CN115509848A - Log analysis method and device, electronic equipment and storage medium - Google Patents

Log analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115509848A
CN115509848A CN202210986726.1A CN202210986726A CN115509848A CN 115509848 A CN115509848 A CN 115509848A CN 202210986726 A CN202210986726 A CN 202210986726A CN 115509848 A CN115509848 A CN 115509848A
Authority
CN
China
Prior art keywords
log
logs
template
label
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210986726.1A
Other languages
Chinese (zh)
Inventor
黄丹
潘缘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202210986726.1A priority Critical patent/CN115509848A/en
Publication of CN115509848A publication Critical patent/CN115509848A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting

Abstract

The embodiment of the invention provides a log analysis method, a log analysis device, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of firstly obtaining a plurality of logs, determining the length and the label corresponding to each log, then classifying the logs according to the length and the label corresponding to each log, then generating a log template according to the logs of the same type, then detecting the log template, and if the log template is detected to be abnormal, then giving an abnormal alarm. According to the embodiment of the invention, the similarity is determined according to the log length and the log label, the logs are classified according to the similarity, and the log template is generated according to the logs of the same type, so that invalid checking of the similar logs is effectively avoided, the checking time is greatly saved, the potential abnormality can be found by detecting the log template, the program is helped to quickly locate the abnormality, the log analysis efficiency is improved, and the platform digitization capability is improved.

Description

Log analysis method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of log management technologies, and in particular, to a log analysis method, a log analysis device, an electronic device, and a computer-readable storage medium.
Background
The log is mainly used for recording the program running process, the running state and the running process of the program can be conveniently observed through the log, and the program execution process can be conveniently analyzed.
At present, more and more data application platforms are used in communication operators, the data application platforms are more and more complex, and the generated logs are massive. When a fault occurs, the cost for manually locating the abnormality from the massive logs is very high, and the main reasons are as follows: on one hand, log formats are various and are difficult to divide manually, and the traditional log rule classification needs to configure complex rules and regularization and is difficult to generalize; on the other hand, the log magnitude is large, the number of alarms is large, the abnormality needing attention is difficult to locate, and some irrelevant logs easily cover the real problem.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are proposed to provide a log analysis method that overcomes or at least partially solves the above problems.
The embodiment of the invention also provides a log analysis device, electronic equipment and a storage medium, so as to ensure the implementation of the method.
In order to solve the above problem, an embodiment of the present invention discloses a log analysis method, where the method includes:
acquiring a plurality of logs, and determining the length and the label corresponding to each log;
classifying the plurality of logs according to the corresponding length and the corresponding label of each log;
generating a log template according to the logs of the same type;
detecting the log template;
and if the log template is detected to be abnormal, performing abnormal alarm.
Optionally, the determining the length and the label corresponding to each log includes:
for each log, dividing the log into a plurality of labels according to separators/spaces; each log corresponds to a plurality of labels;
and determining the length corresponding to each log according to the plurality of labels corresponding to each log.
Optionally, the classifying the plurality of logs according to the length and the label corresponding to each log includes:
respectively inputting the logs to a preset prefix tree; the prefix tree comprises length nodes and label nodes connected with the length nodes;
classifying each log to a target length node matched with the corresponding length;
and classifying the logs to target label nodes matched with the corresponding labels.
Optionally, the label corresponding to each log includes a plurality of labels; the target length node is connected with at least one label node; the classifying the logs into target label nodes matched with the corresponding labels comprises:
determining a label sequence corresponding to each log classified to the target length node;
according to the corresponding label sequence, matching each label corresponding to each log with the at least one label node in sequence;
and when a target label node identical to one label is matched, classifying the log corresponding to the label into the target label node.
Optionally, each log has a corresponding log number; the target label node is linked with a log group list, the log group list comprises a plurality of log groups, and each log group comprises a log event and a log number set;
the generating of the log template according to the logs of the same type includes:
determining the logs classified to the same target label node as logs of the same type;
calculating the similarity between each log and the log events of each log group aiming at each log of the same type to obtain a plurality of similarity values; each log corresponds to a plurality of similarity values;
determining the maximum similarity value corresponding to each log from a plurality of similarity values corresponding to each log;
if the maximum similarity value is larger than or equal to a first threshold value, determining that the log corresponding to the maximum similarity value is matched with a log group;
determining the logs matched to the log group as first-class logs;
and adding the log number corresponding to each first type log to the log number set in the matched log group to obtain a log template.
Optionally, after determining the maximum similarity value corresponding to each log from the multiple similarity values corresponding to each log, the method further includes:
if the maximum similarity value is smaller than a first threshold value, determining that the log corresponding to the maximum similarity value is not matched with the log group;
determining the logs which are not matched with the log group as second-type logs;
respectively creating new log groups in the log group lists corresponding to the second type of logs;
and adding each second type of log to the log event in the corresponding new log group, and adding the log number corresponding to each second type of log to the log number set in the corresponding new log group to obtain a log template.
Optionally, before the detecting the log template, the method further includes:
acquiring all log templates;
matching the longest public subsequence between every two log templates in sequence;
and if the longest public subsequence is greater than or equal to a second threshold value, combining every two log templates corresponding to the longest public subsequence until every two log templates at the tail end complete the matching of the longest public subsequence, and obtaining an updated log template.
Optionally, the detecting the log template includes:
acquiring all updated log templates and log data corresponding to the log amount;
aggregating the log data of the log amount corresponding to all the updated log templates based on chi-square distribution to obtain aggregated log data;
detecting the aggregated log data;
and if the aggregated log data is detected to be abnormal, respectively detecting the log data of the corresponding log amount aiming at each updated log template so as to determine whether the abnormal log template exists.
Optionally, if the log template is detected to be abnormal, performing an abnormal alarm, including:
and if the abnormal log template exists, performing abnormal alarm by adopting the abnormal log template.
The embodiment of the invention also discloses a log analysis device, which comprises:
the log acquisition module is used for acquiring a plurality of logs and determining the length and the label corresponding to each log;
the log classification module is used for classifying the logs according to the corresponding lengths and labels of the logs;
the log template generating module is used for generating a log template according to the logs of the same type;
the log template detection module is used for detecting the log template;
and the abnormity warning module is used for carrying out abnormity warning if the log template is detected to be abnormal.
Optionally, the log obtaining module includes:
the label dividing submodule is used for dividing each log into a plurality of labels according to separators/spaces; each log corresponds to a plurality of labels;
and the length determining submodule is used for determining the length corresponding to each log according to the plurality of labels corresponding to each log.
Optionally, the log classification module includes:
the log input submodule is used for respectively inputting the logs into a preset prefix tree; the prefix tree comprises length nodes and label nodes connected with the length nodes;
a first classification submodule, configured to classify each log into a target length node matched with the corresponding length;
and the second classification submodule is used for classifying the logs to target label nodes matched with the corresponding labels.
Optionally, the label corresponding to each log includes a plurality of labels; the target length node is connected with at least one label node; the second classification submodule comprises:
a label sequence determining unit, configured to determine, for each log classified to the target length node, a label sequence corresponding to each log;
a label matching unit, configured to match, according to the corresponding label sequence, each label corresponding to each log with the at least one label node in sequence;
and the log classifying unit is used for classifying the log corresponding to one label into the target label node when the target label node which is the same as the label is matched.
Optionally, each log has a corresponding log number; the target label node is linked with a log group list, the log group list comprises a plurality of log groups, and each log group comprises a log event and a log number set;
the log template generation module comprises:
the log type determining submodule is used for determining the logs classified to the same target label node as the logs of the same type;
the similarity operator module is used for calculating the similarity between each log and the log events of each log group aiming at each log of the same type to obtain a plurality of similarity values; each log corresponds to a plurality of similarity values;
the maximum similarity value determining submodule is used for determining the maximum similarity value corresponding to each log from a plurality of similarity values corresponding to each log;
the first log group determining submodule is used for determining that the log corresponding to the maximum similarity value is matched with the log group if the maximum similarity value is larger than or equal to a first threshold value;
the first-class log determining submodule is used for determining the logs matched with the log group as first-class logs;
and the first adding submodule is used for adding the log number corresponding to each first type of log to the log number set in the matched log group to obtain the log template.
Optionally, the log template generating module further includes:
a second log group determining submodule, configured to determine that, if the maximum similarity value is smaller than a first threshold, a log corresponding to the maximum similarity value is not matched with a log group;
the second type log determining submodule is used for determining the logs which are not matched with the log group as second type logs;
the creating submodule is used for creating new log groups in the log group lists corresponding to the second type of logs respectively;
and the first adding submodule is used for adding each second type of log to the log event in the corresponding new log group, and adding the log number corresponding to each second type of log to the log number set in the corresponding new log group to obtain the log template.
Optionally, before the detecting the log template, the method further includes:
the log template acquisition module is used for acquiring all log templates;
the longest public subsequence matching module is used for sequentially matching the longest public subsequence between every two log templates;
and the log template updating module is used for merging every two log templates corresponding to the longest public subsequence until every two log templates at the tail end complete the matching of the longest public subsequence to obtain an updated log template if the longest public subsequence is greater than or equal to a second threshold value.
Optionally, the log template detection module includes:
the log data acquisition submodule is used for acquiring all updated log templates and log data corresponding to the log amount;
the log data aggregation submodule is used for aggregating the log data of the log amount corresponding to all the updated log templates based on chi-square distribution to obtain aggregated log data;
the first detection submodule is used for detecting the aggregated log data;
and the second detection submodule is used for detecting the log data of the corresponding log quantity aiming at each updated log template to determine whether the log template with the abnormal quantity exists or not if the aggregated log data is detected to be abnormal.
Optionally, the abnormality warning module includes:
and the abnormal alarm submodule is used for adopting the abnormal log template to carry out abnormal alarm if the abnormal log template exists.
The embodiment of the invention also discloses electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory finish mutual communication through the communication bus;
the memory is used for storing a computer program;
the processor is configured to implement the method according to the embodiment of the present invention when executing the program stored in the memory.
Also disclosed are one or more computer-readable media having instructions stored thereon, which, when executed by one or more processors, cause the processors to perform a method according to an embodiment of the invention.
Compared with the prior art, the embodiment of the invention has the following advantages:
in the embodiment of the invention, a plurality of logs are obtained, the length and the label corresponding to each log are determined, then the logs are classified according to the length and the label corresponding to each log, a log template is generated according to the logs of the same type, so that the log template is detected, and if the log template is detected to be abnormal, abnormal alarm is given. According to the embodiment of the invention, the similarity is determined according to the log length and the log label, the logs are classified according to the similarity, and the log template is generated according to the logs of the same type, so that invalid checking of the similar logs is effectively avoided, the checking time is greatly saved, the potential abnormality can be found by detecting the log template, the program is helped to quickly locate the abnormality, the log analysis efficiency is improved, and the platform digitization capability is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a log analysis method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a connection between a log analysis system and a machine provided by an embodiment of the invention;
fig. 3 is a schematic structural diagram of a prefix tree provided by an embodiment of the present invention;
fig. 4 is a flow chart of the secondary classification based on LCS provided by the embodiment of the present invention;
FIG. 5 is a flow chart of aggregation based on chi-square distribution provided by an embodiment of the present invention;
fig. 6 is a block diagram of a log analysis apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
Referring to fig. 1, a flowchart illustrating steps of a log analysis method according to an embodiment of the present invention is shown, where the method specifically includes the following steps:
step 101, obtaining a plurality of logs, and determining the length and the label corresponding to each log.
In the embodiment of the invention, the method can be applied to a log analysis system, and the log analysis system can be in communication connection with a plurality of machines. In practical application, when a fault of a machine is detected, the log analysis system can sequentially obtain a plurality of logs of each machine according to the machine dimension, and analyze the plurality of logs of each machine respectively to investigate the machine which has the fault, so as to perform an abnormal alarm for the machine which has the fault.
Referring to fig. 2, a schematic diagram of a connection between a log analysis system and a machine according to an embodiment of the present invention is shown, the log analysis system is in communication connection with the machine 1 to the machine N, where N is a positive integer, when a fault of the machine is detected, the log analysis system obtains a plurality of logs from the machine 1, then analyzes the plurality of logs of the machine 1, if the machine 1 is analyzed to be normal, the log analysis system obtains the plurality of logs from the machine 2 and analyzes the plurality of logs, and so on until a certain machine is analyzed to be abnormal, the machine with the abnormality is determined to be the machine with the fault, and thus an abnormality alarm is performed for the machine with the fault.
As shown in fig. 2, after acquiring a plurality of logs of a certain machine, the log analysis system first preprocesses the plurality of logs of the machine, and the length and the label corresponding to each log of the machine can be determined by the preprocessing. Wherein the label may be a token, which has the meaning of the token in computer identity authentication.
In an alternative embodiment of the present invention, step 101 may comprise the following sub-steps:
substep S11, for each log, dividing the log into a plurality of labels according to separators/spaces; each log corresponds to a plurality of labels;
and a substep S12, determining the length corresponding to each log according to the plurality of labels corresponding to each log.
In a specific implementation, the log usually includes content such as character strings/characters/numbers/base 64 codes/address codes, and these content are separated by separators/spaces, and the log analysis system may divide each log into individual tags by taking the separators/spaces as a unit, so as to obtain a plurality of tags. Wherein each log may correspond to a plurality of tags. Illustratively, assuming that the log A is [ Receive from node 4], the log A may be divided into a plurality of tags by separators/spaces, resulting in tags [ Receive ], [ from ], [ node ], [4].
After the plurality of labels are obtained through division, the length corresponding to each log can be determined according to the plurality of labels corresponding to each log. Specifically, the length corresponding to each log may be determined according to the number of the tags obtained by dividing each log. Exemplarily, assuming that the log a is [ Receive from node 4], the division results in 4 tags, thereby determining that the length of the log a is 4.
Furthermore, the trace log is different from the normal log. For a normal log, the tag can be divided by a separator/space, but the trace log is a trace structure and cannot be divided by a separator/space. the trace back log is suitable to be checked according to the lines, therefore, each line of the trace back log can be used as a label in the embodiment of the invention, and thus, similar errors of the trace can be gathered together, and the problem is convenient to search.
And 102, classifying the logs according to the corresponding lengths and labels of the logs.
With the accumulation of business experience, in order to improve the accuracy of log classification, the embodiment of the invention adopts a two-stage model to perform stream classification on the logs. The primary classification is equivalent to pre-classification, and then the pre-classification result is subjected to secondary classification, so that logs which are not well classified in the pre-classification are fused to form a classification result which finally meets the expectation.
The improved prefix tree is used in one classification, the tree depth of the prefix tree is preset and is generally preset to be smaller, and therefore the matching speed is very high. In one classification based on the prefix tree, a plurality of logs can be classified by adopting the length and the label corresponding to each log, so that a pre-classification result is obtained.
In an alternative embodiment of the present invention, step 102 may comprise the following sub-steps:
a substep S21 of inputting the plurality of logs into a preset prefix tree respectively; the prefix tree comprises length nodes and label nodes connected with the length nodes;
substep S22, classifying each log to a target length node matched with the corresponding length;
and a substep S23, classifying the logs to target label nodes matched with the corresponding labels.
Referring to fig. 3, a schematic structural diagram of a prefix tree provided in the embodiment of the present invention is shown, where the prefix tree may include a Root Node, an Internal Node, and a Leaf Node, where the Internal Node may include a length Node and a label Node.
After the length and the label corresponding to each log are determined, each log can be sequentially input into a preset prefix tree. In the prefix tree, each length node corresponds to a length, each length node is connected with at least one label node, and each label node corresponds to a label. And (3) the logs enter the prefix tree, starting from the root node, the second layer is a length layer, the logs with the same length enter the same length node, the next nodes are judged according to the labels, and only the logs matched with the same labels are classified into the corresponding label nodes.
In one example, assuming that the log a is [ Receive from node 4], as shown in fig. 3, according to the Length of the log a being 4, the target Length node matching the Length of the log a may be determined to be "Length:4", then log a can be classified into a target Length node" Length:4", then according to the labels { [ Receive ], [ from ], [ node ], [4] } of the log a, the target label node matching the labels of the log a can be determined to be" Receive ", and then the log a can be classified to the target label node" Receive ". The examples are only intended to enable a person skilled in the art to better understand the embodiments of the invention, and the invention is not limited thereto.
In an optional embodiment of the present invention, the label corresponding to each log includes a plurality of labels; the target length node is connected with at least one label node; the substep S23 may comprise the substeps of:
substep S231, determining, for each log classified to the target length node, a label order corresponding to each log;
substep S232, sequentially matching each label corresponding to each log with the at least one label node according to the corresponding label sequence;
and a substep S233, when a target label node identical to one of the labels is matched, classifying the log corresponding to the one of the labels into the target label node.
The label obtained by dividing each log can include a plurality of labels, so that for each log classified to the target length node, the label sequence corresponding to each log can be determined. The target length node is connected with at least one label node, so that each label corresponding to each log can be matched with at least one label node in sequence according to the label sequence corresponding to each log, when a certain log is matched with a target label node which is the same as one label, the label matching of the log is stopped, and then the log is classified into the target label node.
In one example, assume that the log a is [ Receive from node 4], as shown in fig. 3, the log a is classified into a target Length node "Length:4", and the target Length node" Length:4", two label nodes are connected, which are" Send "and" Receive ", respectively, and the labels obtained by dividing the log a include 4 labels, which are [ Receive ], [ from ], [ node ], [4], respectively, so that the label node is classified into the target Length node" Length: log a of 4", the label order of log a may be determined first. Determining that the label sequence of the log A is [ Receive ] → [ from ] → [ node ] → [4], then sequentially matching each label of the log A with 'Send' and 'Receive' according to the label sequence, and classifying the log A into a target label node when the target label node which is the same as one label is matched. Specifically, the first label [ Receive ] of the log A is matched with the first label node 'Send', if the matching fails, the first label [ Receive ] is matched with the second label node 'Receive', if the matching succeeds, the label matching of the log A can be stopped, and then the log A is classified to the target label node 'Receive'. The examples are only for the purpose of enabling those skilled in the art to better understand the embodiments of the present invention, and the present invention is not limited thereto.
And 103, generating a log template according to the logs of the same type.
After the plurality of logs are classified, different types of logs can be obtained, so that the log analysis system can generate a log template according to the same type of logs.
In an optional embodiment of the present invention, each log has a corresponding log number; the target label node is linked with a log group list, the log group list comprises a plurality of log groups, and each log group comprises a log event and a log number set; step 103 may comprise the following sub-steps:
substep S31, determining the logs classified to the same target label node as the logs of the same type;
step S32, aiming at each log of the same type, calculating the similarity between each log and the log events of each log group to obtain a plurality of similarity values; each log corresponds to a plurality of similarity values;
substep S33, determining a maximum similarity value corresponding to each log from the plurality of similarity values corresponding to each log;
substep S34, if the maximum similarity value is greater than or equal to a first threshold, determining that the log corresponding to the maximum similarity value matches the log group;
substep S35, determining the log matched to the log group as a first type log;
and a substep S36, adding the log number corresponding to each first type log to the log number set in the matched log group to obtain a log template.
According to the length and the label corresponding to each log, after each log is classified to the corresponding length node and the corresponding label node, the logs classified to the same target label node can be determined to be the logs of the same type. While for logs of the same type, further refinement of the classification is required.
As shown in FIG. 3, the target tag Node "Receive" is linked with a Log group list (Alist of Log Groups) located in a Leaf Node (Leaf Node). The Log Group list comprises a plurality of Log groups (Log groups), and each Log Group comprises a Log Event (Log Event) and a Log number set (Log IDs).
The log events can be preset, and the embodiment of the invention can fill variables such as numbers, base64 codes, address codes and the like into < + >, and keep constants, so that the convergence rate of the log template can be greatly improved, and the log template which needs a long period (such as one week) to be stable and available originally can be used immediately after being accessed.
In a specific implementation, for each log of the same type, a similarity simSeq between each log and the log event of each log group may be calculated, so as to obtain a plurality of similarity values. Wherein each log may correspond to a plurality of similarity values. And then determining the maximum similarity value corresponding to each log from a plurality of similarity values corresponding to each log. Since the similarity value is the similarity between the log and the log group, if the maximum similarity value of a log is greater than or equal to the first threshold st, it may be determined that the log matches to an appropriate log group, that is, the log already matches to a log group that satisfies the similarity requirement. For convenience of description, the log matched to the log group may be determined as the first type log, and then the log number corresponding to each first type log may be added to the log number set in the matched log group, so that the log template may be obtained.
In addition, if the similarity between the log events of the log and the log group is directly calculated, the convergence of the log template can be completed in a long time, and the converged log template has a great relationship with the input sequence, so that a robust result cannot be obtained. Therefore, before the similarity is calculated, the log variables can be identified in advance, the variables are filled into < > in advance, and the constant is reserved, that is, in a plurality of labels included in the log, the labels of the parts having difference with the log events of the log group are replaced by < > while the labels of the parts having the same difference are reserved, so that the convergence speed of the log template can be greatly improved, and a more robust log template can be obtained.
In an optional embodiment of the present invention, after the sub-step S33, the following sub-steps may be further included:
substep S41, if the maximum similarity value is smaller than a first threshold value, determining that the log corresponding to the maximum similarity value is not matched with the log group;
substep S42, determining the log which is not matched with the log group as a second type log;
substep S43, respectively creating new log groups in the log group lists corresponding to the second type of logs;
and a substep S44, adding each second type log to the log event in the corresponding new log group, and adding the log number corresponding to each second type log to the log number set in the corresponding new log group to obtain a log template.
After the maximum similarity value corresponding to each log is determined, since the similarity value is the similarity between the log and the log group, if the maximum similarity value of a certain log is smaller than the first threshold st, it may be determined that the log does not match a suitable log group, that is, the log does not match a log group satisfying the similarity requirement. For convenience of description, the logs that are not matched to the log group may be determined as the second type of logs, then a new log group may be created in the log group list corresponding to the second type of logs, then each second type of log may be added to the log event in the new log group corresponding to the second type of logs, and the log number corresponding to each second type of log may be added to the log number set in the new log group corresponding to the second type of logs. That is, the log event in the new log group is the second type of log, and the log number set in the new log group only includes the log number corresponding to the second type of log, so that the prefix tree is updated by using the new log group to obtain the log template.
The substeps S34 to S36 and the substeps 41 to S44 are in parallel.
Further, in some cases, when a new log group appears, it is desirable to alert the new log group. However, when the log classification result is not stable, a new log group may appear frequently, and many false positives may be generated. Therefore, the embodiment of the invention designs the self-adaptive alarm to the new log group, for example, when the new log group is over a certain threshold value in one day, the alarm to the new log group is carried out to inhibit.
In particular, a periodic anomaly refers to the same anomaly that occurred yesterday or last week, and such an alarm is a periodic false alarm. At this time, the abnormal score at the time point corresponding to yesterday or last week is also high, and the current score can be suppressed by using the historical score, so that the effect of eliminating periodic false alarm is achieved. However, there may be a certain offset at the time point of the occurrence of the anomaly, so that it is necessary to perform a central scrolling _ max operation on the scores in a certain time window of yesterday and last week, and then use the current anomaly score to subtract the score which is larger than yesterday or last week, so as to obtain the suppressed score.
In an optional embodiment of the present invention, before step 104, the following steps may be further included:
acquiring all log templates; matching the longest public subsequence between every two log templates in sequence; and if the longest public subsequence is greater than or equal to a second threshold value, combining every two log templates corresponding to the longest public subsequence until every two log templates at the tail end complete the matching of the longest public subsequence, and obtaining an updated log template.
As shown in fig. 2, after obtaining the log template based on the first classification of the prefix tree and before performing log anomaly detection on the log template, the embodiment of the present invention may further perform LCS (Longest Common Sequence) based second classification on the obtained log template, and the main reasons are: the second layer of the first classification has a length match, logs with different lengths cannot be classified into the same type, the first classification is matched one by one according to the label sequence, the calculation similarity is also calculated sequentially, and two log images are similar to each other, so that the logs cannot be classified into the same type if only some token partitions in the middle cannot be aligned.
In order to overcome the problems, the embodiment of the invention introduces a matching algorithm based on the longest common subsequence, and performs secondary combination on the log templates obtained by primary classification to obtain the log template with better clustering effect. Meanwhile, because the time complexity of the longest public subsequence is higher, two pre-matching methods such as a prefix tree and simple circulation are designed, the matching of partial LCS can be reduced, and the overall efficiency of the algorithm is improved. Here, the simple loop refers to performing the one-time classification based on the prefix tree again.
In the embodiment of the invention, the LCS-based secondary classification is to obtain all log templates obtained by the primary classification based on the prefix tree, then sequentially match LCS between every two log templates, if the LCS is greater than or equal to a second threshold, the two log templates corresponding to the LCS can be merged until the last two log templates complete the matching of the LCS, and thus, the updated log template can be obtained.
In a specific implementation, a log object LCSObject is first defined, where the log object LCSObject includes a log key lcseq and a line list lineIds, and a log object list LCSMap is also defined, and is used to store each log object LCSObject. After the definition is good, the log template is input and read by lines, and each line is read, the log object list LCSMap is traversed, and whether the log object LCSObject has the same log key LCSEq as the log object LCSObject is checked in the log object list LCSMap. If such a log object LCSObject exists, adding the lineIDs of the log template to the line list lineIDs of the log object LCSObject; if no log object LCSObject exists, a new log object LCSObject is generated into the log object list LCSMAP, and the log template is continuously read until the end.
In an example, referring to fig. 4, which shows a flowchart of LCS-based secondary classification provided by the embodiment of the present invention, the log template 1[ Temperature (41C) exceeded communication threshold ] is first saved as the log object lcobject in the log object list LCSMap, where the log key lcseq is [ Temperature (41C) exceeded communication threshold ] and the line number list lineIds is {0}. Next, the log template 2 is read, the log object LCS object in the log object list LCSMap is traversed, the log object LCS object includes the log template 1, so the LCS between the log template 2 and the log template 1 is matched, the obtained LCS is the log outward bending threshold, and assuming that the second threshold is 1/2 times of the input log template, it can be determined that the log template 2 and the log template 1 belong to the same log key lcseq, and then the log template 2 and the log template 1 are merged, that is, the variable of the log key lcses seq of the log template 2 and the log template 1 having the difference is replaced by < > and the line list lines {1} of the log template 2 is added to the line list of the log object LCS object, so as to obtain the updated log template. Next, reading the log template 3[ command has completed subsequent retrieval ], traversing the log object LCS object in the log object list LCS map, where the log object LCS object at this time includes log templates 1 and 2, so that LCS between the log template 3 and the log template 1 and the log template 2 are matched, and the obtained LCS is null, it can be determined that the log template 3, the log template 1 and the log template 2 do not belong to the same log key LCS, and then taking the log template 3 as a new log object LCS object to the log object list LCS map, where the log key LCS seq is [ command has completed subsequent retrieval ], and the line number list linelds is {2}, thereby obtaining an updated log template. The examples are only for the purpose of enabling those skilled in the art to better understand the embodiments of the present invention, and the present invention is not limited thereto.
And 104, detecting the log template.
After obtaining the log template, the log template may be used for detection to determine whether an abnormal log template exists therein. The log template anomaly detection method is mainly based on statistics and unsupervised log template anomaly detection algorithm, has no manual marking cost at all, is simple to calculate and has better interpretability.
In an alternative embodiment of the present invention, step 104 may include the following sub-steps:
the substep S51, obtaining all updated log templates and log data corresponding to the log amount;
step S52, aggregating the log data of the log amount corresponding to all the updated log templates based on chi-square distribution to obtain aggregated log data;
substep S53, detecting the aggregated log data;
and a substep S54, if the aggregated log data is detected as abnormal, detecting the log data of the corresponding log amount for each updated log template, so as to determine whether an abnormal log template exists.
In the embodiment of the invention, the log template is obtained based on the primary classification of the prefix tree, the updated log template is obtained based on the secondary classification of the LCS, and then all the updated log templates and the log data corresponding to the log quantity can be obtained. The log data of the log amount refers to the log data allocated to the updated log template.
In the embodiment of the invention, 1min (minute) is designed as a detection period, and as the log is obtained according to the machine dimension, the log anomaly detection also obtains all updated log templates of the current machine and log data corresponding to the log quantity according to the machine dimension.
Then, aggregating the log data of the log amount corresponding to all the updated log templates based on Chi-square Distribution to obtain aggregated log data, namely aggregating the log data of the log amount of different log templates by using Chi-square Distribution. Wherein, based on chi-square distribution, the polymerization is carried out, and the specific process is as follows: referring to fig. 5, a flowchart of aggregation based on chi-square distribution provided in the embodiment of the present invention is shown, where the frequency features of n templates conform to a standard normal distribution, and a square sum feature of the templates is constructed based on the feature, where the feature satisfies the chi-square distribution, and when n is larger, the chi-square distribution can be approximated to the normal distribution, so that the detection can be performed by using a 3 σ method.
In addition to using the 3 σ method, a Boxplot (Boxplot) method can be used, which uses five statistics in the data: a method for describing data by a minimum value, a first quartile, a median, a third quartile and a maximum value.
After the detection results are obtained by the 3 σ method and the boxplot method, the detection results of the two methods can be converted into abnormal scores and then fused, so that aggregated log data can be obtained.
Then, detecting the aggregated log data, and if no abnormity is found, returning a detection result as normal; if the abnormal condition is found, the log data of the log quantity corresponding to each updated log template is further detected to determine whether the abnormal log template exists.
In the embodiment of the invention, the log anomaly detection is designed to be two main reasons: the number of the log templates generated according to the logs of each machine is large, the noise in the log data corresponding to the number of the logs is also large, and the abnormal detection of the log templates is directly carried out, so that more ineffective alarms are easily generated. The number of log templates is large, the calculation pressure for carrying out anomaly detection of 1min granularity in total is large, and detection can not be finished within 1 min. Therefore, the embodiment of the invention considers that an index with less noise is designed from the global perspective, the index is detected, and when the detection result is abnormal, the specific log template is drilled down for detection, and the abnormal log template is found, so that the false alarm can be reduced, and the CPU pressure can be reduced.
And 105, if the log template is detected to be abnormal, performing abnormal alarm.
After the log template is obtained, abnormity judgment is carried out according to log data of the log amount of the log template, and if the log template with abnormity is found, the corresponding abnormity can be sent to a user through an alarm.
In an alternative embodiment of the present invention, step 105 may comprise the following sub-steps:
and a substep S61, if an abnormal log template exists, adopting the abnormal log template to carry out abnormal alarm.
In a specific implementation, the historical data of the log amount corresponding to each updated log template is detected to determine whether an abnormal log template exists. If the abnormal log template does not exist, the returned detection result is normal; if the abnormal log template exists, the returned detection result is abnormal, and then the abnormal log template can be used for carrying out abnormal alarm, so that the user can determine that the current machine is in failure.
In summary, in the embodiment of the present invention, a plurality of logs are obtained, a length and a label corresponding to each log are determined, then the plurality of logs are classified according to the length and the label corresponding to each log, a log template is generated according to the logs of the same type, so as to detect the log template, and if the log template is detected to be abnormal, an abnormal alarm is given. According to the method and the device, the similarity is determined according to the log length and the log label, the logs are classified according to the similarity, and the log template is generated according to the logs of the same type, so that invalid checking of the similar logs is effectively avoided, the checking time is greatly saved, the potential abnormity can be found by detecting the log template, the program can be helped to quickly locate the abnormity, the log analysis efficiency is improved, and the platform digitization capability is improved.
Referring to fig. 6, a block diagram of a log analysis apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
a log obtaining module 601, configured to obtain multiple logs, and determine a length and a label corresponding to each log;
a log classification module 602, configured to classify the multiple logs according to the lengths and labels corresponding to the logs;
a log template generating module 603, configured to generate a log template according to logs of the same type;
a log template detection module 604, configured to detect the log template;
an anomaly alarm module 605, configured to perform an anomaly alarm if the log template is detected as abnormal.
In an optional embodiment of the present invention, the log obtaining module 601 may include:
the label dividing submodule is used for dividing each log into a plurality of labels according to separators/spaces; each log corresponds to a plurality of labels;
and the length determining submodule is used for determining the length corresponding to each log according to the plurality of labels corresponding to each log.
In an optional embodiment of the present invention, the log classification module 602 may include:
the log input submodule is used for respectively inputting the logs into a preset prefix tree; the prefix tree comprises length nodes and label nodes connected with the length nodes;
a first classification submodule, configured to classify each log into a target length node matched with the corresponding length;
and the second classification submodule is used for classifying the logs to target label nodes matched with the corresponding labels.
In an optional embodiment of the present invention, the label corresponding to each log includes a plurality of labels; the target length node is connected with at least one label node; the second classification submodule may include:
a label sequence determining unit, configured to determine, for each log classified to the target length node, a label sequence corresponding to each log;
the label matching unit is used for matching each label corresponding to each log with the at least one label node in sequence according to the corresponding label sequence;
and the log classifying unit is used for classifying the log corresponding to one label into the target label node when the target label node which is the same as the label is matched.
In an optional embodiment of the present invention, each log has a corresponding log number; the target label node is linked with a log group list, the log group list comprises a plurality of log groups, and each log group comprises a log event and a log number set;
the log template generating module 603 may include:
the log type determining submodule is used for determining the logs classified to the same target label node as the logs of the same type;
the similarity operator module is used for calculating the similarity between each log and the log events of each log group aiming at each log of the same type to obtain a plurality of similarity values; each log corresponds to a plurality of similarity values;
the maximum similarity value determining sub-module is used for determining the maximum similarity value corresponding to each log from a plurality of similarity values corresponding to each log;
the first log group determining submodule is used for determining that the log corresponding to the maximum similarity value is matched with the log group if the maximum similarity value is larger than or equal to a first threshold value;
the first-class log determining sub-module is used for determining the logs matched to the log group as first-class logs;
and the first adding submodule is used for adding the log number corresponding to each first type of log to the log number set in the matched log group to obtain a log template.
In an optional embodiment of the present invention, the log template generating module 603 may further include:
a second log group determining submodule, configured to determine that, if the maximum similarity value is smaller than a first threshold, a log corresponding to the maximum similarity value is not matched with a log group;
the second type log determining submodule is used for determining the logs which are not matched with the log group as second type logs;
the creating submodule is used for creating new log groups in the log group lists corresponding to the second type of logs respectively;
and the first adding submodule is used for adding each second type of log to the log event in the corresponding new log group, and adding the log number corresponding to each second type of log to the log number set in the corresponding new log group to obtain the log template.
In an optional embodiment of the present invention, before the detecting the log template, the method may further include:
the log template acquisition module is used for acquiring all log templates;
the longest public subsequence matching module is used for sequentially matching the longest public subsequence between every two log templates;
and the log template updating module is used for merging every two log templates corresponding to the longest public subsequence until every two log templates at the tail end complete the matching of the longest public subsequence to obtain an updated log template if the longest public subsequence is greater than or equal to a second threshold value.
In an optional embodiment of the present invention, the log template detecting module 604 may include:
the log data acquisition submodule is used for acquiring all updated log templates and log data corresponding to the log amount;
the log data aggregation submodule is used for aggregating the log data of the log amount corresponding to all the updated log templates based on chi-square distribution to obtain aggregated log data;
the first detection submodule is used for detecting the aggregated log data;
and the second detection submodule is used for detecting the log data of the corresponding log amount aiming at each updated log template to determine whether the log template with the abnormal log exists or not if the aggregated log data is detected to be abnormal.
In an optional embodiment of the present invention, the abnormality warning module 605 may include:
and the abnormal alarm submodule is used for adopting the abnormal log template to carry out abnormal alarm if the abnormal log template exists.
In summary, in the embodiment of the present invention, a plurality of logs are obtained, a length and a label corresponding to each log are determined, then the plurality of logs are classified according to the length and the label corresponding to each log, a log template is generated according to the logs of the same type, so as to detect the log template, and if the log template is detected to be abnormal, an abnormal alarm is performed. According to the embodiment of the invention, the similarity is determined according to the log length and the log label, the logs are classified according to the similarity, and the log template is generated according to the logs of the same type, so that invalid checking of the similar logs is effectively avoided, the checking time is greatly saved, the potential abnormality can be found by detecting the log template, the program is helped to quickly locate the abnormality, the log analysis efficiency is improved, and the platform digitization capability is improved.
For the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
An embodiment of the present invention further provides an electronic device, including: the processor, the memory, and the computer program stored in the memory and capable of running on the processor, when executed by the processor, implement each process of the above-mentioned log analysis method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements each process of the log analysis method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or terminal apparatus that comprises the element.
The log analysis method, the log analysis device, the electronic device and the computer-readable storage medium provided by the present invention are described in detail, and a specific example is applied in the present document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. A method of log analysis, the method comprising:
acquiring a plurality of logs, and determining the length and the label corresponding to each log;
classifying the plurality of logs according to the corresponding length and the corresponding label of each log;
generating a log template according to the logs of the same type;
detecting the log template;
and if the log template is detected to be abnormal, performing abnormal alarm.
2. The method of claim 1, wherein determining the length and the label corresponding to each log comprises:
for each log, dividing the log into a plurality of labels according to separators/spaces; each log corresponds to a plurality of labels;
and determining the length corresponding to each log according to the plurality of labels corresponding to each log.
3. The method of claim 1, wherein the classifying the plurality of logs according to the length and the label corresponding to each log comprises:
respectively inputting the logs into a preset prefix tree; the prefix tree comprises length nodes and label nodes connected with the length nodes;
classifying each log to a target length node matched with the corresponding length;
and classifying the logs to target label nodes matched with the corresponding labels.
4. The method of claim 3, wherein the label corresponding to each log comprises a plurality of labels; the target length node is connected with at least one label node; the classifying the logs into target label nodes matched with the corresponding labels comprises:
determining a label sequence corresponding to each log classified to the target length node;
sequentially matching each label corresponding to each log with the at least one label node according to the corresponding label sequence;
and when a target label node identical to one label is matched, classifying the log corresponding to the label into the target label node.
5. The method of claim 4, wherein each log has a corresponding log number; the target label node is linked with a log group list, the log group list comprises a plurality of log groups, and each log group comprises a log event and a log number set;
the generating of the log template according to the logs of the same type includes:
determining the logs classified to the same target label node as logs of the same type;
aiming at each log of the same type, calculating the similarity between each log and the log events of each log group to obtain a plurality of similarity values; each log corresponds to a plurality of similarity values;
determining the maximum similarity value corresponding to each log from a plurality of similarity values corresponding to each log;
if the maximum similarity value is larger than or equal to a first threshold value, determining that the log corresponding to the maximum similarity value is matched with a log group;
determining the logs matched to the log group as first-class logs;
and adding the log number corresponding to each first type log to the log number set in the matched log group to obtain a log template.
6. The method according to claim 5, further comprising, after determining the maximum similarity value corresponding to each log from the plurality of similarity values corresponding to each log, the step of:
if the maximum similarity value is smaller than a first threshold value, determining that the log corresponding to the maximum similarity value is not matched with the log group;
determining the logs which are not matched with the log group as second-type logs;
respectively creating new log groups in the log group lists corresponding to the second type of logs;
and adding each second type of log to the log event in the corresponding new log group, and adding the log number corresponding to each second type of log to the log number set in the corresponding new log group to obtain a log template.
7. The method of claim 6, prior to said detecting said log template, further comprising:
acquiring all log templates;
matching the longest public subsequence between every two log templates in sequence;
and if the longest public subsequence is greater than or equal to a second threshold value, combining every two log templates corresponding to the longest public subsequence until every two log templates at the tail end complete the matching of the longest public subsequence, and obtaining an updated log template.
8. The method of claim 7, wherein the detecting the log template comprises:
acquiring all updated log templates and log data corresponding to the log amount;
aggregating the log data of the log amount corresponding to all the updated log templates based on chi-square distribution to obtain aggregated log data;
detecting the aggregated log data;
and if the aggregated log data is detected to be abnormal, respectively detecting the log data of the corresponding log quantity aiming at each updated log template so as to determine whether the abnormal log template exists.
9. The method of claim 8, wherein the performing an anomaly alert if the log template is detected as anomalous comprises:
and if the abnormal log template exists, performing abnormal alarm by adopting the abnormal log template.
10. An apparatus for log analysis, the apparatus comprising:
the log acquisition module is used for acquiring a plurality of logs and determining the length and the label corresponding to each log;
the log classification module is used for classifying the logs according to the corresponding lengths and labels of the logs;
the log template generating module is used for generating a log template according to the logs of the same type;
the log template detection module is used for detecting the log template;
and the abnormal alarm module is used for performing abnormal alarm if the log template is detected to be abnormal.
11. An electronic device, comprising: processor, memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the log analysis method according to any one of claims 1 to 9.
12. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the log analysis method according to any one of claims 1 to 9.
CN202210986726.1A 2022-08-17 2022-08-17 Log analysis method and device, electronic equipment and storage medium Pending CN115509848A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210986726.1A CN115509848A (en) 2022-08-17 2022-08-17 Log analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210986726.1A CN115509848A (en) 2022-08-17 2022-08-17 Log analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115509848A true CN115509848A (en) 2022-12-23

Family

ID=84502604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210986726.1A Pending CN115509848A (en) 2022-08-17 2022-08-17 Log analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115509848A (en)

Similar Documents

Publication Publication Date Title
CN110609759B (en) Fault root cause analysis method and device
CN111612038B (en) Abnormal user detection method and device, storage medium and electronic equipment
CN111143838B (en) Database user abnormal behavior detection method
CN111160021A (en) Log template extraction method and device
CN113254255B (en) Cloud platform log analysis method, system, device and medium
CN109933502B (en) Electronic device, user operation record processing method and storage medium
WO2016093839A1 (en) Structuring of semi-structured log messages
CN113821630B (en) Data clustering method and device
CN113886237A (en) Analysis report generation method and device, electronic equipment and storage medium
CN113723542A (en) Log clustering processing method and system
US10320636B2 (en) State information completion using context graphs
CN117221087A (en) Alarm root cause positioning method, device and medium
CN113746780A (en) Abnormal host detection method, device, medium and equipment based on host image
CN116578700A (en) Log classification method, log classification device, equipment and medium
CN112905370A (en) Topological graph generation method, anomaly detection method, device, equipment and storage medium
CN116707859A (en) Feature rule extraction method and device, and network intrusion detection method and device
CN115509848A (en) Log analysis method and device, electronic equipment and storage medium
CN114090850A (en) Log classification method, electronic device and computer-readable storage medium
CN108304467A (en) For matched method between text
CN115629945A (en) Alarm processing method and device and electronic equipment
CN112750047A (en) Behavior relation information extraction method and device, storage medium and electronic equipment
CN113128213A (en) Log template extraction method and device
CN111475380A (en) Log analysis method and device
CN114756401B (en) Abnormal node detection method, device, equipment and medium based on log
CN110990810A (en) User operation data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination