CN118093325A

CN118093325A - Log template acquisition method, electronic equipment and storage medium

Info

Publication number: CN118093325A
Application number: CN202410518808.2A
Authority: CN
Inventors: 顾兆军; 张智凯; 刘春波; 岳文龙
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2024-04-28
Filing date: 2024-04-28
Publication date: 2024-05-28
Anticipated expiration: 2044-04-28
Also published as: CN118093325B

Abstract

The invention relates to the field of computer technology application, in particular to a log template acquisition method, electronic equipment and a storage medium, which comprise the following steps: obtaining a plurality of initial key logs from an original log data set, then carrying out preliminary identification on word types in the data set, generating a first regular expression representing a non-digital variable and a second regular expression containing a digital constant based on the initial key logs, correcting the word types of the preliminary identification by using the generated expressions, obtaining a corresponding log template based on the corrected word types, then judging whether the current log template meets preset conditions, taking the current log template as a target log template if the current log template meets the preset conditions, otherwise, adjusting the current expression, adjusting the word types in the current data set by using the new expression to obtain a new log template, and repeating the previous judging step until the preset conditions are met. The method and the device can improve the generation efficiency and accuracy of the log template.

Description

Log template acquisition method, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technology application, and in particular, to a log template obtaining method, an electronic device, and a storage medium.

Background

In the development and maintenance of modern software, journals provide critical information about system and network activities, helping developers and operation and maintenance engineers understand system behavior and trace back system problem sources, detect and respond to security events, and conduct troubleshooting and vulnerability analysis. In practice, operation and maintenance engineers typically employ rule-based log parsing methods, such as the Grok filter technique employed by logstack, by manually writing and matching the entire log template with regular expressions. Grok is a method of matching a log line to a regular expression, mapping a specific portion of the log line to a dedicated field, and performing an operation based on this mapping. The problem with this type of approach is that each Grok filter rule corresponds to a class of log events, which means that the Grok rule base is difficult to maintain and expand for modern software systems that contain a large number of heterogeneous log event types, and that are continually updated. Second, each newly added Grok rule results in an additional canonical match to the entire log line. The log of a modern software system may contain hundreds of thousands of log templates, the cost of manually giving regular expressions that match all of the log templates is unacceptable, and the advent of new log templates is completely inadaptable. Another type of log parsing method is based on predefined heuristic rules, where researchers find certain types of features inherent in log data, and algorithms use these features to perform template acquisition. For example, SLCT (Simple Log Cluster Tool) based on frequent word statistics consider that words that occur more frequently in log files are constant. This approach may be effective in preliminary parsing, but it is difficult to identify rare log templates that occur at low frequency. On this basis, LFA (Log File Analyzer) considers the position of the word for statistics, logram uses an n-gram class method as a statistical index, so that the context information of the word is taken into consideration. IPLoM (Iterative Partition Log Mining) proposes an iterative partitioning approach by continually partitioning the log into small clusters based on log length and word location characteristics. Drain is a log parsing algorithm widely used in recent years, and is essentially a tree representation of an iterative partitioning algorithm based on a prefix parse tree of the log. There are two problems with this type of strategy:

a) The super parameter of the similarity threshold has great influence on the algorithm performance and is difficult to adjust to obtain an optimal value;

b) The merging operation may be performed when the similarity of the set of logs is high, this strategy results from the algorithm designer's observation of most log properties and is not appropriate for all logs.

Another type of log parsing method is to learn constant/variable features of a log from a large number of annotated log data sets using deep learning techniques. Wherein UniParser uses LSTM network based on contrast learning strategy to make log context coding, logPPT uses RoBERTa network to make log sequence feature acquisition. However, neural network methods place additional GPU hardware requirements on the system and their operating efficiency is difficult to match with the huge log generation rate.

In summary, the existing log template acquisition method either depends on a large number of rules or needs to adopt a complex deep learning algorithm, so that the acquisition cost is too high, and the real-time processing requirement of a large-scale log data set is difficult to adapt.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

According to a first aspect of the present invention, there is provided a log template acquisition method, the method comprising the steps of:

S100, acquiring an original log data set D= { D ₁,D₂,……,D_i,……,D_n},D_i which is the ith log in D, wherein the value of i is 1 to n, and n is the number of the logs in D.

S200, sampling the D to obtain k initial key logs.

S300, acquiring a word segmentation symbol set, an initial first regular expression and an initial second regular expression based on the k initial key logs; the first regular expression is a regular expression corresponding to a non-digital variable, and the second regular expression is a regular expression corresponding to a constant containing a number.

S400, performing word segmentation processing on the D by using the word segmentation symbol set to obtain a corresponding word segmentation result set W= { W ₁,W₂,……,W_i,……,W_n};W_i as a word segmentation result corresponding to the D _i, W _i={W_i1,W_i2,……,W_ij,……,W_if（i）},W_ij as the j-th word in the W _i, the j value is 1 to f (i), and f (i) is the number of words in the W _i.

S500, for W _ij in W _i, if W _ij contains numbers, the type identifier corresponding to W _ij is marked as a variable identifier, and if W _ij does not contain numbers, the type identifier corresponding to W _ij is marked as a constant identifier.

S600, based on the current first regular expression and the current second regular expression, the type identifier corresponding to W _ij in W _i is adjusted, and D with the type identifier adjusted is obtained and is used as a current log data set.

S700, acquiring a corresponding current log template set based on the current log data set, taking the current log template set as a target log template corresponding to the D if the current log template set is the same as the last log template set, and exiting the current control program, otherwise, executing S800.

S800, adjusting the current first regular expression, taking the adjusted first regular expression as the current first regular expression, adjusting the current second regular expression, taking the adjusted second regular expression as the current second regular expression, and executing S600.

According to a second aspect of the present invention, there is provided an electronic device comprising a processor and a memory; the processor is configured to execute the steps of the method according to the first aspect of the present invention by calling a program or instructions stored in the memory.

According to a third aspect of the present invention there is provided a non-transitory computer readable storage medium storing a program or instructions which cause a computer to perform the steps of the method of the first aspect of the present invention.

The invention has at least the following beneficial effects:

According to the technical scheme provided by the embodiment of the invention, k initial key logs are firstly obtained from an original log data set, then, the word types in the data set are primarily marked, a first regular expression representing a non-digital variable and a second regular expression containing a digital constant are generated based on the k initial key logs, the generated expression is used for correcting the primarily marked word types, the corrected word types are obtained, a corresponding log template is obtained based on the corrected word types, then, whether the current log template meets preset conditions is judged, if yes, the current log template is used as a target log template, otherwise, the current first regular expression and the current second regular expression are adjusted, the word types in the current data set are adjusted by using the new regular expression, a new log template is obtained, and the previous judging step is repeated until the preset conditions are met. The method and the device can improve the generation efficiency and accuracy of the log template.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a log template obtaining method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

It should be noted that some exemplary embodiments are described as a process or a method depicted as a flowchart. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.

An embodiment of the present invention provides a log template obtaining method, as shown in fig. 1, where the method may include the following steps:

In an embodiment of the present invention, the original log data set may be a log data set input by a user. In an embodiment of the invention, the log data set may include log data sets generated by a distributed system (HDFS, hadoop, zookeeper, openStack, spark), supercomputers (BGL, HPC), client applications (Proxifier, thunderbird), server applications (Apache, openSSH), mobile applications (HEALTHAPP), operating systems (Windows, linux, mac, andriod), and the like.

S200, sampling the D to obtain k initial key logs.

In the embodiment of the invention, k can be set based on actual needs, for example, 32, 64 or 128.

Further, in the embodiment of the present invention, S200 may specifically include:

S201, for D _i, performing word segmentation on D _i by using a general segmentation character to obtain a corresponding word segmentation result W0 _i, and acquiring a special symbol set M _i corresponding to D _i; wherein W0 _i={W0_i1,W0_i2,……,W0_ir,……,W0_ig（i）};W0_ir is the r-th word in W0 _i, the value of r is 1 to g (i), and g (i) is the number of words in W0 _i; m _i={M_i1,M_i2,……,M_is,……,M_ih（i）},M_is is the s-th special symbol in M _i, the value of s is 1 to h (i), and h (i) is the number of special symbols in M _i.

In an embodiment of the present invention, the universal delimiter may be an existing universal delimiter, such as a space, or the like. Special symbols may be existing definitions such as symbols other than numerals and letters.

S202, based on the log length and the special symbol set, performing preliminary classification on all logs in the D to obtain m initial log clusters.

In the embodiment of the invention, the length of the log is the number of word segmentation corresponding to the log. All logs in each log cluster have the same log length and the same special symbol.

S203, a counter t=1 is set.

S204, if t is less than or equal to k, executing S205, otherwise, executing S208.

S205, randomly extracting a log cluster from the initial log clusters which are not extracted at present as a t-th sample log cluster C _t, randomly extracting d logs from C _t as a t-th log sample candidate set PC _t={PC_t1,PC_t2,……,PC_ta,……,PC_td};PC_ta as a-th log sample in PC _t, wherein the value of a is 1 to d.

In the embodiment of the invention, d can be set based on actual needs, for example, 32, 64 or 128. Those skilled in the art will appreciate that if the number of logs in C _t is less than d, then all logs in C _t are taken as log samples.

S206, acquiring a log sample corresponding to the min (S _t1,S_t2,……,S_ta,……,S_td) as a t-th key log and adding the t-th key log into a current key log set; wherein S _ta is the maximum similarity ,S_ta=max（S¹ _ta,S² _ta,……,S^u _ta,……,S^p _ta）,S^u _ta between the PC _ta and the current logs in the key log set, the similarity between the PC _ta and the u-th log in the current key log set, the value of u is 1 to p, and p is the number of the current logs in the key log set; the initial value of the key log set is an empty set; min () represents taking the minimum value and max () represents taking the maximum value.

In an embodiment of the present invention, S ^u _ta is determined based on the longest common subsequence between PC _ta and the u-th log in the current critical log set. Specifically, S ^u _ta satisfies the following condition: s ^u _ta=LP_ta-u ^max/L_ta-u ^max, where LP _ta-u ^max is the length of the longest common subsequence between PC _ta and the u-th log in the current critical log set, L _ta-u ^max=max（L_ta,L_u）,L_ta is the length of PC _ta, L _u is the length of the u-th log in the current critical log set, and max () represents the maximum value, i.e., L _ta-u ^max is the maximum of the length of PC _ta and the length of the u-th log in the current critical log set.

S207, t=t+1 is set, and S204 is executed.

S208, taking k logs in the current key log set as the k initial key logs.

In the embodiment of the invention, the word segmentation symbol set may be a union of word segmentation symbols corresponding to k initial key logs.

In an embodiment of the present invention, the first regular expression and the second regular expression may be obtained based on a manual or trained neural network model. Specifically, words belonging to non-digital variables and containing digital constants in each key log may be obtained based on a manual or trained neural network model, e.g., words in log "Invalid user test from 52.80.34.196" where "test" is a non-digital variable, log "… … available network connection on network <"Alt0" in "VIA INTERFACE ALT0" is a word containing a numerical constant. It will be appreciated by those skilled in the art that any method of obtaining words belonging to non-numeric variables and numeric constants in each key log based on a manual or trained neural network model falls within the scope of the present invention.

After all words belonging to the non-numeric variable and the numeric constant in the key log are acquired, the context-like words are represented using the same regular expression, for example, log "Invalid user test from 52.80.34.196" corresponds to a regular expression "(The regular expression corresponding to > VIA INTERFACE ALT0 "is" (. And finally, acquiring a preset number of regular expressions based on actual requirements. Those skilled in the art will recognize that any method for obtaining a regular expression corresponding to a word falls within the scope of the present invention.

Although word segmentation is a key element of log parsing, little attention has been paid. The inventor of the present invention found through research that word segmentation has an important effect on log analysis results. In some cases, some log templates that cannot be correctly identified may be accurately identified in another particular word segmentation. Therefore, optimization of word segmentation techniques needs to be explored more deeply to improve accuracy and efficiency of log parsing. Taking the logs "StackScrollAlgorithm: overlapAmount:220.0" and "StackScrollAlgorithm:state. Cliptopamount204" as examples, if a colon is not used as a word segmentation symbol, each log will be separated into three words. However, since the latter two words contain numbers, the algorithm cannot consider these words when classifying, and the two types of logs will be mapped to the same set ("StackScrollAlgorithm") and cannot be distinguished. In contrast, if a colon is used as the participle symbol, the two types of logs will map to ("StackScrollAlgorithm", "overlapAmount") and ("StackScrollAlgorithm", "state. Even neural network models trained based on a large number of multiple log data sets are difficult to ensure that they can be applied without fine tuning on the new data set. Therefore, in the invention, the union of the word segmentation symbols corresponding to the k initial key logs is used as the word segmentation symbol set, so that the original logs can be converted into a more formatted form as much as possible, and the word segmentation accuracy can be improved.

S500, for W _ij in W _i, if W _ij contains a number, marking the type identifier corresponding to W _ij as a variable identifier, i.e. marking W _ij as a variable, if W _ij does not contain a number, marking the type identifier corresponding to W _ij as a constant identifier, i.e. marking W _ij as a constant.

In embodiments of the present invention, existing constant/variable discriminators may be used to obtain the type identification for each word.

In the embodiment of the invention, the initial value of the current first regular expression is an initial first regular expression, and the initial value of the current second regular expression is an initial second regular expression.

Further, in S600, each log in the original log data set may be respectively matched using the obtained first regular expression and the second regular expression, if the corresponding word is matched, the type identifier corresponding to the word is adjusted to the type identifier corresponding to the corresponding regular expression, for example, matching is performed with a regular expression "(.

Those skilled in the art will appreciate that the process of matching using regular expressions may be implemented based on existing regular matching algorithms.

The ideal goal of log template extraction is to comprehensively and correctly extract constants in the log. In practical applications, it is a very difficult task to implement high accuracy log parsing with only a constant/variable arbiter. Consider the following: assuming that a certain constant/variable classifier has an accuracy of 99% and each log contains 10 words on average, then the probability that the classifier correctly extracts all constants in one log drops to the 10 th power of 0.99, i.e., 0.904; if the accuracy of the constant/variable classifier is slightly lower than 95%, the accuracy of the classifier on an entire log of length 10 drops rapidly to 0.599. Minor deviations in the classifier are amplified in log template discrimination. In addition, in practical application, the form of the log is quite complex and changeable, and the continuous expansion of the log scale in a large-scale system limits the application of the deep learning method. These factors together make it difficult for the constant/variable arbiter to achieve both high accuracy and high efficiency. In view of this, in the embodiment of the present invention, based on the key log, the corresponding first regular expression and second regular expression are obtained and used to correct the discrimination result of the constant/variable discriminator, so that the generated log template can be as accurate as possible.

S700, acquiring a corresponding current log template set based on the current log data set, taking the current log template set as a target log template corresponding to the D if the current log template set is the same as the last log template set, namely, the log template is not changed, and exiting the current control program, otherwise, executing S800.

In the embodiment of the invention, the generation of the log template can adopt the prior art. For example, for each log in the current log dataset, a constant of the log may be used as a static portion of the log template, a variable in the log may be used as a variable portion of the log template, and then the variable portion may be replaced with a wild card to obtain a corresponding log template. And finally, merging the logs with the same corresponding log template to obtain a current log template set.

The technical effect of S800 is to remedy the possible sacrifice of log sampling module in reducing human consumption by finding potentially misclassified samples. The method for loop iteration continuously optimizes the rule set, so that the rule set can be better adapted to the change of a system and the evolution of log data, and the efficiency and the accuracy of operation and maintenance personnel in the aspects of log analysis and fault investigation are improved.

Further, in embodiments of the present invention, potential variables in the categorical samples may be identified based on the longest common subsequence, and potential constants in the categorical samples may be identified based on word frequency statistics.

Further, in an exemplary embodiment of the present invention, in S800, adjusting the current first regular expression, and taking the adjusted first regular expression as the current first regular expression may specifically include:

S810, obtaining similarity SL _bv between any two log templates LT _b and LT _v in the current log template set; the values of b and v are 1 to Q, Q is the number of log templates in the current log template set, and b is not equal to v.

In an embodiment of the invention, SL _bv is determined based on the ratio of the length of the longest common subsequence between LT _b and LT _v to the length of the shortest term between the two. Specifically, SL _bv satisfies the following conditions: SL _bv=LP_bv ^max/L_bv ^max, wherein LP _bv ^max is the length of the longest common subsequence between LT _b and LT _v, LP _bv ^max=max（L_b,L_v）,L_b is the length of LT _b, L _v is the length of LT _v, max () means taking the maximum value, i.e. LP _bv ^max is the maximum one of the length of LT _b and the length of LT _v.

S811, if SL _bv > ST0, executing S812; ST0 is a preset similarity threshold.

If the variables attributed to a particular log template are divided into constants by mistake, then several highly similar log templates may be generated, and therefore if SL _bv > ST0, there may be a classification sample.

In an embodiment of the present invention, ST0 may be an empirical value, and may be determined based on a corresponding log type. In one exemplary embodiment, 0.5.ltoreq.ST0.ltoreq.0.7. The inventors of the present invention have studied to find that if either ST0 < 0.5 or 0.7 < ST0, no potentially categorical sample can be identified effectively.

S812, adding LT _b to the LTC if LT _b does not belong to the LTC, otherwise, not adding LT _b to the LTC; if LT _v does not belong to an LTC, adding LT _v to the LTC, otherwise, not adding LT _v to the LTC; the LTC is a current candidate log template set, and the initial value of the current candidate log template set is a null value.

S813, extracting q logs from each log cluster corresponding to the current candidate log template set as new key logs.

In an embodiment of the present invention, q may be set based on actual needs, and in a non-limiting exemplary embodiment, q may be equal to 2 or 3. S814, generating a new first regular expression based on the new key log.

In the embodiment of the present invention, based on the new key logs, the method for generating the new first regular expression may be the same as the method for generating the first regular expression based on the k key logs.

S815, generating a corrected first regular expression based on the current first regular expression and the new first regular expression.

Specifically, merging the current first regular expression and the new first regular expression to obtain a corrected first regular expression. Therefore, the regular expression is richer, namely, a new digital constant recognition rule can be generated, and potential classification samples are found in a new round of iteration.

S816, respectively taking the corrected first regular expressions as the current first regular expressions, and executing S600.

Further, in S800, adjusting the current second regular expression, and taking the adjusted second regular expression as the current second regular expression may specifically include:

S801, acquiring a word frequency set WF and a constant set CV corresponding to a current log data set; wherein WF= { WF ₁,WF₂,……,WF_e,……,WF_x};WF_e is the word frequency of the e-th word W _e in the word set WP corresponding to the current log data set, the value of e is 1 to x, x is the word number ;WF=W^c ₁∪W^c ₂∪……∪W^c _i∪……∪W^c _n,W^c _i in the WP and is the word segmentation result corresponding to the i-th log data in the current log data set; cv=cv ₁∪CV₂∪……∪CV_i∪……∪CV_n,CV_i is a constant set corresponding to the i-th log data corresponding to the current log data set.

S802, if WF _e is larger than WF0 and W _e does not belong to CV, extracting q logs from the logs corresponding to the log templates corresponding to W _e as new key logs; WF0 is a preset word frequency threshold.

If WF _e > WF0, and W _e does not belong to CV, i.e., the word occurs too frequently, but is not in a constant set, it is stated that the word is likely to be a potential constant, possibly a misclassified sample.

In an embodiment of the present invention, q may be set based on actual needs, and in a non-limiting exemplary embodiment, q may be equal to 2 or 3.

In an embodiment of the present invention, WF0 may be an empirical value, and in one exemplary embodiment WF0 satisfies the following condition: wf0=f×wf _max, f is a preset coefficient, 0< f < 1, WF _max is the largest of WF. Preferably, f=0.2 or 0.3.

Those skilled in the art will appreciate that if the log template corresponding to W _e is multiple, q logs are extracted from the log corresponding to each of the corresponding log templates.

S803, generating a new second regular expression based on the new key log.

In the embodiment of the present invention, based on the new key logs, a manner of generating the new second regular expression may be the same as the foregoing method of generating the second regular expression based on the k key logs.

S804, generating a corrected second regular expression based on the current second regular expression and the new second regular expression.

Specifically, merging the current second regular expression and the new second regular expression to obtain a corrected second regular expression. Therefore, the regular expression is richer, namely, a new digital constant recognition rule can be generated, and a potential under-classification sample is found in a new round of iteration.

S805, taking the corrected second regular expression as the current second regular expression, and executing S600.

In the embodiment of the invention, 16 log data sets provided by LogPai groups are used for carrying out experiments on the log template acquisition method provided by the embodiment of the invention. These 16 datasets cover the distributed system (HDFS, hadoop, zookeeper, openStack, spark), supercomputers (BGL, HPC), client applications (Proxifier, thunderbird), server applications (Apache, openSSH), mobile applications (HEALTHAPP), operating system (Windows, linux, mac, andriod). For each type of data, the LogPai team sampled 2000 representative logs and manually noted.

Experiments show that the log template acquisition method provided by the embodiment of the invention can achieve the analysis precision similar to that of a log analysis method based on a modern neural network, and the result strongly proves the effectiveness of a strategy of improving log analysis performance by manually giving a rule of a few word patterns.

The embodiment of the invention also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being configured to perform the methods of embodiments of the present invention.

The embodiment of the invention also provides a non-transitory computer readable storage medium, which stores computer executable instructions for executing the method according to the embodiment of the invention.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution disclosed in the present invention can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A log template acquisition method, the method comprising the steps of:

S100, acquiring an original log data set D= { D ₁,D₂,……,D_i,……,D_n},D_i to be processed, wherein i is the ith log in D, the value of i is 1 to n, and n is the number of the logs in D;

S200, sampling the D to obtain k initial key logs;

S300, acquiring a word segmentation symbol set, an initial first regular expression and an initial second regular expression based on the k initial key logs; the first regular expression is a regular expression corresponding to a non-digital variable, and the second regular expression is a regular expression corresponding to a constant containing a number;

S400, performing word segmentation processing on the D by using the word segmentation symbol set to obtain a corresponding word segmentation result set W= { W ₁,W₂,……,W_i,……,W_n};W_i as a word segmentation result corresponding to the D _i, wherein W _i={W_i1,W_i2,……,W_ij,……,W_if（i）},W_ij is the j-th word in the W _i, the value of j is 1 to f (i), and f (i) is the number of words in the W _i;

S500, for W _ij in W _i, if W _ij contains numbers, marking the type identifier corresponding to W _ij as a variable identifier, and if W _ij does not contain numbers, marking the type identifier corresponding to W _ij as a constant identifier;

S600, based on the current first regular expression and the current second regular expression, adjusting the type identifier corresponding to W _ij in W _i to obtain D with the type identifier adjusted, and using the D as a current log data set;

S700, acquiring a corresponding current log template set based on the current log data set, taking the current log template set as a target log template corresponding to the D if the current log template set is the same as the last log template set, and exiting the current control program, otherwise, executing S800;

2. The method according to claim 1, wherein S200 specifically comprises:

S201, for D _i, performing word segmentation on D _i by using a general segmentation character to obtain a corresponding word segmentation result W0 _i, and acquiring a special symbol set M _i corresponding to D _i; wherein W0 _i={W0_i1,W0_i2,……,W0_ir,……,W0_ig（i）};W0_ir is the r-th word in W0 _i, the value of r is 1 to g (i), and g (i) is the number of words in W0 _i; m _i={M_i1,M_i2,……,M_is,……,M_ih（i）},M_is is the s-th special symbol in M _i, the value of s is 1 to h (i), and h (i) is the number of special symbols in M _i;

S202, performing preliminary classification on all logs in the step D based on log lengths and special symbol sets to obtain m initial log clusters;

S203, setting a counter t=1;

s204, executing S205 if t is less than or equal to k, otherwise executing S208;

S205, randomly extracting a log cluster from the initial log clusters which are not extracted at present as a t-th sample log cluster C _t, randomly extracting d logs from C _t as t-th log sample candidate sets PC _t={PC_t1,PC_t2,……,PC_ta,……,PC_td};PC_ta as a-th log sample in PC _t, wherein the value of a is 1 to d;

S206, acquiring a log sample corresponding to the min (S _t1,S_t2,……,S_ta,……,S_td) as a t-th key log and adding the t-th key log into a current key log set; wherein S _ta is the maximum similarity ,S_ta=max（S¹ _ta,S² _ta,……,S^u _ta,……,S^p _ta）,S^u _ta between the PC _ta and the current logs in the key log set, the similarity between the PC _ta and the u-th log in the current key log set, the value of u is 1 to p, and p is the number of the current logs in the key log set; the initial value of the key log set is an empty set; min () represents taking the minimum value, and max () represents taking the maximum value;

s207, setting t=t+1, and executing S204;

S208, taking k logs in the current key log set as the k initial key logs.

3. The method according to claim 2, wherein S ^u _ta satisfies the following condition: s ^u _ta=LP_ta-u ^max/L_ta-u ^max, where LP _ta-u ^max is the length of the longest common subsequence between PC _ta and the u-th log in the current critical log set, L _ta-u ^max=max（L_ta,L_u）,L_ta is the length of PC _ta, L _u is the length of the u-th log in the current critical log set, and max () represents the maximum value.

4. The method of claim 1, wherein in S800, the adjusting the current first regular expression and using the adjusted first regular expression as the current first regular expression specifically includes:

S810, obtaining similarity SL _bv between any two log templates LT _b and LT _v in the current log template set; b and v have values of 1 to Q, wherein Q is the number of log templates in the current log template set, and b is not equal to v;

S811, if SL _bv > ST0, executing S812; ST0 is a preset similarity threshold;

S812, adding LT _b to the LTC if LT _b does not belong to the LTC, otherwise, not adding LT _b to the LTC; if LT _v does not belong to an LTC, adding LT _v to the LTC, otherwise, not adding LT _v to the LTC; the LTC is a current candidate log template set, and the initial value of the current candidate log template set is a null value;

s813, extracting q logs from each log cluster corresponding to the current candidate log template set as new key logs;

S814, generating a new first regular expression based on the new key log;

s815, generating a corrected first regular expression based on the current first regular expression and the new first regular expression;

S816, taking the corrected first regular expression as the current first regular expression, and executing S600.

5. The method of claim 4, wherein SL _bv satisfies the following condition: SL _bv=LP_bv ^max/L_bv ^max, wherein LP _bv ^max is the length of the longest common subsequence between LT _b and LT _v, LP _bv ^max=max（L_b,L_v）,L_b is the length of LT _b, L _v is the length of LT _v, max () denotes taking the maximum value.

6. The method of claim 1, wherein in S800, the adjusting the current second regular expression and using the adjusted second regular expression as the current second regular expression specifically includes:

S801, acquiring a word frequency set WF and a constant set CV corresponding to a current log data set; wherein WF= { WF ₁,WF₂,……,WF_e,……,WF_x};WF_e is the word frequency of the e-th word W _e in the word set WP corresponding to the current log data set, the value of e is 1 to x, x is the number of words ;WF=W^c ₁∪W^c ₂∪……∪W^c _i∪……∪W^c _n,W^c _i in the WP and is the word segmentation result corresponding to the i-th log data in the current log data set; cv=cv ₁∪CV₂∪……∪CV_i∪……∪CV_n,CV_i is a constant set corresponding to the ith log data corresponding to the current log data set;

S802, if WF _e is larger than WF0 and W _e does not belong to CV, extracting q logs from the logs corresponding to the log templates corresponding to W _e as new key logs; WF0 is a preset word frequency threshold;

s803, generating a new second regular expression based on the new key log;

S804, generating a corrected second regular expression based on the current second regular expression and the new second regular expression;

7. The method of claim 6, wherein WF0 satisfies the following condition: wf0=f×wf _max, f is a preset coefficient, 0 < f < 1, WF _max is the largest of WF.

8. An electronic device comprising a processor and a memory;

the processor is adapted to perform the steps of the method according to any of claims 1 to 7 by invoking a program or instruction stored in the memory.

9. A non-transitory computer-readable storage medium storing a program or instructions that cause a computer to perform the steps of the method of any one of claims 1 to 7.