CN114528845A - Abnormal log analysis method and device and electronic equipment - Google Patents
Abnormal log analysis method and device and electronic equipment Download PDFInfo
- Publication number
- CN114528845A CN114528845A CN202210151153.0A CN202210151153A CN114528845A CN 114528845 A CN114528845 A CN 114528845A CN 202210151153 A CN202210151153 A CN 202210151153A CN 114528845 A CN114528845 A CN 114528845A
- Authority
- CN
- China
- Prior art keywords
- abnormal
- log
- text
- historical
- processed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 332
- 238000004458 analytical method Methods 0.000 title claims abstract description 46
- 239000013598 vector Substances 0.000 claims abstract description 126
- 238000000034 method Methods 0.000 claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 32
- 239000000284 extract Substances 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000007405 data analysis Methods 0.000 description 34
- 238000012545 processing Methods 0.000 description 22
- 230000005856 abnormality Effects 0.000 description 14
- 230000008569 process Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 8
- 229920001098 polystyrene-block-poly(ethylene/propylene) Polymers 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000003064 k means clustering Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Probability & Statistics with Applications (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses an abnormal log analysis method and device and electronic equipment. Relates to the field of artificial intelligence, and the method comprises the following steps: acquiring a log to be processed which is generated when the system is abnormal; inputting the abnormal logs to be processed into a target language model obtained through pre-training to obtain a plurality of first-dimension vectors; determining a target first-dimension vector with the maximum similarity probability from the plurality of first-dimension vectors, and extracting a target label explanation text from a sentence pair corresponding to the target first-dimension vector; and taking the target label interpretation text as an analysis result of the exception log to be processed. The invention solves the technical problem of poor analysis efficiency caused by the dependence on manual analysis of the log file in the prior art.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to an abnormal log analysis method and device and electronic equipment.
Background
Systems often produce a large number of logs at runtime. Under the background of supporting various complex enterprise-level WEB services and big data services, various service frameworks are integrated, and the modes of recording logs are also various. The log mainly comprises two contents of logic description of system operation and state description of system operation. The logic description of the system operation is expressed in natural language understood by human beings, and when an error occurs in the system, the logic description describes an event which cannot be continuously executed by the system, such as what error occurs when a certain module is called, what error occurs when a certain external interface is accessed, the system resource is exhausted, and the like. The state description of the system runtime is represented as a set of structured data, such as the time stamp submitted by each job during the system runtime, the resource utilization rate, the data throughput, the job execution time, and the like. This series of parameters quantitatively describes the state of the system as it operates to a particular stage.
In the prior art, after an abnormality occurs in a system, an employee in charge of system operation and maintenance usually extracts a system log manually, and performs manual analysis on logic description and state description when the system has an error to find out a cause of the abnormality of the system, thereby causing a problem of poor analysis efficiency.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides an abnormal log analysis method and device and electronic equipment, and aims to at least solve the technical problem of poor analysis efficiency caused by the fact that log files are analyzed in a manual mode in the prior art.
According to an aspect of the embodiments of the present invention, there is provided an analysis method of an exception log, including: acquiring a log of the exception to be processed generated when the system is abnormal; inputting the abnormal logs to be processed into a target language model obtained by pre-training to obtain a plurality of first-dimension vectors, wherein each first-dimension vector represents the similar probability of each sentence pair corresponding to the abnormal logs to be processed and each sentence pair is generated by combining the abnormal logs to be processed and a label interpretation text, and the label interpretation text at least comprises abnormal detail information and/or abnormal solution of preset abnormal type labels; determining a target first-dimension vector with the maximum similarity probability from the plurality of first-dimension vectors, and extracting a target label explanation text from a sentence pair corresponding to the target first-dimension vector; and taking the target label interpretation text as an analysis result of the abnormal log to be processed.
Furthermore, each first-dimension vector in the plurality of first-dimension vectors corresponds to a second-dimension vector, wherein the second-dimension vectors represent the dissimilarity probability of each sentence pair corresponding to the to-be-processed exception log and the to-be-processed exception log.
Further, the analysis method of the anomaly log further comprises the following steps: before the to-be-processed abnormal logs are input into a target language model obtained through pre-training, acquiring a plurality of historical abnormal logs, wherein each historical abnormal log at least comprises a description text of system logic, and the description text of the system logic at least comprises description information of an event causing the system to be abnormal; and vectorizing each historical abnormal log according to the description text of the system logic to obtain a semantic vector corresponding to each historical abnormal log.
Further, the analysis method of the anomaly log further comprises the following steps: counting a first frequency of each word in a description text of system logic in each historical abnormal log and a second frequency of each word in all historical abnormal logs; calculating to obtain a weight value of each word in each historical abnormal log according to the first frequency and the second frequency; and in each historical abnormal log, carrying out weighted summation on the weight value of each word and the word semantic vector of each word to obtain the semantic vector corresponding to each historical abnormal log.
Further, the analysis method of the anomaly log further comprises the following steps: after vectorizing each historical abnormal log according to a description text of system logic to obtain a semantic vector corresponding to each historical abnormal log, clustering the semantic vectors corresponding to a plurality of historical abnormal logs to obtain an abnormal type corresponding to each historical abnormal log, wherein each abnormal type corresponds to at least one historical abnormal log; acquiring a preset abnormal type label corresponding to each abnormal type, and marking the preset abnormal type label on a corresponding historical abnormal log to obtain a marked historical abnormal log, wherein one abnormal type corresponds to one preset abnormal type label; and training according to the marked historical abnormal log to obtain a target language model.
Further, the analysis method of the anomaly log further comprises the following steps: performing text expansion on the marked historical abnormal log to obtain an expanded historical abnormal log, wherein the expanded historical abnormal log at least comprises: a description text of system logic, a description text of system state and a label interpretation text of a preset abnormal type label; and training the initial language model based on the expanded historical abnormal logs to obtain a target language model.
Further, the analysis method of the anomaly log further comprises the following steps: acquiring label interpretation texts of preset abnormal type labels, wherein each preset abnormal type label corresponds to at least one label interpretation text; under the condition that a preset abnormal type label corresponds to a plurality of label interpretation texts, each label interpretation text is respectively added into each marked historical abnormal log corresponding to the corresponding preset abnormal type label to obtain a plurality of expanded historical abnormal logs.
Further, the to-be-processed exception log at least includes a description text of the system logic and a description text of the system state, and the analysis method of the exception log further includes: the method comprises the steps of controlling a target language model to obtain a label interpretation text to be combined from at least one label interpretation text corresponding to each preset abnormal type label, inserting preset separators among a description text of system logic, a description text of system state and the label interpretation text to be combined to obtain a target text, inserting preset sentence start labels into sentence start positions of the target text to obtain sentence pairs corresponding to each preset abnormal type label, and generating a plurality of first-dimension vectors based on the sentence pairs corresponding to each preset abnormal type label.
According to another aspect of the embodiments of the present invention, there is also provided an apparatus for analyzing an abnormality log, including: the acquisition module is used for acquiring an exception log to be processed generated when the system is abnormal; the input module is used for inputting the abnormal logs to be processed into a target language model obtained by pre-training to obtain a plurality of first-dimension vectors, wherein each first-dimension vector represents the similar probability of each sentence pair corresponding to the abnormal logs to be processed and each sentence pair is generated by combining the abnormal logs to be processed and a label interpretation text, and the label interpretation text at least comprises abnormal detail information and/or abnormal solution of preset abnormal type labels; the first determining module is used for determining a target first-dimension vector with the maximum similarity probability from the plurality of first-dimension vectors and extracting a target label explanation text from a sentence pair corresponding to the target first-dimension vector; and the second determining module is used for taking the target label interpretation text as an analysis result of the abnormal log to be processed.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the above method for analyzing the abnormality log.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the above-mentioned method for analyzing the abnormality log.
In the embodiment of the invention, a mode of positioning the abnormal problem of the abnormal log to be processed based on a target language model is adopted, the abnormal log to be processed generated when the system is abnormal is obtained, then the abnormal log to be processed is input into the target language model obtained by pre-training to obtain a plurality of first-dimension vectors, then the target first-dimension vector with the maximum similarity probability is determined from the plurality of first-dimension vectors, and a target label interpretation text is extracted from a sentence pair corresponding to the target first-dimension vector, so that the target label interpretation text is used as the analysis result of the abnormal log to be processed. Each first-dimension vector represents the similarity probability of each sentence pair corresponding to the to-be-processed exception log and the to-be-processed exception log, each sentence pair is generated by combining the to-be-processed exception log and a label interpretation text, and the label interpretation text at least comprises exception detail information and/or exception solutions of preset exception type labels.
In the above process, the tag interpretation text in each sentence pair corresponds to different preset abnormal type tags, and each preset abnormal type tag corresponds to one abnormal type, so that the similarity probability between the abnormal log to be processed and each sentence pair corresponding to the abnormal log to be processed is obtained based on the target language model, and the similarity judgment between the abnormal type corresponding to the abnormal log to be processed and the abnormal type corresponding to each sentence pair is realized. Furthermore, the target label interpretation text is extracted from the sentence pair with the maximum similarity probability in the sentence pairs and is used as the analysis result of the abnormal log to be processed, so that the abnormal problem of the abnormal log to be processed is positioned, the abnormal log is prevented from being analyzed based on a manual mode, an operator can quickly acquire the occurrence reason of the abnormal log, the abnormal problem is quickly solved, and the analysis efficiency and the problem solution efficiency of the abnormal log are improved.
Therefore, the method and the device for processing the abnormal logs achieve the purpose of positioning the abnormal problems of the abnormal logs to be processed based on the target language model, achieve the technical effect of improving the analysis efficiency of the abnormal logs, and further solve the technical problem that the analysis efficiency is poor due to the fact that log files are analyzed in a manual mode in the prior art.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of an alternative method of analyzing an anomaly log according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an alternative training target language model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an optional text augmentation of a labeled historical exception log according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an alternative anomaly log analysis apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of a method for analyzing an anomaly log, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that presented herein.
Fig. 1 is a schematic diagram of an alternative method for analyzing an anomaly log according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S101, obtaining a to-be-processed abnormal log generated when the system is abnormal.
In step S101, a to-be-processed exception log generated when an exception occurs in the system may be obtained by an application system, a processor, an electronic device, or the like, and in this embodiment, the to-be-processed exception log is obtained by a data analysis system. Optionally, the newly generated exception log may be directly read from the system in which the exception occurs by the data analysis system, or the exception log to be processed may be input to the data analysis system by a related operator, the system in which the exception occurs, or another system, so as to be acquired by the data analysis system.
The to-be-processed exception log at least comprises a description text of system logic and a description text of system state. The description text of the system logic is expressed in natural language understood by human beings, and when an error occurs in the system, the description text describes events which cannot be continuously executed by the system, such as what kind of error occurs when a certain module is called, what kind of error occurs when a certain external interface is accessed, the system resources are exhausted, and the like; the description text of the system state is represented as a set of structured data, such as a time stamp submitted by each job during the system operation, resource utilization rate, data throughput, job execution time and the like, and a series of parameters quantitatively describe the state of the system when the system is operated to a specific stage.
It should be noted that, by obtaining the to-be-processed exception log generated when the system is abnormal, the subsequent analysis of the to-be-processed exception log is facilitated.
Step S102, inputting the abnormal log to be processed into a target language model obtained through pre-training to obtain a plurality of first-dimension vectors, wherein each first-dimension vector represents the similar probability of each sentence pair corresponding to the abnormal log to be processed and the abnormal log to be processed, each sentence pair is generated by combining the abnormal log to be processed and a label interpretation text, and the label interpretation text at least comprises abnormal detail information and/or abnormal solution of a preset abnormal type label.
In step S102, the data analysis system may input the aforementioned to-be-processed exception log into a pre-trained target language model, in this embodiment, the pre-trained target language model may be a bert (bidirectional Encoder retrieval from transforms) model, which is a classical large-scale pre-trained language model and is intended to pre-train the deep bidirectional representation of the text by jointly adjusting the contexts of all layers in the model.
Optionally, the data analysis system may control the target language model to combine the to-be-processed exception log with the plurality of tag interpretation texts to generate a sentence pair in which the to-be-processed exception log is combined with one tag interpretation text, and calculate a similarity probability between the to-be-processed exception log and the tag interpretation text in the sentence pair, so as to output a first-dimension vector corresponding to each sentence pair, that is, the similarity probability between the to-be-processed exception log and the tag interpretation text. Each label interpretation text is in a natural language form and corresponds to a preset abnormal type label, and each preset abnormal type label corresponds to an abnormal type. Each label text may be a detailed description of the exception type corresponding to the preset exception type label, may also be an explanation (i.e., a solution) of the exception type corresponding to the preset exception type label, and may also be a combination of the detailed description and the explanation of the exception type corresponding to the preset exception type label.
It should be noted that, because the tag interpretation text in each sentence pair corresponds to different preset abnormal type tags, and each preset abnormal type tag corresponds to one abnormal type, the similarity probability between the abnormal log to be processed and each sentence pair corresponding to the abnormal log to be processed is obtained based on the target language model, so that the similarity between the abnormal type corresponding to the abnormal log to be processed and the abnormal type corresponding to each sentence pair is determined, and the problem of subsequently positioning the abnormal log is solved.
Step S103, determining a target first-dimension vector with the maximum similarity probability from the plurality of first-dimension vectors, and extracting a target label interpretation text from a sentence pair corresponding to the target first-dimension vector.
In step S103, when the similarity probability between the to-be-processed abnormality log and the sentence pair is higher, the abnormality type of the to-be-processed abnormality log is more similar to the abnormality type corresponding to the sentence pair, and the first dimension vector is larger or closer to a certain value. Therefore, the data analysis system can determine a target first-dimension vector with the maximum similarity probability from the multiple first-dimension vectors based on the value of the first-dimension vector, and extract a target label interpretation text from the sentence pair corresponding to the target first-dimension vector, where the target label interpretation text is a relatively optimal interpretation of the abnormality reason corresponding to the to-be-processed abnormality log.
It should be noted that, by determining the target first-dimension vector with the maximum similarity probability and extracting the target label interpretation text corresponding to the target first-dimension vector, the determination of the relatively optimal interpretation of the anomaly reason corresponding to the abnormal log to be processed is realized, that is, the positioning of the abnormal problem of the abnormal log to be processed is realized.
And step S104, taking the target label interpretation text as an analysis result of the abnormal log to be processed.
In step S104, the data analysis system may use the determined target tag interpretation text as an analysis result of the to-be-processed abnormal log, and display the analysis result to an operator through a human-computer interaction interface, so that the operator can obtain the analysis result and maintain the system generating the to-be-processed abnormal log based on the analysis result.
It should be noted that the target label interpretation text is used as an analysis result of the abnormal log to be processed, so that the abnormal log is prevented from being analyzed manually, an operator can quickly obtain the occurrence reason of the abnormal log, the abnormal problem can be quickly solved based on the abnormal solution in the target label interpretation text, and the abnormal analysis efficiency and the problem solution efficiency are improved.
Based on the schemes defined in steps S101 to S104, it can be known that, in the embodiment of the present invention, a method for locating an abnormal problem of an abnormal log to be processed based on a target language model is adopted, the abnormal log to be processed generated when an abnormality occurs in a system is obtained, then the abnormal log to be processed is input into the target language model obtained by pre-training, a plurality of first-dimension vectors are obtained, then a target first-dimension vector with the maximum similarity probability is determined from the plurality of first-dimension vectors, and a target tag interpretation text is extracted from a sentence pair corresponding to the target first-dimension vector, so that the target tag interpretation text is used as an analysis result of the abnormal log to be processed. Each first-dimension vector represents the similarity probability of each sentence pair corresponding to the to-be-processed exception log and the to-be-processed exception log, each sentence pair is generated by combining the to-be-processed exception log and a label interpretation text, and the label interpretation text at least comprises exception detail information and/or exception solutions of preset exception type labels.
It is easy to note that, in the above process, since the tag interpretation text in each sentence pair corresponds to different preset exception type tags, and each preset exception type tag corresponds to an exception type, respectively, the similarity probability between the exception log to be processed and each sentence pair corresponding to the exception log to be processed is obtained based on the target language model, so that the similarity determination between the exception type corresponding to the exception log to be processed and the exception type corresponding to each sentence pair is realized. Furthermore, the target label interpretation text is extracted from the sentence pair with the maximum similarity probability in the sentence pairs and is used as the analysis result of the abnormal log to be processed, so that the abnormal problem of the abnormal log to be processed is positioned, the abnormal log is prevented from being analyzed based on a manual mode, an operator can quickly acquire the occurrence reason of the abnormal log, the abnormal problem is quickly solved, and the analysis efficiency and the problem solution efficiency of the abnormal log are improved.
Therefore, the scheme provided by the application achieves the purpose of positioning the abnormal problem of the abnormal log to be processed based on the target language model, so that the technical effect of improving the analysis efficiency of the abnormal log is achieved, and the technical problem of poor analysis efficiency caused by the fact that log files are analyzed in a manual mode in the prior art is solved.
In an optional embodiment, in the process of inputting the abnormal log to be processed into a target language model obtained through pre-training to obtain a plurality of first-dimension vectors, the data analysis module may control the target language model to obtain a tag interpretation text to be combined from at least one tag interpretation text corresponding to each preset abnormal type tag, insert a preset separator between a description text of system logic, a description text of system state and the tag interpretation text to be combined to obtain the target text, insert a preset sentence start tag at a sentence start position of the target text to obtain a sentence pair corresponding to each preset abnormal type tag, and generate a plurality of first-dimension vectors based on the sentence pair corresponding to each preset abnormal type tag.
Optionally, each preset abnormal type tag and at least one tag interpretation text corresponding to the preset abnormal type tag may be stored in a preset storage area, specifically, a storage area such as a database and a cloud server. After the data analysis module inputs the log to be processed into a target language model obtained by pre-training, the data analysis module can control the target language model to obtain a label interpretation text corresponding to each preset abnormal type label from a preset storage area, and the label interpretation text is used as the label interpretation text to be combined, wherein the target language model is a BERT model. For example, when there are ten kinds of preset abnormal type tags in the preset storage area, the data analysis module controls the target language model to obtain one tag interpretation text of a plurality of tag interpretation texts corresponding to each kind of preset abnormal type tag, so as to obtain ten tag interpretation texts to be combined, and each tag interpretation text to be combined corresponds to one kind of preset abnormal type tag.
Further, the data analysis module can control the target language model to perform text processing on the exception log to be processed. In the text processing process, the target language model may convert the character sequence of the input text < w 1.,. wi.,. wn > into < CLS,. w 1.,. wi.,. wn.,. SEP >, wherein "CLS" is a prescribed sequence start symbol, i.e., a preset sentence start tag, and no other language meaning, and "SEP" is a prescribed sequence end symbol or separator symbol, i.e., a preset separator symbol, and when a plurality of sentences are included in the character sequence of the input text, a corresponding number of "SEPs" may be used.
Specifically, in this embodiment, the target language model may extract a description text of the system logic and a description text of the system state of the to-be-processed exception log, and convert the description text of the system logic, the description text of the system state, and each of the to-be-combined tag interpretation texts into a < CLS, a description text of the system logic, an SEP, a tag interpretation text, an SEP > or a < CLS, a description text of the system logic, an SEP, a description text of the system state, an SEP, a preset exception type tag/tag interpretation text, and an SEP >, so as to obtain a sentence pair corresponding to each preset exception type tag. In the BERT model, a vector corresponding to "CLS" is generally used to express a vector of a character sequence of the entire input text.
Further, after obtaining the sentence pairs corresponding to each preset abnormal type tag, the data analysis system may control the target language model to perform text representation processing on each sentence pair to generate a plurality of first-dimension vectors. Specifically, in the BERT model, a multi-headed self-attention mechanism module is constructed. The basic calculation formula of each multi-head self-attention mechanism module is as follows:
wherein Softmax () represents a normalization processing function, Qr、Kr、VrRepresents three parts into which the sentence pair corresponding vector matrix is divided,respectively representing different weight matrices. Specifically, in a single tap r, the sentence pair corresponding vector matrix may be divided into Qr、Kr、VrThree parts, then Qr、Kr、VrThe three matrices are differentWeight matrix ofRespectively carrying out linear mapping and calculating similarity, determining a weight value based on the similarity, and carrying out corresponding weighted summation on each vector in the V matrix so as to obtain HrI.e. the result output by a single tap r.
Then, for all the different taps r, splicing the taps r, performing linear conversion on the original vector scale, and adding the taps r and the original input to obtain a text representation corresponding to the input text, wherein the formula is as follows:
X′=HWH+X=[H1,...,Hr,...,HR]WH+X
wherein, X' represents the text representation corresponding to the input text, and the text representation takes into account and constructs the relationship among the description text of the system logic, the description text of the system state and the label interpretation text to be combined, [ H ]1,...,Hr,...,HR]Representing individual taps, X representing the input text, i.e. sentence pair, WHRepresents a pair [ H1,...,Hr,...,HR]And performing linear conversion of the original vector scale.
Furthermore, the generated text representation vector is subjected to full-connection-layer MLP () to extract and compress the relation between different dimensions in the high-dimensional vector, and the vector dimension is reduced to adapt to the whole classification task. After the abnormal log to be processed is subjected to text processing, the task is converted into a binary classification problem which is used for judging whether the text pairs are similar or not through a model, so that the output of the MLP of the full connection layer is finally a two-dimensional vector to represent the similar probability of the abnormal log to be processed and each sentence pair corresponding to the abnormal log to be processed. Finally, in order to more intuitively represent the probability, the data analysis system may control the target language model to normalize the two-dimensional vector using the softmax function, converting all variables therein to decimal numbers in the range of [0,1 ]. The formula is as follows:
P=Softmax(MLP(X′CLS))
wherein Softmax () represents a normalization function(ii) MLP (X'CLS) And representing the output of MLP () of a full connection layer, wherein P represents that an exception log to be processed corresponding to an input sentence pair is divided into probability vectors of exception types corresponding to label explanation texts in the sentence pair, and the P vectors are two-dimensional vectors and at least comprise first-dimensional vectors, so that the acquisition of the first-dimensional vectors is realized.
It should be noted that the sentence pair is generated by controlling the target language model, and a plurality of first-dimension vectors are generated based on the sentence pair, so that the similarity between the exception log to be processed and each exception type is accurately determined.
In an optional embodiment, each of the plurality of first-dimension vectors corresponds to a second-dimension vector, where the second-dimension vectors represent the dissimilarity probability between the exception log to be processed and each sentence pair corresponding to the exception log to be processed.
Optionally, the aforementioned P-vector further includes a second dimension vector, and the data analysis system may determine, from the plurality of first dimension vectors, a target first dimension vector with the highest similarity probability by combining the value of the first dimension vector and the value of the second dimension vector. Therefore, more accurate judgment on the first-dimension vector of the target is improved, and the analysis accuracy is further improved.
In an optional embodiment, before the to-be-processed abnormal log is input into the pre-trained target language model, the data analysis system may obtain a plurality of historical abnormal logs, and then perform vectorization processing on each historical abnormal log according to the description text of the system logic to obtain a semantic vector corresponding to each historical abnormal log. Each historical abnormal log at least comprises a description text of the system logic, wherein the description text of the system logic at least comprises description information of an event causing the system to be abnormal.
Optionally, as shown in fig. 2, the data analysis system may obtain a plurality of historical abnormal logs based on manually input data, or may directly read from a storage device such as a related system or a memory. After acquiring a plurality of historical abnormal data, the data analysis system may extract a description text of the system logic in each historical abnormal data, and perform vectorization processing on the description text of the system logic in each historical abnormal data to obtain a semantic vector corresponding to the historical abnormal log.
It should be noted that, by obtaining the semantic vector corresponding to each historical abnormal log, the initial language model is conveniently trained based on each historical abnormal log in the following process to obtain the target language model.
In an optional embodiment, in the process of vectorizing each history abnormal log according to the description text of the system logic, the data analysis system may count a first frequency of each word appearing in each history abnormal log and a second frequency of each word appearing in all history abnormal logs in the description text of the system logic, and then calculate a weight value of each word in each history abnormal log according to the first frequency and the second frequency, so that in each history abnormal log, the weight value of each word and a word semantic vector of each word are weighted and summed to obtain a semantic vector corresponding to each history abnormal log.
Optionally, the data analysis system may perform vectorization processing on the description text of the system logic based on a term frequency-inverse text frequency index (TF-IDF). The TF-IDF is a technology for extracting key words in natural language paragraphs, and the TF-IDF score is obtained by multiplying the occurrence frequency of words in a certain paragraph by the frequency of inverse documents of the words in the whole natural language corpus. The higher the TF-IDF score, the higher the weight of the corresponding word in the paragraph in which the word is located, which means that the word has a higher contribution to the semantics of the whole paragraph.
Specifically, taking any one of the plurality of historical abnormal logs as an example, the data analysis system may obtain a frequency, i.e., a first frequency, of each word in the description text of the system logic appearing in the abnormal log, and obtain a probability, i.e., a second frequency, of each word in the description text of the system logic appearing in all the historical daily logs. And calculating the TF-IDF weight of each word to the historical abnormal log based on the first frequency and the second frequency by adopting a TF-IDF method. Further, the word semantic vectors of all the words in the history abnormal log are weighted and summed based on the aforementioned TF-IDF weight of each word, so that the semantic vector corresponding to the history abnormal log can be obtained.
It should be noted that the first frequency and the second frequency of each word in the description text of the system logic corresponding to each historical abnormal log are calculated, and the semantic vector corresponding to each historical abnormal log is calculated based on the first frequency and the second evaluation rate, so that the semantic vector corresponding to each historical abnormal log is accurately determined, and the natural language semantics contained in the historical abnormal log are accurately extracted.
In an optional embodiment, after vectorization processing is performed on each historical abnormal log according to a description text of system logic to obtain a semantic vector corresponding to each historical abnormal log, the data analysis system may perform clustering processing on the semantic vectors corresponding to the multiple historical abnormal logs to obtain an abnormal type corresponding to each historical abnormal log, then obtain a preset abnormal type tag corresponding to each abnormal type, label the preset abnormal type tag on the corresponding historical abnormal log to obtain a labeled historical abnormal log, and then train according to the labeled historical abnormal log to obtain a target language model. Each exception type corresponds to at least one historical exception log, and each exception type corresponds to a preset exception type label.
Optionally, as shown in fig. 2, an operator may determine the initial cluster number of the system abnormal log by using a development manual written by the system in a development stage and a related experience of problem aggregation of the system in an operation stage, may also determine the initial cluster number of the system abnormal log by using the data analysis system based on a preset numerical value, and may also calculate the initial cluster number of the system abnormal log by using the data analysis system based on a related algorithm. After the initial cluster number is determined, the data analysis system may perform multiple clustering calculations on semantic vectors corresponding to the multiple history abnormal logs based on the initial cluster number by using a K-means clustering algorithm (K-means clustering algorithm) to obtain clustering results of the semantic vectors corresponding to the multiple history abnormal logs, so as to determine an abnormal type corresponding to each history abnormal log based on the clustering results.
Further, after determining the exception type corresponding to each historical exception log, as shown in fig. 2, the data analysis system may obtain a preset exception type tag corresponding to each exception type, and mark the preset exception type tag on the corresponding historical exception log to obtain the marked historical exception log, where the marked historical exception file at least includes the preset exception type tag, a description text of system logic, and a description text of system state, and the preset exception type tag may be a number, a letter, or another identifier, or may be directly taken as a name of the exception type.
Furthermore, after the labeled historical abnormal log is obtained, the data analysis system takes the labeled historical abnormal log as a training sample to train the language model to be trained, so as to obtain the target language model.
It should be noted that, by determining the exception type corresponding to each historical exception log and labeling each historical exception log based on the exception type, the acquisition of the training sample is realized, so that the effective training of the initial language model can be realized, and the target language model capable of being accurately judged can be obtained.
In an optional embodiment, in the process of obtaining the target language model according to the labeled historical abnormal log training, the data analysis system may perform text expansion on the labeled historical abnormal log to obtain an expanded historical abnormal log, so that the initial language model is trained based on the expanded historical abnormal log to obtain the target language model. Wherein, the expanded historical abnormal log at least comprises: the system comprises a description text of system logic, a description text of system state and a label interpretation text of a preset abnormal type label.
Optionally, as shown in fig. 2, in order to deal with the condition of unbalanced distribution among system abnormal types, in the present application, text expansion may be performed on the training samples (i.e., labeled historical abnormal logs) to enrich the training samples, so that the distribution among the system abnormal types is more uniform. In the text expansion process, as shown in fig. 3, the data analysis system may extract a description text of system logic from the labeled historical abnormal log, extract an effective description text of the system state based on manual screening, and create a tag interpretation text or obtain the tag interpretation text from a preset storage area. And further, based on the relation among the historical abnormal logs, the abnormal types and the label interpretation texts, converting the multi-classification tasks into sentence pair tasks, namely generating sentence pairs corresponding to the historical abnormal files based on the historical abnormal logs, the abnormal types and the label interpretation texts, and then transforming the sentence pairs to realize text expansion.
Further, after text expansion is performed on the historical abnormal logs, the data analysis system may train the initial language model based on the expanded historical abnormal logs to obtain the target language model. In this embodiment, the method provided by the present application may be implemented by using python language, the version of which is python 3.7, and may use PyTorch framework as a support library of a deep learning model, and use open-source hugging face as a BERT pre-training model (i.e., an initial language model) training framework based on PyTorch, where the BERT pre-training model may use BERT-Base version published by Google, and collectively includes 110M parameters.
It should be noted that, in general, the types of the abnormal log are not evenly distributed. In one aspect, an unbalanced data set may cause a language model to be prone to classes with a dominant data distribution when predicted, resulting in a model that is prone to fail in predicting some more severe but less dominant outlier classes. On the other hand, the data volume of the abnormal log may not meet the requirement of training a usable and effective model, so that the marked historical abnormal log is subjected to text expansion, the training data can be effectively expanded, and the robustness of problem location on the abnormal log is improved.
In an optional embodiment, in the process of performing text expansion on the labeled historical abnormal logs to obtain the expanded historical abnormal logs, the data analysis system may obtain the tag interpretation texts of the preset abnormal type tags, and add each tag interpretation text to each labeled historical abnormal log corresponding to the corresponding preset abnormal type tag under the condition that one preset abnormal type tag corresponds to a plurality of tag interpretation texts, so as to obtain a plurality of expanded historical abnormal logs. Each preset abnormal type label corresponds to at least one label interpretation text.
Specifically, in conventional applications, a typical multi-classification task (text classification) has a dataset format of: < sentence, tag >, and based on this formatted data set direct multi-classification de-prediction, no additional information is used because the correlation system or model seeks to de-map tags directly from the input text. In the application, each exception type has a set of exception resolution mechanism, so that the data set format of the typical multi-classification task can be converted into a description text of system logic and a description text of a system state of a < historical exception log, a sentence _2 and a tag (0, 1) >, where the content of the sentence _2 is a tag interpretation text, and the tag (0, 1) represents a preset exception type tag corresponding to the tag interpretation text in each sentence _ 2.
Further, after obtaining the sentence pair corresponding to each historical exception log, as shown in fig. 3, the data analysis system may obtain at least one tag interpretation text corresponding to each preset exception type tag, and generate an interpretation dictionary. The format of each pair of entries in the interpreted dictionary is as follows:
{
the "label": [ explain 1,. cndot.,. explain N ],
"tag coping": [ response to 1, ·, response to M ]
}
Wherein, the [ explanation 1, ·, explanation N ] represents the abnormal detail information of the corresponding preset abnormal type label in the label explanation text, the [ reply 1, ·, reply M ] represents the abnormal solution of the corresponding preset abnormal type label in the label explanation text, when only explanation or only label exists in the label explanation text, M + N label explanation texts can be generated, when only explanation and label exist in the label explanation text, M × N label explanation texts can be generated, when only explanation or only label exists in the label explanation text, M × N + M + N label explanation texts can be generated
And then, based on the interpretation dictionary, expanding the content corresponding to the sentence _2 in the sentence pair. That is, for any history exception log, if the exception type in the corresponding sentence pair corresponds to a plurality of the above-mentioned tag interpretation texts, a sentence pair corresponding to the history exception log may be generated based on each tag interpretation text, so that one history exception log has a plurality of sentence pairs corresponding thereto and belonging to the same exception type, thereby implementing text expansion. It should be noted that, in the foregoing text expansion process, the data analysis system may level up the unbalanced data distribution according to the data amount of each anomaly type based on the actual situation, that is, the data amount of each anomaly type is uniformly distributed in the multiple expanded historical anomaly logs (i.e., sentence pairs) finally obtained.
It should be noted that the historical abnormal logs are expanded based on at least one label interpretation text corresponding to each preset abnormal type label, so that the effective expansion of the historical abnormal logs is realized, the unbalanced data distribution in the training data is avoided, and the robustness of the target language model is improved.
It should be noted that, in the present application, on the one hand, based on the description text of the system logic in the exception log, a TF-IDF algorithm is adopted to perform text vectorization characterization, and a K-Means clustering algorithm is utilized to realize large-scale and rapid automatic exception and error induction, and on the other hand. Based on a BERT model, a description text of system logic and a description text of system state in an abnormal log are fused for modeling, the relation between the state and the logic hidden danger in the abnormal log can be comprehensively exposed, problem positioning is automatically carried out, on the other hand, a multi-classification task is converted into a sentence pair similarity judgment task, data are expanded, and the robustness of the abnormal log problem positioning method can be effectively improved.
Therefore, the method and the device for processing the abnormal logs achieve the purpose of positioning the abnormal problems of the abnormal logs to be processed based on the target language model, achieve the technical effect of improving the analysis efficiency of the abnormal logs, and further solve the technical problem that the analysis efficiency is poor due to the fact that log files are analyzed in a manual mode in the prior art.
Example 2
According to an embodiment of the present invention, an embodiment of an apparatus for analyzing an anomaly log is provided, where fig. 4 is a schematic diagram of an optional apparatus for analyzing an anomaly log according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes:
an obtaining module 401, configured to obtain an exception log to be processed, where the exception log is generated when a system is abnormal;
an input module 402, configured to input the to-be-processed exception log into a target language model obtained through pre-training, so as to obtain a plurality of first-dimension vectors, where each first-dimension vector represents a similar probability of the to-be-processed exception log and each sentence pair corresponding to the to-be-processed exception log, each sentence pair is generated by combining the to-be-processed exception log and a tag interpretation text, and the tag interpretation text at least includes exception detail information and/or an exception solution of a preset exception type tag;
a first determining module 403, configured to determine a target first-dimension vector with the highest similarity probability from the multiple first-dimension vectors, and extract a target tag interpretation text from a sentence pair corresponding to the target first-dimension vector;
and a second determining module 404, configured to use the target tag interpretation text as an analysis result of the to-be-processed exception log.
It should be noted that the obtaining module 401, the inputting module 402, the first determining module 403, and the second determining module 404 correspond to steps S101 to S104 in the above embodiment, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1.
Optionally, each first-dimension vector in the plurality of first-dimension vectors corresponds to a second-dimension vector, where the second-dimension vector represents a dissimilarity probability between the abnormal log to be processed and each sentence pair corresponding to the abnormal log to be processed.
Optionally, the apparatus for analyzing an anomaly log further includes: the system comprises a first sub-acquisition module, a second sub-acquisition module and a third sub-acquisition module, wherein the first sub-acquisition module is used for acquiring a plurality of historical exception logs, each historical exception log at least comprises a description text of system logic, and the description text of the system logic at least comprises description information of an event causing exception of a system; and the first processing module is used for carrying out vectorization processing on each historical abnormal log according to the description text of the system logic to obtain a semantic vector corresponding to each historical abnormal log.
Optionally, the first processing module further includes: the statistical module is used for counting a first frequency of each word in a description text of the system logic in each historical abnormal log and a second frequency of each word in all the historical abnormal logs; the first calculation module is used for calculating a weight value of each word in each historical abnormal log according to the first frequency and the second frequency; and the second calculation module is used for weighting and summing the weight value of each word and the word semantic vector of each word in each history abnormal log to obtain the semantic vector corresponding to each history abnormal log.
Optionally, the apparatus for analyzing an anomaly log further includes: the second processing module is used for clustering semantic vectors corresponding to the historical abnormal logs to obtain an abnormal type corresponding to each historical abnormal log, wherein each abnormal type corresponds to at least one historical abnormal log; the second sub-acquisition module is used for acquiring a preset abnormal type label corresponding to each abnormal type, marking the preset abnormal type label on a corresponding historical abnormal log and obtaining a marked historical abnormal log, wherein one abnormal type corresponds to one preset abnormal type label; and the third processing module is used for training according to the marked historical abnormal log to obtain a target language model.
Optionally, the third processing module further includes: the text expansion module is used for performing text expansion on the marked historical abnormal log to obtain an expanded historical abnormal log, wherein the expanded historical abnormal log at least comprises: a description text of system logic, a description text of system state and a label interpretation text of a preset abnormal type label; and the fourth processing module is used for training the initial language model based on the expanded historical abnormal logs to obtain the target language model.
Optionally, the text extension module further includes: the third sub-acquisition module is used for acquiring label interpretation texts of preset abnormal type labels, wherein each preset abnormal type label corresponds to at least one label interpretation text; and the fifth processing module is used for respectively adding each label interpretation text into each marked historical abnormal log corresponding to the corresponding preset abnormal type label to obtain a plurality of expanded historical abnormal logs under the condition that the preset abnormal type label corresponds to the plurality of label interpretation texts.
Optionally, the to-be-processed exception log at least includes a description text of the system logic and a description text of the system state, and the input module further includes: and the sixth processing module is used for controlling the target language model to obtain a label interpretation text to be combined from at least one label interpretation text corresponding to each preset abnormal type label, inserting a preset separator between a description text of system logic, a description text of system state and the label interpretation text to be combined to obtain the target text, inserting a preset sentence start label at the sentence start position of the target text to obtain a sentence pair corresponding to each preset abnormal type label, and generating a plurality of first-dimension vectors based on the sentence pair corresponding to each preset abnormal type label.
Example 3
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the above method for analyzing the abnormality log.
Example 4
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, where fig. 5 is a schematic diagram of an alternative electronic device according to the embodiments of the present invention, and as shown in fig. 5, the electronic device includes a memory and a processor, the memory stores a computer program, and the processor is configured to execute the computer program to perform the above-mentioned method for analyzing the abnormality log.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit may be a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The above is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of modifications and embellishments can be made without departing from the principle of the present invention, and these modifications and embellishments should also be regarded as the protection scope of the present invention.
Claims (11)
1. An analysis method for an exception log, comprising:
acquiring a log of the exception to be processed generated when the system is abnormal;
inputting the abnormal log to be processed into a target language model obtained by pre-training to obtain a plurality of first-dimension vectors, wherein each first-dimension vector represents the similar probability of each sentence pair corresponding to the abnormal log to be processed and the abnormal log to be processed, each sentence pair is generated by combining the abnormal log to be processed and a label interpretation text, and the label interpretation text at least comprises abnormal detail information and/or abnormal solution of a preset abnormal type label;
determining a target first-dimension vector with the maximum similarity probability from the plurality of first-dimension vectors, and extracting a target label explanation text from a sentence pair corresponding to the target first-dimension vector;
and taking the target label interpretation text as an analysis result of the to-be-processed abnormal log.
2. The method according to claim 1, wherein each of the plurality of first-dimension vectors corresponds to a second-dimension vector, and wherein the second-dimension vectors represent the dissimilarity probability between the to-be-processed anomaly log and each sentence pair corresponding to the to-be-processed anomaly log.
3. The method of claim 1, wherein before inputting the exception log to be processed into a pre-trained target language model, the method further comprises:
acquiring a plurality of historical exception logs, wherein each historical exception log at least comprises a description text of system logic, and the description text of the system logic at least comprises description information of an event causing the system to be in exception;
and vectorizing each history abnormal log according to the description text of the system logic to obtain a semantic vector corresponding to each history abnormal log.
4. The method according to claim 3, wherein vectorizing each of the historical exception logs according to the description text of the system logic to obtain a semantic vector corresponding to each of the historical exception logs comprises:
counting a first frequency of each word in a description text of the system logic in each historical abnormal log and a second frequency of each word in all historical abnormal logs;
calculating a weight value of each word in each historical abnormal log according to the first frequency and the second frequency;
in each history abnormal log, carrying out weighted summation on the weighted value of each word and the word semantic vector of each word to obtain the semantic vector corresponding to each history abnormal log.
5. The method according to claim 3, wherein after vectorizing each of the historical exception logs according to the description text of the system logic to obtain a semantic vector corresponding to each of the historical exception logs, the method further comprises:
clustering semantic vectors corresponding to the plurality of historical abnormal logs to obtain an abnormal type corresponding to each historical abnormal log, wherein each abnormal type corresponds to at least one historical abnormal log;
acquiring a preset abnormal type label corresponding to each abnormal type, and marking the preset abnormal type label on a corresponding historical abnormal log to obtain a marked historical abnormal log, wherein one abnormal type corresponds to one preset abnormal type label;
and training according to the labeled historical abnormal log to obtain the target language model.
6. The method of claim 5, wherein training the target language model according to the labeled historical abnormal log comprises:
performing text expansion on the marked historical abnormal log to obtain an expanded historical abnormal log, wherein the expanded historical abnormal log at least comprises: a description text of the system logic, a description text of the system state and a label interpretation text of the preset abnormal type label;
training an initial language model based on the expanded historical abnormal logs to obtain the target language model.
7. The method of claim 6, wherein performing text expansion on the labeled historical anomaly log to obtain an expanded historical anomaly log comprises:
acquiring label interpretation texts of the preset abnormal type labels, wherein each preset abnormal type label corresponds to at least one label interpretation text;
under the condition that a preset abnormal type label corresponds to a plurality of label interpretation texts, each label interpretation text is respectively added into each marked historical abnormal log corresponding to the corresponding preset abnormal type label to obtain a plurality of expanded historical abnormal logs.
8. The method according to claim 1, wherein the to-be-processed exception log at least includes a description text of system logic and a description text of system status, and wherein inputting the to-be-processed exception log into a pre-trained target language model to obtain a plurality of first-dimension vectors includes:
the target language model is controlled to obtain a label interpretation text to be combined from at least one label interpretation text corresponding to each preset abnormal type label, preset separators are inserted among the description text of the system logic, the description text of the system state and the label interpretation text to be combined to obtain a target text, preset sentence beginning labels are inserted at the sentence beginning positions of the target text to obtain sentence pairs corresponding to each preset abnormal type label, and the plurality of first-dimension vectors are generated based on the sentence pairs corresponding to each preset abnormal type label.
9. An apparatus for analyzing an anomaly log, the apparatus comprising:
the acquisition module is used for acquiring an exception log to be processed generated when the system is abnormal;
the input module is used for inputting the abnormal log to be processed into a target language model obtained by pre-training to obtain a plurality of first-dimension vectors, wherein each first-dimension vector represents the similar probability of each sentence pair corresponding to the abnormal log to be processed and the abnormal log to be processed, each sentence pair is generated by combining the abnormal log to be processed and a label interpretation text, and the label interpretation text at least comprises abnormal detail information and/or abnormal solution of a preset abnormal type label;
a first determining module, configured to determine a target first-dimension vector with a maximum similarity probability from the multiple first-dimension vectors, and extract a target tag interpretation text from a sentence pair corresponding to the target first-dimension vector;
and the second determining module is used for taking the target label interpretation text as an analysis result of the to-be-processed abnormal log.
10. A computer-readable storage medium, comprising a stored program, wherein when the program runs, the program controls a device on which the computer-readable storage medium is located to execute the method for analyzing an anomaly log according to any one of claims 1 to 8.
11. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method of analyzing an anomaly log according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210151153.0A CN114528845A (en) | 2022-02-14 | 2022-02-14 | Abnormal log analysis method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210151153.0A CN114528845A (en) | 2022-02-14 | 2022-02-14 | Abnormal log analysis method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114528845A true CN114528845A (en) | 2022-05-24 |
Family
ID=81622180
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210151153.0A Pending CN114528845A (en) | 2022-02-14 | 2022-02-14 | Abnormal log analysis method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114528845A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114996536A (en) * | 2022-08-08 | 2022-09-02 | 深圳市信润富联数字科技有限公司 | Maintenance scheme query method, device, equipment and computer readable storage medium |
CN115687031A (en) * | 2022-11-15 | 2023-02-03 | 北京优特捷信息技术有限公司 | Method, device, equipment and medium for generating alarm description text |
CN115981910A (en) * | 2023-03-20 | 2023-04-18 | 建信金融科技有限责任公司 | Method, device, electronic equipment and computer readable medium for processing exception request |
CN116089231A (en) * | 2023-02-13 | 2023-05-09 | 北京优特捷信息技术有限公司 | Fault alarm method and device, electronic equipment and storage medium |
CN117389980A (en) * | 2023-12-08 | 2024-01-12 | 成都康特电子科技股份有限公司 | Log file analysis method and device, computer equipment and readable storage medium |
CN118468043A (en) * | 2024-07-12 | 2024-08-09 | 北京广通优云科技股份有限公司 | Abnormality detection method based on normal log correction in operation and maintenance system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104461842A (en) * | 2013-09-23 | 2015-03-25 | 伊姆西公司 | Log similarity based failure processing method and device |
CN105653444A (en) * | 2015-12-23 | 2016-06-08 | 北京大学 | Internet log data-based software defect failure recognition method and system |
CN109213655A (en) * | 2018-07-19 | 2019-01-15 | 东软集团股份有限公司 | Method, apparatus, storage medium and equipment are determined for the solution of alarm |
CN112433874A (en) * | 2020-11-05 | 2021-03-02 | 北京浪潮数据技术有限公司 | Fault positioning method, system, electronic equipment and storage medium |
US20210240691A1 (en) * | 2020-01-30 | 2021-08-05 | International Business Machines Corporation | Anomaly identification in log files |
-
2022
- 2022-02-14 CN CN202210151153.0A patent/CN114528845A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104461842A (en) * | 2013-09-23 | 2015-03-25 | 伊姆西公司 | Log similarity based failure processing method and device |
CN105653444A (en) * | 2015-12-23 | 2016-06-08 | 北京大学 | Internet log data-based software defect failure recognition method and system |
CN109213655A (en) * | 2018-07-19 | 2019-01-15 | 东软集团股份有限公司 | Method, apparatus, storage medium and equipment are determined for the solution of alarm |
US20210240691A1 (en) * | 2020-01-30 | 2021-08-05 | International Business Machines Corporation | Anomaly identification in log files |
CN112433874A (en) * | 2020-11-05 | 2021-03-02 | 北京浪潮数据技术有限公司 | Fault positioning method, system, electronic equipment and storage medium |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114996536A (en) * | 2022-08-08 | 2022-09-02 | 深圳市信润富联数字科技有限公司 | Maintenance scheme query method, device, equipment and computer readable storage medium |
CN115687031A (en) * | 2022-11-15 | 2023-02-03 | 北京优特捷信息技术有限公司 | Method, device, equipment and medium for generating alarm description text |
CN116089231A (en) * | 2023-02-13 | 2023-05-09 | 北京优特捷信息技术有限公司 | Fault alarm method and device, electronic equipment and storage medium |
CN116089231B (en) * | 2023-02-13 | 2023-09-15 | 北京优特捷信息技术有限公司 | Fault alarm method and device, electronic equipment and storage medium |
CN115981910A (en) * | 2023-03-20 | 2023-04-18 | 建信金融科技有限责任公司 | Method, device, electronic equipment and computer readable medium for processing exception request |
CN117389980A (en) * | 2023-12-08 | 2024-01-12 | 成都康特电子科技股份有限公司 | Log file analysis method and device, computer equipment and readable storage medium |
CN117389980B (en) * | 2023-12-08 | 2024-02-09 | 成都康特电子科技股份有限公司 | Log file analysis method and device, computer equipment and readable storage medium |
CN118468043A (en) * | 2024-07-12 | 2024-08-09 | 北京广通优云科技股份有限公司 | Abnormality detection method based on normal log correction in operation and maintenance system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111859960B (en) | Semantic matching method, device, computer equipment and medium based on knowledge distillation | |
CN114528845A (en) | Abnormal log analysis method and device and electronic equipment | |
US10417350B1 (en) | Artificial intelligence system for automated adaptation of text-based classification models for multiple languages | |
CN110580308B (en) | Information auditing method and device, electronic equipment and storage medium | |
US20150095017A1 (en) | System and method for learning word embeddings using neural language models | |
WO2022048363A1 (en) | Website classification method and apparatus, computer device, and storage medium | |
CN106778878B (en) | Character relation classification method and device | |
CN111930792A (en) | Data resource labeling method and device, storage medium and electronic equipment | |
CN114218958A (en) | Work order processing method, device, equipment and storage medium | |
CN112433874A (en) | Fault positioning method, system, electronic equipment and storage medium | |
CN116402630B (en) | Financial risk prediction method and system based on characterization learning | |
CN114969334B (en) | Abnormal log detection method and device, electronic equipment and readable storage medium | |
CN115809887A (en) | Method and device for determining main business range of enterprise based on invoice data | |
CN114491034B (en) | Text classification method and intelligent device | |
CN113761875B (en) | Event extraction method and device, electronic equipment and storage medium | |
CN112685548B (en) | Question answering method, electronic device and storage device | |
CN116798417B (en) | Voice intention recognition method, device, electronic equipment and storage medium | |
CN109902162B (en) | Text similarity identification method based on digital fingerprints, storage medium and device | |
CN114936326A (en) | Information recommendation method, device, equipment and storage medium based on artificial intelligence | |
KR102215259B1 (en) | Method of analyzing relationships of words or documents by subject and device implementing the same | |
CN114676699A (en) | Entity emotion analysis method and device, computer equipment and storage medium | |
CN111199170B (en) | Formula file identification method and device, electronic equipment and storage medium | |
CN114328894A (en) | Document processing method, document processing device, electronic equipment and medium | |
CN111782601A (en) | Electronic file processing method and device, electronic equipment and machine readable medium | |
Bouhoun et al. | Information Retrieval Using Domain Adapted Language Models: Application to Resume Documents for HR Recruitment Assistance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |