CN114528845A

CN114528845A - Abnormal log analysis method and device and electronic equipment

Info

Publication number: CN114528845A
Application number: CN202210151153.0A
Authority: CN
Inventors: 杨济银
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-05-24
Anticipated expiration: 2042-02-14
Also published as: CN114528845B

Abstract

The invention discloses an abnormal log analysis method and device and electronic equipment. Relates to the field of artificial intelligence, and the method comprises the following steps: acquiring a log to be processed which is generated when the system is abnormal; inputting the abnormal logs to be processed into a target language model obtained through pre-training to obtain a plurality of first-dimension vectors; determining a target first-dimension vector with the maximum similarity probability from the plurality of first-dimension vectors, and extracting a target label explanation text from a sentence pair corresponding to the target first-dimension vector; and taking the target label interpretation text as an analysis result of the exception log to be processed. The invention solves the technical problem of poor analysis efficiency caused by the dependence on manual analysis of the log file in the prior art.

Description

Analysis method, device and electronic device for abnormal log

技术领域technical field

本发明涉及人工智能领域，具体而言，涉及一种异常日志的分析方法、装置及电子设备。The present invention relates to the field of artificial intelligence, and in particular, to a method, device and electronic device for analyzing abnormal logs.

背景技术Background technique

系统在运行时往往产生大量日志。在如今多样复杂的企业级WEB服务和大数据服务支持背景下，不仅集成了各种服务框架，记录日志的方式也多种多样。日志主要包含系统运行的逻辑描述和系统运行时的状态描述两大内容。系统运行的逻辑描述表现为以人类能理解的自然语言，当系统出现错误时，它描述了系统无法继续执行下去的事件，如调用某个模块时发生何种错误、访问某个外部接口时发生何种错误、系统资源耗尽等描述。系统运行时的状态描述表现为一组结构化的数据，如系统运行时各作业提交的时间戳、资源使用率、数据吞吐量，作业执行时间等。这一系列参数定量地描述了系统运行到某个具体阶段时的状态。The system often generates a lot of logs when it is running. Under the background of today's diverse and complex enterprise-level WEB services and big data services, not only various service frameworks are integrated, but there are also various ways to record logs. The log mainly contains two main contents: logical description of system operation and state description of system operation. The logical description of system operation is expressed in natural language that humans can understand. When the system has an error, it describes the events that the system cannot continue to execute, such as what kind of error occurs when calling a module, or when accessing an external interface. Description of what kind of errors, exhaustion of system resources, etc. The state description when the system is running is represented by a set of structured data, such as the timestamps submitted by each job when the system is running, resource usage, data throughput, and job execution time. This series of parameters quantitatively describes the state of the system when it reaches a specific stage.

在现有技术中，当系统发生异常之后，负责系统运维的员工通常手动地提取出系统日志，并对系统发生错误时的逻辑描述和状态描述进行人工分析以找到系统异常的原因，从而造成分析效率差的问题。In the prior art, when an abnormality occurs in the system, the staff in charge of system operation and maintenance usually manually extracts the system log, and manually analyzes the logic description and state description of the system when an error occurs to find the cause of the system abnormality, thereby causing Analysis of poor efficiency.

针对上述的问题，目前尚未提出有效的解决方案。For the above problems, no effective solution has been proposed yet.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供了一种异常日志的分析方法、装置及电子设备，以至少解决现有技术中由于依赖人工方式分析日志文件造成的分析效率差的技术问题。Embodiments of the present invention provide an abnormal log analysis method, device, and electronic device, so as to at least solve the technical problem of poor analysis efficiency caused by relying on manual analysis of log files in the prior art.

根据本发明实施例的一个方面，提供了一种异常日志的分析方法，包括：获取系统发生异常时所产生的待处理异常日志；将待处理异常日志输入至预先训练得到的目标语言模型中，得到多个第一维向量，其中，每个第一维向量表征待处理异常日志与待处理异常日志所对应的每个句子对的相似概率，每个句子对由待处理异常日志与一个标签解释文本组合生成，标签解释文本至少包括预设异常类型标签的异常详情信息和/或异常解决方案；从多个第一维向量中确定相似概率最大的目标第一维向量，并从目标第一维向量对应的句子对中提取目标标签解释文本；将目标标签解释文本作为待处理异常日志的分析结果。According to an aspect of the embodiments of the present invention, a method for analyzing an exception log is provided, including: acquiring a pending exception log generated when an exception occurs in a system; inputting the pending exception log into a pre-trained target language model, Obtain multiple first-dimensional vectors, where each first-dimensional vector represents the similarity probability of each sentence pair corresponding to the pending exception log and the pending exception log, and each sentence pair is explained by the pending exception log and a label The text combination is generated, and the label explanation text includes at least the abnormal details and/or the abnormal solution of the preset abnormal type label; the target first-dimensional vector with the highest similarity probability is determined from the multiple first-dimensional vectors, and the target first-dimensional vector is determined from the target first-dimensional vector. Extract the target label interpretation text from the sentence pair corresponding to the vector; take the target label interpretation text as the analysis result of the abnormal log to be processed.

进一步地，多个第一维向量中的每个第一维向量分别对应一个第二维向量，其中，第二维向量表征待处理异常日志与待处理异常日志所对应的每个句子对的不相似概率。Further, each first-dimension vector in the plurality of first-dimension vectors corresponds to a second-dimension vector, wherein the second-dimension vector represents the difference between the to-be-processed exception log and each sentence pair corresponding to the to-be-processed exception log. Similar probability.

进一步地，异常日志的分析方法还包括：在将待处理异常日志输入至预先训练得到的目标语言模型中之前，获取多个历史异常日志，其中，每个历史异常日志中至少包含系统逻辑的描述文本，其中，系统逻辑的描述文本至少包括导致系统发生异常的事件的描述信息；根据系统逻辑的描述文本对每个历史异常日志进行向量化处理，得到每个历史异常日志对应的语义向量。Further, the method for analyzing the abnormal log further includes: before inputting the abnormal log to be processed into the target language model obtained by pre-training, acquiring a plurality of historical abnormal logs, wherein each historical abnormal log contains at least a description of the system logic Text, wherein the description text of the system logic includes at least description information of the event that caused the system to occur abnormally; vectorize each historical exception log according to the description text of the system logic, and obtain the semantic vector corresponding to each historical exception log.

进一步地，异常日志的分析方法还包括：统计系统逻辑的描述文本中每个单词在每个历史异常日志中出现的第一频率以及每个单词在所有历史异常日志中出现的第二频率；根据第一频率以及第二频率计算得到每个单词在每个历史异常日志中的权重值；在每个历史异常日志中，对每个单词的权重值与每个单词的单词语义向量进行加权求和，得到每个历史异常日志所对应的语义向量。Further, the method for analyzing abnormal logs also includes: the first frequency of each word appearing in each historical abnormal log in the description text of the statistical system logic and the second frequency of each word appearing in all historical abnormal logs; according to The first frequency and the second frequency are calculated to obtain the weight value of each word in each historical abnormal log; in each historical abnormal log, the weight value of each word and the word semantic vector of each word are weighted and summed , get the semantic vector corresponding to each historical exception log.

进一步地，异常日志的分析方法还包括：在根据系统逻辑的描述文本对每个历史异常日志进行向量化处理，得到每个历史异常日志对应的语义向量之后，对多个历史异常日志对应的语义向量进行聚类处理，得到每个历史异常日志对应的异常类型，其中，每种异常类型对应至少一个历史异常日志；获取每种异常类型对应的预设异常类型标签，并将预设异常类型标签标注在对应的历史异常日志上，得到标注后的历史异常日志，其中，一种异常类型与一个预设异常类型标签相对应；根据标注后的历史异常日志训练得到目标语言模型。Further, the analysis method of the abnormal log further includes: after performing vectorization processing on each historical abnormal log according to the description text of the system logic to obtain the semantic vector corresponding to each historical abnormal log, the semantic vector corresponding to each historical abnormal log is analyzed. The vector is clustered to obtain the abnormality type corresponding to each historical abnormality log, wherein each abnormality type corresponds to at least one historical abnormality log; obtain the preset abnormality type label corresponding to each abnormality type, and assign the preset abnormality type label Annotated on the corresponding historical abnormal log, the marked historical abnormal log is obtained, wherein one abnormal type corresponds to a preset abnormal type label; the target language model is obtained by training according to the marked historical abnormal log.

进一步地，异常日志的分析方法还包括：对标注后的历史异常日志进行文本扩充，得到扩充后的历史异常日志，其中，扩充后的历史异常日志至少包括：系统逻辑的描述文本、系统状态的描述文本以及预设异常类型标签的标签解释文本；基于扩充后的历史异常日志对初始语言模型进行训练，得到目标语言模型。Further, the analysis method of the abnormal log also includes: performing text expansion on the marked historical abnormal log to obtain the expanded historical abnormal log, wherein the expanded historical abnormal log at least includes: a description text of the system logic, a description of the system state. Descriptive text and label explanation text with preset exception type labels; train the initial language model based on the expanded historical exception log to obtain the target language model.

进一步地，异常日志的分析方法还包括：获取预设异常类型标签的标签解释文本，其中，每一种预设异常类型标签对应至少一个标签解释文本；在一种预设异常类型标签与多个标签解释文本相对应的情况下，将每个标签解释文本分别加入至对应预设异常类型标签所对应的每个标注后的历史异常日志中，得到多个扩充后的历史异常日志。Further, the method for analyzing the exception log further includes: obtaining label interpretation texts of preset exception type labels, wherein each preset exception type label corresponds to at least one label interpretation text; When the label explanation text corresponds to each label explanation text, each label explanation text is added to each marked historical anomaly log corresponding to the corresponding preset anomaly type label to obtain a plurality of expanded historical anomaly logs.

进一步地，待处理异常日志至少包括系统逻辑的描述文本以及系统状态的描述文本，异常日志的分析方法还包括：控制目标语言模型在每种预设异常类型标签对应的至少一个标签解释文本中获取一个待组合的标签解释文本，并在系统逻辑的描述文本、系统状态的描述文本以及待组合的标签解释文本之间插入预设分隔符，得到目标文本，在目标文本的句首位置插入预设句首标签，得到每种预设异常类型标签所对应的句子对，基于每种预设异常类型标签所对应的句子对，生成多个第一维向量。Further, the exception log to be processed includes at least a description text of the system logic and a description text of the system state, and the analysis method of the exception log further includes: controlling the target language model to obtain at least one label interpretation text corresponding to each preset exception type label. A label explanation text to be combined, and a preset separator is inserted between the description text of the system logic, the description text of the system state, and the label explanation text to be combined to obtain the target text, and insert the preset at the beginning of the sentence of the target text. Sentence start labels, obtain sentence pairs corresponding to each preset exception type label, and generate multiple first-dimensional vectors based on the sentence pairs corresponding to each preset exception type label.

根据本发明实施例的另一方面，还提供了一种异常日志的分析装置，包括：获取模块，用于获取系统发生异常时所产生的待处理异常日志；输入模块，用于将待处理异常日志输入至预先训练得到的目标语言模型中，得到多个第一维向量，其中，每个第一维向量表征待处理异常日志与待处理异常日志所对应的每个句子对的相似概率，每个句子对由待处理异常日志与一个标签解释文本组合生成，标签解释文本至少包括预设异常类型标签的异常详情信息和/或异常解决方案；第一确定模块，用于从多个第一维向量中确定相似概率最大的目标第一维向量，并从目标第一维向量对应的句子对中提取目标标签解释文本；第二确定模块，用于将目标标签解释文本作为待处理异常日志的分析结果。According to another aspect of the embodiments of the present invention, there is also provided an abnormal log analysis device, including: an acquisition module for acquiring pending exception logs generated when an exception occurs in the system; an input module for The log is input into the target language model obtained by pre-training, and a plurality of first-dimensional vectors are obtained, wherein each first-dimensional vector represents the similarity probability of each sentence pair corresponding to the abnormal log to be processed and the abnormal log to be processed. Sentence pairs are generated by combining the exception log to be processed and a label interpretation text, and the label interpretation text includes at least the exception details information and/or exception solution of the preset exception type label; the first determination module is used for analyzing multiple first dimensions. Determine the target first-dimension vector with the largest similarity probability among the vectors, and extract the target label explanation text from the sentence pair corresponding to the target first-dimension vector; the second determination module is used to analyze the target label explanation text as the abnormal log to be processed result.

根据本发明实施例的另一方面，还提供了一种计算机可读存储介质，计算机可读存储介质包括存储的程序，其中，在程序运行时控制计算机可读存储介质所在设备执行上述的异常日志的分析方法。According to another aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, where the computer-readable storage medium includes a stored program, wherein when the program is run, the device where the computer-readable storage medium is located is controlled to execute the above exception log method of analysis.

根据本发明实施例的另一方面，还提供了一种电子设备，包括存储器和处理器，存储器中存储有计算机程序，处理器被设置为运行计算机程序以执行上述的异常日志的分析方法。According to another aspect of the embodiments of the present invention, an electronic device is also provided, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the above-mentioned abnormal log analysis method.

在本发明实施例中，采用基于目标语言模型定位待处理异常日志的异常问题的方式，通过获取系统发生异常时所产生的待处理异常日志，然后将待处理异常日志输入至预先训练得到的目标语言模型中，得到多个第一维向量，接着从多个第一维向量中确定相似概率最大的目标第一维向量，并从目标第一维向量对应的句子对中提取目标标签解释文本，从而将目标标签解释文本作为待处理异常日志的分析结果。其中，每个第一维向量表征待处理异常日志与待处理异常日志所对应的每个句子对的相似概率，每个句子对由待处理异常日志与一个标签解释文本组合生成，标签解释文本至少包括预设异常类型标签的异常详情信息和/或异常解决方案。In the embodiment of the present invention, the method of locating the abnormal problem of the abnormal log to be processed based on the target language model is adopted, and the abnormal log to be processed generated when an abnormality occurs in the system is obtained, and then the abnormal log to be processed is input to the target obtained by pre-training In the language model, multiple first-dimensional vectors are obtained, and then the target first-dimensional vector with the highest similarity probability is determined from the multiple first-dimensional vectors, and the target label explanation text is extracted from the sentence pair corresponding to the target first-dimensional vector. Thus, the target label interpretation text is used as the analysis result of the exception log to be processed. Among them, each first-dimensional vector represents the similarity probability of each sentence pair corresponding to the exception log to be processed and the exception log to be processed. Exception details and/or exception solutions including pre-set exception type labels.

在上述过程中，由于每个句子对中的标签解释文本对应于不同的预设异常类型标签，每个预设异常类型标签又分别对应于一种异常类型，因此，基于目标语言模型得到待处理异常日志与待处理异常日志所对应的每个句子对的相似概率，实现了对待处理异常日志所对应的异常类型与每个句子对所对应的异常类型的相似度判断。进一步地，从多个句子对中相似概率最大的句子对中提取目标标签解释文本，并将其作为待处理异常日志的分析结果，实现了对该待处理异常日志的异常问题定位，避免了基于人工方式对异常日志进行分析，从而使得操作人员可以快速获取到异常日志的发生原因，进而快速解决异常问题，提高了对异常日志的分析效率和问题解决效率。In the above process, since the label interpretation text in each sentence pair corresponds to different preset exception type labels, and each preset exception type label corresponds to an exception type, therefore, based on the target language model, the The similarity probability of each sentence pair corresponding to the abnormal log and the abnormal log to be processed realizes the similarity judgment of the abnormal type corresponding to the abnormal log to be processed and the abnormal type corresponding to each sentence pair. Further, the target label explanation text is extracted from the sentence pair with the highest similarity probability among multiple sentence pairs, and it is used as the analysis result of the abnormal log to be processed. The abnormal log is analyzed manually, so that the operator can quickly obtain the cause of the abnormal log, and then quickly solve the abnormal problem, which improves the analysis efficiency of the abnormal log and the problem solving efficiency.

由此可见，本申请所提供的方案达到了基于目标语言模型定位待处理异常日志的异常问题的目的，从而实现了提高对异常日志的分析效率的技术效果，进而解决了现有技术中由于依赖人工方式分析日志文件造成的分析效率差的技术问题。It can be seen that the solution provided by the present application achieves the purpose of locating the abnormal problem of the abnormal log to be processed based on the target language model, thereby achieving the technical effect of improving the analysis efficiency of the abnormal log, and solving the problem of relying on the existing technology. The technical problem of poor analysis efficiency caused by manual analysis of log files.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解，构成本申请的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The accompanying drawings described herein are used to provide a further understanding of the present invention and constitute a part of the present application. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached image:

图1是根据本发明实施例的一种可选的异常日志的分析方法的示意图；1 is a schematic diagram of an optional abnormal log analysis method according to an embodiment of the present invention;

图2是根据本发明实施例的一种可选的训练目标语言模型的示意图；2 is a schematic diagram of an optional training target language model according to an embodiment of the present invention;

图3是根据本发明实施例的一种可选的对标注后的历史异常日志进行文本扩充的示意图；3 is a schematic diagram of an optional text expansion of annotated historical exception logs according to an embodiment of the present invention;

图4是根据本发明实施例的一种可选的异常日志的分析装置的示意图；4 is a schematic diagram of an optional abnormal log analysis device according to an embodiment of the present invention;

图5是根据本发明实施例的一种可选的电子设备的示意图。FIG. 5 is a schematic diagram of an optional electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to make those skilled in the art better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only Embodiments are part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second" and the like in the description and claims of the present invention and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

需要说明的是，本公开所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于展示的数据、分析的数据等)，均为经用户授权或者经过各方充分授权的信息和数据。It should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to display data, analysis data, etc.) involved in this disclosure are all authorized by the user or information and data fully authorized by the parties.

实施例1Example 1

根据本发明实施例，提供了一种异常日志的分析方法的实施例，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, an embodiment of a method for analyzing an exception log is provided. It should be noted that the steps shown in the flowchart of the accompanying drawing may be executed in a computer system such as a set of computer-executable instructions, and , although a logical order is shown in the flowcharts, in some cases steps shown or described may be performed in an order different from that herein.

图1是根据本发明实施例的一种可选的异常日志的分析方法的示意图，如图1所示，该方法包括如下步骤：FIG. 1 is a schematic diagram of an optional abnormal log analysis method according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:

步骤S101，获取系统发生异常时所产生的待处理异常日志。Step S101 , acquiring a pending exception log generated when an exception occurs in the system.

在步骤S101中，可以通过应用系统、处理器、电子设备等装置获取系统发生异常时所产生的待处理异常日志，在本实施例中，通过数据分析系统获取前述的待处理异常日志。可选的，可以由数据分析系统直接从发生异常的系统中读取新生成的异常日志，也可以由相关操作人员，或发生异常的系统，或其它系统将待处理异常日志输入至数据分析系统中，以供数据分析系统获取。In step S101, the pending exception log generated when the system is abnormal can be obtained through an application system, a processor, an electronic device, etc. In this embodiment, the aforementioned pending exception log is obtained through a data analysis system. Optionally, the data analysis system can directly read the newly generated exception log from the system where the exception occurred, or the relevant operator, or the system where the exception occurred, or other systems can input the pending exception log to the data analysis system. , for the data analysis system to obtain.

其中，待处理异常日志至少包括系统逻辑的描述文本以及系统状态的描述文本。系统逻辑的描述文本表现为以人类能理解的自然语言，当系统出现错误时，其描述了系统无法继续执行下去的事件，如调用某个模块时发生何种错误、访问某个外部接口时发生何种错误、系统资源耗尽等描述；系统状态的描述文本表现为一组结构化的数据，如，系统运行时各作业提交的时间戳、资源使用率、数据吞吐量，作业执行时间等，这一系列参数定量地描述了系统运行到某个具体阶段时的状态。The to-be-processed exception log includes at least a description text of the system logic and a description text of the system state. The description text of the system logic is expressed in natural language that humans can understand. When the system fails, it describes the events that the system cannot continue to execute, such as what kind of error occurs when calling a module, or when accessing an external interface. Description of errors, system resource exhaustion, etc.; the description text of system status is represented as a set of structured data, such as timestamps submitted by each job when the system is running, resource usage, data throughput, job execution time, etc. This series of parameters quantitatively describes the state of the system when it reaches a specific stage.

需要说明的是，通过获取系统发生异常时所产生的待处理异常日志，以便于实现后续对待处理异常日志的分析。It should be noted that, by acquiring the pending exception log generated when an exception occurs in the system, the subsequent analysis of the pending exception log is facilitated.

步骤S102，将待处理异常日志输入至预先训练得到的目标语言模型中，得到多个第一维向量，其中，每个第一维向量表征待处理异常日志与待处理异常日志所对应的每个句子对的相似概率，每个句子对由待处理异常日志与一个标签解释文本组合生成，标签解释文本至少包括预设异常类型标签的异常详情信息和/或异常解决方案。Step S102, input the pending exception log into the target language model obtained by pre-training, and obtain a plurality of first-dimensional vectors, wherein each first-dimensional vector represents the pending exception log and each corresponding to the pending exception log. The similarity probability of sentence pairs, each sentence pair is generated by combining the pending exception log with a label explanation text. The label explanation text includes at least the exception details and/or exception solutions of the preset exception type labels.

在步骤S102中，数据分析系统可以将前述的待处理异常日志输入至预先训练得到的目标语言模型中，在本实施例中，预先训练得到的目标语言模型可以是BERT(Bidirectional Encoder Representations from Transformers)模型，这种模型是一种经典的大规模预训练语言模型，其旨在通过联合调节模型中所有层种的上下文，来预先训练文本的深度双向表示。In step S102, the data analysis system may input the aforementioned abnormal log to be processed into the target language model obtained by pre-training. In this embodiment, the target language model obtained by pre-training may be BERT (Bidirectional Encoder Representations from Transformers). model, which is a classic large-scale pre-trained language model that aims to pre-train deep bidirectional representations of text by jointly conditioning the context of all layers in the model.

可选的，数据分析系统可以控制目标语言模型将待处理异常日志与多个标签解释文本进行组合，以生成由待处理异常日志与一个标签解释文本组合的句子对，并对句子对中待处理异常日志与标签解释文本的相似概率进行计算，从而输出每个句子对所对应的第一维向量，即待处理异常日志与标签解释文本的相似概率。其中，每个标签解释文本为自然语言形式，其对应于一个预设异常类型标签，每个预设异常类型标签对应于一种异常类型。每个标签文本可以是对预设异常类型标签所对应的异常类型的详情描述，也可以是对预设异常类型标签所对应的异常类型的解释(即解决方案)，还可以是将对预设异常类型标签所对应的异常类型的详情描述以及解释相结合。Optionally, the data analysis system can control the target language model to combine the to-be-processed exception log with multiple label interpretation texts to generate a sentence pair composed of the to-be-processed exception log and a label interpretation text, and interpret the to-be-processed sentence pair. The similarity probability between the abnormal log and the label interpretation text is calculated, so as to output the first dimension vector corresponding to each sentence pair, that is, the similarity probability between the abnormal log to be processed and the label interpretation text. Wherein, each label interpretation text is in the form of natural language, which corresponds to a preset exception type label, and each preset exception type label corresponds to an exception type. Each tag text can be a detailed description of the exception type corresponding to the preset exception type tag, or an explanation (ie, solution) of the exception type corresponding to the preset exception type tag, or a description of the preset exception type tag. The detailed description and explanation of the exception type corresponding to the exception type label are combined.

需要说明的是，由于每个句子对中的标签解释文本对应于不同的预设异常类型标签，每个预设异常类型标签又分别对应于一种异常类型，因此，基于目标语言模型得到待处理异常日志与待处理异常日志所对应的每个句子对的相似概率，实现了对待处理异常日志所对应的异常类型与每个句子对所对应的异常类型的相似度判断，从而便于后续定位异常日志的问题。It should be noted that, because the label interpretation text in each sentence pair corresponds to different preset exception type labels, and each preset exception type label corresponds to an exception type, therefore, based on the target language model, the The similarity probability of each sentence pair corresponding to the abnormal log and the abnormal log to be processed realizes the similarity judgment of the abnormal type corresponding to the abnormal log to be processed and the abnormal type corresponding to each sentence pair, so as to facilitate the subsequent positioning of the abnormal log The problem.

步骤S103，从多个第一维向量中确定相似概率最大的目标第一维向量，并从目标第一维向量对应的句子对中提取目标标签解释文本。Step S103: Determine the target first-dimensional vector with the highest similarity probability from the plurality of first-dimensional vectors, and extract the target label explanation text from the sentence pair corresponding to the target first-dimensional vector.

在步骤S103中，当待处理异常日志与句子对的相似概率越高时，待处理异常日志的异常类型与该句子对所对应的异常类型越相似，第一维向量越大或越趋近于某个值。因此，数据分析系统可以基于第一维向量的值从多个第一维向量中确定相似概率最大的目标第一维向量，并从目标第一维向量对应的句子对中提取出目标标签解释文本，该目标标签解释文本即为对该待处理异常日志所对应的异常原因的相对最优解释。In step S103, when the similarity probability between the abnormal log to be processed and the sentence pair is higher, the abnormal type of the abnormal log to be processed is more similar to the abnormal type corresponding to the sentence pair, and the first dimension vector is larger or closer to some value. Therefore, the data analysis system can determine the target first-dimension vector with the highest similarity probability from the first-dimension vectors based on the value of the first-dimension vector, and extract the target label explanation text from the sentence pair corresponding to the target first-dimension vector. , the target label explanation text is the relatively optimal explanation for the abnormal cause corresponding to the abnormal log to be processed.

需要说明的是，通过确定相似概率最大的目标第一维向量，并提取出目标第一维向量所对应的目标标签解释文本，实现了对待处理异常日志所对应的异常原因的相对最优解释的确定，即实现了对待处理异常日志的异常问题定位。It should be noted that by determining the target first-dimension vector with the highest similarity probability, and extracting the target label interpretation text corresponding to the target first-dimension vector, a relatively optimal interpretation of the abnormal cause corresponding to the abnormal log to be processed is achieved. Determined, that is, the exception problem location of the exception log to be processed is realized.

步骤S104，将目标标签解释文本作为待处理异常日志的分析结果。Step S104, the target label interpretation text is used as the analysis result of the abnormal log to be processed.

在步骤S104中，数据分析系统可以将前述确定的目标标签解释文本作为待处理异常日志的分析结果，并通过人机交互界面将该分析结果显示给操作人员，以供操作人员获知分析结果并基于该分析结果对产生待处理异常日志的系统进行维护。In step S104, the data analysis system can use the above-determined target label interpretation text as the analysis result of the abnormal log to be processed, and display the analysis result to the operator through the human-computer interaction interface, so that the operator can know the analysis result and based on the analysis result The analysis result maintains the system that generates the exception log to be processed.

需要说明的是，通过将目标标签解释文本作为待处理异常日志的分析结果，避免了基于人工方式对异常日志进行分析，使得操作人员可以快速获取到异常日志的发生原因，并可基于目标标签解释文本中的异常解决方案快速解决异常问题，提高了异常分析效率和问题解决效率。It should be noted that by using the target label interpretation text as the analysis result of the pending exception log, the manual analysis of the exception log is avoided, so that the operator can quickly obtain the cause of the exception log, and explain it based on the target label. The exception solution in the text quickly solves the exception problem, improving the efficiency of exception analysis and problem solving.

基于上述步骤S101至步骤S104所限定的方案，可以获知，在本发明实施例中，采用基于目标语言模型定位待处理异常日志的异常问题的方式，通过获取系统发生异常时所产生的待处理异常日志，然后将待处理异常日志输入至预先训练得到的目标语言模型中，得到多个第一维向量，接着从多个第一维向量中确定相似概率最大的目标第一维向量，并从目标第一维向量对应的句子对中提取目标标签解释文本，从而将目标标签解释文本作为待处理异常日志的分析结果。其中，每个第一维向量表征待处理异常日志与待处理异常日志所对应的每个句子对的相似概率，每个句子对由待处理异常日志与一个标签解释文本组合生成，标签解释文本至少包括预设异常类型标签的异常详情信息和/或异常解决方案。Based on the solutions defined in the above steps S101 to S104, it can be known that, in the embodiment of the present invention, the method of locating the abnormal problem of the abnormal log to be processed based on the target language model is adopted, and the abnormal problem to be processed generated when an abnormality occurs in the system is obtained by acquiring the abnormality to be processed. log, and then input the abnormal log to be processed into the target language model obtained by pre-training to obtain multiple first-dimensional vectors, and then determine the target first-dimensional vector with the highest similarity probability from the multiple first-dimensional vectors, and obtain the target first-dimensional vector from the target The target label interpretation text is extracted from the sentence pair corresponding to the first dimension vector, so that the target label interpretation text is used as the analysis result of the abnormal log to be processed. Among them, each first-dimensional vector represents the similarity probability of each sentence pair corresponding to the exception log to be processed and the exception log to be processed. Exception details and/or exception solutions including pre-set exception type labels.

容易注意到的是，在上述过程中，由于每个句子对中的标签解释文本对应于不同的预设异常类型标签，每个预设异常类型标签又分别对应于一种异常类型，因此，基于目标语言模型得到待处理异常日志与待处理异常日志所对应的每个句子对的相似概率，实现了对待处理异常日志所对应的异常类型与每个句子对所对应的异常类型的相似度判断。进一步地，从多个句子对中相似概率最大的句子对中提取目标标签解释文本，并将其作为待处理异常日志的分析结果，实现了对该待处理异常日志的异常问题定位，避免了基于人工方式对异常日志进行分析，从而使得操作人员可以快速获取到异常日志的发生原因，进而快速解决异常问题，提高了对异常日志的分析效率和问题解决效率。It is easy to notice that in the above process, since the label explanation text in each sentence pair corresponds to different preset exception type labels, and each preset exception type label corresponds to an exception type, therefore, based on The target language model obtains the similarity probability of the abnormal log to be processed and each sentence pair corresponding to the abnormal log to be processed, and realizes the similarity judgment between the abnormal type corresponding to the abnormal log to be processed and the abnormal type corresponding to each sentence pair. Further, the target label explanation text is extracted from the sentence pair with the highest similarity probability among multiple sentence pairs, and it is used as the analysis result of the abnormal log to be processed. The abnormal log is analyzed manually, so that the operator can quickly obtain the cause of the abnormal log, and then quickly solve the abnormal problem, which improves the analysis efficiency of the abnormal log and the problem solving efficiency.

在一种可选的实施例中，在将待处理异常日志输入至预先训练得到的目标语言模型中，得到多个第一维向量的过程中，数据分析模块可以控制目标语言模型在每种预设异常类型标签对应的至少一个标签解释文本中获取一个待组合的标签解释文本，并在系统逻辑的描述文本、系统状态的描述文本以及待组合的标签解释文本之间插入预设分隔符，得到目标文本，在目标文本的句首位置插入预设句首标签，得到每种预设异常类型标签所对应的句子对，基于每种预设异常类型标签所对应的句子对，生成多个第一维向量。In an optional embodiment, in the process of inputting the abnormal log to be processed into the target language model obtained by pre-training to obtain multiple first-dimensional vectors, the data analysis module can control the target language model to perform Suppose that a tag interpretation text to be combined is obtained from at least one tag interpretation text corresponding to the exception type tag, and a preset separator is inserted between the description text of the system logic, the description text of the system state, and the tag interpretation text to be combined to obtain In the target text, a preset sentence-heading tag is inserted at the sentence-heading position of the target text to obtain a sentence pair corresponding to each preset exception type tag, and based on the sentence pair corresponding to each preset exception type tag, multiple first sentence pairs are generated. dimensional vector.

可选的，各预设异常类型标签及其所对应的至少一个标签解释文本可以存储在预设存储区域内，具体可以是数据库、云服务器等存储区域。当数据分析模块将待处理日志输入到预先训练得到的目标语言模型中后，数据分析模块可以控制目标语言模型从预设存储区域内获取每种预设异常类型标签所分别对应的一个标签解释文本，并作为待组合的标签解释文本，其中，目标语言模型为BERT模型。例如，当预设存储区域内有十种预设异常类型标签时，数据分析模块控制目标语言模型获取每一种预设异常类型标签所对应的多个标签解释文本中的一个标签解释文本，从而得到十个待组合的标签解释文本，且每个待组合的标签解释文本分别对应于一种预设异常类型标签。Optionally, each preset exception type label and at least one corresponding label explanation text may be stored in a preset storage area, which may specifically be a storage area such as a database and a cloud server. After the data analysis module inputs the log to be processed into the pre-trained target language model, the data analysis module can control the target language model to obtain a label interpretation text corresponding to each preset abnormal type label from the preset storage area , and interpret the text as the label to be combined, where the target language model is the BERT model. For example, when there are ten preset exception type tags in the preset storage area, the data analysis module controls the target language model to obtain one tag interpretation text among the multiple tag interpretation texts corresponding to each preset exception type tag, thereby Ten tag interpretation texts to be combined are obtained, and each tag interpretation text to be combined corresponds to a preset exception type tag respectively.

进一步地，数据分析模块可以控制目标语言模型对待处理异常日志进行文本处理。在文本处理过程中，目标语言模型可以将输入文本的字符序列<w1,...,wi,...wn>转换成<CLS,w1,...,wi,...wn，SEP>，其中，“CLS”为规定的序列起始符，也即预设句首标签，无其他语言含义，“SEP”为规定的序列终止符或分隔符，也即预设分隔符，当输入文本的字符序列中包含多个句子时，可以使用对应数目的“SEP”。Further, the data analysis module can control the target language model to perform text processing on the abnormal log to be processed. During text processing, the target language model can convert the character sequence <w1,...,wi,...wn> of the input text into <CLS,w1,...,wi,...wn,SEP> , where "CLS" is the specified sequence starter, that is, the preset sentence start label, and has no other language meaning, and "SEP" is the specified sequence terminator or separator, that is, the preset separator. When the input text When the character sequence of contains multiple sentences, the corresponding number of "SEP" can be used.

具体地，在本实施例中，目标语言模型可以提取出待处理异常日志的系统逻辑的描述文本以及系统状态的描述文本，并将系统逻辑的描述文本、系统状态的描述文本以及前述的每个待组合的标签解释文本转换为<CLS,系统逻辑的描述文本,SEP,系统状态的描述文本,SEP,标签解释文本,SEP>或<CLS,系统逻辑的描述文本,SEP,系统状态的描述文本,SEP,预设异常类型标签/标签解释文本,SEP>,从而得到每种预设异常类型标签所对应的句子对。其中，在BERT模型中，通常使用“CLS”对应的向量，表达整个输入文本的字符序列的向量。Specifically, in this embodiment, the target language model can extract the description text of the system logic and the description text of the system state of the exception log to be processed, and combine the description text of the system logic, the description text of the system state, and each of the foregoing The label interpretation text to be combined is converted into <CLS, description text of system logic, SEP, description text of system state, SEP, label interpretation text, SEP> or <CLS, description text of system logic, SEP, description text of system state , SEP, preset exception type label/label explanation text, SEP>, so as to obtain the sentence pair corresponding to each preset exception type label. Among them, in the BERT model, the vector corresponding to "CLS" is usually used to express the vector of the character sequence of the entire input text.

更进一步地，在得到每种预设异常类型标签所对应的句子对，数据分析系统可以控制目标语言模型对各句子对进行文本表示处理，以生成多个第一维向量。具体地，在BERT模型中，构建有多头自注意力机制模块。每一个多头自注意力机制模块的基本计算公式如下：Further, after obtaining the sentence pairs corresponding to each preset abnormal type label, the data analysis system can control the target language model to perform text representation processing on each sentence pair to generate multiple first-dimensional vectors. Specifically, in the BERT model, a multi-head self-attention mechanism module is constructed. The basic calculation formula of each multi-head self-attention mechanism module is as follows:

其中，Softmax()表示归一化处理函数，Q_r、K_r、V_r表示句子对对应的向量矩阵被分为的三个部分，

分别表示不同的权重矩阵。具体地，在单个抽头r中，句子对对应的向量矩阵会被分成Q_r、K_r、V_r三个部分，之后Q_r、K_r、V_r三个矩阵经过不同的权重矩阵

分别做线性映射并计算相似度，再基于相似度确定权重值，将V矩阵中的各个向量做相应的加权求和，从而得到H_r，即单个抽头r所输出的结果。Among them, Softmax() represents the normalization processing function, Q _r , K _r , V _r represent the three parts into which the vector matrix corresponding to the sentence pair is divided,

respectively represent different weight matrices. Specifically, in a single tap r, the vector matrix corresponding to the sentence pair will be divided into three parts Q _r , K _r , and V _r , and then the three matrices Q _r , K _r , and V _r will pass through different weight matrices.

Do a linear mapping and calculate the similarity, and then determine the weight value based on the similarity, and perform the corresponding weighted summation of each vector in the V matrix to obtain H _r , that is, the result output by a single tap r.

之后，对于所有的不同抽头r，将其拼接再进行原向量尺度的线性转换后，与原始输入相加，得到与输入文本对应的文本表示，公式如下：After that, for all the different taps r, after splicing them and performing linear transformation of the original vector scale, they are added to the original input to obtain the text representation corresponding to the input text. The formula is as follows:

X′＝HW^H+X＝[H₁，...，H_r，...，H_R]W^H+XX'=HW ^H +X=[H ₁ ,...,H _r ,...,H _R ]W ^H +X

其中，X′表示与输入文本对应的文本表示，该文本表示兼顾并构建了系统逻辑的描述文本、系统状态的描述文本、对待组合的标签解释文本三者之间的关系，[H₁，...，H_r，...，H_R]表示各个抽头，X表示输入文本，即句子对，W^H表示对[H₁，...，H_r，...，H_R]进行原向量尺度的线性转换。Among them, X' represents the text representation corresponding to the input text, which takes into account and builds the relationship between the description text of the system logic, the description text of the system state, and the label explanation text to be combined, [H ₁ , . .., H _r ,..., H _R ] represent individual _taps , X represents the input text, _ie , sentence pairs, W ^H _represents the original Linear transformation of vector scale.

更进一步地，将生成的文本表示向量经过全连接层MLP()，以抽取并压缩高维向量中不同维度之间的关系，将向量维度降低以适应整体分类任务。由于对待处理异常日志进行文本处理后，任务转变成模型判断该文本对是否相似的二分类问题，因此最终全连接层MLP的输出为二维向量，以表征待处理异常日志与待处理异常日志所对应的每个句子对的相似概率。最后，为了更加直观的表示概率，数据分析系统可以控制目标语言模型使用softmax函数将二维向量归一化，将其中所有变量转化为[0,1]范围内的小数。公式如下所示：Further, the generated text representation vector is passed through the fully connected layer MLP() to extract and compress the relationship between different dimensions in the high-dimensional vector, and the vector dimension is reduced to suit the overall classification task. After the text processing of the abnormal log to be processed, the task is transformed into a binary classification problem in which the model judges whether the text pair is similar. Therefore, the final output of the fully connected layer MLP is a two-dimensional vector to represent the abnormal log to be processed and the abnormal log to be processed. The corresponding probability of similarity for each sentence pair. Finally, in order to represent the probability more intuitively, the data analysis system can control the target language model to use the softmax function to normalize the two-dimensional vector, and convert all variables in it to decimals in the range of [0,1]. The formula looks like this:

P＝Softmax(MLP(X′_CLS))P=Softmax(MLP(X′ _CLS ))

其中，Softmax()表示归一化处理函数，MLP(X′_CLS)表示全连接层MLP()的输出，P表示输入的句子对所对应的待处理异常日志被分为该句子对中标签解释文本所对应的异常类型的概率向量，P向量为二维向量，且至少包括第一维向量，由此实现了对第一维向量的获取。Among them, Softmax() represents the normalization processing function, MLP(X′ _CLS ) represents the output of the fully connected layer MLP(), and P represents the input sentence pair corresponding to the pending exception log, which is divided into the label explanation in the sentence pair The probability vector of the abnormal type corresponding to the text, the P vector is a two-dimensional vector, and at least includes the first-dimensional vector, thereby realizing the acquisition of the first-dimensional vector.

需要说明的是，通过控制目标语言模型生成句子对，并基于句子对生成多个第一维向量，实现了对待处理异常日志与各异常类型的相似度的准确确定。It should be noted that, by controlling the target language model to generate sentence pairs, and generating multiple first-dimensional vectors based on the sentence pairs, it is possible to accurately determine the similarity between the exception log to be processed and each exception type.

在一种可选的实施例中，多个第一维向量中的每个第一维向量分别对应一个第二维向量，其中，第二维向量表征待处理异常日志与待处理异常日志所对应的每个句子对的不相似概率。In an optional embodiment, each first-dimensional vector in the plurality of first-dimensional vectors corresponds to a second-dimensional vector, wherein the second-dimensional vector represents the exception log to be processed and the exception log to be processed corresponds to The dissimilarity probability for each sentence pair of .

可选的，前述的P向量还包括第二维向量，数据分析系统可以结合第一维向量的值与第二维向量的值从多个第一维向量中确定相似概率最大的目标第一维向量。从而提高对目标第一维向量更准确的判断，进而提高分析准确率。Optionally, the aforementioned P vector also includes a second-dimensional vector, and the data analysis system can combine the value of the first-dimensional vector and the value of the second-dimensional vector to determine the target first-dimensional vector with the largest similarity probability from the plurality of first-dimensional vectors. vector. Thus, the more accurate judgment of the target first-dimensional vector is improved, and the analysis accuracy is further improved.

在一种可选的实施例中，在将待处理异常日志输入至预先训练得到的目标语言模型中之前，数据分析系统可以获取多个历史异常日志，然后根据系统逻辑的描述文本对每个历史异常日志进行向量化处理，得到每个历史异常日志对应的语义向量。其中，每个历史异常日志中至少包含系统逻辑的描述文本，其中，系统逻辑的描述文本至少包括导致系统发生异常的事件的描述信息。In an optional embodiment, before inputting the to-be-processed exception log into the target language model obtained by pre-training, the data analysis system may acquire multiple historical exception logs, and then analyze each historical exception log according to the description text of the system logic. The exception log is vectorized, and the semantic vector corresponding to each historical exception log is obtained. Wherein, each historical exception log contains at least a description text of the system logic, wherein the description text of the system logic at least includes description information of an event that causes an exception to occur in the system.

可选的，如图2所示，数据分析系统可以基于人工输入的数据获取多个历史异常日志，也可以直接从相关系统或存储器等存储装置中进行读取。在获取了多个历史异常数据后，数据分析系统可以提取出每个历史异常数据中的系统逻辑的描述文本，并对每个历史异常数据中的系统逻辑的描述文本进行向量化处理，以得到与该历史异常日志对应的语义向量。Optionally, as shown in FIG. 2 , the data analysis system may acquire a plurality of historical exception logs based on manually input data, or may directly read from a related system or a storage device such as a memory. After acquiring multiple historical abnormal data, the data analysis system can extract the description text of the system logic in each historical abnormal data, and perform vectorization processing on the description text of the system logic in each historical abnormal data to obtain The semantic vector corresponding to this historical exception log.

需要说明的是，通过获取每个历史异常日志对应的语义向量，以便于后续基于各历史异常日志对初始语言模型进行训练得到目标语言模型。It should be noted that the target language model is obtained by obtaining the semantic vector corresponding to each historical abnormal log, so as to subsequently train the initial language model based on each historical abnormal log.

在一种可选的实施例中，在根据系统逻辑的描述文本对每个历史异常日志进行向量化处理的过程中，数据分析系统可以统计系统逻辑的描述文本中每个单词在每个历史异常日志中出现的第一频率以及每个单词在所有历史异常日志中出现的第二频率，然后根据第一频率以及第二频率计算得到每个单词在每个历史异常日志中的权重值，从而每个历史异常日志中，对每个单词的权重值与每个单词的单词语义向量进行加权求和，得到每个历史异常日志所对应的语义向量。In an optional embodiment, in the process of vectorizing each historical exception log according to the description text of the system logic, the data analysis system can count the number of words in the description text of the system logic in each historical exception log. The first frequency that appears in the log and the second frequency that each word appears in all historical abnormal logs, and then the weight value of each word in each historical abnormal log is calculated according to the first frequency and the second frequency, so that each In each historical abnormal log, the weight value of each word and the word semantic vector of each word are weighted and summed to obtain the semantic vector corresponding to each historical abnormal log.

可选的，数据分析系统可以基于词频-逆文本频率指数(TF-IDF，term frequency-inverse document frequency)对系统逻辑的描述文本进行向量化处理。其中，TF-IDF为提取自然语言段落中的关键词技术，某段落中的词汇出现频率乘以该词在整体自然语言语料中出现的逆文档频率，即为TF-IDF分数。TF-IDF分数越高，则对应的单词在其所在段落中的权重越高，代表着这个词对于整个段落的语义有着更高的贡献。Optionally, the data analysis system may perform vectorization processing on the description text of the system logic based on term frequency-inverse document frequency (TF-IDF, term frequency-inverse document frequency). Among them, TF-IDF is a technique for extracting keywords in natural language paragraphs. The frequency of occurrence of a word in a paragraph is multiplied by the inverse document frequency of the word in the overall natural language corpus, which is the TF-IDF score. The higher the TF-IDF score, the higher the weight of the corresponding word in its paragraph, which means that the word has a higher contribution to the semantics of the entire paragraph.

具体地，以多个历史异常日志中的任意一个历史异常日志为例，数据分析系统可以获取系统逻辑的描述文本中每个单词在该异常日志中出现的频率，即第一频率，并获取系统逻辑的描述文本中每个单词在所有历史日常日志中出现的概率，即第二频率。从而采用TF-IDF方法，基于第一频率与第二频率计算出每个单词对该历史异常日志的TF-IDF权重。进一步地，基于前述的每个单词的TF-IDF权重对该历史异常日志中的所有单词的单词语义向量进行加权求和，从而可以得到该历史异常日志所对应的语义向量。Specifically, taking any historical exception log among multiple historical exception logs as an example, the data analysis system can obtain the frequency of each word in the description text of the system logic appearing in the exception log, that is, the first frequency, and obtain the frequency of each word in the description text of the system logic. A logical description of the probability that each word in the text appears in all historical daily logs, ie the second frequency. Therefore, the TF-IDF method is used to calculate the TF-IDF weight of each word for the historical abnormal log based on the first frequency and the second frequency. Further, based on the aforementioned TF-IDF weight of each word, the word semantic vectors of all the words in the historical abnormal log are weighted and summed, so that the semantic vector corresponding to the historical abnormal log can be obtained.

需要说明的是，通过计算每个历史异常日志所对应的系统逻辑的描述文本中每个单词的第一频率和第二频率，并基于第一频率和第二评率对每个历史异常日志所对应的语义向量进行计算，实现了对每个历史异常日志所对应的语义向量的准确确定，进而实现了对历史异常日志中蕴含的自然语言语义的准确提取。It should be noted that, by calculating the first frequency and the second frequency of each word in the description text of the system logic corresponding to each historical abnormal log, and based on the first frequency and the second rating rate, each historical abnormal log is evaluated. The corresponding semantic vector is calculated, which realizes the accurate determination of the semantic vector corresponding to each historical exception log, and further realizes the accurate extraction of the natural language semantics contained in the historical exception log.

在一种可选的实施例中，在根据系统逻辑的描述文本对每个历史异常日志进行向量化处理，得到每个历史异常日志对应的语义向量之后，数据分析系统可以对多个历史异常日志对应的语义向量进行聚类处理，得到每个历史异常日志对应的异常类型，然后获取每种异常类型对应的预设异常类型标签，并将预设异常类型标签标注在对应的历史异常日志上，得到标注后的历史异常日志，接着根据标注后的历史异常日志训练得到目标语言模型。其中，每种异常类型对应至少一个历史异常日志，种异常类型与一个预设异常类型标签相对应。In an optional embodiment, after performing vectorization processing on each historical exception log according to the description text of the system logic to obtain a semantic vector corresponding to each historical exception log, the data analysis system can The corresponding semantic vector is clustered to obtain the exception type corresponding to each historical exception log, and then the preset exception type label corresponding to each exception type is obtained, and the preset exception type label is marked on the corresponding historical exception log. The marked historical abnormal log is obtained, and then the target language model is obtained by training according to the marked historical abnormal log. Wherein, each exception type corresponds to at least one historical exception log, and each exception type corresponds to a preset exception type label.

可选的，如图2所示，可以由操作人员借助系统在开发阶段编写的开发手册以及系统在运营阶段的问题汇总和相关经验，确定系统异常日志的初始聚类簇数，也可以由数据分析系统基于预设数值确定系统异常日志的初始聚类簇数，还可以由数据分析系统基于相关算法计算出系统异常日志的初始聚类簇数。在确定了初始聚类簇数后，数据分析系统可以利用K均值聚类算法(K-means clustering algorithm)，基于初始聚类簇数对多个历史异常日志对应的语义向量进行多次聚类计算，以得到多个历史异常日志对应的语义向量的聚类结果，从而基于聚类结果确定每个历史异常日志对应的异常类型。Optionally, as shown in Figure 2, the operator can use the development manual written by the system in the development phase and the problem summary and related experience of the system in the operation phase to determine the initial cluster number of the abnormal log of the system, or it can be determined by the data. The analysis system determines the initial cluster number of the system abnormal log based on the preset value, and the data analysis system can also calculate the initial cluster number of the system abnormal log based on a related algorithm. After the initial number of clusters is determined, the data analysis system can use the K-means clustering algorithm to perform multiple clustering calculations on the semantic vectors corresponding to multiple historical abnormal logs based on the initial number of clusters. , to obtain a clustering result of semantic vectors corresponding to multiple historical anomaly logs, so as to determine an anomaly type corresponding to each historical anomaly log based on the clustering result.

进一步地，在确定了每个历史异常日志对应的异常类型后，如图2所示，数据分析系统可以获取每种异常类型对应的预设异常类型标签，并将预设异常类型标签标注在对应的历史异常日志上，以得到标注后的历史异常日志，其中，标注后的历史异常文件至少包括预设异常类型标签、系统逻辑的描述文本以及系统状态的描述文本，预设异常类型标签可以是数字、字母或其它标识，也可以直接取为异常类型的名称。Further, after determining the abnormality type corresponding to each historical abnormality log, as shown in Figure 2, the data analysis system can obtain the preset abnormality type label corresponding to each abnormality type, and mark the preset abnormality type label in the corresponding to obtain the marked historical exception log, wherein the marked historical exception file includes at least the preset exception type label, the description text of the system logic and the description text of the system state. The preset exception type label can be Numbers, letters or other identifiers can also be directly taken as the name of the exception type.

更进一步地，在得到了标注后的历史异常日志后，数据分析系统将标注后的历史异常日志作为训练样本，以对待训练的语言模型进行训练，得到目标语言模型。Furthermore, after obtaining the marked historical abnormal logs, the data analysis system uses the marked historical abnormal logs as training samples to train the language model to be trained to obtain the target language model.

需要说明的是，通过确定各历史异常日志对应的异常类型，并基于异常类型对各历史异常日志进行标注，实现了对训练样本的获取，从而可以实现对初始语言模型进行有效训练，进而得到能够准确判断的目标语言模型。It should be noted that by determining the abnormal type corresponding to each historical abnormal log, and labeling each historical abnormal log based on the abnormal type, the acquisition of training samples is realized, so that the initial language model can be effectively trained, and then the The target language model for accurate judgment.

在一种可选的实施例中，在根据标注后的历史异常日志训练得到目标语言模型的过程中，数据分析系统可以对标注后的历史异常日志进行文本扩充，得到扩充后的历史异常日志，从而基于扩充后的历史异常日志对初始语言模型进行训练，得到目标语言模型。其中，扩充后的历史异常日志至少包括：系统逻辑的描述文本、系统状态的描述文本以及预设异常类型标签的标签解释文本。In an optional embodiment, in the process of obtaining the target language model by training according to the marked historical abnormal logs, the data analysis system may perform text expansion on the marked historical abnormal logs to obtain the expanded historical abnormal logs, Thus, the initial language model is trained based on the expanded historical exception logs, and the target language model is obtained. The expanded historical exception log includes at least: description text of system logic, description text of system state, and label explanation text of preset exception type label.

可选的，如图2所示，为了应对系统异常类型之间的不平衡分布的状况，在本申请中，可以对前述的训练样本(也即标注后的历史异常日志)进行文本扩充，以丰富训练样本，使得各个系统异常类型之间的分布更加均匀。在文本扩充的过程中，如图3所示，数据分析系统可以从标注后的历史异常日志中提取出系统逻辑的描述文本，并基于人工筛选提取出有效的系统状态的描述文本，同时创建标签解释文本或从预设存储区域中获取标签解释文本。进而可以基于历史异常日志、异常类型以及标签解释文本的关系，将多分类任务转换为句子对任务，即基于历史异常日志、异常类型以及标签解释文本生成与各历史异常文件对应的句子对，然后对句子对进行改造，以实现文本扩充。Optionally, as shown in FIG. 2 , in order to deal with the unbalanced distribution among system exception types, in this application, the aforementioned training samples (that is, the marked historical exception logs) can be text-extended to Enrich the training samples to make the distribution of abnormal types of each system more uniform. In the process of text expansion, as shown in Figure 3, the data analysis system can extract the description text of the system logic from the annotated historical exception logs, and extract the effective description text of the system state based on manual screening, and create tags at the same time Interpret text or get label explanation text from preset storage area. Then, based on the relationship between historical exception logs, exception types and labels, the multi-classification task can be converted into a sentence pair task, that is, sentence pairs corresponding to each historical exception file are generated based on the historical exception logs, exception types, and label interpretation texts, and then Transform sentence pairs for text augmentation.

进一步地，在对历史异常日志进行文本扩充后，数据分析系统可以基于扩充后的历史异常日志对初始语言模型进行训练，以得到目标语言模型。其中，在本实施例中，本申请所提供的方法可以采用python语言实现，版本为python 3.7，并可使用PyTorch框架作为深度学习模型的支持库，使用开源的HuggingFace作为基于PyTorch的BERT预训练模型(即初始语言模型)训练框架，BERT预训练模型可以采用Google发布的BERT-Base版本，共包含110M参数。Further, after the text expansion of the historical abnormal log, the data analysis system can train the initial language model based on the expanded historical abnormal log to obtain the target language model. Among them, in this embodiment, the method provided by this application can be implemented in python language, the version is python 3.7, and the PyTorch framework can be used as the support library of the deep learning model, and the open source HuggingFace can be used as the PyTorch-based BERT pre-training model (ie the initial language model) training framework, the BERT pre-training model can use the BERT-Base version released by Google, which contains a total of 110M parameters.

需要说明的是，一般而言，异常日志的种类分布并不均衡。一方面，不平衡的数据集会让语言模型在预测时倾向于数据分布占优势的类别，导致模型在一些较严重却分布不占优势的异常类别的预测上倾向于失败。另一方面，异常日志的数据量有可能无法满足训练出一个可用且有效的模型，因此，通过对标注后的历史异常日志进行文本扩充，可以有效扩充训练数据，提升对异常日志进行问题定位的鲁棒性。It should be noted that, in general, the types of abnormal logs are not evenly distributed. On the one hand, an imbalanced dataset will make the language model tend to predict categories with a dominant data distribution, resulting in the model tending to fail in the prediction of some more severe but not-dominant anomalous categories. On the other hand, the data volume of the abnormal log may not be enough to train a usable and effective model. Therefore, by text expansion of the marked historical abnormal log, the training data can be effectively expanded and the problem location of the abnormal log can be improved. robustness.

在一种可选的实施例中，在对标注后的历史异常日志进行文本扩充，得到扩充后的历史异常日志的过程中，数据分析系统可以获取预设异常类型标签的标签解释文本，并在一种预设异常类型标签与多个标签解释文本相对应的情况下，将每个标签解释文本分别加入至对应预设异常类型标签所对应的每个标注后的历史异常日志中，从而得到多个扩充后的历史异常日志。其中，每一种预设异常类型标签对应至少一个标签解释文本。In an optional embodiment, in the process of performing text expansion on the marked historical abnormality log to obtain the expanded historical abnormality log, the data analysis system may obtain the label explanation text of the preset abnormality type label, and in the process of obtaining the expanded historical abnormality log When a preset exception type label corresponds to multiple label interpretation texts, each label interpretation text is added to each marked historical exception log corresponding to the corresponding preset exception type label, so as to obtain multiple labels. An expanded historical exception log. Wherein, each preset exception type label corresponds to at least one label explanation text.

具体地，在常规应用中，一个典型的多分类任务(文本分类)的数据集格式为：<句子，标签>，并基于此格式的数据集直接的多分类去预测，由于相关系统或模型设法直接从输入的文本去映射标签，因此并没有使用额外的信息。而在本申请中，由于每个异常类型均有一套异常解决机制，因此，可以将前述典型的多分类任务的数据集格式转变为<历史异常日志的系统逻辑的描述文本和系统状态的描述文本，句子_2，标签(0，1)>，其中，句子_2的内容即为标签解释文本，标签(0，1)表示各句子_2中的标签解释文本所对应的预设异常类型标签。Specifically, in conventional applications, the data set format of a typical multi-classification task (text classification) is: <sentence, label>, and the data set in this format is directly multi-classified to predict, because the related system or model tries to Labels are directly mapped from the input text, so no additional information is used. In this application, since each abnormality type has a set of abnormality resolution mechanism, the data set format of the above-mentioned typical multi-classification task can be converted into the description text of the system logic and the description text of the system state of the historical exception log , sentence_2, label (0, 1)>, where the content of sentence_2 is the label interpretation text, and label (0, 1) represents the preset exception type label corresponding to the label interpretation text in each sentence_2 .

进一步地，在获取到每个历史异常日志所对应的句子对后，如图3所示，数据分析系统可以获取每个预设异常类型标签对应的至少一个标签解释文本，并生成解释字典。解释字典中每一对词条格式如下：Further, after obtaining the sentence pair corresponding to each historical exception log, as shown in FIG. 3 , the data analysis system can obtain at least one tag interpretation text corresponding to each preset exception type tag, and generate an interpretation dictionary. The format of each pair of entries in the interpretation dictionary is as follows:

{{

“标签”：[解释1，···，解释N],"label": [Explanation 1, ···, Explanation N],

“标签应对”：[应对1，···，应对M]"Label Coping": [Coping 1, ..., Coping M]

}}

其中，[解释1，···，解释N]表示标签解释文本中对应的预设异常类型标签的异常详情信息，[应对1，···，应对M]表示标签解释文本中对应的预设异常类型标签的异常解决方案，当标签解释文本中只存在解释或只存在标签时，由此可以产生M+N条标签解释文本，当标签解释文本中只存在解释和标签时，由此可以产生M*N条标签解释文本，当标签解释文本既可以只存在解释或只存在标签，也可以只存在解释和标签时，由此可以产生M*N+M+N条标签解释文本Among them, [Explanation 1,...,Explanation N] indicates the exception detail information of the corresponding preset exception type label in the label explanation text, [Response 1,...,Response M] indicates the corresponding preset in the label explanation text An exception solution for exception type labels. When there are only explanations or only labels in the label explanation text, M+N label explanation texts can be generated. When there are only explanations and labels in the label explanation text, it can be generated. M*N pieces of label explanation text, when the label explanation text can have only explanation or only label, or only explanation and label, M*N+M+N pieces of label explanation text can be generated.

之后，基于解释字典，对前述的句子对中的句子_2所对应的内容进行扩充。即对于任意历史异常日志，如果其对应的句子对中的异常类型对应有多种前述的标签解释文本，则可以基于每一条标签解释文本均生成一条与该历史异常日志对应的句子对，从而使得一个历史异常日志具有与其对应的多条属于同一异常类型的句子对，从而实现文本扩充。需要说明的是，在前述文本扩充的过程中，数据分析系统可以基于实际情况，根据各异常类型的数据量，拉平不平衡的数据分布，即使得在最终得到的多个扩充后的历史异常日志(即句子对)中，各异常类型的数据量为均匀分布。After that, based on the interpretation dictionary, the content corresponding to sentence_2 in the aforementioned sentence pair is expanded. That is, for any historical abnormal log, if the abnormal type in the corresponding sentence pair corresponds to a variety of the aforementioned label interpretation texts, a sentence pair corresponding to the historical abnormal log can be generated based on each label interpretation text, so that the A historical anomaly log has multiple corresponding sentence pairs belonging to the same anomaly type, thereby realizing text expansion. It should be noted that, in the process of the aforementioned text expansion, the data analysis system can, based on the actual situation and the amount of data of each abnormal type, level the unbalanced data distribution, that is, in the final obtained multiple expanded historical abnormal logs. (that is, sentence pairs), the amount of data of each abnormal type is uniformly distributed.

需要说明的是，通过基于每一种预设异常类型标签对应的至少一个标签解释文本对历史异常日志进行扩充，实现了对历史异常日志的有效扩充，避免了训练数据中数据分布不平衡，从而提高目标语言模型的鲁棒性。It should be noted that, by expanding the historical abnormal log based on at least one label explanation text corresponding to each preset abnormal type label, the effective expansion of the historical abnormal log is realized, and the data distribution in the training data is avoided. Improve the robustness of the target language model.

需要说明的是，在本申请中，一方面，基于异常日志中的系统逻辑的描述文本，采用TF-IDF算法进行文本向量化表征，并利用K-Means聚类算法，实现了大规模快速的异常和错误自动归纳，另一方面。基于BERT模型，融合异常日志中的系统逻辑的描述文本与系统状态的描述文本进行建模，能全面的将异常日志中的状态和逻辑隐患关系暴露出来，自动化的进行问题定位，再一方面，将多分类任务转换为句子对相似性判断任务，扩充数据，可以有效提升异常日志问题定位方法的鲁棒性。It should be noted that, in this application, on the one hand, based on the description text of the system logic in the exception log, the TF-IDF algorithm is used to carry out the text vectorized representation, and the K-Means clustering algorithm is used to realize a large-scale and fast Exceptions and errors are automatically induced, on the other hand. Based on the BERT model, the description text of the system logic in the exception log and the description text of the system state are combined for modeling, which can comprehensively expose the relationship between the status and logical hidden dangers in the exception log, and automatically locate the problem. On the other hand, Converting the multi-classification task into a sentence pair similarity judgment task and expanding the data can effectively improve the robustness of the abnormal log problem localization method.

实施例2Example 2

根据本发明实施例，提供了一种异常日志的分析装置的实施例，其中，图4是根据本发明实施例的一种可选的异常日志的分析装置的示意图，如图4所示，该装置包括：According to an embodiment of the present invention, an embodiment of an abnormal log analysis apparatus is provided, wherein FIG. 4 is a schematic diagram of an optional abnormal log analysis apparatus according to an embodiment of the present invention. As shown in FIG. 4 , the The device includes:

获取模块401，用于获取系统发生异常时所产生的待处理异常日志；An acquisition module 401, configured to acquire a pending exception log generated when an exception occurs in the system;

输入模块402，用于将待处理异常日志输入至预先训练得到的目标语言模型中，得到多个第一维向量，其中，每个第一维向量表征待处理异常日志与待处理异常日志所对应的每个句子对的相似概率，每个句子对由待处理异常日志与一个标签解释文本组合生成，标签解释文本至少包括预设异常类型标签的异常详情信息和/或异常解决方案；The input module 402 is used to input the to-be-processed exception log into the target language model obtained by pre-training to obtain a plurality of first-dimensional vectors, wherein each first-dimensional vector represents the to-be-processed exception log and the to-be-processed exception log corresponding The similarity probability of each sentence pair of , each sentence pair is generated by the combination of the pending exception log and a label interpretation text, and the label interpretation text includes at least the exception details and/or exception solutions of the preset exception type labels;

第一确定模块403，用于从多个第一维向量中确定相似概率最大的目标第一维向量，并从目标第一维向量对应的句子对中提取目标标签解释文本；The first determination module 403 is used to determine the target first-dimension vector with the largest similarity probability from the plurality of first-dimension vectors, and extract the target label explanation text from the sentence pair corresponding to the target first-dimension vector;

第二确定模块404，用于将目标标签解释文本作为待处理异常日志的分析结果。The second determination module 404 is configured to use the target tag interpretation text as the analysis result of the exception log to be processed.

需要说明的是，上述获取模块401、输入模块402、第一确定模块403以及第二确定模块404对应于上述实施例中的步骤S101至步骤S104，四个模块与对应的步骤所实现的示例和应用场景相同，但不限于上述实施例1所公开的内容。It should be noted that the acquisition module 401 , the input module 402 , the first determination module 403 and the second determination module 404 correspond to steps S101 to S104 in the above embodiment, and the examples and The application scenarios are the same, but are not limited to the content disclosed in the above Embodiment 1.

可选的，多个第一维向量中的每个第一维向量分别对应一个第二维向量，其中，第二维向量表征待处理异常日志与待处理异常日志所对应的每个句子对的不相似概率。Optionally, each first-dimension vector in the plurality of first-dimension vectors corresponds to a second-dimension vector respectively, wherein the second-dimension vector represents the difference between the to-be-processed exception log and each sentence pair corresponding to the to-be-processed exception log. dissimilarity probability.

可选的，异常日志的分析装置还包括：第一子获取模块，用于获取多个历史异常日志，其中，每个历史异常日志中至少包含系统逻辑的描述文本，其中，系统逻辑的描述文本至少包括导致系统发生异常的事件的描述信息；第一处理模块，用于根据系统逻辑的描述文本对每个历史异常日志进行向量化处理，得到每个历史异常日志对应的语义向量。Optionally, the abnormal log analysis device further includes: a first sub-acquisition module for acquiring a plurality of historical abnormal logs, wherein each historical abnormal log contains at least a description text of the system logic, wherein the description text of the system logic It includes at least description information of the event that causes an exception to occur in the system; the first processing module is used to perform vectorized processing on each historical exception log according to the description text of the system logic, and obtain a semantic vector corresponding to each historical exception log.

可选的，第一处理模块还包括：统计模块，用于统计系统逻辑的描述文本中每个单词在每个历史异常日志中出现的第一频率以及每个单词在所有历史异常日志中出现的第二频率；第一计算模块，用于根据第一频率以及第二频率计算得到每个单词在每个历史异常日志中的权重值；第二计算模块，用于在每个历史异常日志中，对每个单词的权重值与每个单词的单词语义向量进行加权求和，得到每个历史异常日志所对应的语义向量。Optionally, the first processing module further includes: a statistics module for counting the first frequency of each word appearing in each historical exception log in the description text of the system logic and the occurrence rate of each word in all historical exception logs. the second frequency; the first calculation module is used for calculating the weight value of each word in each historical abnormal log according to the first frequency and the second frequency; the second calculation module is used for, in each historical abnormal log, The weight value of each word and the word semantic vector of each word are weighted and summed to obtain the semantic vector corresponding to each historical abnormal log.

可选的，异常日志的分析装置还包括：第二处理模块，用于对多个历史异常日志对应的语义向量进行聚类处理，得到每个历史异常日志对应的异常类型，其中，每种异常类型对应至少一个历史异常日志；第二子获取模块，用于获取每种异常类型对应的预设异常类型标签，并将预设异常类型标签标注在对应的历史异常日志上，得到标注后的历史异常日志，其中，一种异常类型与一个预设异常类型标签相对应；第三处理模块，用于根据标注后的历史异常日志训练得到目标语言模型。Optionally, the abnormal log analysis device further includes: a second processing module, configured to perform clustering processing on semantic vectors corresponding to multiple historical abnormal logs, to obtain the abnormal type corresponding to each historical abnormal log, wherein each abnormal The type corresponds to at least one historical exception log; the second sub-acquisition module is used to obtain the preset exception type label corresponding to each exception type, and mark the preset exception type label on the corresponding historical exception log to obtain the marked history An exception log, wherein one exception type corresponds to a preset exception type label; the third processing module is used to train a target language model according to the marked historical exception log.

可选的，第三处理模块还包括：文本扩充模块，用于对标注后的历史异常日志进行文本扩充，得到扩充后的历史异常日志，其中，扩充后的历史异常日志至少包括：系统逻辑的描述文本、系统状态的描述文本以及预设异常类型标签的标签解释文本；第四处理模块，用于基于扩充后的历史异常日志对初始语言模型进行训练，得到目标语言模型。Optionally, the third processing module further includes: a text expansion module, which is used to perform text expansion on the marked historical abnormal log to obtain the expanded historical abnormal log, wherein the expanded historical abnormal log includes at least: system logic; The description text, the description text of the system state, and the label interpretation text of the preset exception type label; the fourth processing module is used for training the initial language model based on the expanded historical exception log to obtain the target language model.

可选的，文本扩充模块还包括：第三子获取模块，用于获取预设异常类型标签的标签解释文本，其中，每一种预设异常类型标签对应至少一个标签解释文本；第五处理模块，用于在一种预设异常类型标签与多个标签解释文本相对应的情况下，将每个标签解释文本分别加入至对应预设异常类型标签所对应的每个标注后的历史异常日志中，得到多个扩充后的历史异常日志。Optionally, the text expansion module further includes: a third sub-acquisition module for acquiring the label interpretation text of the preset exception type labels, wherein each preset exception type label corresponds to at least one label interpretation text; the fifth processing module , which is used to add each label interpretation text to each annotated historical exception log corresponding to the corresponding preset exception type label when one preset exception type label corresponds to multiple label interpretation texts , to get multiple expanded historical exception logs.

可选的，待处理异常日志至少包括系统逻辑的描述文本以及系统状态的描述文本，输入模块还包括：第六处理模块，用于控制目标语言模型在每种预设异常类型标签对应的至少一个标签解释文本中获取一个待组合的标签解释文本，并在系统逻辑的描述文本、系统状态的描述文本以及待组合的标签解释文本之间插入预设分隔符，得到目标文本，在目标文本的句首位置插入预设句首标签，得到每种预设异常类型标签所对应的句子对，基于每种预设异常类型标签所对应的句子对，生成多个第一维向量。Optionally, the exception log to be processed includes at least a description text of the system logic and a description text of the system state, and the input module further includes: a sixth processing module, which is used to control the target language model in at least one corresponding to each preset exception type label. Obtain a tag interpretation text to be combined from the tag interpretation text, and insert a preset separator between the description text of the system logic, the description text of the system state, and the tag interpretation text to be combined to obtain the target text. A preset sentence head tag is inserted at the first position to obtain a sentence pair corresponding to each preset exception type tag, and a plurality of first-dimensional vectors are generated based on the sentence pair corresponding to each preset exception type tag.

实施例3Example 3

实施例4Example 4

根据本发明实施例的另一方面，还提供了一种电子设备，其中，图5是根据本发明实施例的一种可选的电子设备的示意图，如图5所示，电子设备包括存储器和处理器，存储器中存储有计算机程序，处理器被设置为运行计算机程序以执行上述的异常日志的分析方法。According to another aspect of the embodiment of the present invention, an electronic device is also provided, wherein FIG. 5 is a schematic diagram of an optional electronic device according to an embodiment of the present invention. As shown in FIG. 5 , the electronic device includes a memory and a The processor has a computer program stored in the memory, and the processor is configured to run the computer program to execute the above-mentioned method for analyzing the abnormal log.

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

在本发明的上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present invention, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的技术内容，可通过其它的方式实现。其中，以上所描述的装置实施例仅仅是示意性的，例如单元的划分，可以为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，单元或模块的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are only illustrative, for example, the division of units may be a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or integrated into Another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。Units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed over multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例方法的全部或部分步骤。而前述的存储介质包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes .

以上仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, some improvements and modifications can be made without departing from the principles of the present invention, and these improvements and modifications should also be regarded as It is the protection scope of the present invention.

Claims

1. an analysis method of abnormal log, is characterized in that, comprises:

Obtain the pending exception log generated when an exception occurs in the system;

Input the to-be-processed exception log into the target language model obtained by pre-training to obtain a plurality of first-dimensional vectors, wherein each first-dimensional vector represents the to-be-processed exception log corresponding to the to-be-processed exception log The similarity probability of each sentence pair of the solution;

Determine the target first-dimensional vector with the largest similarity probability from the plurality of first-dimensional vectors, and extract the target label explanation text from the sentence pair corresponding to the target first-dimensional vector;

The target label interpretation text is used as the analysis result of the exception log to be processed.

2 . The method according to claim 1 , wherein each first-dimensional vector in the plurality of first-dimensional vectors corresponds to a second-dimensional vector, wherein the second-dimensional vector represents the The dissimilarity probability of each sentence pair corresponding to the exception log to be processed and the exception log to be processed.

3. The method according to claim 1, characterized in that, before inputting the to-be-processed exception log into the target language model obtained by pre-training, the method further comprises:

Acquiring a plurality of historical exception logs, wherein each historical exception log contains at least a description text of the system logic, wherein the description text of the system logic at least includes description information of the event that caused the abnormality of the system;

Perform vectorization processing on each historical exception log according to the description text of the system logic to obtain a semantic vector corresponding to each historical exception log.

4. The method according to claim 3, characterized in that, performing vectorization processing on each historical exception log according to the description text of the system logic to obtain a semantic vector corresponding to each historical exception log, comprising: :

Counting the first frequency of each word in the description text of the system logic appearing in each of the historical abnormal logs and the second frequency of each word appearing in all historical abnormal logs;

Calculate the weight value of each word in each historical abnormal log according to the first frequency and the second frequency;

In each historical abnormal log, weighted summation is performed on the weight value of each word and the word semantic vector of each word to obtain a semantic vector corresponding to each historical abnormal log.

5. The method according to claim 3, characterized in that, after performing vectorization processing on each historical exception log according to the description text of the system logic to obtain a semantic vector corresponding to each historical exception log , the method also includes:

Perform clustering processing on the semantic vectors corresponding to the plurality of historical abnormal logs to obtain the abnormal type corresponding to each historical abnormal log, wherein each abnormal type corresponds to at least one historical abnormal log;

Obtain the preset abnormality type label corresponding to each abnormality type, and mark the preset abnormality type label on the corresponding historical abnormality log to obtain the marked historical abnormality log, wherein one abnormality type is associated with a predetermined abnormality type. Set the exception type label to correspond;

The target language model is obtained by training according to the marked historical abnormal log.

6. The method according to claim 5, wherein the target language model is obtained by training according to the marked historical abnormal log, comprising:

Perform text expansion on the marked historical abnormal log to obtain the expanded historical abnormal log, wherein the expanded historical abnormal log at least includes: the description text of the system logic, the description text of the system state, and the description text of the system state. The label explanation text of the preset exception type label;

An initial language model is trained based on the expanded historical exception log to obtain the target language model.

7. The method according to claim 6, wherein the marked historical abnormal log is subjected to text expansion to obtain the expanded historical abnormal log, comprising:

Obtain the label interpretation text of the preset exception type label, wherein each preset exception type label corresponds to at least one label interpretation text;

In the case where one preset exception type label corresponds to multiple label explanation texts, each label explanation text is added to each marked historical exception log corresponding to the corresponding preset exception type label, and multiple labels are obtained. An expanded historical exception log.

8. The method according to claim 1, wherein the exception log to be processed comprises at least a description text of the system logic and a description text of the system state, wherein the exception log to be processed is input into a pre-trained In the target language model, multiple first-dimensional vectors are obtained, including:

The target language model is controlled to obtain a tag interpretation text to be combined from at least one tag interpretation text corresponding to each preset exception type tag, and the description text of the system logic, the description text of the system state, and all Insert a preset separator between the label interpretation texts to be combined to obtain the target text, insert a preset sentence start label at the sentence start position of the target text, and obtain the sentence pair corresponding to each preset abnormal type label, based on The plurality of first-dimensional vectors are generated for the sentence pairs corresponding to each of the preset abnormality type labels.

9. A device for analyzing abnormal logs, wherein the device comprises:

The acquisition module is used to acquire the pending exception log generated when an exception occurs in the system;

The input module is used to input the pending exception log into the target language model obtained by pre-training, and obtain a plurality of first-dimensional vectors, wherein each first-dimensional vector represents the pending exception log and the pending exception log. Similar probability of each sentence pair corresponding to the processing exception log, each sentence pair is generated by combining the pending exception log and a label interpretation text, the label interpretation text at least includes the exception details of the preset exception type label Information and/or exception resolution;

a first determination module, configured to determine the target first-dimensional vector with the largest similarity probability from the plurality of first-dimensional vectors, and extract the target label explanation text from the sentence pair corresponding to the target first-dimensional vector;

The second determination module is configured to use the target label interpretation text as the analysis result of the exception log to be processed.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein when the program is run, a device where the computer-readable storage medium is located is controlled to execute claims 1 to 8 The analysis method of the abnormal log described in any one of the above.

11. An electronic device comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to execute the computer program to execute the computer program according to any one of claims 1 to 8. The analysis method of the exception log described above.