WO2025113711A1

WO2025113711A1 - Abnormal log processing method and device

Info

Publication number: WO2025113711A1
Application number: PCT/CN2024/136147
Authority: WO
Inventors: 周祥伟; 孙皓; 计叶; 苏妹; 曾宏霞; 周钢
Original assignee: 中国民航信息网络股份有限公司
Priority date: 2023-11-30
Filing date: 2024-12-02
Publication date: 2025-06-05
Also published as: CN117648214A

Abstract

An abnormal log processing method, comprising: collecting abnormal logs generated in the operation process of a service system; preprocessing the abnormal logs; on the basis of a pre-trained log classification model, classifying the preprocessed abnormal logs to obtain corresponding fault types; and searching a pre-created solution library to query solutions matching the fault types, and pushing optimal solutions to operations and maintenance workers. Thus, the operations and maintenance workers can quickly solve or process faults occurring in the operation process of the system, thereby improving the fault solving efficiency of the operations and maintenance workers. Further provided is an abnormal log processing device.

Description

Abnormal log processing method and device

Technical Field

本申请属于计算机技术领域，尤其涉及一种异常日志处理方法及装置。The present application relates to the field of computer technology, and in particular to an abnormal log processing method and device.

Background Art

随着信息技术飞速发展，系统和应用程序等得到广泛应用。而系统应用程序等运行过程中会产生海量的日志数据，其中会存在大量异常日志，通过分析异常日志能定位系统的故障位置及原因。With the rapid development of information technology, systems and applications have been widely used. During the operation of system applications, massive amounts of log data will be generated, including a large number of abnormal logs. By analyzing the abnormal logs, the fault location and cause of the system can be located.

但运维人员分析异常日志确定故障位置时，通常采用如下三种解决途径，一是根据异常日志直接在系统中尝试各种解决方案；二是询问其他经验丰富的运维人员；三是通过搜索运维日志或互联网查找解决方法；无论选择哪种方式都存在故障定位效率低的问题。However, when operation and maintenance personnel analyze exception logs to determine the fault location, they usually adopt the following three solutions: one is to try various solutions directly in the system based on the exception log; the second is to ask other experienced operation and maintenance personnel; the third is to search the operation and maintenance logs or the Internet to find solutions; no matter which method is chosen, there is a problem of low fault location efficiency.

Summary of the invention

有鉴于此，本申请的目的在于提供一种异常日志处理方法及装置，以解决上述的至少部分技术问题，其提供的技术方案如下：In view of this, the purpose of this application is to provide an abnormal log processing method and device to solve at least part of the above technical problems. The technical solution provided is as follows:

第一方面，本申请实施例提供了一种异常日志处理方法，包括：In a first aspect, an embodiment of the present application provides a method for processing abnormal logs, including:

收集业务系统运行过程中产生的异常日志；Collect abnormal logs generated during the operation of business systems;

对异常日志进行预处理；Preprocess the abnormal logs;

基于预先训练的日志分类模型对预处理后的异常日志进行分类得到对应的故障类型；Based on the pre-trained log classification model, the pre-processed abnormal logs are classified to obtain the corresponding fault types;

搜索预先创建的解决方案库，查询与所述故障类型相匹配的解决方案，所述解决方案库包括故障类型以及与相应的解决方案之间的关联关系。A pre-created solution library is searched to query a solution matching the fault type, wherein the solution library includes the fault type and an association relationship between the fault type and the corresponding solution.

第二方面，本申请实施例还提供了一种异常日志处理装置，包括：In a second aspect, the embodiment of the present application further provides an abnormal log processing device, including:

日志采集模块，用于收集业务系统运行过程中产生的异常日志；The log collection module is used to collect abnormal logs generated during the operation of the business system;

日志管理模块，用于对所述异常日志进行预处理，以及调用日志分类模型对预处理后的异常日志进行分类得到对应的故障类别；A log management module, used to pre-process the abnormal logs, and call the log classification model to classify the pre-processed abnormal logs to obtain corresponding fault categories;

解决方案推荐模块，用于搜索预先创建的解决方案库查询与所述故障类别相匹配的解决方案。The solution recommendation module is used to search a pre-created solution library to query for solutions matching the fault category.

与现有技术相比，本申请提供的上述技术方案具有如下优点：本申请提供的异常日志处理方法，通过收集系统(微服务或应用程序)运行过程中的异常日志，利用预先训练好的日志分类模型可以对异常日志进行快速分类。进一步将从预先创建的解决方案库中搜索与该异常日志的分类类别相匹配的解决方案，并将最优解决方案推送至运维人员，从而使运维人员快速解决或处理系统运行过程中出现的故障，提高了运维工作人员的解决故障的效率。Compared with the prior art, the above technical solution provided by the present application has the following advantages: the abnormal log processing method provided by the present application can quickly classify the abnormal logs by collecting the abnormal logs during the operation of the system (microservice or application) and using the pre-trained log classification model. Further, a solution matching the classification category of the abnormal log will be searched from the pre-created solution library, and the optimal solution will be pushed to the operation and maintenance personnel, so that the operation and maintenance personnel can quickly solve or handle the faults that occur during the operation of the system, thereby improving the efficiency of the operation and maintenance personnel in solving faults.

BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1是本申请实施例提供的一种异常日志处理方法的流程图；FIG1 is a flow chart of an abnormal log processing method provided by an embodiment of the present application;

图2是本申请实施例提供的另一种异常日志处理方法的流程图；FIG2 is a flow chart of another abnormal log processing method provided in an embodiment of the present application;

图3是本申请实施例提供的一种异常日志处理装置的结构示意图；FIG3 is a schematic diagram of the structure of an abnormal log processing device provided in an embodiment of the present application;

图4是本申请实施例提供的另一种异常日志处理装置的结构示意图。FIG4 is a schematic diagram of the structure of another abnormal log processing device provided in an embodiment of the present application.

DETAILED DESCRIPTION

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present application clearer, the technical solution in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

为了下述各实施例的描述清楚简洁，首先给出相关技术的简要介绍：In order to make the description of the following embodiments clear and concise, a brief introduction to the related technology is first given:

自然语言处理(NLP)：NLP技术可以将文本数据转化为计算机可以理解和处理的形式。在日志分类中，NLP可以应用于文本分词、词性标注、命名实体识别等任务，以提取有用的特征信息。Natural Language Processing (NLP): NLP technology can convert text data into a form that computers can understand and process. In log classification, NLP can be applied to tasks such as text segmentation, part-of-speech tagging, and named entity recognition to extract useful feature information.

机器学习：机器学习算法可用于通过训练模型来识别和分类不同类型的日志。常见的机器学习算法包括朴素贝叶斯分类器、支持向量机(SVM)、决策树、随机森林等。这些算法可以学习日志文本的特征模式，并推断出日志所属的类别。Machine Learning: Machine learning algorithms can be used to identify and classify different types of logs by training models. Common machine learning algorithms include Naive Bayes classifier, Support Vector Machine (SVM), Decision Tree, Random Forest, etc. These algorithms can learn the characteristic patterns of log text and infer the category to which the log belongs.

特征工程：特征工程是指从原始数据中构建有效的特征表示，以提高分类算法的性能。在日志分类中，特征工程可以包括词频统计、TF-IDF(词频-逆文档频率)计算、词嵌入(Word Embedding)等技术。Feature Engineering: Feature engineering refers to building effective feature representations from raw data to improve the performance of classification algorithms. In log classification, feature engineering can include word frequency statistics, TF-IDF (term frequency-inverse document frequency) calculation, word embedding and other technologies.

深度学习：深度学习技术如卷积神经网络(CNN)或循环神经网络(RNN)在日志分类中也有应用。这些模型能够自动从数据中学习特征，并通过多层神经网络进行分类和预测。Deep learning: Deep learning techniques such as convolutional neural networks (CNN) or recurrent neural networks (RNN) are also used in log classification. These models can automatically learn features from data and perform classification and prediction through multi-layer neural networks.

增量学习：由于日志数据量庞大且不断增长，采用增量学习算法可以实现对新日志的快速分类和更新模型，而无需重新训练整个模型。Incremental learning: Since the amount of log data is huge and growing, the use of incremental learning algorithms can achieve rapid classification of new logs and update models without retraining the entire model.

请参见图1，示出了本申请实施例提供的一种异常日志处理方法的流程图，该方法应用于运维系统，可以包括以下步骤：Please refer to FIG1 , which shows a flow chart of an abnormal log processing method provided by an embodiment of the present application. The method is applied to an operation and maintenance system and may include the following steps:

S101，收集业务系统运行过程中产生的异常日志。S101, collecting abnormal logs generated during the operation of the business system.

从各个系统(微服务或应用程序)中采集运行过程中产生的异常日志，采集对象以异常日志为主，包括不同类型、不同级别、不同事件的日志。例如，异常日志的级别可包括：Collect exception logs generated during the operation of each system (microservice or application). The collection object is mainly exception logs, including logs of different types, levels, and events. For example, the levels of exception logs may include:

调试(Debug)：记录调试相关的信息，用于开发和排查问题。Debug: Records debugging-related information for development and troubleshooting.

警告(Warning)：记录警告信息，表示可能出现问题或潜在的错误。Warning: Records warning information, indicating possible problems or potential errors.

错误(Error)：记录错误信息，表示出现了具体的错误或故障。Error: Records error information, indicating that a specific error or failure has occurred.

严重(Critical)：记录严重错误信息，表示出现了非常严重的故障，可能导致系统崩溃或无法正常工作。Critical: A serious error message is recorded, indicating that a very serious failure has occurred, which may cause the system to crash or fail to work properly.

致命(Fatal)：记录致命错误信息，表示出现了无法恢复的错误，系统无法继续运行。Fatal: Records fatal error information, indicating that an unrecoverable error has occurred and the system cannot continue to run.

在一示例性实施例中，可以根据系统或微服务记录日志的框架结构部署相应的Agent采集程序。可以配置采集的日志级别，如警告、错误、严重和致命等级别。当系统或微服务的日志达到配置的采集级别时，会触发异常日志采集动作，例如，当有错误标识的日志时，收集该日志。In an exemplary embodiment, a corresponding Agent collection program can be deployed according to the framework structure of the system or microservice log recording. The collected log level can be configured, such as warning, error, severe and fatal levels. When the log of the system or microservice reaches the configured collection level, an abnormal log collection action will be triggered, for example, when there is a log with an error mark, the log is collected.

S102，对异常日志进行数据预处理。S102, performing data preprocessing on the abnormal log.

日志的预处理过程对于分类算法的效果至关重要。预处理步骤可以包括去除停用词、标准化文本格式(即日志格式化)、处理缺失值和异常值等操作，以减少噪音干扰并提高分类准确性。The preprocessing of logs is crucial to the effectiveness of the classification algorithm. The preprocessing step can include removing stop words, standardizing the text format (i.e. log formatting), processing missing values and outliers, etc. to reduce noise interference and improve classification accuracy.

在本申请实施例中，日志格式化的目的是将采集的日志进行数据标准化处理，原始的日志信息按一定格式进行整理和组织，以便后续的处理、存储和分析。In the embodiment of the present application, the purpose of log formatting is to perform data standardization processing on the collected logs, and the original log information is sorted and organized in a certain format to facilitate subsequent processing, storage and analysis.

在一示例性实施例中，日志数据格式化的方法可以包括如下几种：In an exemplary embodiment, the method of formatting log data may include the following:

(1)时间戳(1) Timestamp

在日志中添加时间戳信息，记录事件发生的具体时间。时间戳可以采用标准的日期时间格式，例如，“2023-06-21T10:30:00Z”或其他自定义格式。Add timestamp information to the log to record the specific time when the event occurred. The timestamp can be in a standard date and time format, for example, "2023-06-21T10:30:00Z" or other custom formats.

(2)日志级别(2) Log level

为每个日志条目指定级别，以表示其重要性或严重程度。常见的日志级别包括调试、警告、错误、严重、致命等。Assign a level to each log entry to indicate its importance or severity. Common log levels include debug, warning, error, critical, fatal, etc.

(3)日志来源(3) Log source

记录生成该日志的模块、组件或应用程序的名称或标识符，以便于追踪日志来源。Record the name or identifier of the module, component, or application that generates the log to facilitate tracing the source of the log.

(4)消息内容(4) Message content

记录日志的具体内容，包括描述性的文字信息、错误堆栈跟踪、异常信息等。确保消息内容简洁明了，能清晰传达所需的信息。Record the specific content of the log, including descriptive text information, error stack trace, exception information, etc. Make sure the message content is concise and can clearly convey the required information.

(5)上下文信息(5) Contextual Information

根据需要，添加与日志相关的上下文信息。这些信息可能包括用户ID、设备ID、请求参数、响应码等，以便更好地理解和分析日志。Add contextual information related to the log as needed. This information may include user ID, device ID, request parameters, response code, etc., to better understand and analyze the log.

(6)分隔符和格式规范(6) Separators and format specifications

使用适当的分隔符和格式规范确保日志数据的可读性和易解析性。如，可以使用逗号、制表符等分隔符，或者采用常见的日志格式，如Apche日志格式或Json格式。Use appropriate delimiters and format specifications to ensure readability and easy parsing of log data. For example, you can use delimiters such as commas and tabs, or use common log formats such as Apache log format or Json format.

(7)可视化(7) Visualization

根据需要，将格式化的日志数据以图形化或表格化的形式展示，提供更直观的观察和分析手段。As needed, the formatted log data can be displayed in graphical or tabular form to provide more intuitive observation and analysis methods.

S103，提取预处理后的异常日志的特征向量，并输入至预先训练的日志分类模型进行分类得到故障类别。S103, extracting the feature vector of the preprocessed abnormal log and inputting it into a pre-trained log classification model for classification to obtain a fault category.

日志分类模型是基于机器学习算法的模型，设计日志分类模型并利用训练数据训练该模型，使模型自动从训练数据中学习日志分类方法，最终得到训练好的日志分类模型。The log classification model is a model based on the machine learning algorithm. The log classification model is designed and trained with training data so that the model automatically learns the log classification method from the training data, and finally a trained log classification model is obtained.

在一实施例中，利用日志分类模型获得日志分类的过程可以包括：In one embodiment, the process of obtaining log classification using the log classification model may include:

(1)特征提取(1) Feature extraction

可以从格式化后的异常日志数据中提取有用的特征，如可包括时间戳、日志级别、组件名称、关键字等。此外，对于文本特征，可以进行分词、去停用词、词干提取等操作。Useful features can be extracted from the formatted exception log data, such as timestamp, log level, component name, keyword, etc. In addition, for text features, operations such as word segmentation, stop word removal, and stem extraction can be performed.

例如，日志数据“10.20.20.10；[2018-07-16 13:12:57]；GET/online/sample HTTP/1.1；200”，分词依据标点或特殊符号如；:<>[]{}/\n\t\r等进行切分，该日志分词后为：10.20.20.10 2018-07-16 13 12 57GET online sample HTTP 1.1 200。分词后可以进行词干提取，该日志提取的词干为：get online sample http。For example, the log data "10.20.20.10; [2018-07-16 13:12:57]; GET/online/sample HTTP/1.1; 200" is segmented based on punctuation or special symbols such as; :<>[]{}/\n\t\r, etc., and the log is segmented as follows: 10.20.20.10 2018-07-16 13 12 57GET online sample HTTP 1.1 200. After segmentation, stem extraction can be performed, and the stem extracted from the log is: get online sample http.

(2)特征向量化(2) Feature vectorization

将提取的特征转换为数值表示。如可采用独热编码、词袋模型(Bag of Words)、TF-IDF(Term Frequency-Inverse Document Frequency)等向量转换工具将文本数据转换为稀疏向量或者密集向量。Convert the extracted features into numerical representations. For example, vector conversion tools such as one-hot encoding, bag of words, TF-IDF (Term Frequency-Inverse Document Frequency) can be used to convert text data into sparse vectors or dense vectors.

(3)将日志数据对应的向量输入至日志分类模型，通过模型的预测函数输出该日志对应的类别标签。(3) The vector corresponding to the log data is input into the log classification model, and the category label corresponding to the log is output through the prediction function of the model.

S104，搜索预先建立的解决方案库获得该故障类别对应的最优解决方案，并推送最优解决方案。S104, searching a pre-established solution library to obtain an optimal solution corresponding to the fault category, and pushing the optimal solution.

在一实施例中，基于规则匹配模型建立故障与解决方案之间的关联关系，形成解决方案库。例如，解决方案库可以采用如下表1的方式：In one embodiment, an association relationship between faults and solutions is established based on a rule matching model to form a solution library. For example, the solution library may be in the form of the following Table 1:

表1

Table 1

基于日志分类模型分析得到的异常日志的类别标签，搜索已有的解决方案库，查询与当前异常日志的故障表现和原因相匹配的解决方案，例如，可以根据异常日志的类别标签、关键词或故障描述等进行搜索，以便快速找到相关的解决方案。Based on the category labels of the exception logs obtained by the log classification model analysis, the existing solution library is searched to query solutions that match the fault manifestations and causes of the current exception logs. For example, the search can be performed based on the category labels, keywords or fault descriptions of the exception logs to quickly find relevant solutions.

例如，如果根据异常日志确定当前故障是数据库故障，并且由于锁冲突造成，则该异常日志的类别为“数据库故障-锁冲突”，进一步从解决方案库中搜索与“数据库故障-锁冲突”相匹配的解决方案。For example, if it is determined from the exception log that the current fault is a database fault and is caused by a lock conflict, the category of the exception log is "database fault-lock conflict", and a solution matching "database fault-lock conflict" is further searched from the solution library.

进一步地，在从解决方案库中搜索到与异常日志相匹配的解决方案后，可以评估搜索到的解决方案与故障是否相匹配，例如，可以基于两者之间的置信度来判定，置信度越高表明解决方案与故障相匹配，从而确保搜索到的解决方案中的步骤和建议适用于当前的故障情况，即提高了搜索的解决方案的准确性。例如，解决方案与故障之间的置信度可以通过熵值计算得到，熵值可以通过一定计算逻辑得到，熵值越小置信度越高。Furthermore, after searching for a solution that matches the abnormal log from the solution library, it is possible to evaluate whether the searched solution matches the fault. For example, it can be determined based on the confidence between the two. The higher the confidence, the more the solution matches the fault, thereby ensuring that the steps and suggestions in the searched solution are applicable to the current fault situation, that is, improving the accuracy of the searched solution. For example, the confidence between the solution and the fault can be calculated by entropy value, which can be obtained by certain calculation logic. The smaller the entropy value, the higher the confidence.

在一示例性实施例中，可以向运维人员使用的终端设备发送最优解决方案，或者，在运维系统的客户端显示最优解决方案，本申请对推送最优解决方案的具体方式不做特殊限定。In an exemplary embodiment, the optimal solution may be sent to a terminal device used by an operation and maintenance personnel, or the optimal solution may be displayed on a client of the operation and maintenance system. This application does not specifically limit the specific method of pushing the optimal solution.

此外，在本申请的其他实施例中，将搜索到的解决方案推送给运维人员后，运维人员对解决方案进行反馈，即推送的解决方案能否解决异常日志对应的故障问题，而且，会针对此反馈结果对日志分类模型进行进一步的强化训练。In addition, in other embodiments of the present application, after the searched solution is pushed to the operation and maintenance personnel, the operation and maintenance personnel will provide feedback on the solution, that is, whether the pushed solution can solve the fault problem corresponding to the abnormal log, and the log classification model will be further strengthened and trained based on this feedback result.

此外，实际应用过程中，通过不断地优化和更新解决方案库，使其包含最新的、最适用的解决方案，提高解决方案库的准确率和适用性。In addition, during the actual application process, the solution library is continuously optimized and updated to include the latest and most applicable solutions, thereby improving the accuracy and applicability of the solution library.

本实施例提供的异常日志处理方法，通过收集系统(微服务或应用程序)运行过程中的异常日志，利用预先训练好的日志分类模型可以对异常日志进行快速分类。进一步将从预先创建的解决方案库中搜索与该异常日志的分类类别相匹配的解决方案，并将最优解决方案推送至运维人员，从而使运维人员快速解决或处理系统运行过程中出现的故障，提高了运维工作人员的解决故障的效率。The abnormal log processing method provided in this embodiment collects abnormal logs during the operation of the system (microservice or application) and uses a pre-trained log classification model to quickly classify the abnormal logs. Further, a solution matching the classification category of the abnormal log is searched from a pre-created solution library, and the optimal solution is pushed to the operation and maintenance personnel, so that the operation and maintenance personnel can quickly solve or handle the faults that occur during the operation of the system, thereby improving the efficiency of the operation and maintenance personnel in solving faults.

请参见图2，示出了本申请实施例提供的另一种异常日志处理方法的流程图，本实施例将详细阐述异常日志的处理过程，并且着重介绍日志分类模型的训练过程。如图2所示，该方法可以包括以下步骤：Please refer to Figure 2, which shows a flowchart of another abnormal log processing method provided by an embodiment of the present application. This embodiment will elaborate on the abnormal log processing process and focus on the training process of the log classification model. As shown in Figure 2, the method may include the following steps:

S201，收集业务系统运行过程产生的异常日志。S201, collecting abnormal logs generated during the operation of the business system.

S202，对异常日志进行预处理。S202: Pre-process the abnormal log.

其中，S201～S202的实施过程与S101～S102相同，此处不再赘述。The implementation process of S201 to S202 is the same as that of S101 to S102, and will not be repeated here.

S203，利用预先训练的日志分类模型对预处理后的异常日志进行分类；若模型能对给异常日志自动分类则执行S204；若不能对该异常日志自动分类则执行S205即利用无法自动分类的异常日志数据对日志分类模型进行针对性训练，从而提高日志分类模型的适用性。S203, classify the preprocessed exception logs using the pre-trained log classification model; if the model can automatically classify the exception logs, execute S204; if the model cannot automatically classify the exception logs, execute S205, that is, use the exception log data that cannot be automatically classified to conduct targeted training on the log classification model, thereby improving the applicability of the log classification model.

从异常日志中提取特征向量并输入至训练好的日志分类模型中，该模型对该异常日志的类别标签进行预测，得到预测的各个类别标签，并按照各个预测类别标签的置信度由高到低进行排序，最终确定置信度最高的类别标签为该异常日志的故障类别。The feature vector is extracted from the abnormal log and input into the trained log classification model. The model predicts the category label of the abnormal log, obtains the predicted category labels, and sorts them from high to low according to the confidence of each predicted category label. Finally, the category label with the highest confidence is determined as the fault category of the abnormal log.

提取异常日志的特征向量的过程请参见图1所示实施例中S103的相关内容，此处不再赘述。For the process of extracting the feature vector of the abnormal log, please refer to the relevant content of S103 in the embodiment shown in FIG. 1 , which will not be described in detail here.

S204，搜索预先创建的解决方案库，查询与异常日志的故障类别相匹配的解决方案并推送。S204, searching a pre-created solution library, searching for solutions that match the fault category of the exception log, and pushing them.

S205，构建故障分类体系。S205, construct a fault classification system.

故障分类体系是指将故障按照一定的规则和标准进行分类，以便更好地管理和维护系统。分类体系是从多个维度出发进行构建的，这些维度包括不限于异常分类、性能分类、配置问题分类等。The fault classification system is to classify faults according to certain rules and standards in order to better manage and maintain the system. The classification system is constructed from multiple dimensions, including but not limited to abnormal classification, performance classification, configuration problem classification, etc.

日志数据可按不同的分类方式进行分类，在一些实施例中，日志数据分类方式可包括如下几种：Log data may be classified in different ways. In some embodiments, the log data classification methods may include the following:

(1)按日志级别分类(1) Classification by log level

将日志根据其重要性或严重程度进行分类，如警告、错误、严重和致命等级别。此种分类方式可帮助运维人员(或开发人员)更好地理解日志的含义和重要程度，并快速定位问题。Classify logs according to their importance or severity, such as warning, error, severe, and fatal. This classification method can help operation and maintenance personnel (or developers) better understand the meaning and importance of logs and quickly locate problems.

(2)按日志类型分类(2) Classification by log type

根据日志的功能或用途进行分类，如应用程序日志、系统日志、安全日志、性能日志访问日志等。每类日志记录不同方面的信息，有助于针对特定问题进行分析和排查。Logs are classified according to their functions or purposes, such as application logs, system logs, security logs, performance logs, access logs, etc. Each type of log records different aspects of information, which helps to analyze and troubleshoot specific problems.

(3)按日志来源分类(3) Classification by log source

根据生成日志的模块、组件或应用程序进行分类，例如，数据库日志、网络日志、服务器日志、应用程序日志等。此种分类方式可帮助在复杂的系统中追踪和筛选相关的日志。Categorize logs based on the module, component, or application that generates them, such as database logs, network logs, server logs, application logs, etc. This classification method can help track and filter related logs in complex systems.

(4)按时间周期分类(4) Classification by time period

将日志根据时间周期进行分类，例如按天、按周、按月或按年等。这种分类方式有助于实践中的日志存储和归档策略。Classify logs according to time periods, such as daily, weekly, monthly, or yearly. This classification method helps in the implementation of log storage and archiving strategies.

(5)按日志格式分类(5) Classification by log format

根据日志数据的格式进行分类，例如文本日志、JSON日志、XML日志等。不同的格式适用于不同的应用场景和分析需求。Log data is classified according to its format, such as text log, JSON log, XML log, etc. Different formats are suitable for different application scenarios and analysis requirements.

(6)按关键词分类(6) Classification by keywords

按照关键词分类：根据日志中的关键词或特定内容对日志进行分类，例如错误日志、警告日志、异常日志、信息日志等。Classification by keywords: Classify logs according to keywords or specific content in the logs, such as error logs, warning logs, exception logs, information logs, etc.

实际应用中，可以选择以上至少两种分类方式，不同的应用场景可以选择不同的分类方式。In practical applications, at least two of the above classification methods can be selected, and different classification methods can be selected for different application scenarios.

S206，标注历史异常日志数据得到故障样本数据。S206, annotating historical abnormal log data to obtain fault sample data.

在一些实施例中，人工对故障样本数据进行标注，如可以根据上一步骤构建的故障分类体系确定故障样本数据(即历史异常日志数据)的故障类型及标注内容，得到标注数据。In some embodiments, the fault sample data is manually labeled, for example, the fault type and labeling content of the fault sample data (ie, historical abnormal log data) can be determined according to the fault classification system constructed in the previous step to obtain the labeled data.

S207，设计日志分类模型。S207, design a log classification model.

选择分类模型，如分类模型包括朴素贝叶斯模型、支持向量(SVM)机模型、决策树模型、随机森林等。根据具体情况，还可以使用深度学习模型，如卷积神经网络(CNN)模块或循环神经网络(RNN)模型。Select a classification model, such as a naive Bayes model, a support vector machine (SVM) model, a decision tree model, a random forest, etc. Depending on the specific situation, a deep learning model such as a convolutional neural network (CNN) module or a recurrent neural network (RNN) model can also be used.

S208，训练日志分类模型。S208, training a log classification model.

将标注的故障样本数据划分为训练集和测试集。选择适当的评估指标，利用训练集对建立的日志分类模型进行训练，具体的训练过程可包括：提取故障样本数据的特征向量并输入至日志分类模型进行分类得到类别预测结果。然后，基于损失函数计算该类别预测结果与该故障样本数据对应的标注故障类型之间的准确率、精确率和召回率等指标，若指标不在允许范围内，则根据需要对日志分类模型参数进行调优，直到各指标在允许范围内，得到可以使用的日志分类模型。The labeled fault sample data is divided into a training set and a test set. Appropriate evaluation indicators are selected, and the established log classification model is trained using the training set. The specific training process may include: extracting the feature vector of the fault sample data and inputting it into the log classification model for classification to obtain the category prediction result. Then, based on the loss function, the accuracy, precision, and recall rate between the category prediction result and the labeled fault type corresponding to the fault sample data are calculated. If the indicator is not within the allowable range, the log classification model parameters are adjusted as needed until each indicator is within the allowable range to obtain a usable log classification model.

此外，利用测试数据集对训练得到的日志分类模型进行测试，提取测试数据集的特征向量，输入至待测试的日志分类模型，通过该模型的预测函数输出各测试数据对应的类别标签。然后计算准确率、精确率和召回率等指标，若各指标均在允许范围内，则确保训练得到的日志分类模型准确率和泛化性能符合应用。In addition, the trained log classification model is tested using the test data set, the feature vector of the test data set is extracted, and input into the log classification model to be tested, and the category label corresponding to each test data is output through the prediction function of the model. Then, indicators such as accuracy, precision, and recall are calculated. If all indicators are within the allowable range, it is ensured that the accuracy and generalization performance of the trained log classification model meet the application requirements.

利用训练好的日志分类模型对收集到的异常日志数据进行分类，得到类别标签，即训练好的日志分类模型可以用于执行S203。The collected abnormal log data is classified using the trained log classification model to obtain a category label, that is, the trained log classification model can be used to execute S203.

本实施例提供的异常日志处理方法，通过对历史异常日志数据进行标注得到故障样本数据，进一步提取故障样本数据的特征向量，并将该特征向量输入至初始日志分类模型中，利用初始日志分类模型对该故障样本数据进行分类得到相应的类别标签。基于模型得到的类别与故障样本数据标注的类别计算得到准确率、召回率等指标，最后根据指标数据调整模型参数，直到各指标满足要求。模型训练过程简单快速，日志分类模型的准确率和效率等方面都优于基于规则和关键字的日志分类方式。因此，利用该异常日志处理方法可以提高异常日志分类的效率，而且，该方法还能直接搜索到与异常日志的故障类型相匹配的解决方案，并将最优解决方案推送至运维人员，从而使运维人员快速解决或处理系统运行过程中出现的故障，提高了运维工作人员的解决故障的效率。The abnormal log processing method provided in this embodiment obtains fault sample data by annotating historical abnormal log data, further extracts the feature vector of the fault sample data, and inputs the feature vector into the initial log classification model, and uses the initial log classification model to classify the fault sample data to obtain the corresponding category label. Based on the category obtained by the model and the category annotated by the fault sample data, the accuracy rate, recall rate and other indicators are calculated, and finally the model parameters are adjusted according to the indicator data until each indicator meets the requirements. The model training process is simple and fast, and the accuracy and efficiency of the log classification model are superior to the log classification method based on rules and keywords. Therefore, the efficiency of abnormal log classification can be improved by using the abnormal log processing method. Moreover, the method can also directly search for solutions that match the fault type of the abnormal log, and push the optimal solution to the operation and maintenance personnel, so that the operation and maintenance personnel can quickly solve or handle the faults that occur during the operation of the system, thereby improving the efficiency of the operation and maintenance staff in solving faults.

请参见图3，示出了本申请实施例提供的一种异常日志处理装置的结构示意图，如图3所示，该装置可以包括：Please refer to FIG3 , which shows a schematic diagram of the structure of an abnormal log processing device provided in an embodiment of the present application. As shown in FIG3 , the device may include:

日志采集模块101，用于收集业务系统运行过程中产生的异常日志。The log collection module 101 is used to collect abnormal logs generated during the operation of the business system.

日志管理模块102，对于采集到的异常日志进行管理，如主要包括日志格式化和日志分类。The log management module 102 manages the collected abnormal logs, such as mainly including log formatting and log classification.

日志格式化的目的是将采集到的日志进行数据标准化处理，原始的日志信息会按照一定格式进行整理和组织，以便于后续的处理、存储和分析。The purpose of log formatting is to standardize the collected logs. The original log information will be sorted and organized in a certain format to facilitate subsequent processing, storage and analysis.

日志分类是按照不同的分类方式对日志数据进行分类，本模块的日志分类会通过调用日志分类算法进行快速分类。Log classification is to classify log data according to different classification methods. The log classification of this module will be quickly classified by calling the log classification algorithm.

日志分类模型103，主要利用训练数据对分类模型进行训练，自动从训练数据中学习生成用于日志分类的模型，并将学习到的模型应用于未知的日志数据进行分类，实现日志快速分类。The log classification model 103 mainly uses the training data to train the classification model, automatically learns and generates a model for log classification from the training data, and applies the learned model to unknown log data for classification, thereby realizing rapid log classification.

解决方案库104，用来组织和存储解决问题的方法、技巧、最佳实践和经验的知识库。解决方案会进行归类和分类，可以按照问题类型、技术领域、产品模块等多个维度对解决方案进行分类，此分类会与日志分类产生交集和关联，从而为方案推荐提供依据。The solution library 104 is a knowledge base for organizing and storing methods, techniques, best practices and experiences for solving problems. Solutions will be classified and categorized according to multiple dimensions such as problem type, technical field, product module, etc. This classification will intersect and correlate with the log classification, thereby providing a basis for solution recommendation.

解决方案推荐模块105，用来从解决方案库104中快速查找已有的解决方案中与异常日志的故障类型相匹配的最优解决方案并推送给运维人员。The solution recommendation module 105 is used to quickly search for the optimal solution that matches the fault type of the abnormal log from the existing solutions in the solution library 104 and push it to the operation and maintenance personnel.

本实施例提供的异常日志处理装置，通过日志采集模块收集系统(微服务或应用程序)运行过程中的异常日志，并发送至日志管理模块进行进一步处理。日志管理模块对异常日志进行预处理后，基于日志分类模型可以对异常日志进行快速分类。进一步由解决方案推荐模块从预先创建的解决方案库中搜索与该异常日志的分类类别相匹配的解决方案，并将最优解决方案推送至运维人员，从而使运维人员快速解决或处理系统运行过程中出现的故障，提高了运维工作人员的解决故障的效率。The abnormal log processing device provided in this embodiment collects the abnormal logs during the operation of the system (microservice or application) through the log collection module, and sends them to the log management module for further processing. After the log management module pre-processes the abnormal logs, the abnormal logs can be quickly classified based on the log classification model. The solution recommendation module further searches for solutions that match the classification category of the abnormal log from the pre-created solution library, and pushes the optimal solution to the operation and maintenance personnel, so that the operation and maintenance personnel can quickly solve or handle the faults that occur during the operation of the system, thereby improving the efficiency of the operation and maintenance staff in solving faults.

请参见图4，示出了本申请实施例提供的另一种异常日志处理装置的结构示意图，如图4所示，该装置在图3所示实施例的基础上还可以包括：Please refer to FIG. 4, which shows a schematic diagram of the structure of another abnormal log processing device provided in an embodiment of the present application. As shown in FIG. 4, the device may further include, based on the embodiment shown in FIG. 3:

故障分类体系构建模块201，用于构建故障分类体系。The fault classification system building module 201 is used to build a fault classification system.

数据标注模块202，用于标注历史异常日志数据得到故障样本数据。The data annotation module 202 is used to annotate historical abnormal log data to obtain fault sample data.

可以根据构建的故障分类体系确定故障样本数据(即历史异常日志数据)的故障类型及标注内容，得到标注数据。The fault type and annotation content of the fault sample data (ie, historical abnormal log data) can be determined according to the constructed fault classification system to obtain the annotation data.

分类模型训练模块203，用于利用故障样本数据对初始日志分类模型进行训练。The classification model training module 203 is used to train the initial log classification model using the fault sample data.

本实施例提供的异常日志处理装置，通过对历史异常日志数据进行标注得到故障样本数据，进一步提取故障样本数据的特征向量，并将该特征向量输入至初始日志分类模型中，利用初始日志分类模型对该故障样本数据进行分类得到相应的类别标签。基于模型得到的类别与故障样本数据标注的类别计算得到准确率、召回率等指标，最后根据指标数据调整模型参数，直到各指标满足要求。模型训练过程简单快速，日志分类模型的准确率和效率等方面都优于基于规则和关键字的日志分类方式。因此，利用该异常日志处理方法可以提高异常日志分类的效率，而且，该方法还能直接搜索到与异常日志的故障类型相匹配的解决方案，并将最优解决方案推送至运维人员，从而使运维人员快速解决或处理系统运行过程中出现的故障，提高了运维工作人员的解决故障的效率。The abnormal log processing device provided in this embodiment obtains fault sample data by annotating historical abnormal log data, further extracts the feature vector of the fault sample data, and inputs the feature vector into the initial log classification model, and uses the initial log classification model to classify the fault sample data to obtain the corresponding category label. Based on the category obtained by the model and the category annotated by the fault sample data, the accuracy rate, recall rate and other indicators are calculated, and finally the model parameters are adjusted according to the indicator data until each indicator meets the requirements. The model training process is simple and fast, and the accuracy and efficiency of the log classification model are superior to the log classification method based on rules and keywords. Therefore, the efficiency of abnormal log classification can be improved by using this abnormal log processing method. Moreover, the method can also directly search for solutions that match the fault type of the abnormal log, and push the optimal solution to the operation and maintenance personnel, so that the operation and maintenance personnel can quickly solve or handle the faults that occur during the operation of the system, thereby improving the efficiency of the operation and maintenance staff in solving faults.

本申请提供了一种计算设备，该计算设备包括处理器和存储器，该存储器内存储有可在处理器上运行的程序。该处理器运行存储器内存储的该程序时实现上述的异常日志处理方法实施例。The present application provides a computing device, which includes a processor and a memory, wherein the memory stores a program that can be run on the processor. When the processor runs the program stored in the memory, the above-mentioned abnormal log processing method embodiment is implemented.

本申请还提供了一种计算设备可执行的存储介质，该存储介质中存储有程序，该程序由计算设备执行时实现上述的异常日志处理方法。The present application also provides a storage medium executable by a computing device, in which a program is stored. When the program is executed by the computing device, the above-mentioned abnormal log processing method is implemented.

需要说明的是，本说明书中的各个实施例记载的技术特征可以相互替代或组合，也可以根据实际需要进行顺序调整、合并和删减。每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似的部分互相参见即可。对于装置类实施例而言，由于其与方法实施例基本相似，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。It should be noted that the technical features recorded in the various embodiments in this specification can be replaced or combined with each other, and can also be adjusted, merged and deleted in order according to actual needs. Each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.

作为分离部件说明的模块或子模块可以是或者也可以不是物理上分开的，作为模块或子模块的部件可以是或者也可以不是物理模块或子模块，即可以位于一个地方，或者也可以分布到多个网络模块或子模块上。可以根据实际的需要选择其中的部分或者全部模块或子模块来实现本实施例方案的目的。The modules or submodules described as separate components may or may not be physically separated, and the components of the modules or submodules may or may not be physical modules or submodules, that is, they may be located in one place, or they may be distributed on multiple network modules or submodules. Some or all of the modules or submodules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能模块或子模块可以集成在一个处理模块中，也可以是各个模块或子模块单独物理存在，也可以两个或两个以上模块或子模块集成在一个模块中。上述集成的模块或子模块既可以采用硬件的形式实现，也可以采用软件功能模块或子模块的形式实现。In addition, each functional module or submodule in each embodiment of the present application may be integrated into one processing module, or each module or submodule may exist physically separately, or two or more modules or submodules may be integrated into one module. The above-mentioned integrated modules or submodules may be implemented in the form of hardware or in the form of software functional modules or submodules.

最后，还需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。Finally, it should be noted that, in this article, relational terms such as first and second, etc. are merely used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any actual relationship or order between these entities or operations.

以上所述仅是本申请的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本申请的保护范围。The above is only a preferred implementation of the present application. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principles of the present application. These improvements and modifications should also be regarded as the scope of protection of the present application.

Claims

A method for processing abnormal logs, characterized by comprising:

Collect abnormal logs generated during the operation of business systems;

Preprocess the abnormal logs;

Based on the pre-trained log classification model, the pre-processed abnormal logs are classified to obtain the corresponding fault types;

A pre-created solution library is searched to query a solution matching the fault type, wherein the solution library includes the fault type and an association relationship between the fault type and the corresponding solution.

The method according to claim 1 is characterized in that the pre-processed abnormal logs are classified based on the pre-trained log classification model to obtain corresponding fault types, including:

Extract the feature vector of the preprocessed abnormal log;

The feature vector is input into a pre-trained log classification model for classification to obtain the fault type corresponding to the abnormal log.

The method according to claim 1, characterized in that the method further comprises:

If the log classification model cannot obtain the fault type corresponding to the abnormal log, the log classification model is strengthened and trained based on the abnormal log.

The method according to any one of claims 1 to 3 is characterized in that the method further comprises: pushing the searched solution to operation and maintenance personnel.

The method according to claim 4, characterized in that the method further comprises:

Receive feedback from operation and maintenance personnel on the pushed solutions;

If the feedback result is that the solution cannot solve the fault problem of the abnormal log, the log classification model is trained based on the abnormal log.

The method according to claim 1, characterized in that the preprocessing of the abnormal log comprises:

Stop words, missing values and abnormal values in the abnormal log are removed, and the abnormal log is formatted to obtain a standard format log.

The method according to any one of claims 1 to 3, characterized in that the training process of the log classification model comprises:

Build a fault classification system;

Annotate historical abnormal logs to obtain fault sample data;

Extracting a feature vector of the fault sample data, and inputting the feature vector into an initial log classification model for classification to obtain a fault category prediction result;

Based on the fault category prediction result and the labeled category corresponding to the historical abnormal log, obtaining the performance index of the initial log classification model;

If the performance index is not within the allowable range, the model parameters of the initial log classification model are adjusted, and the historical abnormal log data are classified using the adjusted log classification model to obtain a fault category prediction result, until the performance index is within the allowable range, and the log classification model is obtained.

An abnormal log processing device, characterized by comprising:

The log collection module is used to collect abnormal logs generated during the operation of the business system;

A log management module, used to pre-process the abnormal logs, and call the log classification model to classify the pre-processed abnormal logs to obtain corresponding fault categories;

The solution recommendation module is used to search a pre-created solution library to query for solutions matching the fault category.

The device according to claim 8 is characterized in that the solution recommendation module is also used to: push the searched solution to operation and maintenance personnel.

The device according to claim 8 or 9 is characterized in that it also includes: a log classification model retraining module, which is used to receive feedback results from operation and maintenance personnel on the pushed solution; if the feedback result is that the solution cannot solve the fault problem of the abnormal log, the log classification model is trained based on the abnormal log.