CN117170926A - Methods, devices, equipment and storage media for determining the root cause of anomalies - Google Patents

Methods, devices, equipment and storage media for determining the root cause of anomalies Download PDF

Info

Publication number
CN117170926A
CN117170926A CN202311236319.XA CN202311236319A CN117170926A CN 117170926 A CN117170926 A CN 117170926A CN 202311236319 A CN202311236319 A CN 202311236319A CN 117170926 A CN117170926 A CN 117170926A
Authority
CN
China
Prior art keywords
log
abnormal
constant information
log set
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311236319.XA
Other languages
Chinese (zh)
Inventor
冯鹏
欧阳晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Asiainfo Technologies China Inc
Original Assignee
Asiainfo Technologies China Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Asiainfo Technologies China Inc filed Critical Asiainfo Technologies China Inc
Priority to CN202311236319.XA priority Critical patent/CN117170926A/en
Publication of CN117170926A publication Critical patent/CN117170926A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a method, a device, equipment and a storage medium for determining an abnormal root cause, and relates to the field of log analysis. The method comprises the following steps: counting the logs in the counting period to obtain a log set corresponding to the target constant information; determining at least one abnormal log set according to whether the target constant information corresponding to the log set is the reference constant information; obtaining a first abnormal log set chain according to each adjacent abnormal log set; and determining the root cause log in the statistical period according to the first abnormal log set chain. The embodiment of the application realizes more accurate and efficient log root cause analysis.

Description

确定异常根因的方法、装置、设备和存储介质Methods, devices, equipment and storage media for determining the root cause of anomalies

技术领域Technical field

本申请涉及运维技术领域,具体而言,本申请涉及一种确定异常根因的方法、装置、设备和存储介质。This application relates to the field of operation and maintenance technology. Specifically, this application relates to a method, device, equipment and storage medium for determining the root cause of anomalies.

背景技术Background technique

业界对于IT系统日志的挖掘定位的方法已有所研究和成效,最传统的方法是根据运维经验进行人工逐级排查,目前最为普遍的方法有两种:模式化智能分析方法,基于RCA规则的根因定位。但这两种方案存在如下缺点:The industry has already conducted research and achieved results on methods for mining and locating IT system logs. The most traditional method is manual step-by-step inspection based on operation and maintenance experience. Currently, there are two most common methods: patterned intelligent analysis method, based on RCA rules. root cause location. However, these two solutions have the following shortcomings:

1、模式化智能分析,此方案已成为业内较为通用的做法,也有一定的命中率。但是IT系统日志实际调用存在复杂性、多变性,此方案依旧会聚合出大量的日志模型(日志模型可以理解为日志的常量信息),会出现漏判、误判导致根因节点定位不准。如:日志模型的变动与业务紧密相关,偶发的日志增多、降低不一定是真实异常。1. Model-based intelligent analysis. This solution has become a common practice in the industry and has a certain hit rate. However, the actual call of IT system logs is complex and variable. This solution will still aggregate a large number of log models (log models can be understood as constant information of logs), which may lead to missed or misjudged decisions, resulting in inaccurate root cause node positioning. For example: changes in the log model are closely related to the business, and occasional increases or decreases in logs are not necessarily true anomalies.

2、基于RCA规则的根因定位,主要依赖组件的拓扑关系进一步挖掘组件异常的关联关系。对拓扑关系依赖性较大,一旦CMDB(Configuration Management Database,配置管理数据库)中不存在拓扑或者拓扑未及时更新都会直接影响根因定位的准确性。2. Root cause location based on RCA rules mainly relies on the topological relationship of components to further explore the abnormal correlation relationships of components. It relies heavily on topology relationships. Once the topology does not exist in the CMDB (Configuration Management Database) or the topology is not updated in time, it will directly affect the accuracy of root cause location.

发明内容Contents of the invention

本申请实施例提供了一种确定异常根因的方法、装置、设备和存储介质,用于解决上述技术问题之一。The embodiments of the present application provide a method, device, equipment and storage medium for determining the root cause of an abnormality, to solve one of the above technical problems.

一方面,本申请实施例提供了一种确定异常根因的方法,该方法包括:On the one hand, embodiments of the present application provide a method for determining the root cause of an abnormality, which method includes:

对于统计周期内实时生成的每个日志,确定日志的生成时刻和常量信息;若确定没有日志对应的统计时段,则将日志的常量信息作为目标常量信息,根据日志的生成时刻创建一个与目标常量信息关联的统计时段,并对创建的目标常量信息关联的统计时段内生成的日志进行统计,获得目标常量信息相应的日志集;日志集中各日志的常量信息为目标常量信息。根据日志集相应的目标常量信息是否为参考常量信息确定至少一个异常日志集,参考常量信息为历史日志的常量信息;各异常日志集依照统计时段的先后顺序排列。根据各相邻的异常日志集获得第一异常日志集链;第一异常日志集链中相邻的异常日志集各自相应的统计时段的相隔时长不超过第一时长、首和尾的异常日志集各自相应的统计时段的相隔时长不超过第二时长。根据第一异常日志集链确定统计周期中的根因日志。For each log generated in real time within the statistical period, determine the generation time and constant information of the log; if it is determined that there is no statistical period corresponding to the log, use the constant information of the log as the target constant information, and create a target constant based on the generation time of the log. The statistical period associated with the information, and statistics of the logs generated within the statistical period associated with the created target constant information are performed to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is the target constant information. At least one abnormal log set is determined based on whether the corresponding target constant information of the log set is reference constant information, and the reference constant information is the constant information of the historical log; each abnormal log set is arranged in the order of the statistical period. Obtain the first abnormal log set chain based on each adjacent abnormal log set; the interval between the corresponding statistical periods of the adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, the beginning and the end of the abnormal log set The length of time between the corresponding statistical periods shall not exceed the second length of time. Determine the root cause log in the statistical period according to the first abnormal log set chain.

可选的,确定日志的常量信息,包括:Optional, determine the constant information of the log, including:

确定日志中的常量特征信息和变量特征信息。将日志中的变量特征信息替换为预设常量,并将替换后的日志确定为日志的常量信息。Determine the constant feature information and variable feature information in the log. Replace the variable characteristic information in the log with preset constants, and determine the replaced log as the constant information of the log.

可选的,确定没有日志对应的统计时段,包括:Optionally, determine the statistical period corresponding to no logs, including:

对于所述统计周期内生成的首个日志,则确定没有所述日志对应的统计时段;对于所述统计周期内生成的非首个日志,若已创建的各统计时段均不包括所述日志的生成时刻,或者所述日志的常量信息与各统计时段相应的目标常量信息均不相同,则确定没有所述日志对应的统计时段。For the first log generated within the statistical period, it is determined that there is no statistical period corresponding to the log; for the non-first log generated within the statistical period, if each created statistical period does not include the period of the log If the generation time or the constant information of the log is different from the target constant information corresponding to each statistical period, it is determined that there is no statistical period corresponding to the log.

可选的,根据日志集相应的目标常量信息是否为参考常量信息确定至少一个异常日志集,包括:Optionally, determine at least one exception log set based on whether the target constant information corresponding to the log set is reference constant information, including:

对于每个日志集,若日志集相应的目标常量信息不为参考常量信息,将日志集确定为异常日志集。若日志集相应的目标常量信息为参考常量信息,则根据日志集的日志数量确定日志集为异常日志集。For each log set, if the corresponding target constant information of the log set is not the reference constant information, the log set is determined to be an abnormal log set. If the corresponding target constant information of the log set is reference constant information, the log set is determined to be an abnormal log set based on the number of logs in the log set.

可选的,每个参考常量信息关联第一波动阈值和第二波动阈值,第一波动阈值大于第二波动阈值;Optionally, each reference constant information is associated with a first fluctuation threshold and a second fluctuation threshold, and the first fluctuation threshold is greater than the second fluctuation threshold;

根据日志集的日志数量确定日志集为异常日志集,包括:The log set is determined to be an abnormal log set based on the number of logs in the log set, including:

确定与日志集相应的目标常量信息相同的参考常量信息。若日志集中日志的日志数量大于确定的参考常量信息关联的第一波动阈值,或者若日志集中日志的日志数量小于确定的参考常量信息关联的第二波动阈值,确定日志集为异常日志集。Determine the reference constant information that is the same as the target constant information corresponding to the log set. If the number of logs in the log set is greater than the first fluctuation threshold associated with the determined reference constant information, or if the number of logs in the log set is less than the second fluctuation threshold associated with the determined reference constant information, the log set is determined to be an abnormal log set.

可选的,根据各相邻的异常日志集获得第一异常日志集链,包括:Optionally, obtain the first exception log set chain based on each adjacent exception log set, including:

对各异常日志集根据相应的统计时段的时序进行排列,获得异常日志集序列。根据所述第一时长从所述异常日志集序列中确定至少一条第二异常日志集链;所述第二异常日志集链中相邻的异常日志集相应的统计时段的相隔时长不超过所述第一时长。对于每条第二异常日志集链,若所述第二异常日志集链中首和尾的异常日志集各自相应的统计时段的相隔时长不超过所述第二时长,确定所述第二异常日志集链为所述第一异常日志集链;若超过所述第二时长,根据所述第二异常日志集链确定至少一条第一异常日志集链。Arrange each abnormal log set according to the timing of the corresponding statistical period to obtain the abnormal log set sequence. Determine at least one second abnormal log set chain from the abnormal log set sequence according to the first duration; the interval between the corresponding statistical periods of adjacent abnormal log sets in the second abnormal log set chain does not exceed the First duration. For each second abnormal log set chain, if the length of time between the corresponding statistical periods of the first and last abnormal log sets in the second abnormal log set chain does not exceed the second length of time, determine the second abnormal log set The set chain is the first abnormal log set chain; if the second time period is exceeded, at least one first abnormal log set chain is determined based on the second abnormal log set chain.

可选的,根据第一异常日志集链确定统计周期中的根因日志,包括:Optionally, determine the root cause log in the statistical period based on the first abnormal log set chain, including:

对于每条第一异常日志集链,将第一异常日志集链的首个异常日志集,以及与首个异常日志集的统计时段相同的其他异常日志集作为候选日志集。若候选日志集的数量为一个,将候选日志集中的日志作为统计周期的根因日志。若候选日志集的数量为多个,确定多个候选日志集中相应的目标常量信息为非参考常量信息的候选日志集,并将确定的候选日志集中的日志作为统计周期的根因日志。若多个候选日志集中至少两个候选日志集相应的目标常量信息为参考常量信息,则确定至少两个异常日志集中符合预设波动条件的候选日志集,并将确定的候选日志集中的日志作为统计周期的根因日志;预设波动条件为候选日志集中日志的日志数量相对于第一波动阈值或者第二波动阈值的波动量最大。For each first abnormal log set chain, the first abnormal log set of the first abnormal log set chain and other abnormal log sets with the same statistical period as the first abnormal log set are used as candidate log sets. If the number of candidate log sets is one, the logs in the candidate log set are used as the root cause logs of the statistical period. If the number of candidate log sets is multiple, determine the candidate log sets whose corresponding target constant information in the multiple candidate log sets is non-reference constant information, and use the logs in the determined candidate log sets as the root cause logs of the statistical period. If the target constant information corresponding to at least two candidate log sets in the multiple candidate log sets is reference constant information, then determine the candidate log sets in at least two abnormal log sets that meet the preset fluctuation conditions, and use the logs in the determined candidate log sets as The root cause log of the statistical period; the preset fluctuation condition is that the number of logs in the candidate log set has the largest fluctuation relative to the first fluctuation threshold or the second fluctuation threshold.

另一方面,本申请实施例提供了一种确定异常根因的装置,该装置包括:On the other hand, embodiments of the present application provide a device for determining the root cause of anomalies, which device includes:

统计模块,用于对于统计周期内实时生成的每个日志,确定日志的生成时刻和常量信息;若确定没有日志对应的统计时段,则将日志的常量信息作为目标常量信息,根据日志的生成时刻创建一个与目标常量信息关联的统计时段,并对创建的目标常量信息关联的统计时段内生成的日志进行统计,获得目标常量信息相应的日志集;日志集中各日志的常量信息为目标常量信息。The statistics module is used to determine the generation time and constant information of the log for each log generated in real time during the statistical period; if it is determined that there is no statistical period corresponding to the log, the constant information of the log will be used as the target constant information, and the log will be generated based on the generation time of the log. Create a statistical period associated with the target constant information, and count the logs generated within the statistical period associated with the created target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is the target constant information.

第一确定模块,用于根据日志集相应的目标常量信息是否为参考常量信息确定至少一个异常日志集,参考常量信息为历史日志的常量信息;各异常日志集依照统计时段的先后顺序排列。The first determination module is used to determine at least one abnormal log set based on whether the corresponding target constant information of the log set is reference constant information, and the reference constant information is the constant information of the historical log; each abnormal log set is arranged in the order of the statistical period.

聚合链接模块,用于根据各相邻的异常日志集获得第一异常日志集链;所述第一异常日志集链中相邻的异常日志集各自相应的统计时段的相隔时长不超过第一时长、首和尾的异常日志集各自相应的统计时段的相隔时长不超过第二时长。The aggregation link module is used to obtain a first abnormal log set chain according to each adjacent abnormal log set; the interval between the corresponding statistical periods of adjacent abnormal log sets in the first abnormal log set chain does not exceed the first time length. , the interval between the corresponding statistical periods of the first and last exception log sets shall not exceed the second period.

第二确定模块,用于根据第一异常日志集链确定统计周期中的根因日志。The second determination module is used to determine the root cause log in the statistical period according to the first abnormal log set chain.

本申请实施例提供了一种电子设备,该电子设备包括:包括存储器、处理器及存储在存储器上的计算机程序,处理器执行计算机程序以实现一种确定异常根因的方法的步骤。An embodiment of the present application provides an electronic device. The electronic device includes: a memory, a processor, and a computer program stored on the memory. The processor executes the computer program to implement the steps of a method for determining the root cause of an abnormality.

本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现一种确定异常根因的方法的步骤。Embodiments of the present application provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of a method for determining the root cause of an abnormality are implemented.

本申请实施例提供的技术方案带来的有益效果是:The beneficial effects brought by the technical solutions provided by the embodiments of this application are:

常量信息和统计时段实质上是对日志进行分类的工具。因此,确定日志的常量信息,以及在确定没有日志对应的统计时段时,根据日志的生成时刻创建一个与日志的常量信息关联的统计时段,可实现创建分类工具的目的。进一步地,可使用统计时段和常量信息两项工具对统计时段内实时生成的日志进行初步的分类处理,获得日志集,后面可以基于日志集分析异常事件的根因日志集。Constant information and statistical periods are essentially tools for classifying logs. Therefore, determining the constant information of the log, and when it is determined that there is no statistical period corresponding to the log, creating a statistical period associated with the constant information of the log based on the generation time of the log, can achieve the purpose of creating a classification tool. Furthermore, two tools, statistical period and constant information, can be used to perform preliminary classification processing on the logs generated in real time during the statistical period to obtain a log set. Later, the root cause log set of abnormal events can be analyzed based on the log set.

进一步地,对于“新”、“旧”目标常量信息相应的日志集实现不同的异常判断。按照“目标常量信息”是否为参考常量信息来确定至少一个异常日志集,对于为参考常量信息的目标常量信息而言,是一种“旧的”目标常量信息,而不为参考常量信息的目标常量信息,则是一种“新的”目标常量信息,两种不同的目标常量信息的异常判别标准不同,通过该步骤可实现对不同异常标准的日志集进行异常判断。Furthermore, different exception judgments are implemented for the log sets corresponding to the "new" and "old" target constant information. Determine at least one exception log set according to whether the "target constant information" is reference constant information. For the target constant information that is reference constant information, it is an "old" target constant information, not the target of reference constant information. Constant information is a "new" target constant information. Two different target constant information have different abnormality discrimination standards. Through this step, abnormality judgment can be achieved on log sets with different abnormality standards.

第一时长和第二时长实质是仿照现实场景中异常事件的标准设置的时间切片,比如第一时长是异常事件中相邻异常点之间的时长,而第二时长则是异常事件的最大持续时长。按照第一时长和第二时长对各异常日志集进行链接,实质是通过时间切片对将异常日志集进行收敛成表征异常事件的第一异常日志集链,最后通过第一异常日志集链可分析出统计周期中的根因日志,也就是异常事件的根因日志。The first duration and the second duration are essentially time slices modeled on the standard settings of abnormal events in real-life scenarios. For example, the first duration is the duration between adjacent abnormal points in the abnormal event, while the second duration is the maximum duration of the abnormal event. duration. Linking each abnormal log set according to the first duration and the second duration is essentially to converge the abnormal log set into a first abnormal log set chain that represents the abnormal event through time slicing. Finally, the first abnormal log set chain can be analyzed Produce the root cause log in the statistical period, that is, the root cause log of abnormal events.

通过该方法可实现对实时生成的海量日志进行分类统计的日志,并基于分类所得的日志集实现精准、高效的日志根因分析。This method can be used to classify and count the massive logs generated in real time, and to achieve accurate and efficient log root cause analysis based on the log set obtained from classification.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍。In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below.

图1为本申请实施例提供的一种确定异常根因的方法的流程示意图;Figure 1 is a schematic flowchart of a method for determining the root cause of an abnormality provided by an embodiment of the present application;

图2为本申请实施例提供的多个第一异常日志集链结构示意图;Figure 2 is a schematic structural diagram of multiple first exception log set chains provided by the embodiment of the present application;

图3为本申请实施例提供的一种异常推理装置的结构示意图;Figure 3 is a schematic structural diagram of an anomaly reasoning device provided by an embodiment of the present application;

图4为本申请实施例提供的一种根因定位流程示意图;Figure 4 is a schematic diagram of a root cause positioning process provided by an embodiment of the present application;

图5为本申请实施例提供的一种确定异常根因的装置的结构示意图;Figure 5 is a schematic structural diagram of a device for determining the root cause of an abnormality provided by an embodiment of the present application;

图6为本申请实施例提供的一种电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面结合本申请中的附图描述本申请的实施例。应理解,下面结合附图所阐述的实施方式,是用于解释本申请实施例的技术方案的示例性描述,对本申请实施例的技术方案不构成限制。The embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below in conjunction with the accompanying drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.

本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请实施例所使用的术语“包括”以及“包含”是指相应特征可以实现为所呈现的特征、信息、数据、步骤、操作、元件和/或组件,但不排除实现为本技术领域所支持其他特征、信息、数据、步骤、操作、元件、组件和/或它们的组合等。应该理解,当我们称一个元件被“连接”或“耦接”到另一元件时,该一个元件可以直接连接或耦接到另一元件,也可以指该一个元件和另一元件通过中间元件建立连接关系。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的术语“和/或”指示该术语所限定的项目中的至少一个,例如“A和/或B”指示实现为“A”,或者实现为“A”,或者实现为“A和B”。Those skilled in the art will understand that, unless expressly stated otherwise, the singular forms "a", "an", "the" and "the" used herein may also include the plural form. It should be further understood that the terms "comprising" and "including" used in the embodiments of this application mean that the corresponding features can be implemented as the presented features, information, data, steps, operations, elements and/or components, but do not exclude Implementation is other features, information, data, steps, operations, elements, components and/or their combinations supported by the technical field. It should be understood that when we refer to an element being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element, or one element and the other element may be connected to the other element through intervening elements. Establish connections. Additionally, "connected" or "coupled" as used herein may include wireless connections or wireless couplings. The term "and/or" as used herein indicates at least one of the items defined by the term, for example, "A and/or B" indicates implemented as "A", or implemented as "A", or implemented as "A and B" ".

为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the purpose, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

首先对本申请涉及的几个名词进行介绍和解释:First, several terms involved in this application will be introduced and explained:

收敛:收敛是一个经济学、数学名词,是研究函数的一个重要工具,是指会聚于一点,向某一值靠近。收敛类型有收敛数列、函数收敛、全局收敛、局部收敛。Convergence: Convergence is an economics and mathematics term and an important tool for studying functions. It means converging on a point and approaching a certain value. Convergence types include convergence sequence, function convergence, global convergence, and local convergence.

日志模型:日志是一种半结构化数据由特定代码生成,模型作为业内对日志分析的主流手段,可帮助我们快速了解日志概貌,可以将上百万条日志压缩成几百个日志模板,这样就可以达到人眼可看的目的。比如,本申请实施例中日志的常量信息就是一种日志模型。Log model: Log is a kind of semi-structured data generated by specific code. As the mainstream method of log analysis in the industry, the model can help us quickly understand the log overview and compress millions of logs into hundreds of log templates. It can achieve the purpose of being visible to the human eye. For example, the constant information of the log in the embodiment of this application is a log model.

序列(或称动态数列):是指将同一统计指标的数值按其发生的时间先后顺序排列而成的数列。Sequence (or dynamic sequence): refers to a sequence in which the values of the same statistical indicator are arranged in the order of their occurrence time.

根因分析:根因分析是一项结构化的问题处理法,用以逐步找出问题的根本原因并加以解决,而不是仅仅关注问题的表征。所谓根本原因,就是导致我们所关注的问题发生的最基本的原因。Root cause analysis: Root cause analysis is a structured problem-solving method that is used to gradually find out the root cause of the problem and solve it, rather than just focusing on the symptoms of the problem. The so-called root cause is the most basic reason that causes the problem we are concerned about.

下面通过对几个示例性实施方式的描述,对本申请实施例的技术方案以及本申请的技术方案产生的技术效果进行说明。需要指出的是,下述实施方式之间可以相互参考、借鉴或结合,对于不同实施方式中相同的术语、相似的特征以及相似的实施步骤等,不再重复描述。The following describes the technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application through the description of several exemplary embodiments. It should be noted that the following embodiments can be referred to, borrowed from, or combined with each other. The same terms, similar features, and similar implementation steps in different embodiments will not be repeatedly described.

图1示出了一种确定异常根因的方法的流程示意图。其中,该方法包括步骤S110~S140。Figure 1 shows a schematic flowchart of a method for determining the root cause of an anomaly. Among them, the method includes steps S110 to S140.

S110,对于统计周期内实时生成的每个日志,确定日志的生成时刻和常量信息;若确定没有日志对应的统计时段,则将日志的常量信息作为目标常量信息,根据日志的生成时刻创建一个与目标常量信息关联的统计时段,并对创建的目标常量信息关联的统计时段内生成的日志进行统计,获得目标常量信息相应的日志集;日志集中各日志的常量信息为目标常量信息。S110, for each log generated in real time during the statistical period, determine the generation time and constant information of the log; if it is determined that there is no statistical period corresponding to the log, use the constant information of the log as the target constant information, and create a log based on the generation time of the log. The statistical period associated with the target constant information, and statistics of logs generated within the statistical period associated with the created target constant information are performed to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is the target constant information.

其中,在获取统计周期中的日志之后,对日志进行预处理,以将不重要的日志或者无效的日志等脏日志进行清理,减少对异常判断过程的干扰。Among them, after obtaining the logs in the statistical period, the logs are preprocessed to clean up unimportant logs or invalid logs and other dirty logs to reduce interference to the abnormality judgment process.

具体而言,对于统计周期内实时生成的每个日志,按照日志的生成时刻的先后顺序依次判断是否存在日志对应的统计时段。其中,日志对应的统计时段可以理解为,日志的生成时刻在该统计时段内,且日志的常量信息串为该统计时段相应的目标常量信息。Specifically, for each log generated in real time during the statistical period, it is determined whether there is a statistical period corresponding to the log according to the order of the log generation time. Among them, the statistical period corresponding to the log can be understood as the generation time of the log within the statistical period, and the constant information string of the log is the target constant information corresponding to the statistical period.

其中,每个统计时段为预设时长。Among them, each statistical period is a preset duration.

在一种可能的实现方式中,提供了确定没有日志对应的统计时段的实施步骤,包括:In one possible implementation, implementation steps for determining the statistical period corresponding to no log are provided, including:

对于所述统计周期内生成的首个日志,则确定没有所述日志对应的统计时段。对于所述统计周期内生成的非首个日志,若已创建的各统计时段均不包括所述日志的生成时刻,或者所述日志的常量信息与各统计时段相应的目标常量信息均不相同,则确定没有所述日志对应的统计时段。For the first log generated within the statistical period, it is determined that there is no statistical period corresponding to the log. For non-first logs generated within the statistical period, if each created statistical period does not include the generation time of the log, or the constant information of the log is different from the target constant information corresponding to each statistical period, It is determined that there is no statistical period corresponding to the log.

S120,根据日志集相应的目标常量信息是否为参考常量信息确定至少一个异常日志集,参考常量信息为历史日志的常量信息;各异常日志集依照统计时段的先后顺序排列。S120: Determine at least one abnormal log set according to whether the corresponding target constant information of the log set is reference constant information, and the reference constant information is the constant information of the historical log; each abnormal log set is arranged in the order of the statistical period.

S130,根据各相邻的异常日志集获得第一异常日志集链;所述第一异常日志集链中相邻的异常日志集各自相应的统计时段的相隔时长不超过第一时长、首和尾的异常日志集各自相应的统计时段的相隔时长不超过第二时长。S130: Obtain a first abnormal log set chain according to each adjacent abnormal log set; the interval between the corresponding statistical periods of the adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, the beginning and the end. The interval between the corresponding statistical periods of the exception log sets shall not exceed the second period.

S140,根据第一异常日志集链确定统计周期中的根因日志。S140: Determine the root cause log in the statistical period according to the first abnormal log set chain.

其中,每条第一异常日志集链事件表征统计周期中的一项异常事件,因此对于统计周期中出现的异常事件,可以通过第一异常日志集链确定异常事件的根因日志。Each first abnormal log set chain event represents an abnormal event in the statistical period. Therefore, for abnormal events occurring in the statistical period, the root cause log of the abnormal event can be determined through the first abnormal log set chain.

由于常量信息是对统计周期内实时生成的日志进行分类的工具,如何确定这种工具是一个需要解决的问题。为此,本申请实施例还提供了一种可能的实现方式。Since constant information is a tool for classifying logs generated in real time within a statistical period, how to determine this tool is a problem that needs to be solved. To this end, the embodiment of this application also provides a possible implementation manner.

其中,对于S110中确定日志的常量信息的具体实施步骤包括:Among them, the specific implementation steps for determining the constant information of the log in S110 include:

确定日志中的常量特征信息和变量特征信息;将日志中的变量特征信息替换为预设常量,并将替换后的日志确定为日志的常量信息。Determine the constant characteristic information and variable characteristic information in the log; replace the variable characteristic information in the log with preset constants, and determine the replaced log as the constant information of the log.

在一个示例中,日志1的内容如下所示:In one example, the contents of log 1 look like this:

IDMM{IDMM{

"220";["state":true"astRunTime":"20220630000016""lastDealTime":"20220629212250"},"220";["state":true"astRunTime":"20220630000016""lastDealTime":"20220629212250"},

"221":{"state";true,"lastRunTime":"20220630000015","lastDealTime":"20220629203437"}"221":{"state";true,"lastRunTime":"20220630000015","lastDealTime":"20220629203437"}

}}

基于日志1的内容,获得的日志1的常量信息如下所示:Based on the contents of log 1, the obtained constant information of log 1 is as follows:

IDMM{IDMM{

"XX";["state":XX"LastRunTime":"XX""lastDealTime":"XX"]"XX";["state":XX"LastRunTime":"XX""lastDealTime":"XX"]

}}

如何从日志集中筛选出异常日志集,是进行异常判断的关键步骤。本申请实施例还提供了一种实现方式,针对每个日志集进行单独判断,从而获得至少一个异常日志集。其中,根据日志集相应的目标常量信息是否为参考常量信息确定至少一个异常日志集,包括:How to filter out the abnormal log set from the log set is a key step in abnormal judgment. The embodiment of the present application also provides an implementation method to perform independent judgment on each log set, thereby obtaining at least one abnormal log set. Among them, at least one abnormal log set is determined according to whether the corresponding target constant information of the log set is reference constant information, including:

对于每个日志集,若日志集相应的目标常量信息不为参考常量信息,将日志集确定为异常日志集。若日志集相应的目标常量信息为参考常量信息,则根据日志集的日志数量确定日志集为异常日志集。For each log set, if the corresponding target constant information of the log set is not the reference constant information, the log set is determined to be an abnormal log set. If the corresponding target constant information of the log set is reference constant information, the log set is determined to be an abnormal log set based on the number of logs in the log set.

对于统计周期中出现的目标常量信息而言,若是首次出现,则为新的常量信息,若不是首次出现,则为旧的常量信息。对于新的目标常量信息,其出现极大可能说明发生了异常,对于此类日志集可直接判定为异常日志集;对于旧的目标常量信息,若旧的目标常量信息相应的日志数量处于正常范围,则不是一种异常,只有日志数量超过正常范围,才能判定为异常。在这种情况下,可根据日志集的日志数量来判断是否发生异常。For the target constant information that appears in the statistical period, if it appears for the first time, it is the new constant information; if it does not appear for the first time, it is the old constant information. For new target constant information, its appearance is most likely to indicate that an exception has occurred. This type of log set can be directly determined as an abnormal log set; for old target constant information, if the number of logs corresponding to the old target constant information is within the normal range , it is not an anomaly. Only when the number of logs exceeds the normal range can it be determined to be an anomaly. In this case, you can determine whether an exception occurs based on the number of logs in the log set.

可选的,每个参考常量信息关联第一波动阈值和第二波动阈值,第一波动阈值大于第二波动阈值。其中,第一波动阈值和第二波动阈值可以根据参考常量信息相应的历史日志确定。需要说明的是,第一波动阈值和第二波动阈值的确定过程可以参考相关技术,为了描述简便,在此不再赘述。Optionally, each reference constant information is associated with a first fluctuation threshold and a second fluctuation threshold, and the first fluctuation threshold is greater than the second fluctuation threshold. The first fluctuation threshold and the second fluctuation threshold can be determined based on the historical log corresponding to the reference constant information. It should be noted that the determination process of the first fluctuation threshold and the second fluctuation threshold may refer to related technologies, and for the sake of simplicity of description, details will not be described again here.

其中,根据日志集的日志数量确定日志集为异常日志集,包括如下实施步骤Sa1~Sa2。Among them, the log set is determined to be an abnormal log set according to the number of logs in the log set, including the following implementation steps Sa1 to Sa2.

Sa1,确定与日志集相应的目标常量信息相同的参考常量信息。Sa1, determine the reference constant information that is the same as the target constant information corresponding to the log set.

Sa2,若日志集中日志的日志数量大于确定的参考常量信息关联的第一波动阈值,或者若日志集中日志的日志数量小于确定的参考常量信息关联的第二波动阈值,确定日志集为异常日志集。Sa2, if the number of logs in the log set is greater than the first fluctuation threshold associated with the determined reference constant information, or if the number of logs in the log set is less than the second fluctuation threshold associated with the determined reference constant information, the log set is determined to be an abnormal log set .

由于异常日志集存在至少一个,如何在各异常日志集之间进行链接操作也是需要解决的问题。为此,本申请实施例还提供了一种可能的实现方式。其中,链接各相邻的异常日志集,获得第一异常日志集链,包括如下实施步骤Sb1~Sb3。Since there is at least one abnormal log set, how to perform link operations between abnormal log sets is also a problem that needs to be solved. To this end, the embodiment of this application also provides a possible implementation manner. Among them, linking adjacent exception log sets to obtain the first exception log set chain includes the following steps Sb1 to Sb3.

Sb1,对各异常日志集根据相应的统计时段的时序进行排列,获得异常日志集序列。Sb1, arrange each abnormal log set according to the timing of the corresponding statistical period, and obtain the abnormal log set sequence.

具体而言,首先,各异常日志集按照统计时段的时序,获得初始异常日志集序列。其中,在初始异常日志集序列中部分统计时段相同的异常日志集处于同一顺序。其次,对于处于同一顺序的各异常日志集,按照日志数量的从高到低进行重新排列。最后,将获得的初始异常日志集序列作为异常日志集序列。Specifically, first, each abnormal log set obtains the initial abnormal log set sequence according to the timing of the statistical period. Among them, the exception log sets with the same statistical period in the initial exception log set sequence are in the same sequence. Secondly, for each abnormal log set in the same order, rearrange it according to the number of logs from high to low. Finally, the obtained initial exception log set sequence is used as the exception log set sequence.

各异常日志集中包括作为根因的异常日志集,以及作为非根因的异常日志集。一般而言,作为根因的异常日志集通常在作为非根因的异常日志集之前,是非根因异常日志集生成的原因。因此,按照统计时段的先后顺序,对各异常日志集进行有序排列,可奠定分析根因日志的基础。Each exception log set includes an exception log set that is a root cause and an exception log set that is a non-root cause. Generally speaking, the exception log set that is the root cause usually precedes the exception log set that is the non-root cause, and is the reason why the non-root cause exception log set is generated. Therefore, orderly arranging each abnormal log set according to the order of statistical periods can lay the foundation for analyzing root cause logs.

Sb2,根据第一时长链接异常日志集序列中相邻的异常日志集,获得至少一条第二异常日志集链;第二异常日志集链中相邻的异常日志集相应的统计时段的相隔时长不超过第一时长。Sb2: Link adjacent abnormal log sets in the abnormal log set sequence according to the first duration to obtain at least one second abnormal log set chain; the corresponding statistical periods of adjacent abnormal log sets in the second abnormal log set chain are not separated by the same length of time. exceeds the first duration.

在相关技术的异常检测中,两个相邻异常之间的间隔不会超过一个特定时长,如5分钟。若间隔超过5分钟,则可以理解为这两个相邻异常之间没有相关性,不属于同一个异常事件。在本申请实施例中,可以将第一时长理解为该特定时长。In anomaly detection in related technologies, the interval between two adjacent anomalies will not exceed a specific length of time, such as 5 minutes. If the interval exceeds 5 minutes, it can be understood that there is no correlation between the two adjacent anomalies and they do not belong to the same abnormal event. In the embodiment of the present application, the first duration can be understood as the specific duration.

通过第一时长将各异常日志集进行收敛聚合,实质是将碎片化的异常日志集进行规整,形成独立、完整的异常日志集链。在形成的异常日志集链中,我们可以很直观的看到各异常日志集的信息,如统计时段、日志数量、相应的目标常量信息。Through the first duration of convergence and aggregation of each abnormal log set, the essence is to organize the fragmented abnormal log sets to form an independent and complete abnormal log set chain. In the formed abnormal log set chain, we can intuitively see the information of each abnormal log set, such as statistical period, number of logs, and corresponding target constant information.

Sb3,对于每条第二异常日志集链,若所述第二异常日志集链中首和尾的异常日志集各自相应的统计时段的相隔时长不超过所述第二时长,确定所述第二异常日志集链为所述第一异常日志集链;若超过所述第二时长,根据所述第二异常日志集链确定至少一条第一异常日志集链。Sb3, for each second abnormal log set chain, if the interval between the corresponding statistical periods of the first and last abnormal log sets in the second abnormal log set chain does not exceed the second time length, determine the second abnormal log set chain. The abnormal log set chain is the first abnormal log set chain; if the second time period is exceeded, at least one first abnormal log set chain is determined based on the second abnormal log set chain.

在相关技术的异常检测中,一个异常事件是不会无休止的持续下去,也即其持续时长不会超过一个特定时长,如特征时长为2小时。若持续时长超过2小时,则可以认为2小时之后的异常与该异常事件无关。In anomaly detection in related technologies, an abnormal event will not continue endlessly, that is, its duration will not exceed a specific duration, such as a characteristic duration of 2 hours. If it lasts for more than 2 hours, it can be considered that the abnormality after 2 hours has nothing to do with the abnormal event.

对收敛聚合所得的第一异常日志集链,按照第二时长进行切分,形成符合异常事件逻辑的异常日志集链,上升了异常日志集链的维度,极大提升了异常检测结果的可靠性。The first abnormal log set chain obtained by convergence and aggregation is divided according to the second duration to form an abnormal log set chain that conforms to the logic of the abnormal event, which increases the dimension of the abnormal log set chain and greatly improves the reliability of the anomaly detection results. .

为了解第一异常日志集链,本申请还提供了多个第一异常日志集链的示例,如图2所示。图2的横轴为时间,纵轴为日志数量。In order to understand the first abnormal log set chain, this application also provides multiple examples of the first abnormal log set chain, as shown in Figure 2. The horizontal axis of Figure 2 is time, and the vertical axis is the number of logs.

本示例中,统计周期为0:00~24:00,共确定16个异常日志集,16个异常日志集按照顺序排列,并被标识为1~16。In this example, the statistical period is from 0:00 to 24:00, and a total of 16 exception log sets are determined. The 16 exception log sets are arranged in order and identified as 1 to 16.

本示例中,针对16个异常日志集确定了3个第一异常日志集链,每个第一异常日志集链表征一项事件,分别为事件1~事件3。其中,表征事件1的第一异常日志集链上的异常日志集依次为:1,2,3,4,5;表征事件2的第一异常日志集链上的异常日志集分别为:6,7,8,9,10,11,12;表征事件3的第一异常日志集链上的异常日志集分别为:13,14,15,16。In this example, 3 first abnormal log set chains are determined for 16 abnormal log sets. Each first abnormal log set chain represents an event, namely event 1 to event 3 respectively. Among them, the abnormal log sets on the first abnormal log set chain representing event 1 are: 1, 2, 3, 4, 5; the abnormal log sets on the first abnormal log set chain representing event 2 are: 6, 7, 8, 9, 10, 11, 12; the abnormal log sets on the first abnormal log set chain representing event 3 are: 13, 14, 15, and 16 respectively.

如何通过每条第一异常日志集链挖掘异常的根因日志,同样是本申请实施例需要解决的关键问题。为此,本申请实施例还提供了一种可能的实现方式,该实施方式中步骤S140还包括如下实施步骤。How to mine abnormal root cause logs through each first abnormal log set chain is also a key issue that needs to be solved in the embodiment of this application. To this end, the embodiment of the present application also provides a possible implementation manner. In this implementation manner, step S140 also includes the following implementation steps.

对于每条第一异常日志集链:For each first exception log set chain:

将第一异常日志集链的首个异常日志集,以及与首个异常日志集的统计时段相同的其他异常日志集作为候选日志集。其中,在相关技术的异常检测中,最早生成的日志通常是一件异常事件的根因日志。因此,需要将第一异常日志集链中统计时段最早的异常日志集筛选出来,并作为候选日志集,通过候选日志集来确定根因日志。The first exception log set in the first exception log set chain and other exception log sets with the same statistical period as the first exception log set are used as candidate log sets. Among them, in the anomaly detection of related technologies, the earliest generated log is usually the root cause log of an abnormal event. Therefore, it is necessary to filter out the earliest abnormal log set in the statistical period in the first abnormal log set chain and use it as a candidate log set to determine the root cause log.

接图2所示的示例,对于表征事件1的第一异常日志集链而言,标识为“1”的异常日志集为候选日志。对于表征事件2的第一异常日志集链而言,标识为“6”、“7”的异常日志集为候选日志。Continuing with the example shown in Figure 2, for the first abnormal log set chain characterizing event 1, the abnormal log set identified as "1" is a candidate log. For the first abnormal log set chain characterizing event 2, the abnormal log sets identified as “6” and “7” are candidate logs.

若候选日志集的数量为一个,将候选日志集中的日志作为统计周期的根因日志。If the number of candidate log sets is one, the logs in the candidate log set are used as the root cause logs of the statistical period.

若候选日志集的数量为多个,确定多个候选日志集中相应的目标常量信息为非参考常量信息的候选日志集,并将确定的候选日志集中的日志作为统计周期的根因日志。其中,在本次统计周期,若目标常量信息为参考常量信息,目标常量信息可以理解为一种“新的”常量信息。若目标常量信息为非参考常量信息,目标常量信息可以理解为一种“旧的”常量信息。If the number of candidate log sets is multiple, determine the candidate log sets whose corresponding target constant information in the multiple candidate log sets is non-reference constant information, and use the logs in the determined candidate log sets as the root cause logs of the statistical period. Among them, in this statistical period, if the target constant information is reference constant information, the target constant information can be understood as a kind of "new" constant information. If the target constant information is non-reference constant information, the target constant information can be understood as a kind of "old" constant information.

若多个候选日志集中至少两个候选日志集相应的目标常量信息为参考常量信息,则确定至少两个异常日志集中符合预设波动条件的候选日志集,并将确定的候选日志集中的日志作为统计周期的根因日志;预设波动条件为候选日志集中日志的日志数量相对于第一波动阈值或者第二波动阈值的波动量最大。If the target constant information corresponding to at least two candidate log sets in the multiple candidate log sets is reference constant information, then determine the candidate log sets in at least two abnormal log sets that meet the preset fluctuation conditions, and use the logs in the determined candidate log sets as The root cause log of the statistical period; the preset fluctuation condition is that the number of logs in the candidate log set has the largest fluctuation relative to the first fluctuation threshold or the second fluctuation threshold.

结合第一异常日志集链中各异常日志集的统计时段、目标常量信息是否为参考常量信息、日志数量的波动情况,进行综合推理,最终明确第一异常日志集链所表征的异常事件的根因日志。Combined with the statistical period of each abnormal log set in the first abnormal log set chain, whether the target constant information is reference constant information, and the fluctuation of the number of logs, comprehensive reasoning is performed to finally clarify the root cause of the abnormal event represented by the first abnormal log set chain. Because of the log.

本申请实施例提供的确定异常根因的方法,可以在多种异常检测的场景中实现精准、高效的日志根因分析。为了更清楚地理解该方法的技术效果,本申请实施例还提供了一种示例。其中,本示例包括一个异常推理装置,如图3所示。The method for determining the root cause of anomalies provided by the embodiments of this application can achieve accurate and efficient log root cause analysis in a variety of anomaly detection scenarios. In order to understand the technical effect of this method more clearly, the embodiment of this application also provides an example. Among them, this example includes an anomaly reasoning device, as shown in Figure 3.

本示例中,日志的日志模型即为上述实施例中日志的常量信息。In this example, the log model of the log is the constant information of the log in the above embodiment.

本示例中,异常推理装置包括4个模块,分别为数据接入模块,模型训练模块、实时异常检测模块和实时根因推理模块。In this example, the anomaly reasoning device includes four modules, namely the data access module, the model training module, the real-time anomaly detection module and the real-time root cause reasoning module.

其中,数据接入模块的功能包括:对实时生成的日志进行预处理操作,获得可处理的日志。一方面,将可处理的日志数据进行存储,作为当前统计周期之后的统计周期的历史日志数据。另一方面,向模型训练模块发送可处理的实时日志和历史日志。Among them, the functions of the data access module include: preprocessing the logs generated in real time to obtain processable logs. On the one hand, the log data that can be processed is stored as historical log data for the statistical period after the current statistical period. On the other hand, processable real-time logs and historical logs are sent to the model training module.

其中,模型训练模块的功能包括:接收可处理的实时日志和历史日志;对实时日志进行日志模板特征构造,获得实时日志的常量特征信息和变量特征信息,根据实时日志的各特征信息进行日志模式模型训练,获得实时日志的日志模型。对于每个日志模型,进行时序数据特征统计,获得日志模型的至少一项时序统计数据(日志模型的时序统计数据,相当于一个统计时段内对目标常量信息相应的日志进行统计获得的日志集)。Among them, the functions of the model training module include: receiving processable real-time logs and historical logs; constructing log template features for real-time logs, obtaining constant feature information and variable feature information of real-time logs, and performing log mode based on each feature information of real-time logs Model training to obtain a log model of real-time logs. For each log model, conduct time series data feature statistics to obtain at least one time series statistical data of the log model (the time series statistical data of the log model is equivalent to the log set obtained by counting the logs corresponding to the target constant information within a statistical period) .

其中,模型训练模块的功能还包括:对历史日志进行日志模板特征构造,获得历史日志的常量特征信息和变量特征信息,根据历史日志的各特征信息进行日志模式模型训练,获得历史日志的日志模型;确定多个参考日志模型;分析每个参考日志模型的时序特征数据,并对每个参考日志模型进行异常阈值的训练,获得参考日志模型的时序统计数据中最大值和最小值。其中,最大值相当于上述实施例的第一波动阈值,最小值相当于上述实施例中的第二波动阈值。Among them, the functions of the model training module also include: constructing log template features for historical logs, obtaining constant feature information and variable feature information of historical logs, conducting log mode model training based on each feature information of historical logs, and obtaining log models of historical logs. ; Determine multiple reference log models; analyze the time series feature data of each reference log model, and perform abnormal threshold training on each reference log model to obtain the maximum and minimum values in the time series statistics of the reference log model. The maximum value is equivalent to the first fluctuation threshold in the above embodiment, and the minimum value is equivalent to the second fluctuation threshold in the above embodiment.

进一步地,将每个日志模型的时序统计数据、每个日志模型关联的最大值和最小值发送至实时异常检测模块。Further, the time series statistics of each log model and the maximum and minimum values associated with each log model are sent to the real-time anomaly detection module.

其中,实时异常检测模块的功能包括:对于每个日志模型,将日志模型与日志模型库中的参考日志模型进行匹配。若日志模型为参考日志模型,则采用新模型异常检测;若日志模型为非参考日志模型,则采用日志量异常检测,并结合日志模型的时序统计数据的最大值和最小值进行异常检测。通过前述两种异常检测,从时序统计数据中确定N个异常的时序统计数据,分别简称为异常1、异常2、异常3……异常N(每个异常相当于上述实施例中的异常日志集)。Among them, the functions of the real-time anomaly detection module include: for each log model, match the log model with the reference log model in the log model library. If the log model is a reference log model, new model anomaly detection is used; if the log model is a non-reference log model, log volume anomaly detection is used, and the maximum and minimum values of the time series statistics of the log model are used for anomaly detection. Through the aforementioned two kinds of anomaly detection, the timing statistical data of N anomalies are determined from the timing statistical data, which are respectively referred to as anomaly 1, anomaly 2, anomaly 3...anomaly N (each anomaly is equivalent to the anomaly log set in the above embodiment) ).

进一步地,将异常1~异常N发送至实时根因推理模块。Further, abnormality 1 to abnormality N are sent to the real-time root cause inference module.

其中,实时根因推理模块的功能包括:根据异常1~异常N进行事件收敛,获得事件1、事件2…事件N(每个事件相当于上述实施例中的第一异常日志集链)。对每个事件进行根因分析,获得每个事件中作为根因的异常。如,对于事件1而言,其异常根因就是异常1和异常5。Among them, the functions of the real-time root cause inference module include: performing event convergence based on anomaly 1 to anomaly N, and obtaining event 1, event 2...event N (each event is equivalent to the first exception log set chain in the above embodiment). Perform root cause analysis on each event to obtain the anomalies that serve as root causes in each event. For example, for event 1, its abnormal root causes are exception 1 and exception 5.

进一步地,总结各事件的根因,并作为当前统计周期的根因。Furthermore, the root causes of each event are summarized and used as the root causes of the current statistical period.

另外,本申请对于异常推理装置中实时异常检测模块和实时根因推理的运行过程还提供了一个详细的流程示例,如图4所示。其中,该流程包括步骤S1001~S1010。In addition, this application also provides a detailed process example for the operation process of the real-time anomaly detection module and real-time root cause reasoning in the anomaly reasoning device, as shown in Figure 4. Among them, the process includes steps S1001 to S1010.

S1001,获取日志模型的时序统计数据。S1001. Obtain time series statistical data of the log model.

其中,获取统计周期内所有日志模型的各项时序统计数据。Among them, various time series statistical data of all log models within the statistical period are obtained.

S1002,检测日志模型的时序统计数据是否异常。S1002, detect whether the timing statistics of the log model are abnormal.

其中,若时序统计数据为异常,则将异常的时序统计数据作为一个异常,并执行步骤S1003。If the time series statistical data is abnormal, the abnormal time series statistical data is regarded as an abnormality, and step S1003 is executed.

S1003,将异常进行聚合得到完整的初始事件(初始事件相当于上述实施例中的第二异常日志集链)。S1003, aggregate the exceptions to obtain a complete initial event (the initial event is equivalent to the second exception log set chain in the above embodiment).

S1004,对初始事件进行收敛,获得独立的事件(该事件相当于第一异常日志集链)。S1004, converge the initial events and obtain independent events (this event is equivalent to the first abnormal log set chain).

存在多个事件时,针对每个事件执行步骤S1005~S1009。When there are multiple events, steps S1005 to S1009 are executed for each event.

S1005,判断候选异常是否为1项。S1005: Determine whether the candidate exception is one item.

其中,对于每个事件,将出现时刻最早的异常作为候选异常(候选异常,相当于上述实施例中的候选日志集)。其中,候选异常可以为一个,也可以为多个。Among them, for each event, the exception with the earliest occurrence time is used as a candidate exception (candidate exception, equivalent to the candidate log set in the above embodiment). Among them, the candidate anomaly can be one or multiple.

其中,若候选异常的数量为一个,将候选异常作为事件的根因。若候选异常的数量为多个,则执行步骤S1006。Among them, if the number of candidate exceptions is one, the candidate exception is used as the root cause of the event. If the number of candidate anomalies is multiple, step S1006 is executed.

S1006,判断候选异常相应的日志模型是否为参考日志模型。S1006: Determine whether the log model corresponding to the candidate exception is a reference log model.

其中,对于多个候选异常中的任一候选异常,若候选异常相应的日志模型为非参考日志模型,则将候选异常相应的时序统计数据作为事件的根因。若候选异常相应的日志模型为参考日志模型,则执行S1007。Among them, for any candidate anomaly among the plurality of candidate anomalies, if the log model corresponding to the candidate anomaly is a non-reference log model, the time series statistical data corresponding to the candidate anomaly will be used as the root cause of the event. If the log model corresponding to the candidate exception is the reference log model, execute S1007.

S1007,判断至少两项候选异常的日志模型是否为参考日志模型。S1007: Determine whether the log model of at least two candidate anomalies is a reference log model.

若是,则执行S1008。若否,则进入下一事件的根因判断过程。If yes, execute S1008. If not, enter the root cause judgment process of the next event.

S1008,判断至少两各候选异常的数据波动情况是否超过预设范围。S1008: Determine whether the data fluctuations of at least two candidate anomalies exceed a preset range.

其中,每个异常就是一项时序统计数据。对于各候选异常,将相应的时序统计数据的波动情况最大的候选异常作为事件的根因。Among them, each exception is a time series statistic. For each candidate anomaly, the candidate anomaly with the largest fluctuation in the corresponding time series statistical data is used as the root cause of the event.

S1009,确定事件的根因。S1009, determine the root cause of the incident.

S1010,输出统计周期内各事件的根因。S1010, output the root cause of each event within the statistical period.

参见图5,本申请实施例还提供了一种确定异常根因的装置500。其中,该装置包括统计模块510,第一确定模块520,聚合链接模块530,第二确定模块540。Referring to Figure 5, an embodiment of the present application also provides a device 500 for determining the root cause of an abnormality. Among them, the device includes a statistics module 510, a first determination module 520, an aggregation link module 530, and a second determination module 540.

统计模块510,用于对于统计周期内实时生成的每个日志,确定日志的生成时刻和常量信息;若确定没有日志对应的统计时段,则将日志的常量信息作为目标常量信息,根据日志的生成时刻创建一个与目标常量信息关联的统计时段,并对创建的目标常量信息关联的统计时段内生成的日志进行统计,获得目标常量信息相应的日志集;日志集中各日志的常量信息为目标常量信息。The statistics module 510 is used to determine the generation time and constant information of the log for each log generated in real time within the statistical period; if it is determined that there is no statistical period corresponding to the log, the constant information of the log is used as the target constant information, and the log is generated according to the log generation time. Create a statistical period associated with the target constant information at all times, and count the logs generated within the statistical period associated with the created target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is the target constant information .

第一确定模块520,用于根据日志集相应的目标常量信息是否为参考常量信息确定至少一个异常日志集,参考常量信息为历史日志的常量信息;各异常日志集依照统计时段的先后顺序排列。The first determination module 520 is used to determine at least one abnormal log set based on whether the target constant information corresponding to the log set is reference constant information, and the reference constant information is the constant information of the historical log; each abnormal log set is arranged in the order of the statistical period.

聚合链接模块530,用于根据各相邻的异常日志集获得第一异常日志集链;所述第一异常日志集链中相邻的异常日志集各自相应的统计时段的相隔时长不超过第一时长、首和尾的异常日志集各自相应的统计时段的相隔时长不超过第二时长。The aggregation link module 530 is used to obtain a first abnormal log set chain according to each adjacent abnormal log set; the interval between the corresponding statistical periods of adjacent abnormal log sets in the first abnormal log set chain does not exceed the first The duration, the interval between the corresponding statistical periods of the first and last exception log sets shall not exceed the second duration.

第二确定模块540,用于根据第一异常日志集链确定统计周期中的根因日志。The second determination module 540 is configured to determine the root cause log in the statistical period according to the first abnormal log set chain.

本申请实施例的装置可执行本申请实施例所提供的方法,其实现原理相类似,本申请各实施例的装置中的各模块所执行的动作是与本申请各实施例的方法中的步骤相对应的,对于装置的各模块的详细功能描述具体可以参见前文中所示的对应方法中的描述,此处不再赘述。The device of the embodiment of the present application can execute the method provided by the embodiment of the present application, and its implementation principle is similar. The actions performed by each module in the device of the embodiment of the present application are the same as the steps in the method of the embodiment of the present application. Correspondingly, for the detailed functional description of each module of the device, please refer to the description in the corresponding method shown above, and will not be described again here.

本申请实施例中提供了一种电子设备,包括存储器、处理器及存储在存储器上的计算机程序,该处理器执行上述计算机程序以实现一种确定异常根因的方法的步骤,与现有技术相比可实现:实现更为精准、高效的日志根因分析。The embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory. The processor executes the above computer program to implement the steps of a method for determining the root cause of an abnormality, which is consistent with the existing technology. Compared with what is achievable: Achieve more accurate and efficient log root cause analysis.

参见图6,本申请实施例还提供了一种电子设备具体示例,图6所示的电子设备6000包括:处理器6001和存储器6003。其中,处理器6001和存储器6003相连,如通过总线6002相连。可选地,电子设备6000还可以包括收发器6004,收发器6004可以用于该电子设备与其他电子设备之间的数据交互,如数据的发送和/或数据的接收等。需要说明的是,实际应用中收发器6004不限于一个,该电子设备6000的结构并不构成对本申请实施例的限定。Referring to Figure 6, the embodiment of the present application also provides a specific example of an electronic device. The electronic device 6000 shown in Figure 6 includes: a processor 6001 and a memory 6003. Among them, the processor 6001 and the memory 6003 are connected, such as through a bus 6002. Optionally, the electronic device 6000 may also include a transceiver 6004, which may be used for data interaction between the electronic device and other electronic devices, such as data transmission and/or data reception. It should be noted that in practical applications, the number of transceivers 6004 is not limited to one, and the structure of the electronic device 6000 does not constitute a limitation on the embodiments of the present application.

处理器6001可以是CPU(Central Processing Unit,中央处理器),通用处理器,DSP(Digital Signal Processor,数据信号处理器),ASIC(Application SpecificIntegrated Circuit,专用集成电路),FPGA(Field Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器6001也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。The processor 6001 can be a CPU (Central Processing Unit, central processing unit), a general-purpose processor, a DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit, application specific integrated circuit), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with this disclosure. The processor 6001 can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, etc.

总线6002可包括一通路,在上述组件之间传送信息。总线6002可以是PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(ExtendedIndustry Standard Architecture,扩展工业标准结构)总线等。总线6002可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Bus 6002 may include a path that carries information between the components described above. The bus 6002 may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus or an EISA (Extended Industry Standard Architecture, Extended Industry Standard Architecture) bus, or the like. The bus 6002 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 6, but it does not mean that there is only one bus or one type of bus.

存储器6003可以是ROM(Read Only Memory,只读存储器)或可存储静态信息和指令的其他类型的静态存储设备,RAM(Random Access Memory,随机存取存储器)或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM(Electrically ErasableProgrammable Read Only Memory,电可擦可编程只读存储器)、CD-ROM(Compact DiscRead Only Memory,只读光盘)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质、其他磁存储设备、或者能够用于携带或存储计算机程序并能够由计算机读取的任何其他介质,在此不做限定。The memory 6003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, RAM (Random Access Memory, random access memory) or other types that can store information and instructions. Dynamic storage devices can also be EEPROM (Electrically Erasable Programmable Read Only Memory), CD-ROM (Compact DiscRead Only Memory) or other optical disc storage, optical disc storage (including compressed optical discs, Laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other media that can be used to carry or store computer programs and can be read by a computer, are not limited here.

存储器6003用于存储执行本申请实施例的计算机程序,并由处理器6001来控制执行。处理器6001用于执行存储器6003中存储的计算机程序,以实现前述方法实施例所示的步骤。The memory 6003 is used to store computer programs for executing embodiments of the present application, and is controlled by the processor 6001 for execution. The processor 6001 is used to execute the computer program stored in the memory 6003 to implement the steps shown in the foregoing method embodiments.

其中,电子设备包括但不限于:服务器。Among them, electronic equipment includes but is not limited to: servers.

本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时可实现前述方法实施例的步骤及相应内容。Embodiments of the present application provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.

本申请实施例还提供了一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时可实现前述方法实施例的步骤及相应内容。Embodiments of the present application also provide a computer program product, including a computer program. When the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.

应该理解的是,虽然本申请实施例的流程图中通过箭头指示各个操作步骤,但是这些步骤的实施顺序并不受限于箭头所指示的顺序。除非本文中有明确的说明,否则在本申请实施例的一些实施场景中,各流程图中的实施步骤可以按照需求以其他的顺序执行。此外,各流程图中的部分或全部步骤基于实际的实施场景,可以包括多个子步骤或者多个阶段。这些子步骤或者阶段中的部分或全部可以在同一时刻被执行,这些子步骤或者阶段中的每个子步骤或者阶段也可以分别在不同的时刻被执行。在执行时刻不同的场景下,这些子步骤或者阶段的执行顺序可以根据需求灵活配置,本申请实施例对此不限制。It should be understood that although each operation step is indicated by arrows in the flow chart of the embodiment of the present application, the order of implementation of these steps is not limited to the order indicated by the arrows. Unless otherwise specified herein, in some implementation scenarios of the embodiments of the present application, the implementation steps in each flowchart may be executed in other orders according to requirements. In addition, some or all of the steps in each flowchart are based on actual implementation scenarios and may include multiple sub-steps or multiple stages. Some or all of these sub-steps or stages may be executed at the same time, and each of these sub-steps or stages may also be executed at different times. In scenarios with different execution times, the execution order of these sub-steps or stages can be flexibly configured according to needs, and the embodiments of the present application do not limit this.

以上所述仅是本申请部分实施场景的可选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请的方案技术构思的前提下,采用基于本申请技术思想的其他类似实施手段,同样属于本申请实施例的保护范畴。The above are only optional implementation modes of some implementation scenarios of the present application. It should be pointed out that for those of ordinary skill in the technical field, without departing from the technical concept of the solution of the present application, adopting solutions based on the technical ideas of the present application Other similar implementation means also fall within the protection scope of the embodiments of this application.

Claims (10)

1. A method of determining an abnormal root cause, the method comprising:
determining the generation time and constant information of each log generated in real time in a statistical period; if no statistical period corresponding to the log is determined, taking constant information of the log as target constant information, creating a statistical period associated with the target constant information according to the generation time of the log, and counting the log generated in the created statistical period associated with the target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is the target constant information;
determining at least one abnormal log set according to whether the target constant information corresponding to the log set is reference constant information or not, wherein the reference constant information is constant information of a history log; each abnormal log set is arranged according to the sequence of the statistical time periods;
Obtaining a first abnormal log set chain according to each adjacent abnormal log set; the interval duration of the corresponding statistical time periods of the adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, and the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set does not exceed the second duration;
and determining the root cause log in the statistical period according to the first abnormal log set chain.
2. The method of claim 1, wherein said determining constant information for the log comprises:
determining constant characteristic information and variable characteristic information in the log;
and replacing the variable characteristic information in the log with a preset constant, and determining the replaced log as constant information of the log.
3. The method of claim 1, wherein the determining that there is no statistical period corresponding to the log comprises:
for the first log generated in the statistical period, determining that no statistical period corresponding to the log exists;
and for the non-first log generated in the statistical period, if each created statistical period does not comprise the generation time of the log or the constant information of the log is different from the corresponding target constant information of each statistical period, determining that the statistical period corresponding to the log does not exist.
4. The method of claim 1, wherein determining at least one abnormal log set according to whether the target constant information corresponding to the log set is the reference constant information, comprises:
for each log set, if the corresponding target constant information of the log set is not the reference constant information, determining the log set as the abnormal log set;
and if the target constant information corresponding to the log set is the reference constant information, determining that the log set is the abnormal log set according to the log quantity of the log set.
5. The method of claim 4, wherein each reference constant information is associated with a first ripple threshold and a second ripple threshold, the first ripple threshold being greater than the second ripple threshold;
the determining that the log set is the abnormal log set according to the log quantity of the log set includes:
determining the same reference constant information as the target constant information corresponding to the log set;
and if the number of the logs in the log set is larger than a first fluctuation threshold associated with the determined reference constant information, or if the number of the logs in the log set is smaller than a second fluctuation threshold associated with the determined reference constant information, determining the log set as the abnormal log set.
6. The method of claim 1, wherein the obtaining a first chain of anomaly log sets from each adjacent anomaly log set comprises:
arranging the abnormal log sets according to the time sequence of the corresponding statistical time period to obtain an abnormal log set sequence;
determining at least one second abnormal log set chain from the abnormal log set sequence according to the first time length; the interval duration of the corresponding statistical time period of the adjacent abnormal log sets in the second abnormal log set chain does not exceed the first duration;
for each second abnormal log set chain, if the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set in the second abnormal log set chain is not longer than the second duration, determining the second abnormal log set chain as the first abnormal log set chain; and if the second time period is longer than the second time period, determining at least one first abnormal log set chain according to the second abnormal log set chain.
7. The method of claim 1, wherein said determining a root log in the statistical period from the first chain of anomaly log sets comprises:
for each first abnormal log set chain, taking a first abnormal log set of the first abnormal log set chain and other abnormal log sets with the same statistical time period as the first abnormal log set as candidate log sets;
If the number of the candidate log sets is one, taking the logs in the candidate log sets as root logs of the statistical period;
if the number of the candidate log sets is multiple, determining that the corresponding target constant information in the multiple candidate log sets is a candidate log set of non-reference constant information, and taking the logs in the determined candidate log sets as root logs of the statistical period;
if the target constant information corresponding to at least two candidate log sets in the plurality of candidate log sets is the reference constant information, determining the candidate log sets which accord with the preset fluctuation condition in the at least two abnormal log sets, and taking the logs in the determined candidate log sets as root cause logs of the statistical period; the preset fluctuation condition is that the fluctuation amount of the log quantity of the logs in the candidate log set relative to the first fluctuation threshold value or the second fluctuation threshold value is maximum.
8. An apparatus for determining an cause of an anomaly, the apparatus comprising:
the statistics module is used for determining the generation time and constant information of each log generated in real time in a statistics period; if no statistical period corresponding to the log is determined, taking constant information of the log as target constant information, creating a statistical period associated with the target constant information according to the generation time of the log, and counting the log generated in the created statistical period associated with the target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is the target constant information;
The first determining module is used for determining at least one abnormal log set according to whether the target constant information corresponding to the log set is reference constant information or not, wherein the reference constant information is constant information of a history log; each abnormal log set is arranged according to the sequence of the statistical time periods;
the aggregation link module is used for obtaining a first abnormal log set chain according to each adjacent abnormal log set; the interval duration of the corresponding statistical time periods of the adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, and the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set does not exceed the second duration;
and the second determining module is used for determining the root cause log in the statistical period according to the first abnormal log set chain.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1-7.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method of determining the root cause of an anomaly of any one of claims 1 to 7.
CN202311236319.XA 2023-09-22 2023-09-22 Methods, devices, equipment and storage media for determining the root cause of anomalies Pending CN117170926A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311236319.XA CN117170926A (en) 2023-09-22 2023-09-22 Methods, devices, equipment and storage media for determining the root cause of anomalies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311236319.XA CN117170926A (en) 2023-09-22 2023-09-22 Methods, devices, equipment and storage media for determining the root cause of anomalies

Publications (1)

Publication Number Publication Date
CN117170926A true CN117170926A (en) 2023-12-05

Family

ID=88936091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311236319.XA Pending CN117170926A (en) 2023-09-22 2023-09-22 Methods, devices, equipment and storage media for determining the root cause of anomalies

Country Status (1)

Country Link
CN (1) CN117170926A (en)

Similar Documents

Publication Publication Date Title
CN111461555B (en) Production line quality monitoring method, device and system
CN116450399B (en) Microservice system fault diagnosis and root cause location method
CN110388315A (en) Oil pump fault identification method, device and system based on multi-source information fusion
CN117034149A (en) Fault processing strategy determining method and device, electronic equipment and storage medium
CN113407428A (en) Reliability evaluation method and device of artificial intelligence system and computer equipment
CN108055152B (en) Anomaly detection method of communication network information system based on distributed service log
CN119557607B (en) Data tracing method and system based on big data and blockchain multidimensional features
CN117992416A (en) Software detection method, device, chip and computer readable storage medium
CN114881112A (en) System anomaly detection method, device, equipment and medium
CN114443437A (en) Alarm root cause output method, apparatus, device, medium, and program product
CN112363891B (en) A method for obtaining abnormal causes based on fine-grained event and KPIs analysis
CN115114124A (en) Host risk assessment method and assessment device
CN111858270A (en) A Fault Location Method for Interlocking System Based on Data Mining Algorithm
CN116302984B (en) A root cause analysis method, device and related equipment for test tasks
CN117149565A (en) State detection method, device, equipment and medium for key performance indexes of cloud platform
CN117170926A (en) Methods, devices, equipment and storage media for determining the root cause of anomalies
Zhu et al. A Performance Fault Diagnosis Method for SaaS Software Based on GBDT Algorithm.
CN117540718A (en) Intelligent inspection result statistical method based on document object model
CN117544482A (en) AI-based operation and maintenance fault determination methods, devices, equipment and storage media
CN117827928A (en) A database inspection method based on abnormal feature extraction
CN117574055A (en) Hydropower unit state monitoring data cleaning method and device and electronic equipment
CN111241145A (en) A method and device for self-healing rule mining based on big data
CN118394597B (en) Method, device, equipment and medium for detecting abnormality of indicator data in call chain log
CN116861204B (en) Intelligent manufacturing equipment data management system based on digital twinning
CN118519818B (en) Deep recursion network-based big data computer system fault detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination