CN117170926A - Method, device, equipment and storage medium for determining abnormal root cause - Google Patents

Method, device, equipment and storage medium for determining abnormal root cause Download PDF

Info

Publication number
CN117170926A
CN117170926A CN202311236319.XA CN202311236319A CN117170926A CN 117170926 A CN117170926 A CN 117170926A CN 202311236319 A CN202311236319 A CN 202311236319A CN 117170926 A CN117170926 A CN 117170926A
Authority
CN
China
Prior art keywords
log
abnormal
constant information
determining
log set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311236319.XA
Other languages
Chinese (zh)
Inventor
冯鹏
欧阳晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Asiainfo Technologies China Inc
Original Assignee
Asiainfo Technologies China Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Asiainfo Technologies China Inc filed Critical Asiainfo Technologies China Inc
Priority to CN202311236319.XA priority Critical patent/CN117170926A/en
Publication of CN117170926A publication Critical patent/CN117170926A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a method, a device, equipment and a storage medium for determining an abnormal root cause, and relates to the field of log analysis. The method comprises the following steps: counting the logs in the counting period to obtain a log set corresponding to the target constant information; determining at least one abnormal log set according to whether the target constant information corresponding to the log set is the reference constant information; obtaining a first abnormal log set chain according to each adjacent abnormal log set; and determining the root cause log in the statistical period according to the first abnormal log set chain. The embodiment of the application realizes more accurate and efficient log root cause analysis.

Description

Method, device, equipment and storage medium for determining abnormal root cause
Technical Field
The application relates to the technical field of operation and maintenance, in particular to a method, a device, equipment and a storage medium for determining an abnormal root cause.
Background
The mining and positioning method for the IT system log in the industry has been researched and achieved, the most traditional method is to conduct manual step-by-step investigation according to operation and maintenance experience, and at present, two methods are the most common: the modeling intelligent analysis method is based on root cause positioning of RCA rules. However, these two schemes have the following disadvantages:
1. The scheme becomes a common practice in the industry and has a certain hit rate. However, the actual calling of the IT system log has complexity and variability, a large number of log models (the log models can be understood as constant information of the log) can still be aggregated by the scheme, and misjudgment can occur to cause inaccurate positioning of the root cause node. Such as: the change of the log model is closely related to the business, and the sporadic increase and decrease of the log are not necessarily real anomalies.
2. Based on root cause positioning of RCA rules, the association relationship of component abnormality is further mined mainly depending on the topological relationship of the components. The dependency on the topology relationship is large, and once the topology does not exist in the CMDB (Configuration Management Database ) or is not updated in time, the accuracy of root cause positioning can be directly affected.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for determining an abnormal root cause, which are used for solving one of the technical problems.
In one aspect, an embodiment of the present application provides a method for determining an abnormal root cause, where the method includes:
determining the generation time and constant information of each log generated in real time in a statistical period; if no statistics period corresponding to the log exists, taking constant information of the log as target constant information, creating a statistics period associated with the target constant information according to the generation time of the log, and counting the log generated in the created statistics period associated with the target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is target constant information. Determining at least one abnormal log set according to whether the target constant information corresponding to the log set is reference constant information, wherein the reference constant information is constant information of a history log; the anomaly log sets are arranged according to the sequence of the statistical time periods. Obtaining a first abnormal log set chain according to each adjacent abnormal log set; the interval duration of the respective corresponding statistical periods of adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, and the interval duration of the respective corresponding statistical periods of the first and the last abnormal log sets does not exceed the second duration. And determining the root cause log in the statistical period according to the first abnormal log set chain.
Optionally, determining constant information of the log includes:
constant characteristic information and variable characteristic information in the log are determined. And replacing the variable characteristic information in the log with a preset constant, and determining the replaced log as constant information of the log.
Optionally, determining the statistical period without log correspondence includes:
for the first log generated in the statistical period, determining that no statistical period corresponding to the log exists; and for the non-first log generated in the statistical period, if each created statistical period does not comprise the generation time of the log or the constant information of the log is different from the corresponding target constant information of each statistical period, determining that the statistical period corresponding to the log does not exist.
Optionally, determining at least one abnormal log set according to whether the target constant information corresponding to the log set is the reference constant information includes:
and for each log set, if the corresponding target constant information of the log set is not the reference constant information, determining the log set as an abnormal log set. If the target constant information corresponding to the log set is the reference constant information, determining the log set as an abnormal log set according to the log quantity of the log set.
Optionally, each piece of reference constant information is associated with a first fluctuation threshold and a second fluctuation threshold, and the first fluctuation threshold is greater than the second fluctuation threshold;
determining the log set as an abnormal log set according to the log quantity of the log set comprises the following steps:
and determining the same reference constant information as the target constant information corresponding to the log set. And if the number of the logs in the log set is larger than a first fluctuation threshold associated with the determined reference constant information, or if the number of the logs in the log set is smaller than a second fluctuation threshold associated with the determined reference constant information, determining the log set as an abnormal log set.
Optionally, obtaining a first abnormal log set chain according to each adjacent abnormal log set includes:
and arranging the abnormal log sets according to the time sequence of the corresponding statistical time period to obtain an abnormal log set sequence. Determining at least one second abnormal log set chain from the abnormal log set sequence according to the first time length; and the interval duration of the corresponding statistical time periods of the adjacent abnormal log sets in the second abnormal log set chain does not exceed the first duration. For each second abnormal log set chain, if the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set in the second abnormal log set chain is not longer than the second duration, determining the second abnormal log set chain as the first abnormal log set chain; and if the second time period is longer than the second time period, determining at least one first abnormal log set chain according to the second abnormal log set chain.
Optionally, determining the root cause log in the statistical period according to the first abnormal log set chain includes:
for each first abnormal log set chain, taking the first abnormal log set of the first abnormal log set chain and other abnormal log sets with the same statistical time period as the first abnormal log set as candidate log sets. And if the number of the candidate log sets is one, taking the logs in the candidate log sets as root logs of the statistical period. If the number of the candidate log sets is multiple, determining that the corresponding target constant information in the multiple candidate log sets is the candidate log set of the non-reference constant information, and taking the logs in the determined candidate log sets as root logs of the statistical period. If the target constant information corresponding to at least two candidate log sets in the plurality of candidate log sets is the reference constant information, determining the candidate log sets in the at least two abnormal log sets, which accord with the preset fluctuation condition, and taking the logs in the determined candidate log sets as root logs of the statistical period; the preset fluctuation condition is that the fluctuation amount of the log quantity of the logs in the candidate log set relative to the first fluctuation threshold value or the second fluctuation threshold value is maximum.
In another aspect, an embodiment of the present application provides an apparatus for determining an cause of an abnormality, where the apparatus includes:
The statistics module is used for determining the generation time and constant information of each log for each log generated in real time in a statistics period; if no statistics period corresponding to the log exists, taking constant information of the log as target constant information, creating a statistics period associated with the target constant information according to the generation time of the log, and counting the log generated in the created statistics period associated with the target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is target constant information.
The first determining module is used for determining at least one abnormal log set according to whether the target constant information corresponding to the log set is reference constant information, wherein the reference constant information is the constant information of the history log; the anomaly log sets are arranged according to the sequence of the statistical time periods.
The aggregation link module is used for obtaining a first abnormal log set chain according to each adjacent abnormal log set; and the interval duration of the corresponding statistical time periods of the adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, and the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set does not exceed the second duration.
And the second determining module is used for determining the root cause log in the statistical period according to the first abnormal log set chain.
The embodiment of the application provides electronic equipment, which comprises: comprising a memory, a processor and a computer program stored on the memory, the processor executing the computer program to perform the steps of a method of determining the root cause of an anomaly.
Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of determining a root cause of an anomaly.
The technical scheme provided by the embodiment of the application has the beneficial effects that:
constant information and statistical time periods are essentially tools to categorize logs. Therefore, the purpose of creating the classification tool can be achieved by determining the constant information of the log and creating a statistical period associated with the constant information of the log according to the generation time of the log when the statistical period corresponding to the log is not determined. Further, the logs generated in real time in the statistical period can be subjected to preliminary classification processing by using two tools of the statistical period and the constant information, a log set is obtained, and the root cause log set of the abnormal event can be analyzed based on the log set.
Further, different exception judgment is realized for the log sets corresponding to the new and old target constant information. According to whether the target constant information is the reference constant information or not, determining at least one abnormal log set, wherein the target constant information which is the reference constant information is the old target constant information, but not the reference constant information, and is the new target constant information, and the abnormal judgment standards of two different target constant information are different, so that the abnormal judgment of the log sets of different abnormal standards can be realized through the step.
The first duration and the second duration are substantially time slices of a standard setting of an abnormal event in a real scene, for example, the first duration is a duration between adjacent abnormal points in the abnormal event, and the second duration is a maximum duration of the abnormal event. And linking the abnormal log sets according to the first time length and the second time length, wherein the essence is that the abnormal log sets are converged into a first abnormal log set chain for representing the abnormal event through time slicing, and finally the root cause log in the statistical period, namely the root cause log of the abnormal event, can be analyzed through the first abnormal log set chain.
By the method, the logs for classifying and counting the mass logs generated in real time can be realized, and accurate and efficient log root cause analysis can be realized based on the log set obtained by classification.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.
FIG. 1 is a flow chart of a method for determining an abnormal root cause according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a plurality of first log-anomaly chains according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an abnormality inference device according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a root cause positioning process according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an apparatus for determining an abnormal root cause according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g. "a and/or B" indicates implementation as "a", or as "a and B".
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
First, several terms related to the present application are described and explained:
and (3) convergence: convergence is an economic, mathematical term that is an important tool in studying functions, meaning convergence at a point, approaching a certain value. The convergence type includes convergence sequence, function convergence, global convergence and local convergence.
Log model: the log is a semi-structured data generated by a specific code, and the model is used as a main stream means for analyzing the log in the industry, so that the model can help us to quickly understand the log profile, millions of logs can be compressed into hundreds of log templates, and the aim of being visible to human eyes can be achieved. For example, constant information of the log in the embodiment of the present application is a log model.
Sequence (or dynamic series): the numerical values of the same statistical index are arranged according to the time sequence of occurrence.
Root cause analysis: root cause analysis is a structured problem-handling approach to gradually find the root cause of a problem and solve it, rather than just focusing on the characterization of the problem. The root cause is the most fundamental cause of the occurrence of the problem of interest.
The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.
FIG. 1 shows a flow diagram of a method of determining the root cause of an anomaly. The method comprises steps S110 to S140.
S110, determining the generation time and constant information of each log generated in real time in a statistical period; if no statistics period corresponding to the log exists, taking constant information of the log as target constant information, creating a statistics period associated with the target constant information according to the generation time of the log, and counting the log generated in the created statistics period associated with the target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is target constant information.
After the logs in the statistics period are obtained, the logs are preprocessed so as to clean unimportant logs or invalid logs and other dirty logs, and interference to an abnormality judgment process is reduced.
Specifically, for each log generated in real time in the statistical period, whether a statistical period corresponding to the log exists or not is sequentially judged according to the sequence of the generation time of the log. The statistics period corresponding to the log may be understood as that the generation time of the log is within the statistics period, and the constant information string of the log is the target constant information corresponding to the statistics period.
Wherein each statistical period is a preset duration.
In one possible implementation manner, the implementation step of determining the statistical period without log correspondence is provided, including:
and for the first log generated in the statistical period, determining that the statistical period corresponding to the log does not exist. And for the non-first log generated in the statistical period, if each created statistical period does not comprise the generation time of the log or the constant information of the log is different from the corresponding target constant information of each statistical period, determining that the statistical period corresponding to the log does not exist.
S120, determining at least one abnormal log set according to whether target constant information corresponding to the log set is reference constant information, wherein the reference constant information is constant information of a history log; the anomaly log sets are arranged according to the sequence of the statistical time periods.
S130, obtaining a first abnormal log set chain according to each adjacent abnormal log set; and the interval duration of the corresponding statistical time periods of the adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, and the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set does not exceed the second duration.
S140, determining root cause logs in the statistical period according to the first abnormal log set chain.
Wherein each first anomaly log set chain event characterizes one anomaly event in the statistical period, so that the root log of the anomaly event can be determined through the first anomaly log set chain for the anomaly event occurring in the statistical period.
Since constant information is a tool to classify logs generated in real time during a statistical period, how to determine such a tool is a problem to be solved. To this end, the embodiment of the application also provides a possible implementation manner.
The specific implementation steps for determining the constant information of the log in S110 include:
determining constant characteristic information and variable characteristic information in a log; and replacing the variable characteristic information in the log with a preset constant, and determining the replaced log as constant information of the log.
In one example, the contents of log 1 are as follows:
IDMM{
"220";["state":true"astRunTime":"20220630000016""lastDealTime":"20220629212250"},
"221":{"state";true,"lastRunTime":"20220630000015","lastDealTime":"20220629203437"}
}
based on the content of log 1, the obtained constant information of log 1 is as follows:
IDMM{
"XX";["state":XX"LastRunTime":"XX""lastDealTime":"XX"]
}
how to screen the abnormal log set from the log set is a key step for performing abnormal judgment. The embodiment of the application also provides an implementation mode, which is used for carrying out independent judgment on each log set so as to obtain at least one abnormal log set. Determining at least one abnormal log set according to whether the target constant information corresponding to the log set is the reference constant information, including:
and for each log set, if the corresponding target constant information of the log set is not the reference constant information, determining the log set as an abnormal log set. If the target constant information corresponding to the log set is the reference constant information, determining the log set as an abnormal log set according to the log quantity of the log set.
For the target constant information appearing in the statistical period, if the target constant information appears for the first time, the target constant information is new constant information, and if the target constant information does not appear for the first time, the target constant information is old constant information. For new target constant information, the occurrence of the new target constant information is very likely to indicate that an abnormality occurs, and for such a log set, the new target constant information can be directly judged as an abnormal log set; for the old target constant information, if the corresponding log quantity of the old target constant information is in a normal range, the log quantity is not abnormal, and the log quantity can be judged to be abnormal only if the log quantity exceeds the normal range. In this case, whether or not an abnormality occurs may be determined according to the number of logs of the log set.
Optionally, each of the reference constant information is associated with a first ripple threshold and a second ripple threshold, the first ripple threshold being greater than the second ripple threshold. The first fluctuation threshold and the second fluctuation threshold can be determined according to history logs corresponding to the reference constant information. It should be noted that, the determining process of the first fluctuation threshold and the second fluctuation threshold may refer to related technologies, and for simplicity and convenience, a description thereof will not be repeated here.
The method comprises the following steps of Sa 1-Sa 2, wherein the log set is determined to be an abnormal log set according to the log quantity of the log set.
Sa1, determining the same reference constant information as the target constant information corresponding to the log set.
Sa2, if the number of the logs in the log set is greater than the first fluctuation threshold associated with the determined reference constant information, or if the number of the logs in the log set is less than the second fluctuation threshold associated with the determined reference constant information, determining the log set as an abnormal log set.
Since there is at least one of the sets of exception logs, how to perform a linking operation between the sets of exception logs is also a problem to be solved. To this end, the embodiment of the application also provides a possible implementation manner. The method comprises the steps of linking adjacent abnormal log sets to obtain a first abnormal log set chain, and implementing steps Sb 1-Sb 3.
And Sb1, arranging the abnormal log sets according to the time sequence of the corresponding statistical time period to obtain an abnormal log set sequence.
Specifically, first, each abnormal log set obtains an initial abnormal log set sequence in accordance with the timing of the statistical period. Wherein, partial abnormal log sets with the same statistical time period in the initial abnormal log set sequence are in the same sequence. Second, for each abnormal log set in the same order, the log number is rearranged from high to low. And finally, taking the obtained initial abnormal log set sequence as an abnormal log set sequence.
The respective anomaly log sets include an anomaly log set as a root cause and an anomaly log set as a non-root cause. In general, an anomaly log set as a root is generally the cause of the non-root anomaly log set generation before an anomaly log set as a non-root. Therefore, the basis for analyzing the root cause log can be laid by orderly arranging the abnormal log sets according to the sequence of the statistical time periods.
Sb2, obtaining at least one second abnormal log set chain according to adjacent abnormal log sets in the first time-long link abnormal log set sequence; the interval duration of the corresponding statistical time periods of the adjacent abnormal log sets in the second abnormal log set chain does not exceed the first duration.
In the abnormality detection of the related art, the interval between two adjacent abnormalities does not exceed a certain period of time, such as 5 minutes. If the interval exceeds 5 minutes, it is understood that there is no correlation between the two adjacent anomalies and that they do not belong to the same anomaly event. In the embodiment of the present application, the first time period may be understood as the specific time period.
Converging and aggregating the abnormal log sets through the first time length, namely, regularizing the fragmented abnormal log sets to form an independent and complete abnormal log set chain. In the formed abnormal log set chain, the information of each abnormal log set, such as the statistical time period, the log number and the corresponding target constant information, can be intuitively seen.
Sb3, for each second abnormal log set chain, if the interval duration of the respective corresponding statistical periods of the first and the last abnormal log sets in the second abnormal log set chain does not exceed the second duration, determining that the second abnormal log set chain is the first abnormal log set chain; and if the second time period is longer than the second time period, determining at least one first abnormal log set chain according to the second abnormal log set chain.
In the related art abnormality detection, an abnormality is not continued endlessly, that is, the duration thereof is not longer than a specific duration, for example, the characteristic duration is 2 hours. If the duration exceeds 2 hours, then the abnormality after 2 hours can be considered to be independent of the abnormal event.
And cutting the first abnormal log set chain obtained by convergence aggregation according to the second time length to form an abnormal log set chain conforming to the abnormal event logic, so that the dimension of the abnormal log set chain is increased, and the reliability of an abnormal detection result is greatly improved.
To learn about the first anomaly log set chain, the present application also provides an example of a plurality of first anomaly log set chains, as shown in FIG. 2. The horizontal axis of fig. 2 is time, and the vertical axis is log number.
In this example, the statistical period is 0:00-24:00, and a total of 16 exception log sets are determined, and the 16 exception log sets are arranged in order and are identified as 1-16.
In this example, 3 first abnormal log set chains are determined for 16 abnormal log sets, and each first abnormal log set chain characterizes one event, namely, event 1 to event 3. The first abnormal log set chain of the characterization event 1 sequentially comprises the following abnormal log sets: 1,2,3,4,5; the anomaly log sets on the first anomaly log set chain characterizing event 2 are respectively: 6,7,8,9, 10, 11, 12; the anomaly log sets on the first anomaly log set chain characterizing event 3 are respectively: 13, 14,15,16.
How to mine the root cause logs of the exception through each first exception log set chain is also a key problem to be solved by the embodiment of the application. For this reason, the embodiment of the present application further provides a possible implementation manner, where step S140 further includes the following implementation steps.
For each first chain of anomaly log sets:
and taking the first abnormal log set of the first abnormal log set chain and other abnormal log sets with the same statistical time period as the first abnormal log set as candidate log sets. In the anomaly detection of the related art, the earliest log is usually a root log of an anomaly event. Therefore, the exception log set with the earliest statistics time period in the first exception log set chain needs to be screened out and used as a candidate log set, and the root cause log is determined through the candidate log set.
Continuing with the example shown in FIG. 2, for a first chain of anomaly log sets characterizing event 1, the anomaly log set identified as "1" is the candidate log. For the first chain of anomaly log sets characterizing event 2, the anomaly log sets identified as "6", "7" are candidate logs.
And if the number of the candidate log sets is one, taking the logs in the candidate log sets as root logs of the statistical period.
If the number of the candidate log sets is multiple, determining that the corresponding target constant information in the multiple candidate log sets is the candidate log set of the non-reference constant information, and taking the logs in the determined candidate log sets as root logs of the statistical period. In this statistical period, if the target constant information is the reference constant information, the target constant information may be understood as a "new" constant information. If the target constant information is non-reference constant information, the target constant information can be understood as "old" constant information.
If the target constant information corresponding to at least two candidate log sets in the plurality of candidate log sets is the reference constant information, determining the candidate log sets in the at least two abnormal log sets, which accord with the preset fluctuation condition, and taking the logs in the determined candidate log sets as root logs of the statistical period; the preset fluctuation condition is that the fluctuation amount of the log quantity of the logs in the candidate log set relative to the first fluctuation threshold value or the second fluctuation threshold value is maximum.
And carrying out comprehensive reasoning by combining the statistical time period of each abnormal log set in the first abnormal log set chain and whether the target constant information is the reference constant information or not and the fluctuation condition of the log quantity, and finally determining the root log of the abnormal event represented by the first abnormal log set chain.
The method for determining the abnormal root cause can realize accurate and efficient log root cause analysis in various abnormal detection scenes. In order to more clearly understand the technical effects of the method, an example is also provided in the embodiment of the present application. Wherein the present example includes an anomaly inference means as shown in fig. 3.
In this example, the log model of the log is constant information of the log in the above embodiment.
In this example, the anomaly inference device includes 4 modules, which are a data access module, a model training module, a real-time anomaly detection module, and a real-time root cause inference module, respectively.
Wherein, the function of the data access module includes: and preprocessing the log generated in real time to obtain a processable log. In one aspect, the processable log data is stored as historical log data for a statistical period subsequent to the current statistical period. On the other hand, a processable real-time log and a history log are sent to the model training module.
Wherein, the function of model training module includes: receiving a processable real-time log and a history log; and constructing log template features of the real-time log to obtain constant feature information and variable feature information of the real-time log, and training a log model according to each feature information of the real-time log to obtain a log model of the real-time log. And (3) carrying out time sequence data feature statistics on each log model to obtain at least one time sequence statistic data (time sequence statistic data of the log model is equivalent to a log set obtained by carrying out statistics on the corresponding log of the target constant information in a statistic period) of the log model.
Wherein the functions of the model training module further include: constructing log template features of the history log to obtain constant feature information and variable feature information of the history log, and training a log model according to each feature information of the history log to obtain a log model of the history log; determining a plurality of reference log models; analyzing the time sequence characteristic data of each reference log model, training the abnormal threshold value of each reference log model, and obtaining the maximum value and the minimum value in the time sequence statistical data of the reference log model. Wherein the maximum value corresponds to the first fluctuation threshold of the above embodiment and the minimum value corresponds to the second fluctuation threshold of the above embodiment.
Further, the time sequence statistical data of each log model, the maximum value and the minimum value associated with each log model are sent to a real-time anomaly detection module.
Wherein, the function of real-time abnormal detection module includes: for each log model, matching the log model with a reference log model in a log model library. If the log model is a reference log model, adopting a new model for anomaly detection; if the log model is a non-reference log model, adopting log quantity abnormality detection, and carrying out abnormality detection by combining the maximum value and the minimum value of time sequence statistical data of the log model. By the foregoing two types of anomaly detection, the time series statistics of N anomalies are determined from the time series statistics, which are simply referred to as anomaly 1, anomaly 2, anomaly 3 … … anomaly N, respectively (each anomaly corresponds to the anomaly log set in the above embodiment).
And further, transmitting the anomalies 1 to N to a real-time root cause reasoning module.
Wherein, the function of real-time root cause reasoning module includes: event convergence is performed according to the anomalies 1 to N, and an event 1 and an event 2 … event N (each event corresponds to the first anomaly log set chain in the above embodiment) are obtained. And carrying out root cause analysis on each event to obtain the abnormality serving as the root cause in each event. For example, for event 1, its exception root causes are exception 1 and exception 5.
Further, the root cause of each event is summarized and used as the root cause of the current statistical period.
In addition, the application also provides a detailed flow example for the running process of the real-time abnormality detection module and the real-time root cause reasoning in the abnormality reasoning device, as shown in fig. 4. The flow includes steps S1001 to S1010.
S1001, time sequence statistical data of a log model is acquired.
And acquiring time sequence statistical data of all log models in a statistical period.
S1002, detecting whether time sequence statistical data of the log model is abnormal.
If the time series statistic data is abnormal, the abnormal time series statistic data is regarded as one abnormality, and step S1003 is executed.
S1003, aggregating the exceptions to obtain a complete initial event (the initial event corresponds to the second exception log set chain in the embodiment).
S1004, converging the initial event to obtain an independent event (the event corresponds to the first abnormal log set chain).
When there are a plurality of events, steps S1005 to S1009 are executed for each event.
S1005, judging whether the candidate abnormality is 1 item.
Here, for each event, the abnormality whose occurrence time is earliest is taken as a candidate abnormality (candidate abnormality, which corresponds to the candidate log set in the above embodiment). The candidate anomalies may be one or more.
If the number of the candidate anomalies is one, the candidate anomalies are taken as the root cause of the event. If the number of candidate anomalies is plural, step S1006 is performed.
S1006, judging whether the log model corresponding to the candidate exception is a reference log model.
And for any one of the plurality of candidate anomalies, if the log model corresponding to the candidate anomaly is a non-reference log model, taking the time sequence statistical data corresponding to the candidate anomaly as the root cause of the event. If the log model corresponding to the candidate abnormality is the reference log model, S1007 is performed.
S1007, judging whether the log model of at least two candidate abnormal is a reference log model.
If yes, S1008 is performed. If not, the root cause judgment process of the next event is entered.
S1008, judging whether the data fluctuation conditions of at least two candidate anomalies exceed a preset range.
Wherein each anomaly is a piece of time sequence statistics. And regarding each candidate abnormality, taking the candidate abnormality with the largest fluctuation condition of the corresponding time sequence statistical data as the root cause of the event.
S1009, the root cause of the event is determined.
S1010, outputting the root cause of each event in the statistical period.
Referring to fig. 5, an embodiment of the present application further provides an apparatus 500 for determining an abnormal root cause. The device comprises a statistics module 510, a first determination module 520, an aggregation link module 530 and a second determination module 540.
The statistics module 510 is configured to determine, for each log generated in real time in a statistics period, a generation time and constant information of the log; if no statistics period corresponding to the log exists, taking constant information of the log as target constant information, creating a statistics period associated with the target constant information according to the generation time of the log, and counting the log generated in the created statistics period associated with the target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is target constant information.
The first determining module 520 is configured to determine at least one abnormal log set according to whether the target constant information corresponding to the log set is reference constant information, where the reference constant information is constant information of a history log; the anomaly log sets are arranged according to the sequence of the statistical time periods.
An aggregation link module 530, configured to obtain a first abnormal log set chain according to each adjacent abnormal log set; and the interval duration of the corresponding statistical time periods of the adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, and the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set does not exceed the second duration.
The second determining module 540 is configured to determine the root cause log in the statistical period according to the first abnormal log set chain.
The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions of each module of the device may be referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.
The embodiment of the application provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of a method for determining the cause of abnormality, and compared with the prior art, the method can realize the following steps: and more accurate and efficient log root cause analysis is realized.
Referring to fig. 6, the embodiment of the present application further provides a specific example of an electronic device, and an electronic device 6000 shown in fig. 6 includes: a processor 6001 and a memory 6003. In which a processor 6001 is coupled to a memory 6003, such as via a bus 6002. Optionally, the electronic device 6000 may also include a transceiver 6004, the transceiver 6004 may be used for data interactions between the electronic device and other electronic devices, such as transmission of data and/or reception of data and the like. It should be noted that, in practical applications, the transceiver 6004 is not limited to one, and the structure of the electronic device 6000 is not limited to the embodiment of the present application.
The processor 6001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 6001 may also be a combination that performs computing functions, e.g., including one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
Bus 6002 may include a path to transfer information between the aforementioned components. Bus 6002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 6002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.
The Memory 6003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory ), a CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, without limitation.
The memory 6003 is for storing a computer program for executing an embodiment of the present application, and is controlled to be executed by the processor 6001. The processor 6001 is configured to execute a computer program stored in the memory 6003 to implement the steps shown in the foregoing method embodiments.
Among them, electronic devices include, but are not limited to: and a server.
Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the foregoing method embodiments and corresponding content.
The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program can realize the steps and corresponding contents of the embodiment of the method when being executed by a processor.
It should be understood that, although various operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the order in which these steps are implemented is not limited to the order indicated by the arrows. In some implementations of embodiments of the application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages can be flexibly configured according to the requirement, which is not limited by the embodiment of the present application.
The foregoing is merely an optional implementation manner of some of the implementation scenarios of the present application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the present application are adopted without departing from the technical ideas of the scheme of the present application, and the implementation manner is also within the protection scope of the embodiments of the present application.

Claims (10)

1. A method of determining an abnormal root cause, the method comprising:
determining the generation time and constant information of each log generated in real time in a statistical period; if no statistical period corresponding to the log is determined, taking constant information of the log as target constant information, creating a statistical period associated with the target constant information according to the generation time of the log, and counting the log generated in the created statistical period associated with the target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is the target constant information;
determining at least one abnormal log set according to whether the target constant information corresponding to the log set is reference constant information or not, wherein the reference constant information is constant information of a history log; each abnormal log set is arranged according to the sequence of the statistical time periods;
Obtaining a first abnormal log set chain according to each adjacent abnormal log set; the interval duration of the corresponding statistical time periods of the adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, and the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set does not exceed the second duration;
and determining the root cause log in the statistical period according to the first abnormal log set chain.
2. The method of claim 1, wherein said determining constant information for the log comprises:
determining constant characteristic information and variable characteristic information in the log;
and replacing the variable characteristic information in the log with a preset constant, and determining the replaced log as constant information of the log.
3. The method of claim 1, wherein the determining that there is no statistical period corresponding to the log comprises:
for the first log generated in the statistical period, determining that no statistical period corresponding to the log exists;
and for the non-first log generated in the statistical period, if each created statistical period does not comprise the generation time of the log or the constant information of the log is different from the corresponding target constant information of each statistical period, determining that the statistical period corresponding to the log does not exist.
4. The method of claim 1, wherein determining at least one abnormal log set according to whether the target constant information corresponding to the log set is the reference constant information, comprises:
for each log set, if the corresponding target constant information of the log set is not the reference constant information, determining the log set as the abnormal log set;
and if the target constant information corresponding to the log set is the reference constant information, determining that the log set is the abnormal log set according to the log quantity of the log set.
5. The method of claim 4, wherein each reference constant information is associated with a first ripple threshold and a second ripple threshold, the first ripple threshold being greater than the second ripple threshold;
the determining that the log set is the abnormal log set according to the log quantity of the log set includes:
determining the same reference constant information as the target constant information corresponding to the log set;
and if the number of the logs in the log set is larger than a first fluctuation threshold associated with the determined reference constant information, or if the number of the logs in the log set is smaller than a second fluctuation threshold associated with the determined reference constant information, determining the log set as the abnormal log set.
6. The method of claim 1, wherein the obtaining a first chain of anomaly log sets from each adjacent anomaly log set comprises:
arranging the abnormal log sets according to the time sequence of the corresponding statistical time period to obtain an abnormal log set sequence;
determining at least one second abnormal log set chain from the abnormal log set sequence according to the first time length; the interval duration of the corresponding statistical time period of the adjacent abnormal log sets in the second abnormal log set chain does not exceed the first duration;
for each second abnormal log set chain, if the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set in the second abnormal log set chain is not longer than the second duration, determining the second abnormal log set chain as the first abnormal log set chain; and if the second time period is longer than the second time period, determining at least one first abnormal log set chain according to the second abnormal log set chain.
7. The method of claim 1, wherein said determining a root log in the statistical period from the first chain of anomaly log sets comprises:
for each first abnormal log set chain, taking a first abnormal log set of the first abnormal log set chain and other abnormal log sets with the same statistical time period as the first abnormal log set as candidate log sets;
If the number of the candidate log sets is one, taking the logs in the candidate log sets as root logs of the statistical period;
if the number of the candidate log sets is multiple, determining that the corresponding target constant information in the multiple candidate log sets is a candidate log set of non-reference constant information, and taking the logs in the determined candidate log sets as root logs of the statistical period;
if the target constant information corresponding to at least two candidate log sets in the plurality of candidate log sets is the reference constant information, determining the candidate log sets which accord with the preset fluctuation condition in the at least two abnormal log sets, and taking the logs in the determined candidate log sets as root cause logs of the statistical period; the preset fluctuation condition is that the fluctuation amount of the log quantity of the logs in the candidate log set relative to the first fluctuation threshold value or the second fluctuation threshold value is maximum.
8. An apparatus for determining an cause of an anomaly, the apparatus comprising:
the statistics module is used for determining the generation time and constant information of each log generated in real time in a statistics period; if no statistical period corresponding to the log is determined, taking constant information of the log as target constant information, creating a statistical period associated with the target constant information according to the generation time of the log, and counting the log generated in the created statistical period associated with the target constant information to obtain a log set corresponding to the target constant information; the constant information of each log in the log set is the target constant information;
The first determining module is used for determining at least one abnormal log set according to whether the target constant information corresponding to the log set is reference constant information or not, wherein the reference constant information is constant information of a history log; each abnormal log set is arranged according to the sequence of the statistical time periods;
the aggregation link module is used for obtaining a first abnormal log set chain according to each adjacent abnormal log set; the interval duration of the corresponding statistical time periods of the adjacent abnormal log sets in the first abnormal log set chain does not exceed the first duration, and the interval duration of the corresponding statistical time periods of the first abnormal log set and the last abnormal log set does not exceed the second duration;
and the second determining module is used for determining the root cause log in the statistical period according to the first abnormal log set chain.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1-7.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method of determining the root cause of an anomaly of any one of claims 1 to 7.
CN202311236319.XA 2023-09-22 2023-09-22 Method, device, equipment and storage medium for determining abnormal root cause Pending CN117170926A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311236319.XA CN117170926A (en) 2023-09-22 2023-09-22 Method, device, equipment and storage medium for determining abnormal root cause

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311236319.XA CN117170926A (en) 2023-09-22 2023-09-22 Method, device, equipment and storage medium for determining abnormal root cause

Publications (1)

Publication Number Publication Date
CN117170926A true CN117170926A (en) 2023-12-05

Family

ID=88936091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311236319.XA Pending CN117170926A (en) 2023-09-22 2023-09-22 Method, device, equipment and storage medium for determining abnormal root cause

Country Status (1)

Country Link
CN (1) CN117170926A (en)

Similar Documents

Publication Publication Date Title
CN109598095B (en) Method and device for establishing scoring card model, computer equipment and storage medium
EP2854053B1 (en) Defect prediction method and device
CN113098723B (en) Fault root cause positioning method and device, storage medium and equipment
CN111459700B (en) Equipment fault diagnosis method, diagnosis device, diagnosis equipment and storage medium
US11651375B2 (en) Below-the-line thresholds tuning with machine learning
US8078913B2 (en) Automated identification of performance crisis
US20160255109A1 (en) Detection method and apparatus
CN116450399B (en) Fault diagnosis and root cause positioning method for micro service system
EP3663919A1 (en) System and method of automated fault correction in a network environment
CN113010389A (en) Training method, fault prediction method, related device and equipment
CN110471945B (en) Active data processing method, system, computer equipment and storage medium
US11792081B2 (en) Managing telecommunication network event data
CN117170926A (en) Method, device, equipment and storage medium for determining abnormal root cause
CN116955059A (en) Root cause positioning method, root cause positioning device, computing equipment and computer storage medium
CN115185932A (en) Data processing method and device
JP6451483B2 (en) Predictive detection program, apparatus, and method
CN111724048A (en) Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering
US11288159B2 (en) System model evaluation system, operation management system, system model evaluation method, and program
CN117093433B (en) Fault detection method and device, electronic equipment and storage medium
Sreevalsan-Nair et al. CAP-DSDN: Node Co-association Prediction in Communities in Dynamic Sparse Directed Networks and a Case Study of Migration Flow.
WO2022059183A1 (en) Information processing device, information processing method, and information processing program
CN117349154A (en) Numerical method for predicting uncertainty in software reliability growth model
CN117544525A (en) Wireless AP log analysis method and system for subway
CN117877261A (en) Traffic abnormal data anomaly detection method based on improved robust random forest
CN115237917A (en) Data computing method, device and equipment for data center station and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination