WO2022237088A1 - 根因定位方法、电子设备及存储介质 - Google Patents

根因定位方法、电子设备及存储介质 Download PDF

Info

Publication number
WO2022237088A1
WO2022237088A1 PCT/CN2021/127331 CN2021127331W WO2022237088A1 WO 2022237088 A1 WO2022237088 A1 WO 2022237088A1 CN 2021127331 W CN2021127331 W CN 2021127331W WO 2022237088 A1 WO2022237088 A1 WO 2022237088A1
Authority
WO
WIPO (PCT)
Prior art keywords
root cause
abnormal
type
score
candidate root
Prior art date
Application number
PCT/CN2021/127331
Other languages
English (en)
French (fr)
Inventor
卢冠男
孙芮
莫林林
王雅琪
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022237088A1 publication Critical patent/WO2022237088A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Definitions

  • the present application relates to the field of computer technology, in particular to a root cause location method, electronic equipment and storage media.
  • embodiments of the present application provide a root cause location method, electronic equipment, and storage media, so as to solve the technical problem in related technologies that the root cause of an abnormal event cannot be accurately located.
  • the embodiment of the present application provides a root cause location method, including:
  • a target root cause corresponding to the first abnormal event is determined based on the determined first score corresponding to the candidate root cause in each set abnormality type.
  • the first score corresponding to each candidate root cause in each set abnormal type is determined, including:
  • each candidate root cause is The corresponding first score in each set exception type.
  • the method also includes:
  • the confidence level corresponding to each of the at least two candidate root causes corresponding to the first abnormal event Based on the second set correspondence between the set root cause and the confidence level, determine the confidence level corresponding to each of the at least two candidate root causes corresponding to the first abnormal event; wherein, the second A setting corresponding relationship and the second setting relationship are determined based on at least one of historical logs, historical alarm information, and version release records.
  • the method also includes:
  • the first score corresponding to the determined candidate root cause is set in the abnormal type.
  • the adjustment of the predicted probability corresponding to the set abnormal type corresponding to the first abnormal event and the confidence degree of the candidate root cause corresponding to the first abnormal event includes:
  • the predicted probability of the first abnormal type is less than the first set probability; at least two candidate root causes corresponding to the first abnormal event do not include the first candidate root cause, or the first candidate root cause
  • the confidence degree of the cause is less than the first set confidence level; the first set probability indicates that the first abnormal event belongs to the first abnormal type; the first set confidence level indicates that the first abnormal event currently exists A candidate root cause.
  • the first score corresponding to the determined candidate root cause in setting the abnormal type is adjusted, including:
  • the first score corresponding to the determined candidate root cause in setting the abnormal type is adjusted, including:
  • the first score corresponding to the first candidate root cause in the first anomaly type is less than the set score, the first score corresponding to the first candidate root cause in the first anomaly type adjusted to be greater than or equal to the set score; the set score is used to screen for the target root cause.
  • the embodiment of the present application also provides an electronic device, including:
  • the prediction unit is configured to input the feature vector corresponding to the first abnormal event into the abnormal detection model, and obtain the predicted probability corresponding to each of the various types of abnormal settings corresponding to the first abnormal event;
  • the first determination unit is configured to be based on the predicted probability corresponding to each set abnormal type corresponding to the first abnormal event, and based on the corresponding Confidence, determine the first score corresponding to each candidate root cause in each set abnormal type;
  • the second determining unit is configured to determine the target root cause corresponding to the first abnormal event based on the first score corresponding to the determined candidate root cause in each set abnormal type.
  • An embodiment of the present application also provides an electronic device, including: a processor and a memory configured to store a computer program that can run on the processor,
  • the processor is configured to execute the steps of any one of the above-mentioned root cause location methods when running the computer program.
  • the embodiment of the present application also provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the above root cause location methods are implemented.
  • the predicted probability that the first abnormal event belongs to each set abnormal type is predicted through the trained abnormal detection model; based on the predicted probability of the set abnormal type and at least two candidate root causes corresponding to the first abnormal event Confidence corresponding to each candidate root cause in , calculate the first score of each candidate root cause in each set anomaly type, and then determine the target root cause corresponding to the first abnormal event based on the first score, thus The target root cause can be accurately determined, and the target root cause leading to the first abnormal event can be accurately located.
  • Fig. 1 is the implementation flow diagram of the root cause localization method provided by the application embodiment of the present invention.
  • Fig. 2 is a schematic diagram of the root cause location method provided by the application embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a hardware composition structure of an electronic device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an implementation process of a method for training an anomaly detection model provided by an embodiment of the present application, wherein the execution subject of the process is an electronic device such as a terminal and a server.
  • root cause location methods include:
  • Step 101 Input the feature vector corresponding to the first abnormal event into the abnormal detection model, and obtain the predicted probability corresponding to each of the various types of abnormal settings corresponding to the first abnormal event.
  • the electronic device when it obtains the relevant data of the first abnormal event, it extracts the characteristic information of the first abnormal event from the relevant data of the first abnormal event, and determines the corresponding
  • the feature vector is to obtain the feature vector corresponding to the first abnormal event; input the feature vector corresponding to the first abnormal event into the trained abnormal detection model, and obtain the various set abnormal types corresponding to the first abnormal event output by the abnormal detection model The predicted probability for each type of specified exception.
  • the relevant data of the first abnormal event includes output logs, alarm information and version release records within a set time period before the occurrence of the first abnormal event when the first abnormal event occurs.
  • the feature information of the first abnormal event includes at least one of the following: feature information of abnormal indicators, feature information of interruption events, feature information of change operations, feature information of alarm events, and feature information of abnormal systems.
  • the characteristic information of the external factors corresponding to the first abnormal event can be determined, for example, the characteristic information of the change operation; based on at least one of the log and alarm information, the internal factors corresponding to the first abnormal event can be determined.
  • the characteristic information of factors such as the characteristic information of abnormal indicators, the characteristic information of interruption events and the characteristic information of abnormal system, etc.; thus, the characteristic information corresponding to the first abnormal event is enriched, which can improve the prediction probability of the abnormal detection model output. Accuracy.
  • the feature vector corresponding to the feature information of the first abnormal event is determined according to the following method:
  • the feature vectors corresponding to each of the at least two kinds of feature information corresponding to the first abnormal event are determined, the feature vectors corresponding to the determined feature information are fused to obtain the feature vectors corresponding to the set abnormal event.
  • the feature vector can be reduced in dimension, and the data processing efficiency of the anomaly detection model can be improved.
  • fusing the feature vectors refers to merging the feature vectors.
  • fusing the feature vectors corresponding to the determined feature information includes:
  • the third vector represents a vector corresponding to feature information other than the feature information of the abnormal index.
  • the abnormal indicator refers to a set indicator that triggers an alarm; the set indicator that triggers an alarm is determined based on log or alarm information.
  • the setting index includes at least one of the following: business transaction volume, business success rate, time delay, and the like.
  • the electronic device converts the characteristic information of each abnormal indicator corresponding to the first abnormal event into a corresponding first vector according to the set hierarchical structure, and corresponds to the first abnormal event All the first vectors of are summed to obtain the second vector, so as to obtain the feature vectors corresponding to all abnormal indicators. It should be noted that, when there is only one abnormal index corresponding to the first abnormal event, the first vector is equal to the second vector.
  • the electronic device determines a third vector corresponding to the first abnormal event based on at least one of the characteristic information of the interruption event, the characteristic information of the change operation, the characteristic information of the alarm event, and the characteristic information of the abnormal system;
  • the second vector and the third vector corresponding to the abnormal event are combined horizontally to obtain the feature vector corresponding to the first abnormal event.
  • the characteristic information of the abnormal indicator extracted by the electronic device includes the product identification, scene identification, indicator type identification and abnormal type corresponding to the abnormal indicator.
  • the setting hierarchy can be [product][scenario][setting index type][abnormal type]; among them, the scene is also called function, for example, transfer, repayment, deposit and loan, etc.; setting index types include: business transaction traffic, service success rate, and delay; abnormal types include sudden increase and sudden drop.
  • the electronic device is based on the first number of product types corresponding to the first abnormal event, the second number of scenarios included in each product type, the third number of set index types corresponding to each scene, and the first number of abnormal types.
  • the first abnormal event comes from product A and product B.
  • Product A includes scenario a and scenario aa; product B includes scenario b and scenario bb; scenario a, scenario aa, scenario b, and scenario bb include four set indicators respectively.
  • the feature information of the first abnormal event represents the system success rate corresponding to product A’s scenario a and triggers an alarm
  • the feature information of the system success rate corresponding to product A is one-hot (one-hot) according to the set hierarchy structure.
  • the first vector is [1,0,1,0,0,0,0,1,0,0,0,0,0,0,0] , or [1,0,1,0,0,0,0,0,1,0,0,0,0,0,0].
  • the first two digits in the first vector represent product A; the third to sixth digits of the first vector represent scene a; the last eight digits of the first vector are "1,0,0,0,0,0 ,0" indicates a sudden increase in the system success rate, and "0,1,0,0,0,0,0" indicates a sudden decrease in the system success rate.
  • the first vectors corresponding to the first abnormal event are determined, the first vectors are summed bit by bit to obtain the corresponding second vectors.
  • the characteristic information of the interruption event indicates whether there is message loss;
  • the third vector corresponding to the characteristic information of the interruption event is represented by [0] or [1].
  • the characteristic information of the abnormal system indicates whether there is a subsystem with the highest time-consuming, or whether it corresponds to the deepest invoked subsystem with a failure log; the third vector corresponding to the characteristic information of the abnormal system is [0] or [1]. It should be noted that the location of the abnormal subsystem plays a crucial role in determining the final root cause.
  • the feature information of the change operation indicates whether the record of the change operation is for the determined abnormal subsystem; the third vector corresponding to the feature information of the change operation is [0] or [1].
  • the abnormal subsystem may be the real root cause of the set abnormal event.
  • Alarm events include middleware alarm events and network alarm events.
  • the third vector corresponding to the feature information of the alarm event is [0] or [1].
  • the feature information of the middleware alarm event indicates whether there is a middleware alarm event of a set level related to the abnormal subsystem
  • the feature information of the network alarm event indicates whether there is a network alarm event of a set level related to the abnormal subsystem.
  • the anomaly detection model is composed of deep neural networks (DNN, Deep Neural Networks), and the anomaly detection model is trained based on the first data corresponding to at least two set abnormal events.
  • the first data corresponding to the set abnormal event includes a feature vector, and a calibration probability corresponding to each set abnormal type among the multiple set abnormal types corresponding to each set abnormal event.
  • setting an abnormal event represents an abnormal event monitored during the running of the software system.
  • the feature vector corresponding to the set abnormal event is determined based on feature information extracted from at least one of historical logs, historical alarm information, and version release records.
  • the method for determining the feature vector corresponding to the set abnormal event is similar to the method for determining the feature vector corresponding to the first abnormal event above.
  • the first data corresponding to the abnormal configuration events may further include weight values corresponding to the abnormal configuration events.
  • the weight value corresponding to the first abnormal event is 1 by default. In practical application, considering that the setting abnormal event occurs later, it has greater reference significance for root cause location. Therefore, the later the setting abnormal event occurs, the greater the weight corresponding to the setting abnormal event .
  • Step 102 Based on the predicted probability corresponding to each set abnormal type corresponding to the first abnormal event, and based on the confidence corresponding to each of the at least two candidate root causes corresponding to the first abnormal event, A first score corresponding to each candidate root cause in each set abnormal type is determined.
  • the electronic device determines the confidence corresponding to each of the at least two candidate root causes corresponding to the first abnormal event based on the set root cause set, and based on the predicted probability corresponding to each set abnormal type corresponding to the first abnormal event , and based on the determined confidence levels corresponding to the candidate root causes, determine the first score corresponding to each candidate root cause in each set abnormality type.
  • the electronic device may determine the product of the prediction probability corresponding to the set abnormal type and the confidence of the candidate root cause as the first score corresponding to the corresponding candidate root cause in the corresponding set abnormal type. Confidence and predicted probability are represented by any value between 0 and 1.
  • the set root cause set includes a first correspondence between a set abnormal event and a set root cause, and a second set correspondence between a set root cause and a confidence degree.
  • the method for determining the confidence level corresponding to each of the at least two candidate root causes corresponding to the first abnormal event includes:
  • the confidence level corresponding to each of the at least two candidate root causes corresponding to the first abnormal event Based on the second set correspondence between the set root cause and the confidence level, determine the confidence level corresponding to each of the at least two candidate root causes corresponding to the first abnormal event; wherein, the second A setting corresponding relationship and the second setting relationship are determined based on at least one of historical logs, historical alarm information, and historical version release records.
  • the electronic device determines the first set correspondence between the set abnormal event and the set root cause from the set root cause set, and determines the second set correspondence between the set root cause and the confidence level relationship; based on the determined first set corresponding relationship, determine at least two candidate root causes corresponding to the first abnormal event; based on the determined second set corresponding relationship, determine at least two candidate root causes corresponding to the first abnormal event Confidence for each of the candidate root causes.
  • the confidence level is a value greater than or equal to 0 and less than or equal to 1.
  • the first setting correspondence and the second setting relationship are determined based on at least one of historical logs, historical warning information, and version release records, and are stored in the electronic device.
  • the electronic device determines the set root cause, the abnormal type corresponding to the set root cause, and the set confidence level corresponding to the set root cause based on at least one of historical logs, historical alarm information, and version release records; Based on the set root cause and the abnormal type corresponding to the set root cause, a first set corresponding relationship is established; based on the set root cause and the set confidence degree corresponding to the set root cause, a second set corresponding relationship is established. in,
  • the electronic device determines the first setting root cause based on the error log in the history log, it determines that the setting exception type corresponding to the first setting root cause is an internal program exception.
  • the confidence level of the first set root cause is calculated based on the following formula: x represents the number of times the error log appears when the system is in an abnormal state, and the value of S(x) is greater than 0.5 and less than or equal to 1. When x is larger, the confidence S(x) is larger.
  • the electronic device determines the second set root cause based on the alarm event in the historical alarm information, it determines the set exception type corresponding to the second set root cause based on the alarm category to which the alarm event belongs. Based on the set alarm level corresponding to the alarm event, the number of alarms triggered by the alarm event, the total number of historical alarms and the average number of alarms per day, the confidence level corresponding to the second set root cause is determined.
  • the middleware alarm event represents an abnormal database
  • the database abnormality is the second set root cause
  • the set abnormal type corresponding to the second set root cause is Middleware abnormality
  • the network alarm event represents an abnormal network device
  • the network device abnormality is the second set root cause
  • the set abnormality corresponding to the second set root cause is application host abnormality
  • confidence C MAX(h, f)/g.
  • h represents the set alarm level corresponding to the alarm event corresponding to the second set root cause, and the set alarm level is represented by a value between 0 and 1;
  • g represents the average number of alarms per day, and g is determined based on the total number of historical alarms within a set period of time (for example, one month).
  • the electronic device determines a third setting root cause based on the abnormal subsystem and version release records; the third root cause represents a version change, and the setting exception type corresponding to the third setting root cause is application version release.
  • the payment scenario of a certain product will go through 5 subsystems.
  • the failure logs or high time-consuming subsystems
  • C is an abnormal subsystem
  • the first version release record within the set duration before the output time is determined in the release record, and the confidence corresponding to the third set root cause is determined based on the first version release record and the identification of the abnormal subsystem.
  • the determined third root causes include three: subsystem C version change, subsystem D version change and Subsystem E version changes.
  • the determining the first score corresponding to each candidate root cause in each set abnormal type includes:
  • each candidate root cause is The corresponding first score in each set exception type.
  • w3 represents the third set weight, and the sum among w1, w2 and w3 is 2.
  • the method further includes:
  • the first score corresponding to the determined candidate root cause is set in the abnormal type.
  • the electronic device obtains corresponding logs, warning information, and version release records based on the occurrence time of the first abnormal event; uses a set rule engine to analyze at least one of the obtained logs, alarm information, and version release records , to obtain the first abnormal type corresponding to the first abnormal event and the first candidate root cause corresponding to the first abnormal event.
  • the set rule engine is a json-based rule engine.
  • the json format of the rule engine rule set is as follows:
  • Judgment condition composed of multiple sub-conditions whose relationship is &(and);
  • sub-condition name also known as the parsed function name
  • Name The name of the action, also known as the name of the executed function
  • Action The specific execution action content, which is input into the name of then as a parameter.
  • the business success rate may decrease due to insufficient user balance.
  • the electronic device determines through the set rule engine that the log includes information representing insufficient balance and abnormal success rate, it determines normal business failure as the first abnormal type corresponding to the first abnormal event, and determines insufficient balance as the first abnormal event Corresponding to the first candidate root cause, at this time, the prediction probability corresponding to normal business failure is 1, and the confidence corresponding to insufficient balance is set to 1.
  • the electronic device when it obtains the corresponding logs, warning information, and version release records based on the occurrence time of the first abnormal event, it can also determine the characteristic information corresponding to the first abnormal event, and adopt the setting
  • the rule engine analyzes the characteristic information corresponding to the first abnormal event, and obtains the first abnormal type corresponding to the first abnormal event and the first candidate root cause corresponding to the first abnormal event.
  • the method for determining the characteristic information corresponding to the first abnormal event is similar to the method for determining the characteristic information for setting the abnormal event above, and will not be repeated here.
  • the first abnormal type corresponding to the first abnormal event is determined to be normal through the set rule engine Business failure; when the feature information corresponding to the first abnormal event also represents insufficient balance, determine the insufficient balance as the first candidate root cause corresponding to the first abnormal event.
  • the configured rule engine determines that the database abnormality is the first candidate root cause corresponding to the first abnormal event.
  • the electronic device can adjust the above determined Set at least one of the predicted probability corresponding to the abnormal type, the confidence level corresponding to the candidate root cause, and the first score corresponding to the candidate root cause. It should be noted that, after calculating the first score, the electronic device may adjust and set at least one of the prediction probability corresponding to the abnormal type and the confidence level corresponding to the candidate root cause based on the first abnormal type and the first candidate root cause , and calculate a new first score; the electronic device may also adjust the prediction probability corresponding to the abnormal type and the confidence level corresponding to the candidate root cause based on the first anomaly type and the first candidate root cause before calculating the first score.
  • At least one item is selected, and the first score is recalculated based on the adjusted predicted probability and/or confidence.
  • the target root cause is determined based on the new first score, thus, compared with the root cause location method that directly adjusts the target root cause through the first candidate root cause, it is possible to determine the The target root cause is more accurate.
  • the adjustment of the predicted probability corresponding to the set abnormal type corresponding to the first abnormal event and the confidence of the candidate root cause corresponding to the first abnormal event degrees including:
  • the predicted probability of the first abnormal type is less than the first set probability; at least two candidate root causes corresponding to the first abnormal event do not include the first candidate root cause, or the first candidate root cause
  • the confidence degree of the cause is less than the first set confidence level; the first set probability indicates that the first abnormal event belongs to the first abnormal type; the first set confidence level indicates that the first abnormal event currently exists A candidate root cause.
  • the predicted probability of the first abnormal type among the multiple set abnormal types corresponding to the first abnormal event is smaller than the first set probability, the predicted probability of the first abnormal type is adjusted to the first set probability.
  • the first candidate root cause is not included in the at least two candidate root causes corresponding to the first abnormal event, or, the confidence degree of the first candidate root cause among the at least two candidate root causes corresponding to the first abnormal event If it is less than the first set confidence level, adjust the confidence level of the first candidate root cause to the first set confidence level.
  • the predicted probability corresponding to abnormal category 1 is 1, and when the predicted probability corresponding to abnormal category 1 is determined to be less than 1 through the abnormal detection model , adjust the predicted probability corresponding to anomaly category 1 to 1.
  • the first set probability is 1.
  • the confidence level corresponding to root cause 1 is 1; when at least two candidate root causes corresponding to the first abnormal event do not include the root cause When the cause is 1, determine the root cause 1 as the candidate root cause corresponding to the first abnormal event; The corresponding confidence level is adjusted to 1. At this time, the first set confidence level is 1.
  • the first score corresponding to the determined candidate root cause in the set exception type is adjusted, including:
  • the first score corresponding to the first candidate root cause in the first anomaly type is less than the set score, the first score corresponding to the first candidate root cause in the first anomaly type adjusted to be greater than or equal to the set score; the set score is used to screen for the target root cause.
  • the electronic device searches for the first score corresponding to the first candidate root cause in the first anomaly type, and if the found first score is less than the set score, the first candidate The first score corresponding to the root cause in the first abnormal type is adjusted to be greater than or equal to the set score.
  • the set score when the electronic device determines the candidate root cause with the highest score as the target root cause, the set score may be 1; when the electronic device determines the candidate root cause corresponding to the first score greater than the set threshold as the target root cause In the case of , set the score equal to the set threshold. Since the first anomaly type and the first candidate root cause are determined by using the set rule engine, it means that the first anomaly type and the first candidate root cause must exist at present, therefore, the electronic device puts the first candidate root cause in the first The corresponding first score in the abnormal type is adjusted to be greater than or equal to the set score, so that the first candidate root cause can be determined as one of the target root causes corresponding to the first abnormal event.
  • the set rule engine is used to determine that root cause 1 and abnormal category 1 must exist at present, when it is calculated that the first score corresponding to root cause 1 in abnormal category 1 is not equal to 1, the root The first score corresponding to 1 in abnormal category 1 is adjusted to 1.
  • the electronic device can also analyze the candidate root cause corresponding to the first abnormal event through the set rule engine, and adjust the first score according to the analysis result, so that the Targeting the root cause is more accurate. For example, when it is determined by the set rule engine that the candidate root cause corresponding to the first abnormal event includes database abnormality, the first score corresponding to the database abnormality is increased.
  • the adjustment determines the first score corresponding to the candidate root cause in the set abnormal type, including:
  • the electronic device calculates the second score corresponding to the first candidate root cause in the first anomaly type based on the first set probability and the first set confidence level; the determined first score includes the first candidate root cause In the case of the first score corresponding to the first anomaly type, update the first score corresponding to the first candidate root cause in the first anomaly type to the second score; the determined first score does not include the first score In the case of the first score corresponding to a candidate root cause in the first anomaly type, the second score corresponding to the first candidate root cause in the first anomaly type is determined as the first candidate root cause in the first anomaly type corresponding to the first score.
  • the target root cause determined based on the adjusted first score can be made more accurate.
  • the method for calculating the second score is similar to the method for calculating the first score, and will not be repeated here.
  • the first anomaly type and the first candidate root cause determined by the set rule engine are used to adjust the prediction probability corresponding to the set anomaly type, the confidence level corresponding to the candidate root cause corresponding to the first anomaly event, and the candidate root cause. At least one of the first scores corresponding to the root cause, the target root cause is determined by the adjusted first score, and the target root cause corresponding to the first abnormal event is not directly adjusted, thereby improving the determined target root cause the accuracy.
  • Step 103 Determine the target root cause corresponding to the first abnormal event based on the first score corresponding to the determined candidate root cause in each set abnormal type.
  • the electronic device may determine the candidate root cause corresponding to the highest first score as the target root cause corresponding to the first abnormal event, or determine the candidate root cause corresponding to the first score greater than a set threshold as the first The target root cause corresponding to the abnormal event.
  • the electronic device may also sort the first scores of the determined candidate root causes corresponding to each set abnormal type, and determine the target root cause corresponding to the first abnormal event based on the sorted first scores. because.
  • the prediction probability corresponding to each of the various types of abnormal settings corresponding to the first abnormal event is predicted through the abnormal detection model; Prediction probability, and based on the confidence corresponding to each candidate root cause in the at least two candidate root causes corresponding to the first abnormal event, determine the first score corresponding to each candidate root cause in each set abnormal type; based on The first score corresponding to the determined candidate root cause in each set abnormal type determines the target root cause corresponding to the first abnormal event; thus, the target root cause can be accurately located and the determined target root cause can be improved. root cause accuracy.
  • Fig. 2 is the schematic diagram of the root cause localization method provided by the application embodiment of the present invention, as shown in Fig. 2, the root cause localization method comprises:
  • Step 201 Input the feature vector corresponding to the first abnormal event into the abnormality detection model, and obtain the predicted probability corresponding to each type of abnormality in the settings corresponding to the first abnormal event.
  • Step 202 Based on the predicted probability corresponding to each set abnormal type corresponding to the first abnormal event, and based on the confidence corresponding to each of the at least two candidate root causes corresponding to the first abnormal event, A first score corresponding to each candidate root cause in each set abnormal type is determined.
  • Step 203 Using a set rule engine to analyze at least one of logs, alarm information, and version release records to obtain a first anomaly type and a first candidate root cause corresponding to the first anomaly event.
  • Step 204 Based on the determined first abnormality type and first candidate root cause, adjust at least one of the following:
  • the first score corresponding to the determined candidate root cause is set in the abnormal type.
  • Step 205 Determine the target root cause corresponding to the first abnormal event based on the first score corresponding to the determined candidate root cause in each set abnormal type.
  • another embodiment of the present application also provides an electronic device, as shown in Figure 3, the electronic device includes:
  • the predicting unit 31 is configured to input the feature vector corresponding to the first abnormal event into the abnormal detection model, and obtain the predicted probability corresponding to each of the various types of abnormal settings corresponding to the first abnormal event;
  • the first determining unit 32 is configured to be based on the predicted probability corresponding to each set abnormal type corresponding to the first abnormal event, and based on each of the at least two candidate root causes corresponding to the first abnormal event Corresponding confidence, determine the corresponding first score of each candidate root cause in each set abnormal type;
  • the second determining unit 33 is configured to determine the target root cause corresponding to the first abnormal event based on the first score corresponding to the determined candidate root cause in each set abnormal type.
  • the first determining unit 32 is configured to:
  • each candidate root cause is The corresponding first score in each set exception type.
  • the electronic device also includes:
  • the third determination unit is configured to determine at least two candidate root causes corresponding to the first abnormal event based on the first set correspondence between the set abnormal event and the set root cause;
  • the fourth determination unit is configured to determine the confidence corresponding to each of the at least two candidate root causes corresponding to the first abnormal event based on the second set correspondence between the set root cause and the confidence degree degree; wherein, the first setting correspondence and the second setting relationship are determined based on at least one of historical logs, historical warning information, and version release records.
  • the electronic device also includes:
  • the analysis unit is configured to use a set rule engine to analyze at least one of logs, alarm information, and version release records, to obtain a first abnormal type and a first candidate root cause corresponding to the first abnormal event;
  • An adjustment unit configured to adjust at least one of the following based on the determined first abnormality type and the first candidate root cause:
  • the first score corresponding to the determined candidate root cause is set in the abnormal type.
  • the adjustment unit is configured to:
  • the predicted probability of the first abnormal type is less than the first set probability; at least two candidate root causes corresponding to the first abnormal event do not include the first candidate root cause, or the first candidate root cause
  • the confidence degree of the cause is less than the first set confidence level; the first set probability indicates that the first abnormal event belongs to the first abnormal type; the first set confidence level indicates that the first abnormal event currently exists A candidate root cause.
  • the adjustment unit is configured to:
  • the adjustment unit is configured to:
  • the first score corresponding to the first candidate root cause in the first anomaly type is less than the set score, the first score corresponding to the first candidate root cause in the first anomaly type adjusted to be greater than or equal to the set score; the set score is used to screen for the target root cause.
  • the above-mentioned units can be controlled by a processor in an electronic device, such as a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Signal Processor), a micro control unit (MCU, Microcontroller Unit) or Programmable gate array (FPGA, Field-Programmable Gate Array) and other implementations.
  • a processor in an electronic device, such as a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Signal Processor), a micro control unit (MCU, Microcontroller Unit) or Programmable gate array (FPGA, Field-Programmable Gate Array) and other implementations.
  • CPU central processing unit
  • DSP Digital Signal Processor
  • MCU Microcontroller Unit
  • FPGA Field-Programmable Gate Array
  • the electronic device provided in the above embodiment performs root cause location
  • the division of the above-mentioned program modules is used as an example for illustration.
  • the above-mentioned processing can be assigned to different program modules according to needs. That is, the internal structure of the device is divided into different program modules to complete all or part of the processing described above.
  • the electronic device provided in the above embodiment and the root cause location method embodiment belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.
  • FIG. 4 is a schematic diagram of the hardware composition structure of the electronic device of the embodiment of the present application. As shown in FIG. 4, the electronic device 4 includes:
  • Communication interface 41 capable of exchanging information with other devices such as network devices;
  • the processor 42 is connected to the communication interface 41 to realize information interaction with other devices, and is configured to execute the root cause location method provided by one or more technical solutions of the above-mentioned electronic device when running the computer program. Instead, the computer program is stored on the memory 43 .
  • bus system 44 is configured to enable connection communication between these components.
  • bus system 44 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as bus system 44 in FIG. 4 .
  • the memory 43 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device 4 .
  • Examples of such data include: any computer program for operating on the electronic device 4 .
  • the memory 43 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memories.
  • the non-volatile memory can be read-only memory (ROM, Read Only Memory), programmable read-only memory (PROM, Programmable Read-Only Memory), erasable programmable read-only memory (EPROM, Erasable Programmable Read-Only Memory) Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), Magnetic Random Access Memory (FRAM, ferromagnetic random access memory), Flash Memory (Flash Memory), Magnetic Surface Memory , CD, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface storage can be disk storage or tape storage.
  • the volatile memory may be random access memory (RAM, Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • RAM Random Access Memory
  • many forms of RAM are available, such as Static Random Access Memory (SRAM, Static Random Access Memory), Synchronous Static Random Access Memory (SSRAM, Synchronous Static Random Access Memory), Dynamic Random Access Memory Memory (DRAM, Dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, Synchronous Dynamic Random Access Memory), double data rate synchronous dynamic random access memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), Synchronous Link Dynamic Random Access Memory (SLDRAM, Sync Link Dynamic Random Access Memory), Direct Memory Bus Random Access Memory (DRRAM, Direct Rambus Random Access Memory) Memory).
  • SRAM Static Random Access Memory
  • SSRAM Synchronous Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • SDRAM Synchronous Dynamic Random Access Memory
  • the methods disclosed in the foregoing embodiments of the present application may be applied to the processor 42 or implemented by the processor 42 .
  • the processor 42 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 42 or instructions in the form of software.
  • the aforementioned processor 42 may be a general-purpose processor, DSP, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like.
  • the processor 42 may implement or execute various methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a storage medium, and the storage medium is located in the memory 43, and the processor 42 reads the program in the memory 43, and completes the steps of the foregoing method in combination with its hardware.
  • the processor 42 executes the program, it implements a corresponding process implemented by the terminal in each method of the embodiment of the present application. For the sake of brevity, details are not repeated here.
  • the embodiment of the present application also provides a storage medium, that is, a computer storage medium, specifically a computer-readable storage medium, for example, including a first memory 43 storing a computer program, and the above-mentioned computer program can be processed by the terminal
  • the device 42 is executed to complete the steps described in the foregoing method.
  • the computer-readable storage medium can be memories such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM.
  • the disclosed device, terminal and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration
  • the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
  • the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions for Make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: various media capable of storing program codes such as removable storage devices, ROM, RAM, magnetic disks or optical disks.
  • the term "and/or" in the embodiments of the present application is only an association relationship describing associated objects, which means that there may be three kinds of relationships, for example, A and/or B, which may mean that A exists alone , both A and B exist, and B exists alone.
  • the term "at least one" herein means any combination of any one or more of at least two of a plurality, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Debugging And Monitoring (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

一种根因定位方法、电子设备及存储介质,根因定位方法包括:将第一异常事件对应的特征向量输入至异常检测模型,得到所述第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率(101);基于所述第一异常事件对应的每种设定异常类型对应的预测概率,以及基于所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分(102);基于确定出的候选根因在每个设定异常类型中对应的第一得分,确定出所述第一异常事件对应的目标根因(103)。

Description

根因定位方法、电子设备及存储介质
相关申请的交叉引用
本申请基于申请号为202110517260.6,申请日为2021年5月12日的中国专利申请提出,并要求上述中国专利申请的优先权,上述中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机技术领域,尤其涉及一种根因定位方法、电子设备及存储介质。
背景技术
随着计算机技术的发展,越来越多的技术(例如,大数据等)应用在金融领域,传统金融业正在逐步向金融科技转变,然而,由于金融行业的安全性、实时性要求,金融科技也对技术提出了更高的要求。金融科技领域下,在系统发生异常事件,且存在多个可能引起该异常事件的候选根因时,相关人员通常基于故障处理经验对多个候选根因进行逐个排查,以确定出导致该异常事件的目标根因,但根据故障处理经验确定出的目标根因可能不准确。
发明内容
为解决相关技术问题,本申请实施例提供了一种根因定位方法、电子设备及存储介质,以解决相关技术中无法对导致异常事件的根因进行准确定位的技术问题。
本申请实施例提供了一种根因定位方法,包括:
将第一异常事件对应的特征向量输入至异常检测模型,得到所述第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率;
基于所述第一异常事件对应的每种设定异常类型对应的预测概率,以及基于所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分;
基于确定出的候选根因在每个设定异常类型中对应的第一得分,确定出所述第一异常事件对应的目标根因。
上述方案中,所述确定出每个候选根因在每个设定异常类型中对应的 第一得分,包括:
基于预测概率对应的第一设定权重、置信度对应的第二设定权重、每种设定异常类型对应的预测概率以及每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分。
上述方案中,所述方法还包括:
基于设定异常事件与设定根因之间的第一设定对应关系,确定出所述第一异常事件对应的至少两个候选根因;
基于设定根因与置信度之间的第二设定对应关系,确定出所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度;其中,所述第一设定对应关系和所述第二设定关系基于历史日志、历史告警信息和版本发布记录中的至少之一确定出。
上述方案中,所述方法还包括:
采用设定的规则引擎对日志、告警信息和版本发布记录中的至少之一进行分析,得到所述第一异常事件对应的第一异常类型和第一候选根因;
基于确定出的第一异常类型和第一候选根因,调整以下至少一项:
所述第一异常事件对应的设定异常类型对应的预测概率;
所述第一异常事件对应的候选根因的置信度;
确定出的候选根因在设定异常类型中对应的第一得分。
上述方案中,所述调整所述第一异常事件对应的设定异常类型对应的预测概率,以及所述第一异常事件对应的候选根因的置信度,包括:
将所述第一异常类型的预测概率调整为第一设定概率,以及将所述第一候选根因的置信度调整为第一设定置信度;其中,
所述第一异常类型的预测概率小于所述第一设定概率;所述第一异常事件对应的至少两个候选根因中不包括所述第一候选根因,或者所述第一候选根因的置信度小于所述第一设定置信度;所述第一设定概率表征所述第一异常事件属于所述第一异常类型;所述第一设定置信度表征当前存在所述第一候选根因。
上述方案中,所述调整确定出的候选根因在设定异常类型中对应的第一得分,包括:
基于所述第一设定概率和所述第一设定置信度,计算出第一候选根因在所述第一异常类型中对应的第二得分,基于所述第二得分更新确定出的候选根因在设定异常类型中对应的第一得分。
上述方案中,所述调整确定出的候选根因在设定异常类型中对应的第一得分,包括:
在所述第一候选根因在所述第一异常类型中对应的第一得分小于设定得分的情况下,将所述第一候选根因在所述第一异常类型中对应的第一得分调整为大于或等于所述设定得分;所述设定得分用于筛选目标根因。
本申请实施例还提供了一种电子设备,包括:
预测单元,配置为将第一异常事件对应的特征向量输入至异常检测模型,得到所述第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率;
第一确定单元,配置为基于所述第一异常事件对应的每种设定异常类型对应的预测概率,以及基于所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分;
第二确定单元,配置为基于确定出的候选根因在每个设定异常类型中对应的第一得分,确定出所述第一异常事件对应的目标根因。
本申请实施例还提供了一种电子设备,包括:处理器和配置为存储能够在处理器上运行的计算机程序的存储器,
其中,所述处理器配置为运行所述计算机程序时,执行上述任一种根因定位方法的步骤。
本申请实施例还提供了一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一种根因定位方法的步骤。
本申请实施例,通过训练完成的异常检测模型预测出第一异常事件属于每种设定异常类型的预测概率;基于设定异常类型的预测概率和第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,计算出每个候选根因在每种设定异常类型中的第一得分,进而基于第一得分确定出第一异常事件对应的目标根因,由此可以准确地确定出目标根因,可以对导致第一异常事件的目标根因进行准确定位。
附图说明
图1为本发明申请实施例提供的根因定位方法的实现流程示意图;
图2为本发明申请应用实施例提供的根因定位方法的示意图;
图3为本发明申请实施例提供的电子设备的结构示意图;
图4为本发明申请实施例提供的电子设备的硬件组成结构示意图。
具体实施方式
以下结合说明书附图及具体实施例对本发明的技术方案做进一步的详细阐述。
图1为本申请实施例提供的训练异常检测模型的方法的实现流程示意图,其中,流程的执行主体为终端、服务器等电子设备。如图1示出的,根因定位方法包括:
步骤101:将第一异常事件对应的特征向量输入至异常检测模型,得到所述第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率。
这里,电子设备在获取到第一异常事件的相关数据的情况下,从第一异常事件的相关数据中,提取出第一异常事件的特征信息,并确定出第一异常事件的特征信息对应的特征向量,得到第一异常事件对应的特征向量;将第一异常事件对应的特征向量输入至训练完毕的异常检测模型,得到异常检测模型输出的第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率。其中,第一异常事件的相关数据包括发生第一异常事件的情况下输出的日志、告警信息以及在发生第一异常事件之前的设定时长内的版本发布记录。
实际应用时,第一异常事件的特征信息包括以下至少一种:异常指标的特征信息、发生中断事件的特征信息、变更操作的特征信息、告警事件的特征信息和异常系统的特征信息等。其中,基于版本发布记录可以确定出第一异常事件对应的外部因素的特征信息,例如,变更操作的特征信息;基于日志和告警信息中的至少一项,可以确定出第一异常事件对应的内部因素的特征信息,例如,异常指标的特征信息、发生中断事件的特征信息和异常系统的特征信息等;由此丰富了第一异常事件对应的特征信息,可以提高异常检测模型输出的预测概率的准确度。
其中,当第一异常事件的特征信息包括至少两种时,按照以下方法确定出第一异常事件的特征信息对应的特征向量:
在确定出第一异常事件对应的至少两种特征信息中每种特征信息对应的特征向量的情况下,对确定出的特征信息对应的特征向量进行融合,得到设定异常事件对应的特征向量。由此,可以对特征向量进行降维,提高异常检测模型的数据处理效率。实际应用时,对特征向量进行融合是指将特征向量进行合并。
在一些实施例中,对确定出的特征信息对应的特征向量进行融合,包括:
将第一异常事件对应的每个异常指标的特征信息,转换成对应的第一向量;
对第一异常事件对应的所有第一向量进行求和,得到第二向量;
将第二向量和第三向量进行横向合并,得到第一异常事件对应的特征向量;其中,第三向量表征除异常指标的特征信息之外的特征信息对应的向量。
这里,异常指标是指触发告警的设定指标;触发告警的设定指标基于日志或告警信息确定出。在实际应用时,设定指标包括以下至少之一:业务交易量、业务成功率和时延等。
在异常指标的数量为至少两个的情况下,电子设备按照设定层次结构,将第一异常事件对应的每个异常指标的特征信息转换成对应的第一向量,并对第一异常事件对应的所有第一向量进行求和,得到第二向量,从而得到所有异常指标对应的特征向量。需要说明的是,在第一异常事件对应的 异常指标只有一个的情况下,第一向量等于第二向量。
电子设备基于发生中断事件的特征信息、变更操作的特征信息、告警事件的特征信息和异常系统的特征信息中的至少之一,确定出第一异常事件对应的第三向量;将第一设定异常事件对应第二向量和第三向量进行横向合并,得到第一异常事件对应的特征向量。
实际应用时,电子设备提取出的异常指标的特征信息中包括异常指标对应的产品标识、场景标识、指标类型标识和异常类型。设定层次结构可以为[产品][场景][设定指标类型][异常类型];其中,场景也称功能,例如,转账、还款、存款以及贷款等;设定指标类型包括:业务交易量、业务成功率和时延;异常类型包括突增和突降。
实际应用时,电子设备基于第一异常事件对应的产品类型的第一数量、每个产品类型包括的场景的第二数量、每个场景对应的设定指标类型的第三数量和异常类型的第四数量,确定出第一向量的位数。其中,第一向量的位数=第一数量+第一数量×第二数量+第三数量×第四数量。
例如,第一异常事件来自产品A和产品B,产品A包括场景a和场景aa;产品B包括场景b和场景bb;场景a、场景aa、场景b和场景bb,分别包括4种设定指标类型:当前成功率、系统成功率、交易量和时延;那么,每个异常指标对应的第一向量的位数为:2+2×2+4×2=14。
比如,在第一异常事件的特性信息表征产品A的场景a对应的系统成功率触发告警的情况下,按照设定层次结构对产品A对应的系统成功率的特征信息进行独热(one-hot)编码,得到产品A的系统成功率对应的第一向量,该第一向量为[1,0,1,0,0,0,1,0,0,0,0,0,0,0],或者[1,0,1,0,0,0,0,1,0,0,0,0,0,0]。其中,该第一向量中的前两位表征产品A;该第一向量的第3位至第六位表示场景a;该第一向量后8位“1,0,0,0,0,0,0,0”表示系统成功率突增,“0,1,0,0,0,0,0,0”表示系统成功率突降。
需要说明的是,在确定出第一异常事件对应的所有第一向量的情况下,对第一向量按位进行求和,得到对应的第二向量。
其中,发生中断事件的特征信息表征是否存在消息丢失;发生中断事件的特征信息对应的第三向量为[0]或[1]表征。当存在消息丢失时,表征发生了中断事件,内部功能调用未出现问题。
异常系统的特征信息表征是否存在耗时最高的子系统,或者表征是否对应有失败日志的最深被调用的子系统;异常系统的特征信息对应的第三向量为[0]或[1]。需要说明的是,异常子系统的定位对最终的根因判定具有至关重要的作用。
变更操作的特征信息表征变更操作记录是否针对确定出的异常子系统;变更操作的特征信息对应的第三向量为[0]或[1]。当变更操作的特征信息对应的第三向量表征变更操作记录针对确定出的异常子系统时,表征异常子系统有可能是导致设定异常事件的真正根因。
告警事件包括中间件告警事件和网络告警事件。告警事件的特征信息对应的第三向量为[0]或[1]。其中,中间件告警事件的特征信息表征是否存在与异常子系统相关的设定级别的中间件告警事件;网络告警事件的特征信息表征是否存在与异常子系统相关的设定级别的网络告警事件。
需要说明的是,当存在与异常子系统相关的设定级别的中间件告警事件时,可导致时延上升或成功率下降;当存在与异常子系统相关的设定级别的网络告警事件时,可导致多个设定指标出现异常。
需要说明的是,异常检测模型由深度神经网络(DNN,Deep Neural Networks)构成,异常检测模型基于至少两个设定异常事件对应的第一数据训练得到。设定异常事件对应的第一数据包括特征向量,以及每个设定异常事件对应的多种设定异常类型中每种设定异常类型对应的标定概率。其中,设定异常事件表征在运行软件系统的过程中监测的异常事件。设定异常事件对应的特征向量基于从历史日志、历史告警信息和版本发布记录中的至少一项中提取得到的特征信息确定出。确定设定异常事件对应的特征向量的方法与上文中确定第一异常事件对应的特征向量的方法类似。
其中,当至少两个设定异常事件对应同一产品,或者至少两个设定异常事件具有相同的特征信息时,设定异常事件对应的第一数据还可以包括设定异常事件对应的权重值。对应地,第一异常事件对应的权重值默认为1。实际应用时,考虑到发生时间离越晚的设定异常事件,对根因定位具有更大的参考意义,因此,设定异常事件的发生时间越晚,该设定异常事件对应的权重越大。
步骤102:基于所述第一异常事件对应的每种设定异常类型对应的预测概率,以及基于所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分。
电子设备基于设定根因集合确定出第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,基于第一异常事件对应的每种设定异常类型对应的预测概率,以及基于确定出的候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分。实际应用时,电子设备可以将设定异常类型对应的预测概率与候选根因的置信度之间的乘积,确定为对应的候选根因在对应的设定异常类型中对应的第一得分。置信度和预测概率均采用0到1之间的任一数值表示。
其中,设定根因集合中包括设定异常事件与设定根因之间的第一对应关系,以及设定根因与置信度之间的第二设定对应关系。
为了更快速且准确地确定出候选根因对应的置信度,在实际应用中,确定出第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度的方法包括:
基于设定异常事件与设定根因之间的第一设定对应关系,确定出所述 第一异常事件对应的至少两个候选根因;
基于设定根因与置信度之间的第二设定对应关系,确定出所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度;其中,所述第一设定对应关系和所述第二设定关系基于历史日志、历史告警信息和历史版本发布记录中的至少之一确定出。
这里,电子设备从设定根因集合中确定出设定异常事件与设定根因之间的第一设定对应关系,以及确定出设定根因与置信度之间的第二设定对应关系;基于确定出的第一设定对应关系,确定出第一异常事件对应的至少两个候选根因;基于确定出的第二设定对应关系,确定出第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度。其中,置信度为大于或等于0,且小于或等于1的数值。
其中,第一设定对应关系和第二设定关系基于历史日志、历史告警信息和版本发布记录中的至少之一确定出,并存储在电子设备中。
实际应用时,电子设备基于历史日志、历史告警信息和版本发布记录中的至少之一,确定出设定根因、设定根因对应的异常类型和设定根因对应的设定置信度;基于设定根因和设定根因对应的异常类型,建立第一设定对应关系;基于设定根因和设定根因对应的设定置信度,建立第二设定对应关系。其中,
电子设备基于历史日志中的错误日志确定出第一设定根因时,将第一设定根因对应的设定异常类型确定为内部程序异常。基于以下公式计算出第一设定根因的置信度:
Figure PCTCN2021127331-appb-000001
x表征在系统处于异常状态时错误日志出现的次数,S(x)的值大于0.5,且小于或等于1。当x越大,置信度S(x)越大。
电子设备基于历史告警信息中的告警事件确定出第二设定根因时,基于告警事件所属的告警类别确定出第二设定根因对应的设定异常类型。基于告警事件对应的设定告警级别、告警事件引发告警的次数、历史告警总次数和平均每天的告警次数,确定出第二设定根因对应的置信度。
例如,在检测到历史告警信息中包括中间件告警事件,且中间件告警事件表征数据库异常时,确定出数据库异常为第二设定根因,第二设定根因对应的设定异常类型为中间件异常;在检测到历史告警信息中包括网络告警事件,且网络告警事件表征网络设备异常时,确定出网络设备异常为第二设定根因,第二设定根因对应的设定异常类型为网络异常;在检测到历史告警信息表征主机CPU告警时,确定出主机CPU异常为第二设定根因,第二设定根因对应的设定异常类型为应用主机异常。
实际应用时,置信度C=MAX(h,f)/g。h表征第二设定根因对应的告警事件对应的设定告警级别,设定告警级别采用0到1之间的数值表示;f表征第二设定根因对应的告警事件引发告警的次数与历史告警总次数的 商。g表征平均每天的告警次数,g基于设定时长(比如,一个月)内的历史告警总次数确定出。
电子设备基于异常子系统和版本发布记录,确定出第三设定根因;第三根因表征版本变更,第三设定根因对应的设定异常类型为应用版本发布。第三设定根因对应的置信度=1/d,d表征版本发布记录与异常子系统在业务调用链上的距离。下面结合具体的例子说明确定第三设定根因对应的置信度的方法:
某产品的支付场景会经过5个子系统,根据历史日志确定出失败日志(或者高耗时的子系统)集中于子系统C上,因此C为异常子系统;基于失败日志的输出时间,从版本发布记录中确定出在该输出时间之前的设定时长内的第一版本发布记录,基于第一版本发布记录和异常子系统的标识,确定出第三设定根因对应的置信度。其中,当第一版本发布记录表征子系统C、子系统D和子系统E均发生了版本变更时,确定出的第三设定根因包括3个:子系统C版本变更、子系统D版本变更和子系统E版本变更。
在业务调用链为子系统A→子系统B→子系统C→子系统D→子系统E的情况下,子系统C版本变更对应的置信度=1/1;子系统D版本变更对应的置信度=1/2;子系统E版本变更对应的置信度=1/3。
考虑到只有设定异常类型对应的预测概率与候选根因在对应的设定异常类型中的置信度都比较高时,才是异常事件的真正根因,为了准确地确定出目标根因,在一些实施例中,所述确定出每个候选根因在每个设定异常类型中对应的第一得分,包括:
基于预测概率对应的第一设定权重、置信度对应的第二设定权重、每种设定异常类型对应的预测概率以及每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分。
这里,电子设备基于以下公式计算第一得分:Match score=(w1×G+G×K+w2×K)/2;其中,G表征设定异常类型对应的预测概率;K表征候选根因对应的置信度。w1表征第一设定权重,w2表征第二设定权重;w1与w2之和为1。实际应用时,w1和w2均为0.5。
需要说明的是,在一些实施例中,电子设备还可以基于Match score=(w1×G+w3×G×K+w2×K)/2计算第一得分。w3表征第三设定权重,w1、w2和w3之间的总和为2。
为了更准确地确定出目标根因,在一些实施例中,所述方法还包括:
采用设定的规则引擎对日志、告警信息和版本发布记录中的至少之一进行分析,得到所述第一异常事件对应的第一异常类型和第一候选根因;
基于确定出的第一异常类型和第一候选根因,调整以下至少一项:
所述第一异常事件对应的设定异常类型对应的预测概率;
所述第一异常事件对应的候选根因的置信度;
确定出的候选根因在设定异常类型中对应的第一得分。
这里,电子设备基于第一异常事件的发生时间,获取对应的日志、告警信息和版本发布记录;采用设定的规则引擎对获取到的日志、告警信息和版本发布记录中的至少之一进行分析,得到第一异常事件对应的第一异常类型和第一异常事件对应的第一候选根因。
实际应用时,设定的规则引擎为基于json的规则引擎。设定的规则引擎规的json格式如下:
Rule:规则名称;
When:判断条件,由多个关系为&(且)的子条件构成;
Name:子条件名称,也称解析的函数名;
Filter:比较动作,包括大于、等于、小于、包含、不包含、发生时间段等;
Values:根据filter的不同,设定不同的值,用于比较计算;
Then:执行的动作;
Name:动作的名称,也称执行的函数名;
Action:具体的执行动作内容,作为参数输入到then的name中。
示例性地,在银行主动提醒用户还款并进行批量扣款的场景下,可能因为用户余额不足导致业务成功率下降。电子设备通过设定的规则引擎确定出日志中包括表征余额不足以及成功率异常的信息时,将正常业务失败确定为第一异常事件对应的第一异常类型,将余额不足确定为第一异常事件对应的第一候选根因,此时,表征正常业务失败对应的预测概率为1,余额不足对应的置信度置为1。
在一些实施例中,电子设备在基于第一异常事件的发生时间,获取到对应的日志、告警信息和版本发布记录的情况下,也可以确定出第一异常事件对应的特征信息,采用设定的规则引擎对第一异常事件对应的特征信息进行分析,得到第一异常事件对应的第一异常类型和第一异常事件对应的第一候选根因。其中,确定第一异常事件对应的特征信息的方法与前文中确定设定异常事件的特征信息的方法类似,此处不赘述。
实际应用时,当第一异常事件对应的特征信息表征交易量突增,且同一用户对应的交易多次失败时,通过设定的规则引擎确定出第一异常事件对应的第一异常类型为正常业务失败;当第一异常事件对应的特征信息还表征余额不足时,将余额不足确定为第一异常事件对应的第一候选根因。
当第一异常事件对应的特征信息表征数据库异常时,通过设定的规则引擎确定出数据库异常为第一异常事件对应的第一候选根因。
由于采用设定的规则引擎确定出的第一异常类型和第一候选根因是真实存在的,因此,电子设备可以基于确定出的第一异常类型和第一候选根因,调整上文中确定出的设定异常类型对应的预测概率、候选根因对应的置信度以及候选根因对应的第一得分中的至少之一。需要说明的是,电子设备可以在计算出第一得分之后,再基于第一异常类型和第一候选根因调 整设定异常类型对应的预测概率和候选根因对应的置信度中的至少一项,并计算新的第一得分;电子设备也可以在计算第一得分之前,基于第一异常类型和第一候选根因调整设定异常类型对应的预测概率和候选根因对应的置信度中的至少一项,并基于调整后的预测概率和/或置信度,重新计算出第一得分。在确定出新的第一得分的情况下,基于新的第一得分确定出目标根因,由此,相对于直接通过第一候选根因调整目标根因的根因定位方法,可以使得确定出的目标根因更准确。
为了更准确地确定出目标根因,在一些实施例中,所述调整所述第一异常事件对应的设定异常类型对应的预测概率,以及所述第一异常事件对应的候选根因的置信度,包括:
将所述第一异常类型的预测概率调整为第一设定概率,以及将所述第一候选根因的置信度调整为第一设定置信度;其中,
所述第一异常类型的预测概率小于所述第一设定概率;所述第一异常事件对应的至少两个候选根因中不包括所述第一候选根因,或者所述第一候选根因的置信度小于所述第一设定置信度;所述第一设定概率表征所述第一异常事件属于所述第一异常类型;所述第一设定置信度表征当前存在所述第一候选根因。
这里,在第一异常事件对应的多种设定异常类型中的第一异常类型的预测概率小于第一设定概率的情况下,将第一异常类型的预测概率调整为第一设定概率。
在第一异常事件对应的至少两个候选根因中不包括第一候选根因的情况下,或者,在第一异常事件对应的至少两个候选根因中的第一候选根因的置信度小于第一设定置信度的情况下,将第一候选根因的置信度调整为第一设定置信度。
示例性地,在采用设定的规则引擎确定出当前必然存在异常类别1的情况下,异常类别1对应的预测概率为1,当通过异常检测模型确定出异常类别1对应的预测概率小于1时,将异常类别1对应的预测概率调整为1。此时,第一设定概率为1。示例性地,在采用设定的规则引擎确定出当前必然存在根因1的情况下,根因1对应的置信度为1;当第一异常事件对应的至少两个候选根因中不包括根因1时,将根因1确定为第一异常事件对应的候选根因;当第一异常事件对应的至少两个候选根因中的根因1对应的置信度小于1时,将根因1对应的置信度调整为1。此时,第一设定置信度为1。
为了更准确地确定出目标根因,在一些实施例中,所述调整确定出的候选根因在设定异常类型中对应的第一得分,包括:
在所述第一候选根因在所述第一异常类型中对应的第一得分小于设定得分的情况下,将所述第一候选根因在所述第一异常类型中对应的第一得分调整为大于或等于所述设定得分;所述设定得分用于筛选目标根因。
这里,电子设备在确定出的第一得分中,查找第一候选根因在第一异常类型中对应的第一得分,在查找到的第一得分小于设定得分的情况下,将第一候选根因在第一异常类型中对应的第一得分调整为大于或等于设定得分。
其中,在电子设备将得分最高的候选根因确定为目标根因的情况下,设定得分可以为1;在电子设备将大于设定阈值的第一得分对应的候选根因确定为目标根因的情况下,设定得分等于该设定阈值。由于第一异常类型和第一候选根因是采用设定的规则引擎确定出的,表征当前必然存在第一异常类型和第一候选根因,因此,电子设备将第一候选根因在第一异常类型中对应的第一得分调整为大于或等于设定得分,从而能够将第一候选根因确定为第一异常事件对应的其中一个目标根因。
示例性地,在采用设定的规则引擎确定出当前必然存在根因1和异常类别1的情况下,当计算出根因1在异常类别1中对应的第一得分不等于1时,将根因1在异常类别1中对应的第一得分调整为1。
在一些实施例中,电子设备还可以通过设定的规则引擎对第一异常事件对应的候选根因进行分析,并根据分析结果调整第一得分,从而使得基于调整后的第一得分确定出的目标根因更准确。例如,在通过设定的规则引擎确定出第一异常事件对应的候选根因中包括数据库异常的情况下,调高数据库异常对应的第一得分。
在确定出的第一得分中未查找到第一候选根因在第一异常类型中对应的第一得分的场景下,在一些实施例中,在将第一异常类型的预测概率调整为第一设定概率,以及将第一候选根因的置信度调整为第一设定置信度的情况下,所述调整确定出的候选根因在设定异常类型中对应的第一得分,包括:
基于所述第一设定概率和所述第一设定置信度,计算出第一候选根因在所述第一异常类型中对应的第二得分,基于所述第二得分更新确定出的候选根因在设定异常类型中对应的第一得分。
这里,电子设备基于第一设定概率和第一设定置信度,计算出第一候选根因在第一异常类型中对应的第二得分;在确定出的第一得分中包括第一候选根因在第一异常类型中对应的第一得分的情况下,将第一候选根因在第一异常类型中对应的第一得分更新为第二得分;在确定出的第一得分中不包括第一候选根因在第一异常类型中对应的第一得分的情况下,将第一候选根因在第一异常类型中对应的第二得分,确定为第一候选根因在第一异常类型中对应的第一得分。由此,可以使得基于调整后的第一得分确定出的目标根因更准确。其中,计算第二得分的方法与计算第一得分的方法类似,此处不赘述。
本实施例中,通过设定的规则引擎确定出的第一异常类型和第一候选根因,调整设定异常类型对应的预测概率、第一异常事件对应的候选根因 对应的置信度以及候选根因对应的第一得分中的至少一项,通过调整后得到的第一得分来确定目标根因,并非直接调整第一异常事件对应的目标根因,由此可以提高确定出的目标根因的准确度。
步骤103:基于确定出的候选根因在每个设定异常类型中对应的第一得分,确定出所述第一异常事件对应的目标根因。
这里,电子设备可以将最高的第一得分对应的候选根因,确定为第一异常事件对应的目标根因,也可以将大于设定阈值的第一得分对应的候选根因,确定为第一异常事件对应的目标根因。
在实际应用中,电子设备也可以将确定出的候选根因在每个设定异常类型中对应的第一得分进行排序,基于排序后的第一得分,确定出第一异常事件对应的目标根因。
本实施例中,通过异常检测模型预测出第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率;基于第一异常事件对应的每种设定异常类型对应的预测概率,以及基于第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分;基于确定出的候选根因在每个设定异常类型中对应的第一得分,确定出第一异常事件对应的目标根因;由此,可以准确地对目标根因进行定位,提高确定出的目标根因的准确度。
图2为本发明应用实施例提供的根因定位方法的示意图,如图2所示,根因定位方法包括:
步骤201:将第一异常事件对应的特征向量输入至异常检测模型,得到所述第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率。
步骤202:基于所述第一异常事件对应的每种设定异常类型对应的预测概率,以及基于所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分。
步骤203:采用设定的规则引擎对日志、告警信息和版本发布记录中的至少之一进行分析,得到所述第一异常事件对应的第一异常类型和第一候选根因。
步骤204:基于确定出的第一异常类型和第一候选根因,调整以下至少一项:
所述第一异常事件对应的设定异常类型对应的预测概率;
所述第一异常事件对应的候选根因的置信度;
确定出的候选根因在设定异常类型中对应的第一得分。
步骤205:基于确定出的候选根因在每个设定异常类型中对应的第一得分,确定出所述第一异常事件对应的目标根因。
为实现本申请实施例的根因定位方法,本申请另一实施例还提供了一 种电子设备,如图3所示,该电子设备包括:
预测单元31,配置为将第一异常事件对应的特征向量输入至异常检测模型,得到所述第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率;
第一确定单元32,配置为基于所述第一异常事件对应的每种设定异常类型对应的预测概率,以及基于所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分;
第二确定单元33,配置为基于确定出的候选根因在每个设定异常类型中对应的第一得分,确定出所述第一异常事件对应的目标根因。
在一些实施例中,第一确定单元32配置为:
基于预测概率对应的第一设定权重、置信度对应的第二设定权重、每种设定异常类型对应的预测概率以及每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分。
在一些实施例中,该电子设备还包括:
第三确定单元,配置为基于设定异常事件与设定根因之间的第一设定对应关系,确定出所述第一异常事件对应的至少两个候选根因;
第四确定单元,配置为基于设定根因与置信度之间的第二设定对应关系,确定出所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度;其中,所述第一设定对应关系和所述第二设定关系基于历史日志、历史告警信息和版本发布记录中的至少之一确定出。
在一些实施例中,该电子设备还包括:
分析单元,配置为采用设定的规则引擎对日志、告警信息和版本发布记录中的至少之一进行分析,得到所述第一异常事件对应的第一异常类型和第一候选根因;
调整单元,配置为基于确定出的第一异常类型和第一候选根因,调整以下至少一项:
所述第一异常事件对应的设定异常类型对应的预测概率;
所述第一异常事件对应的候选根因的置信度;
确定出的候选根因在设定异常类型中对应的第一得分。
在一些实施例中,所述调整单元配置为:
将所述第一异常类型的预测概率调整为第一设定概率,以及将所述第一候选根因的置信度调整为第一设定置信度;其中,
所述第一异常类型的预测概率小于所述第一设定概率;所述第一异常事件对应的至少两个候选根因中不包括所述第一候选根因,或者所述第一候选根因的置信度小于所述第一设定置信度;所述第一设定概率表征所述第一异常事件属于所述第一异常类型;所述第一设定置信度表征当前存在所述第一候选根因。
在一些实施例中,所述调整单元配置为:
基于所述第一设定概率和所述第一设定置信度,计算出第一候选根因在所述第一异常类型中对应的第二得分,基于所述第二得分更新确定出的候选根因在设定异常类型中对应的第一得分。
在一些实施例中,所述调整单元配置为:
在所述第一候选根因在所述第一异常类型中对应的第一得分小于设定得分的情况下,将所述第一候选根因在所述第一异常类型中对应的第一得分调整为大于或等于所述设定得分;所述设定得分用于筛选目标根因。
实际应用时,上述各单元可通过电子设备中的处理器,比如中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制单元(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)等实现。
需要说明的是:上述实施例提供的电子设备在进行根因定位时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提供的电子设备与根因定位方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
基于上述程序模块的硬件实现,且为了实现本申请实施例的方法,本申请实施例还提供了一种电子设备。图4为本申请实施例电子设备的硬件组成结构示意图,如图4所示,电子设备4包括:
通信接口41,能够与其它设备比如网络设备等进行信息交互;
处理器42,与通信接口41连接,以实现与其它设备进行信息交互,配置为运行计算机程序时,执行上述电子设备一个或多个技术方案提供的根因定位方法。而计算机程序存储在存储器43上。
当然,实际应用时,电子设备4中的各个组件通过总线系统44耦合在一起。可理解,总线系统44配置为实现这些组件之间的连接通信。总线系统44除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图4中将各种总线都标为总线系统44。
本申请实施例中的存储器43用于存储各种类型的数据以支持电子设备4的操作。这些数据的示例包括:用于在电子设备4上操作的任何计算机程序。
可以理解,存储器43可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储 器(FRAM,ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,Sync Link Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本申请实施例描述的存储器43旨在包括但不限于这些和任意其它适合类型的存储器。
上述本申请实施例揭示的方法可以应用于处理器42中,或者由处理器42实现。处理器42可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器42中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器42可以是通用处理器、DSP,或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器42可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器43,处理器42读取存储器43中的程序,结合其硬件完成前述方法的步骤。
可选地,所述处理器42执行所述程序时实现本申请实施例的各个方法中由终端实现的相应流程,为了简洁,在此不再赘述。
在示例性实施例中,本申请实施例还提供了一种存储介质,即计算机存储介质,具体为计算机可读存储介质,例如包括存储计算机程序的第一存储器43,上述计算机程序可由终端的处理器42执行,以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置、终端和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一 个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是:“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
另外,本申请实施例所记载的技术方案之间,在不冲突的情况下,可以任意组合。
需要说明的是,本申请实施例中的术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多个中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种根因定位方法,包括:
    将第一异常事件对应的特征向量输入至异常检测模型,得到所述第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率;
    基于所述第一异常事件对应的每种设定异常类型对应的预测概率,以及基于所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分;
    基于确定出的候选根因在每个设定异常类型中对应的第一得分,确定出所述第一异常事件对应的目标根因。
  2. 根据权利要求1所述的方法,其中,所述确定出每个候选根因在每个设定异常类型中对应的第一得分,包括:
    基于预测概率对应的第一设定权重、置信度对应的第二设定权重、每种设定异常类型对应的预测概率以及每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分。
  3. 根据权利要求1所述的方法,其中,所述方法还包括:
    基于设定异常事件与设定根因之间的第一设定对应关系,确定出所述第一异常事件对应的至少两个候选根因;
    基于设定根因与置信度之间的第二设定对应关系,确定出所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度;其中,所述第一设定对应关系和所述第二设定关系基于历史日志、历史告警信息和版本发布记录中的至少之一确定出。
  4. 根据权利要求1至3任一项所述的方法,其中,所述方法还包括:
    采用设定的规则引擎对日志、告警信息和版本发布记录中的至少之一进行分析,得到所述第一异常事件对应的第一异常类型和第一候选根因;
    基于确定出的第一异常类型和第一候选根因,调整以下至少一项:
    所述第一异常事件对应的设定异常类型对应的预测概率;
    所述第一异常事件对应的候选根因的置信度;
    确定出的候选根因在设定异常类型中对应的第一得分。
  5. 根据权利要求4所述的方法,其中,所述调整所述第一异常事件对应的设定异常类型对应的预测概率,以及所述第一异常事件对应的候选根因的置信度,包括:
    将所述第一异常类型的预测概率调整为第一设定概率,以及将所述 第一候选根因的置信度调整为第一设定置信度;其中,
    所述第一异常类型的预测概率小于所述第一设定概率;所述第一异常事件对应的至少两个候选根因中不包括所述第一候选根因,或者所述第一候选根因的置信度小于所述第一设定置信度;所述第一设定概率表征所述第一异常事件属于所述第一异常类型;所述第一设定置信度表征当前存在所述第一候选根因。
  6. 根据权利要求5所述的方法,其中,所述调整确定出的候选根因在设定异常类型中对应的第一得分,包括:
    基于所述第一设定概率和所述第一设定置信度,计算出第一候选根因在所述第一异常类型中对应的第二得分,基于所述第二得分更新确定出的候选根因在设定异常类型中对应的第一得分。
  7. 根据权利要求4所述的方法,其中,所述调整确定出的候选根因在设定异常类型中对应的第一得分,包括:
    在所述第一候选根因在所述第一异常类型中对应的第一得分小于设定得分的情况下,将所述第一候选根因在所述第一异常类型中对应的第一得分调整为大于或等于所述设定得分;所述设定得分用于筛选目标根因。
  8. 一种电子设备,包括:
    预测单元,配置为将第一异常事件对应的特征向量输入至异常检测模型,得到所述第一异常事件对应的多种设定异常类型中每种设定异常类型对应的预测概率;
    第一确定单元,配置为基于所述第一异常事件对应的每种设定异常类型对应的预测概率,以及基于所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分;
    第二确定单元,配置为基于确定出的候选根因在每个设定异常类型中对应的第一得分,确定出所述第一异常事件对应的目标根因。
  9. 根据权利要求8所述的电子设备,其中,所述第一确定单元配置为:
    基于预测概率对应的第一设定权重、置信度对应的第二设定权重、每种设定异常类型对应的预测概率以及每个候选根因对应的置信度,确定出每个候选根因在每个设定异常类型中对应的第一得分。
  10. 根据权利要求8所述的电子设备,其中,所述电子设备还包括:
    第三确定单元,配置为基于设定异常事件与设定根因之间的第一设定对应关系,确定出所述第一异常事件对应的至少两个候选根因;
    第四确定单元,配置为基于设定根因与置信度之间的第二设定对应关系,确定出所述第一异常事件对应的至少两个候选根因中每个候选根因对应的置信度;其中,所述第一设定对应关系和所述第二设定关系基 于历史日志、历史告警信息和版本发布记录中的至少之一确定出。
  11. 根据权利要求8至10任一项所述的电子设备,其中,所述电子设备还包括:
    分析单元,配置为采用设定的规则引擎对日志、告警信息和版本发布记录中的至少之一进行分析,得到所述第一异常事件对应的第一异常类型和第一候选根因;
    调整单元,配置为基于确定出的第一异常类型和第一候选根因,调整以下至少一项:
    所述第一异常事件对应的设定异常类型对应的预测概率;
    所述第一异常事件对应的候选根因的置信度;
    确定出的候选根因在设定异常类型中对应的第一得分。
  12. 根据权利要求11所述的电子设备,其中,所述调整单元配置为:
    将所述第一异常类型的预测概率调整为第一设定概率,以及将所述第一候选根因的置信度调整为第一设定置信度;其中,
    所述第一异常类型的预测概率小于所述第一设定概率;所述第一异常事件对应的至少两个候选根因中不包括所述第一候选根因,或者所述第一候选根因的置信度小于所述第一设定置信度;所述第一设定概率表征所述第一异常事件属于所述第一异常类型;所述第一设定置信度表征当前存在所述第一候选根因。
  13. 根据权利要求12所述的电子设备,其中,所述调整单元配置为:
    基于所述第一设定概率和所述第一设定置信度,计算出第一候选根因在所述第一异常类型中对应的第二得分,基于所述第二得分更新确定出的候选根因在设定异常类型中对应的第一得分。
  14. 根据权利要求11所述的电子设备,其中,所述调整单元配置为:
    在所述第一候选根因在所述第一异常类型中对应的第一得分小于设定得分的情况下,将所述第一候选根因在所述第一异常类型中对应的第一得分调整为大于或等于所述设定得分;所述设定得分用于筛选目标根因。
  15. 一种电子设备,包括:处理器和配置为存储能够在处理器上运行的计算机程序的存储器,其中,所述处理器配置为运行所述计算机程序时,执行权利要求1至7任一项所述的方法的步骤。
  16. 一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至7任一项所述的方法的步骤。
PCT/CN2021/127331 2021-05-12 2021-10-29 根因定位方法、电子设备及存储介质 WO2022237088A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110517260.6A CN113298638B (zh) 2021-05-12 2021-05-12 根因定位方法、电子设备及存储介质
CN202110517260.6 2021-05-12

Publications (1)

Publication Number Publication Date
WO2022237088A1 true WO2022237088A1 (zh) 2022-11-17

Family

ID=77321678

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/127331 WO2022237088A1 (zh) 2021-05-12 2021-10-29 根因定位方法、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN113298638B (zh)
WO (1) WO2022237088A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117389230A (zh) * 2023-11-16 2024-01-12 广州中健中医药科技有限公司 抗高血压中药提取液生产控制方法及系统
WO2024139255A1 (zh) * 2022-12-28 2024-07-04 支付宝(杭州)信息技术有限公司 根因定位的方法、装置、设备和可读介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298638B (zh) * 2021-05-12 2023-07-14 深圳前海微众银行股份有限公司 根因定位方法、电子设备及存储介质
CN114978877B (zh) * 2022-05-13 2024-04-05 京东科技信息技术有限公司 一种异常处理方法、装置、电子设备及计算机可读介质
CN115729796B (zh) * 2022-12-23 2023-10-10 中软国际科技服务有限公司 基于人工智能的异常操作分析方法及大数据应用系统
CN116561090B (zh) * 2023-03-27 2024-09-24 广州迪澳基因科技有限公司 一种应用于核酸检测仪的异常报警方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222936A (zh) * 2019-05-09 2019-09-10 阿里巴巴集团控股有限公司 一种业务场景的根因定位方法、系统及电子设备
US20190384275A1 (en) * 2016-12-28 2019-12-19 Mitsubishi Hitachi Power Systems, Ltd. Diagnostic device, diagnostic method, and program
CN112087334A (zh) * 2020-09-09 2020-12-15 中移(杭州)信息技术有限公司 告警根因分析方法、电子设备和存储介质
CN112152852A (zh) * 2020-09-23 2020-12-29 创新奇智(北京)科技有限公司 根因分析方法、装置、设备及计算机存储介质
CN113298638A (zh) * 2021-05-12 2021-08-24 深圳前海微众银行股份有限公司 根因定位方法、电子设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111158977B (zh) * 2019-12-12 2023-07-11 深圳前海微众银行股份有限公司 一种异常事件根因定位方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190384275A1 (en) * 2016-12-28 2019-12-19 Mitsubishi Hitachi Power Systems, Ltd. Diagnostic device, diagnostic method, and program
CN110222936A (zh) * 2019-05-09 2019-09-10 阿里巴巴集团控股有限公司 一种业务场景的根因定位方法、系统及电子设备
CN112087334A (zh) * 2020-09-09 2020-12-15 中移(杭州)信息技术有限公司 告警根因分析方法、电子设备和存储介质
CN112152852A (zh) * 2020-09-23 2020-12-29 创新奇智(北京)科技有限公司 根因分析方法、装置、设备及计算机存储介质
CN113298638A (zh) * 2021-05-12 2021-08-24 深圳前海微众银行股份有限公司 根因定位方法、电子设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024139255A1 (zh) * 2022-12-28 2024-07-04 支付宝(杭州)信息技术有限公司 根因定位的方法、装置、设备和可读介质
CN117389230A (zh) * 2023-11-16 2024-01-12 广州中健中医药科技有限公司 抗高血压中药提取液生产控制方法及系统
CN117389230B (zh) * 2023-11-16 2024-06-07 广州中健中医药科技有限公司 抗高血压中药提取液生产控制方法及系统

Also Published As

Publication number Publication date
CN113298638B (zh) 2023-07-14
CN113298638A (zh) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2022237088A1 (zh) 根因定位方法、电子设备及存储介质
WO2021114977A1 (zh) 一种异常事件根因定位方法及装置
WO2021213247A1 (zh) 一种异常检测方法及装置
WO2022089202A1 (zh) 故障识别模型训练方法、故障识别方法、装置及电子设备
WO2019141144A1 (zh) 确定网络故障的方法和装置
CN111612038B (zh) 异常用户检测方法及装置、存储介质、电子设备
CN110197430A (zh) 一种基于资金业务系统的资金业务监控方法及系统
WO2023071761A1 (zh) 一种异常定位方法及装置
CN113298127B (zh) 训练异常检测模型的方法及电子设备
US20140279380A1 (en) Automated searching credit reports to identify potential defaulters
WO2022036702A1 (zh) 资产证券产品的预警方法、装置、电子设备及存储介质
JP4383484B2 (ja) メッセージ解析装置、制御方法および制御プログラム
TW201503029A (zh) 計算企業拖欠帳款機率之技術
CN114595765A (zh) 数据处理方法、装置、电子设备及存储介质
CN113326064A (zh) 划分业务逻辑模块的方法、电子设备及存储介质
WO2021212753A1 (zh) 计算机性能数据确定方法、装置、计算机设备及存储介质
CN112927071A (zh) 一种贷后行为特征加工的方法与装置
CN112329862A (zh) 基于决策树的反洗钱方法及系统
CN115063143A (zh) 账户数据处理方法、装置、计算机设备和存储介质
CN111783835A (zh) 一种数据降维方法、装置及系统
CN113256422A (zh) 分仓账户识别方法、装置、计算机设备和存储介质
TW202020754A (zh) 使用機器學習預測系統健康度之方法與系統
CN118014713B (zh) 一种实时风险监测智能预警方法和系统
CN113935574B (zh) 异常交易的监测方法、装置、计算机设备和存储介质
CN117786581B (zh) 一种基于混合属性的金融领域异常检测方法、系统、设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21941653

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.03.2024)