CN112905479A - Cloud platform based alarm accident root cause optimal path determination method and system - Google Patents

Cloud platform based alarm accident root cause optimal path determination method and system Download PDF

Info

Publication number
CN112905479A
CN112905479A CN202110287133.1A CN202110287133A CN112905479A CN 112905479 A CN112905479 A CN 112905479A CN 202110287133 A CN202110287133 A CN 202110287133A CN 112905479 A CN112905479 A CN 112905479A
Authority
CN
China
Prior art keywords
alarm
module
accident
alarm accident
root cause
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110287133.1A
Other languages
Chinese (zh)
Other versions
CN112905479B (en
Inventor
苏君福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Icsoc Beijing Communication Technology Co ltd
Original Assignee
Icsoc Beijing Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Icsoc Beijing Communication Technology Co ltd filed Critical Icsoc Beijing Communication Technology Co ltd
Priority to CN202110287133.1A priority Critical patent/CN112905479B/en
Publication of CN112905479A publication Critical patent/CN112905479A/en
Application granted granted Critical
Publication of CN112905479B publication Critical patent/CN112905479B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/366Software debugging using diagnostics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3471Address tracing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Alarm Systems (AREA)

Abstract

The application provides a cloud platform-based alarm accident root cause optimal path determination method and system. The method comprises the steps of obtaining related alarm information of an alarm accident occurrence module, wherein the related alarm information comprises the time point of the occurrence of an alarm accident; acquiring information of alarm events of all alarm accident modules within preset time before the time point according to the time point of the alarm accident; analyzing the information of the alarm events of all alarm accident modules within the preset time before the time point to obtain the information of the associated alarm accident modules; obtaining the association dispersion of the association alarm accident module and the alarm accident module according to the association alarm accident module information; and determining the optimal path of the root cause of the alarm accident according to the associated dispersion. According to the method provided by the embodiment of the application, the association dispersion is obtained, and the alarm module is analyzed through the association dispersion, so that the fault path determining efficiency is improved.

Description

Cloud platform based alarm accident root cause optimal path determination method and system
Technical Field
The invention belongs to the technical field of internet monitoring, and particularly relates to a cloud platform-based alarm accident root cause optimal path determination method and system.
Background
In recent years, with the rapid development of internet technology, the size of network service systems and the complexity between internal modules have been increased, thereby resulting in an increase in difficulty in diagnosing service failures. For a huge and complex network environment under cloud computing, timely discovery of application faults becomes important on the premise of avoiding influencing the use of clients. Therefore, an optimal fault path needs to be found after a service fault occurs, and according to the optimal fault path, the fault discharge efficiency can be improved, and the user experience is improved.
The method provided by the related art searches one by one through manual work after finding problems, that is, the optimal fault path cannot be determined.
However, the fault removing process is relatively complex, consumes more labor cost and time cost, consumes too long time in part of fault diagnosis processes, and is difficult to diagnose and stop damage timely and effectively.
Disclosure of Invention
In order to solve the technical problems, the invention provides a cloud platform-based alarm accident root cause optimal path determination method and system.
The specific technical scheme of the invention is as follows:
a cloud platform based alarm accident root cause optimal path determination method comprises the following steps:
acquiring related alarm information of the alarm accident occurrence module, wherein the related alarm information comprises the time point of the alarm accident occurrence;
acquiring information of alarm events of all alarm accident modules within preset time before the time point according to the time point of the alarm accident;
analyzing the information of the alarm events of all alarm accident modules within the preset time before the time point to obtain the information of the associated alarm accident modules;
obtaining the association dispersion of the association alarm accident module and the alarm accident module according to the association alarm accident module information;
and determining the optimal path of the root cause of the alarm accident according to the associated dispersion.
In an optional embodiment, obtaining the association dispersion between the associated alarm incident module and the alarm incident module according to the associated alarm incident module information includes:
acquiring the correlation degree between each related alarm accident module and the alarm accident module;
obtaining a correlation coefficient between each alarm correlation accident module and the alarm accident module according to the correlation degree;
and obtaining the association dispersion of the association alarm accident module and the alarm accident module according to the correlation coefficient.
In an optional embodiment, obtaining the association dispersion between the associated alarm incident module and the current alarm incident module according to the correlation coefficient includes:
and the absolute value of the difference value between the correlation coefficient of the correlation alarm accident module and the correlation coefficient of the current alarm accident module is the correlation dispersion of the current alarm accident module.
In an optional embodiment, obtaining a correlation coefficient between each alarm related accident module and the current alarm accident module according to the correlation degree includes:
acquiring the called frequency of each alarm related accident module within preset time;
and obtaining the mean square error coefficient of each related alarm accident module called in the preset time according to the called frequency of each alarm related accident module in the preset time, and taking the mean square error coefficient as the correlation coefficient of each related alarm accident module and the alarm accident module at this time.
In an optional embodiment, the obtaining the correlation between each associated alarm incident module and the current alarm incident module includes:
acquiring the change state of each associated alarm accident module and the current alarm accident module to the preset items;
acquiring a time sequence of each associated alarm accident module and the alarm accident module;
and obtaining the correlation degree of each associated alarm accident module and the current alarm accident module according to the change state of each associated alarm accident module and the current alarm accident module to the preset items and the time sequence of each associated alarm accident module and the current alarm accident module.
In an optional embodiment, analyzing information of alarm events occurring in all alarm accident modules within a preset time before the time point to obtain associated alarm accident module information includes:
acquiring alarm logs of all alarm accident modules, alarm accident modules and access volumes of the alarm accident modules within preset time before a time sequence;
and collecting the access amount of the alarm log, the alarm accident module and the alarm accident module information related to the alarm accident module to obtain related alarm accident module information.
In an alternative embodiment, determining the alarm incident root cause optimal path according to the associated dispersion comprises:
taking the alarm accident module as a central point, acquiring a related alarm accident module of a first layer within a first preset correlation degree and a first preset correlation coefficient range;
taking the alarm accident module as a central point, acquiring an associated alarm accident module within a second preset correlation degree and a target preset correlation coefficient range from the associated alarm accident module acquired in the previous layer, wherein the first preset correlation degree is greater than the second preset correlation degree;
and determining the alarm accident root cause optimal path according to the associated dispersion of the associated alarm accident module within the target correlation coefficient range.
In an optional embodiment, after determining the optimal path of the alarm incident root cause according to the associated dispersion, the method further comprises:
and searching an alarm root cause according to the optimal path of the alarm accident root cause, and performing processing action according to the alarm root cause.
In another aspect, a cloud platform-based alarm accident root cause optimal path determination system is provided, which includes: a memory, a processor and a computer program stored on the memory, the processor executing the computer program to implement any of the methods described above.
In yet another aspect, a computer-readable storage medium having computer-executable instructions stored thereon for performing any of the above methods when executed by a processor is provided.
The invention has the following beneficial effects:
on one hand, the method provided by the embodiment of the application is based on time division, namely, only the alarm accident module before the time point of the current alarm accident is analyzed, so that the time for analyzing the alarm accident module after the time point is reduced, and the analysis efficiency is improved; by acquiring the associated dispersion and analyzing the alarm module through the associated dispersion, the efficiency of determining the fault path is improved.
Drawings
Fig. 1 is a schematic flow chart of a cloud platform-based alarm accident root cause optimal path determination method according to an embodiment of the present application;
FIG. 2 is a timing diagram illustrating a product service failure alarm according to an embodiment of the present disclosure;
fig. 3 is a simplified schematic diagram of S105 according to an embodiment of the present application.
Detailed Description
The present invention will be described in further detail with reference to the following examples and drawings.
The path determination algorithm based on the cloud platform alarm accident root cause in the current market is relatively single, the correlation discrete degree of alarm events between interfaces is not dynamically analyzed and tracked, the analysis result of the alarm correlation relation of the modules is influenced, and the accident root cause positioning has deviation. In view of this, the embodiment of the present application provides a method and a system for determining an optimal path of an alarm accident root cause based on a cloud platform, and aims to solve the above technical problems.
On one hand, an embodiment of the present application provides a method for determining an optimal path of an alarm accident root cause based on a cloud platform, please refer to fig. 1, and fig. 1 is a schematic flow diagram of the method for determining an optimal path of an alarm accident root cause based on a cloud platform according to the embodiment of the present application. The method comprises the following steps:
s101, obtaining related alarm information of the module with the alarm accident, wherein the related alarm information comprises the time point of the alarm accident.
S102, acquiring information of alarm events of all alarm accident modules within preset time before the time point according to the time point of the alarm accident.
S103, analyzing the information of the alarm events of all alarm accident modules within the preset time before the time point to obtain the information of the associated alarm accident modules.
And S104, obtaining the association dispersion of the association alarm accident module and the alarm accident module according to the association alarm accident module information.
And S105, determining the optimal path of the alarm accident root cause according to the associated dispersion.
According to the method provided by the embodiment of the application, the relevant alarm information of the module with the alarm accident is obtained, wherein the relevant alarm information comprises the time point of the alarm accident; the information of the alarm events of all the alarm accident modules within the preset time before the time point can be obtained through the preposed causal relationship, so that the information of all the alarm accident modules is compared with the alarm accident module; analyzing the information of alarm events of all alarm accident modules within a preset time before a time point, and marking the alarm modules associated with the alarm accident module to obtain associated alarm accident module information; obtaining the association dispersion of the association alarm accident module and the alarm accident module according to the association alarm accident module information; and determining the optimal path of the root cause of the alarm accident according to the associated dispersion. The method provided by the embodiment of the application is based on time differentiation, namely only the alarm accident module before the time point of the current alarm accident is analyzed, so that the time for analyzing the alarm accident module after the time point is reduced, and the analysis efficiency is improved; by acquiring the associated dispersion and analyzing the alarm module through the associated dispersion, the efficiency of determining the fault path is improved.
The methods provided herein are further explained and illustrated below by way of optional examples.
S101, obtaining related alarm information of the module with the alarm accident, wherein the related alarm information comprises the time point of the alarm accident.
It should be noted that the alarm that causes the service to be alarmed includes not only the alarm of the interface associated with the service but also the alarm of the host associated with the service. The alarm accident module comprises an interface module and a host module.
That is to say, each alarm event has a certain incidence relation with some modules which have already generated alarm accidents, that is, each module which has generated alarm accidents may cause the alarm. As an example, the module A cannot be normally accessed, and the analysis of the alarm data of the accident module and the alarm module by the alarm module is analyzed, and the correlation library of all alarm accident modules is compared, so that which module causes the abnormal access of the module A can be obtained.
The related alarm information of the alarm accident occurrence module has a plurality of types, such as the time point of the alarm occurrence, the content of the alarm service, and the like. According to the embodiment of the application, the information of the alarm accident module before the time point can be checked according to the preposed causal relationship by acquiring the alarm time point, and the information of all the alarm accident modules does not need to be checked, so that the efficiency of fault inquiry is improved.
S102, acquiring information of alarm events of all alarm accident modules within preset time before the time point according to the time point of the alarm accident.
It can be understood that many interfaces or hosts are failed when the service process runs, and for the specific alarm failure, only the information of the pre-alarm failure module which can cause the alarm of the service process needs to be checked.
As an example, referring to fig. 2, fig. 2 is a schematic diagram of a product service failure alarm time provided in the embodiment of the present application, where the product service alarm time is 9.25, and information of all alarm modules between 9:05 and 9:25 is analyzed, that is, alarm events and service request data information of all interfaces within 20 minutes. It can be seen that each time point in fig. 2 corresponds to a very corresponding alarm accident, in the embodiment of the present application, only the alarm accident event within the preset time is analyzed, and the alarm accident event outside the preset time is not processed.
It should be noted that the information of the alarm events occurring in all the alarm accident modules within the preset time before the time point, which is acquired in the embodiment of the present application, includes the contents of the log, the technical resources, the access amount, and the like of the alarm accident module.
S103, analyzing the information of the alarm events of all alarm accident modules within the preset time before the time point to obtain the information of the associated alarm accident modules.
It can be understood that within the preset time, many alarm accident modules have alarm events, but if the alarm events are not related to the alarm of the alarm accident module, the alarm accident will not be affected. Therefore, the information of the alarm events of all the alarm accident modules within the preset time before the time point is analyzed, the alarm accident modules which are not related to the alarm accident module at this time are removed, and the related alarm accident module information related to the alarm accident module at this time is obtained.
In an optional embodiment, S103 includes S1031 to S1032.
And S1031, acquiring alarm logs of all alarm accident modules within preset time before the time point, the alarm accident modules and the access amount of the alarm accident modules.
The related alarm accident module information may include alarm logs of all alarm accident modules, and access amounts of the alarm accident modules within a preset time before a time point, the identity information of the related alarm accident modules, the alarm time of the occurrence of a fault, and a connection relationship between the alarm accident modules of this time, the connection relationship including a direct connection relationship and an indirect connection relationship.
S1032, collecting the access amount of the alarm log, the alarm accident module and the alarm accident module information related to the alarm accident module to obtain related alarm accident module information.
And S104, obtaining the association dispersion of the association alarm accident module and the alarm accident module according to the association alarm accident module information.
In an alternative embodiment, S104 includes S1041-S1043.
S1041, obtaining the correlation degree between each related alarm accident module and the alarm accident module.
It can be understood that the alarm accident module may be caused to alarm only when each associated alarm accident module has a correlation with the alarm accident module.
In an alternative embodiment, S1041 comprises:
and acquiring the change state of each associated alarm accident module and the alarm accident module to the preset items.
The preset items may include a load of the alarm incident module, a CPU occupancy rate, and the like. As an example, when the alarm accident module a fails and alarms, the load of a increases and the load of B also increases accordingly, which may indicate that there is a correlation between the alarm accident modules a and B, that is, an increase or decrease in the load of a may affect an increase or decrease in the load of B, or an increase or decrease in the load of B may affect an increase or decrease in the load of a.
And acquiring a time sequence of each associated alarm accident module and the alarm accident module.
It will be appreciated that when an alarm failure module fails, it will cause the alarm of its associated alarm failure module in a short period of time, i.e. both quickly and timely. That is to say, the alarm failure module closer to the alarm time point of the alarm failure module at this time may be the reason for the alarm at this time. Therefore, by acquiring the time sequence of each associated alarm accident module and the alarm accident module of this time, the alarm time of each associated alarm module can be acquired through the time sequence, and the closer the alarm time of the alarm accident module of this time is, the greater the correlation degree of the simulation of the alarm fault of this time is.
And obtaining the correlation degree of each associated alarm accident module and the current alarm accident module according to the change state of each associated alarm accident module and the current alarm accident module to the preset items and the time sequence of each associated alarm accident module and the current alarm accident module.
S1042, obtaining the correlation coefficient of each alarm correlation accident module and the alarm accident module according to the correlation degree.
In an alternative embodiment, S1042 includes:
and acquiring the called frequency of each alarm related accident module in the preset time.
It can be understood that the greater the frequency of the alarm-related accident module being called within the preset time, the greater the access amount of the alarm-related accident module is, the greater the probability of the alarm occurring in the alarm accident module is.
According to the embodiment of the application, the called frequency of each alarm related accident module in the preset time is obtained, the called mean square error coefficient of each related alarm accident module in the preset time is obtained according to the called frequency of each alarm related accident module in the preset time, and the mean square error coefficient is used as the correlation coefficient of each related alarm accident module and the alarm accident module at this time.
And S1043, obtaining the association dispersion of the association alarm accident module and the alarm accident module according to the correlation coefficient.
In an alternative embodiment, S1043 comprises: and the absolute value of the difference value between the correlation coefficient of the correlation alarm accident module and the correlation coefficient of the current alarm accident module is the correlation dispersion of the current alarm accident module.
As an example, the alarm incident module A gives an alarm, and the alarm incident modules B and C are both associated alarm incident modules. The called frequency of the three alarm accident modules in the preset time is obtained through a sampling method, for example, the preset time can be 9.10-9.25 minutes, and the lower mark is 10-25 minutes, namely, the interval time is 15 minutes. And the correlation coefficients of the alarm accident module A, the alarm accident module B and the alarm accident module C are respectively recorded as alpha A ═ CHG [ A, 10-25 ], alpha C ═ CHG [ C, 10-25 ] and alpha B ═ CHG [ B, 10-25 ]. The alarm accident module a is taken as an example for explanation, wherein CHG is a correlation coefficient, a in parentheses represents the alarm accident module a, 10-25 represents time 9.10-9.25 minutes, that is, within 15 minutes of interval time, the meanings represented by the alarm accident modules B and C are the same as those of a, and are not described in detail herein. By obtaining the mean square deviations of the alarm accident module a, the alarm accident module B and the alarm accident module C, which are respectively expressed as mean square deviations STDEVP [ α a ] ═ 2.4, STDEVP [ α B ] ═ 2.6 and STDEVP [ α C ] ═ 3.4, the dispersion from the alarm accident module B to the alarm accident module a is Q [ a, B ] ═ STDEVP [ α B ] -STDEVP [ α a ] ═ 0.2, and the dispersion from the alarm accident module C to the alarm accident module a is Q [ a, C ] ═ STDEVP [ α C ] -STDEVP [ α a ] ═ 1.4. Namely, in the above embodiment, the absolute value of the difference between the correlation coefficient of the correlated alarm accident module and the correlation coefficient of the current alarm accident module is the correlation dispersion of the current alarm accident module.
And S105, determining the optimal path of the alarm accident root cause according to the associated dispersion.
In an alternative embodiment, S105 includes S1051-S1053.
Referring to fig. 3, fig. 3 is a simplified flow chart of S105 according to an embodiment of the present disclosure.
S1051, taking the alarm accident module as a central point, and acquiring the related alarm accident module of the first layer within the range of the first preset correlation degree and the first preset correlation coefficient.
It should be noted that, as can be seen from fig. 3, there are many alarm accident modules associated with the alarm accident module a, and if each time the alarm accident module is analyzed from head to tail, the analysis time is increased, and the analysis efficiency is reduced. Therefore, in the embodiment of the application, by acquiring the associated alarm accident modules of the first layer within the first preset correlation degree and the first preset correlation coefficient range, the associated alarm accident modules of the first layer outside the first preset correlation degree range are removed, and only the associated alarm accidents within the first preset correlation degree range are analyzed again. Therefore, the workload of path analysis is reduced, and the efficiency of path analysis is improved. It should be noted that the associated alarm accident modules in each layer have different correlation degrees with the alarm accident module to be detected, and by analyzing the alarm accident modules within the first preset correlation degree in the first layer, the alarm accident modules with low correlation degree with the alarm accident module can be eliminated.
And S1052, taking the alarm accident module as a central point, acquiring the associated alarm accident module within a second preset correlation degree and a target preset correlation coefficient range from the associated alarm accident module acquired in the previous layer, wherein the first preset correlation degree is greater than the second preset correlation degree.
It can be understood that the determination of the alarm accident root cause optimal path according to the associated dispersion provided by the embodiment of the present application may not find the optimal path through a path analysis once, at this time, the first layer of the associated alarm accident that meets the condition, that is, is located within the first preset correlation degree, needs to be analyzed again, and this is repeated until the associated alarm accident module is within the target correlation coefficient, which indicates that the path is the optimal path determined by the embodiment of the present application.
And S1053, determining the alarm accident root cause optimal path according to the associated dispersion of the associated alarm accident module within the target correlation coefficient range.
As an example, the module associated with the alarm incident module a has an alarm incident module B and an alarm incident module C, the modules associated with the alarm incident module B and the alarm incident module C are an alarm incident module D and an alarm incident module E, respectively, the association dispersion of the alarm incident module D and the alarm incident module B is 2, the association dispersion of the alarm incident module C and the alarm incident module E is 1, and then there are two critical paths to the alarm incident module a: and D- > B- > A correlation dispersion is 4, E- > C- > A correlation dispersion is 2, and A- > C- > E is the optimal path, namely the correlation dispersion of the alarm is low, so that the root cause of the problem is determined, and the root cause of the accident A is the E module.
In an optional embodiment, after determining the optimal path of the alarm incident root cause according to the associated dispersion, the method further comprises:
and searching an alarm root cause according to the optimal path of the alarm accident root cause, and performing processing action according to the alarm root cause.
On the other hand, the system for determining the optimal path of the alarm accident root cause based on the cloud platform is further provided, and comprises the following components: a memory, a processor and a computer program stored on the memory, the processor executing the computer program to implement any of the methods described above.
In an optional embodiment, the cloud platform-based alarm incident root cause optimal path determination system provided in the embodiment of the present application further includes:
and the event analysis module is used for analyzing the alarm events of all modules in a period of time before the alarm accident module according to the time sequence when the alarm accident occurs, and storing the name of the alarm module, the content of the alarm accident module and the alarm accident data information which occur in the period of time into the database.
And the statistical label module is used for comparing and analyzing the analysis result with the incidence relation database table of the alarm accident module through the alarm accident event analysis module, establishing a label of alarm incidence dispersion between the alarm accident module and the accident module, and determining the incidence dispersion of the alarm incidence relation from each module to the alarm accident module.
And the optimal path module is used for calculating the key paths from each alarm accident module to the alarm accident module at this time, and selecting a shortest path from the key paths as an optimal path, so that the highest incidence relation between the alarm accident module and the alarm accident module at this time is determined, namely the optimal path root cause of the accident.
In yet another aspect, a computer-readable storage medium having computer-executable instructions stored therein is provided, the computer-executable instructions being executable by a processor to implement the service failure monitoring method as any one of the above.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A cloud platform based alarm accident root cause optimal path determination method is characterized by comprising the following steps:
acquiring related alarm information of an alarm accident occurrence module, wherein the related alarm information comprises the time point of the occurrence of the alarm accident;
acquiring information of alarm events of all alarm accident modules within preset time before the time point according to the time point of the alarm accident;
analyzing the information of the alarm events of all the alarm accident modules within the preset time before the time point to obtain the information of the associated alarm accident modules;
obtaining the association dispersion of the association alarm accident module and the alarm accident module according to the association alarm accident module information;
and determining the optimal path of the root cause of the alarm accident according to the associated dispersion.
2. The cloud platform alarm accident root cause optimal path determination method based on claim 1, wherein the obtaining of the association dispersion between the associated alarm accident module and the current alarm accident module according to the associated alarm accident module information comprises:
acquiring the correlation degree between each related alarm accident module and the alarm accident module;
obtaining a correlation coefficient between each alarm associated accident module and the current alarm accident module according to the correlation degree;
and obtaining the association dispersion of the association alarm accident module and the alarm accident module according to the correlation coefficient.
3. The cloud platform alarm accident root cause optimal path determination method according to claim 2, wherein the obtaining of the associated dispersion of the associated alarm accident module and the current alarm accident module according to the correlation coefficient includes:
and the absolute value of the difference value between the correlation coefficient of the correlation alarm accident module and the correlation coefficient of the current alarm accident module is the correlation dispersion of the current alarm accident module.
4. The cloud platform alarm incident root cause optimal path determining method according to claim 2, wherein the obtaining of the correlation coefficient between each alarm correlation incident module and the current alarm incident module according to the correlation degree comprises:
acquiring the called frequency of each alarm related accident module in the preset time;
and obtaining the mean square error coefficient of each related alarm accident module called in the preset time according to the frequency of each alarm related accident module called in the preset time, and taking the mean square error coefficient as the correlation coefficient of each related alarm accident module and the alarm accident module at this time.
5. The cloud platform alarm incident root cause optimal path determination method according to claim 2, wherein the obtaining the correlation degree between each associated alarm incident module and the current alarm incident module comprises:
acquiring the change state of each associated alarm accident module and the current alarm accident module to the preset items;
acquiring a time sequence of each associated alarm accident module and the alarm accident module;
and obtaining the correlation degree of each associated alarm accident module and the current alarm accident module according to the change state of each associated alarm accident module and the current alarm accident module to the preset items and the time sequence of each associated alarm accident module and the current alarm accident module.
6. The cloud platform-based alarm incident root cause optimal path determination method according to claim 1, wherein the analyzing information of alarm events occurring in all alarm incident modules within a preset time before the time point to obtain associated alarm incident module information comprises:
acquiring alarm logs of all alarm accident modules within a preset time before the time sequence, the alarm accident modules and the access amount of the alarm accident modules;
and collecting the alarm log, the alarm accident module and the access amount of the alarm accident module and the alarm accident module information related to the alarm accident module to obtain the related alarm accident module information.
7. The cloud platform alarm incident root cause optimal path determination method according to claim 2, wherein determining an alarm incident root cause optimal path according to the associated dispersion comprises:
taking the alarm accident module as a central point, and acquiring a related alarm accident module of a first layer within a first preset correlation degree and a first preset correlation coefficient range;
taking the alarm accident module as a central point, acquiring an associated alarm accident module within a second preset correlation degree and a target preset correlation coefficient range from an associated alarm accident module acquired in a previous layer, wherein the first preset correlation degree is greater than the second preset correlation degree;
and determining the alarm accident root cause optimal path according to the associated dispersion of the associated alarm accident module within the target correlation coefficient range.
8. The cloud platform alarm incident root cause optimal path determination method of claim 1, wherein after determining an alarm incident root cause optimal path according to the associated dispersion, the method further comprises:
and searching an alarm root cause according to the optimal path of the alarm accident root cause, and performing a processing action according to the alarm root cause.
9. A cloud platform based alarm incident root cause optimal path determination system, the system comprising: memory, processor and computer program stored on the memory, characterized in that the processor executes the computer program to implement the method of any of claims 1-8.
CN202110287133.1A 2021-03-17 2021-03-17 Cloud platform-based method and system for determining optimal path of alarm accident root cause Active CN112905479B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110287133.1A CN112905479B (en) 2021-03-17 2021-03-17 Cloud platform-based method and system for determining optimal path of alarm accident root cause

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110287133.1A CN112905479B (en) 2021-03-17 2021-03-17 Cloud platform-based method and system for determining optimal path of alarm accident root cause

Publications (2)

Publication Number Publication Date
CN112905479A true CN112905479A (en) 2021-06-04
CN112905479B CN112905479B (en) 2024-05-10

Family

ID=76105310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110287133.1A Active CN112905479B (en) 2021-03-17 2021-03-17 Cloud platform-based method and system for determining optimal path of alarm accident root cause

Country Status (1)

Country Link
CN (1) CN112905479B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546205A (en) * 2010-12-20 2012-07-04 中国移动通信集团公司 Method and device for generating fault relation and determining fault
CN106789138A (en) * 2015-11-23 2017-05-31 中国移动通信集团广西有限公司 A kind of method and device of network alarm association analysis
KR20190096706A (en) * 2018-02-09 2019-08-20 주식회사 케이티 Method and Apparatus for Monitoring Abnormal of System through Service Relevance Tracking
CN112052151A (en) * 2020-10-09 2020-12-08 腾讯科技(深圳)有限公司 Fault root cause analysis method, device, equipment and storage medium
CN112152852A (en) * 2020-09-23 2020-12-29 创新奇智(北京)科技有限公司 Root cause analysis method, device, equipment and computer storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546205A (en) * 2010-12-20 2012-07-04 中国移动通信集团公司 Method and device for generating fault relation and determining fault
CN106789138A (en) * 2015-11-23 2017-05-31 中国移动通信集团广西有限公司 A kind of method and device of network alarm association analysis
KR20190096706A (en) * 2018-02-09 2019-08-20 주식회사 케이티 Method and Apparatus for Monitoring Abnormal of System through Service Relevance Tracking
CN112152852A (en) * 2020-09-23 2020-12-29 创新奇智(北京)科技有限公司 Root cause analysis method, device, equipment and computer storage medium
CN112052151A (en) * 2020-10-09 2020-12-08 腾讯科技(深圳)有限公司 Fault root cause analysis method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112905479B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
Gainaru et al. Fault prediction under the microscope: A closer look into HPC systems
US9294338B2 (en) Management computer and method for root cause analysis
US8677191B2 (en) Early detection of failing computers
US9967169B2 (en) Detecting network conditions based on correlation between trend lines
Jiang et al. Efficient fault detection and diagnosis in complex software systems with information-theoretic monitoring
Smith et al. An anomaly detection framework for autonomic management of compute cloud systems
CN108599977B (en) System and method for monitoring system availability based on statistical method
CN108809734B (en) Network alarm root analysis method, system, storage medium and computer equipment
CN112783682B (en) Abnormal automatic repairing method based on cloud mobile phone service
CN114528175A (en) Micro-service application system root cause positioning method, device, medium and equipment
US10110440B2 (en) Detecting network conditions based on derivatives of event trending
CN113010341A (en) Method and equipment for positioning fault memory
KR101876629B1 (en) Apparatus and method for monitoring condition based on bicdata analysis
CN112905479B (en) Cloud platform-based method and system for determining optimal path of alarm accident root cause
Zou et al. Improving log-based fault diagnosis by log classification
Taerat et al. Blue gene/l log analysis and time to interrupt estimation
CN116501705A (en) RAS-based memory information collecting and analyzing method, system, equipment and medium
CN111240936A (en) Data integrity checking method and equipment
CN111813872B (en) Method, device and equipment for generating fault troubleshooting model
CN114003466A (en) Fault root cause positioning method for micro-service application program
CN111581044A (en) Cluster optimization method, device, server and medium
CN113312197A (en) Method and apparatus for determining batch faults, computer storage medium and electronic device
CN111258788A (en) Disk failure prediction method, device and computer readable storage medium
CN113037550B (en) Service fault monitoring method, system and computer readable storage medium
US11271832B2 (en) Communication monitoring apparatus and communication monitoring method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant