CN112506763A - Automatic positioning method and device for database system fault root - Google Patents

Automatic positioning method and device for database system fault root Download PDF

Info

Publication number
CN112506763A
CN112506763A CN202011372173.8A CN202011372173A CN112506763A CN 112506763 A CN112506763 A CN 112506763A CN 202011372173 A CN202011372173 A CN 202011372173A CN 112506763 A CN112506763 A CN 112506763A
Authority
CN
China
Prior art keywords
monitoring index
monitoring
abnormal
relation
root cause
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011372173.8A
Other languages
Chinese (zh)
Inventor
裴丹
刘平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202011372173.8A priority Critical patent/CN112506763A/en
Publication of CN112506763A publication Critical patent/CN112506763A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a method and a device for automatically positioning a fault root cause of a database system, and relates to the technical field of data processing, wherein the method comprises the following steps: carrying out anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.

Description

Automatic positioning method and device for database system fault root
Technical Field
The application relates to the technical field of data processing, in particular to a method and a device for automatically positioning a fault root cause of a database system.
Background
When a computer system breaks down, operation and maintenance personnel need to rapidly analyze a large number of abnormal monitoring indexes in an abnormal machine, locate root cause indexes and then take measures to stop damage. In an actual production environment, because a large number of abnormal monitoring indexes depend on manual analysis of operation and maintenance personnel, the time consumption of the positioning process of the root index is long, and the system is in a fault state for a long time.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide an automatic location method for a root cause of a fault in a database system, which solves the problem of automatic location of a root cause monitoring index of a fault system, and improves the efficiency of locating the root cause monitoring index, so that the fault system can be quickly recovered to normal.
The second purpose of the present application is to provide an automatic database system fault root cause locating device.
To achieve the above object, an embodiment of a first aspect of the present application provides a method for automatically locating a failure root cause of a database system, including:
carrying out anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data;
constructing a monitoring index relation graph according to the abnormal monitoring index data;
analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm;
and determining a root cause monitoring index according to the sequencing result.
According to the automatic positioning method for the database system fault root, all monitoring indexes of an abnormal database are subjected to abnormal detection, and abnormal monitoring index data are obtained; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.
In an embodiment of the present application, the performing anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data includes:
acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period;
and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
In an embodiment of the present application, the constructing a monitoring index relation graph according to the abnormal monitoring index data includes:
acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data;
and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
In an embodiment of the application, the analyzing and sorting the monitoring indexes in the monitoring index relation graph according to a preset algorithm includes
Analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph;
and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
In an embodiment of the present application, the method for automatically locating a database system fault root further includes:
and sending the root cause monitoring index to target equipment for display.
In order to achieve the above object, a second aspect of the present application provides an automatic database system fault root cause locating device, including:
the acquisition module is used for carrying out abnormity detection on all monitoring indexes of the abnormity database and acquiring abnormity monitoring index data;
the construction module is used for constructing a monitoring index relation graph according to the abnormal monitoring index data;
the analysis module is used for analyzing and sequencing all monitoring indexes in the monitoring index relation graph according to a preset algorithm;
and the determining module is used for determining the root cause monitoring index according to the sequencing result.
According to the automatic positioning device for the database system fault root cause, all monitoring indexes of an abnormal database are subjected to abnormal detection, and abnormal monitoring index data are obtained; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.
In an embodiment of the present application, the obtaining module is configured to:
acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period;
and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
In one embodiment of the present application, the building module is configured to:
acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data;
and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
In an embodiment of the application, the analysis module is specifically configured to:
analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph;
and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
In an embodiment of the present application, the apparatus further includes:
and the sending module is used for sending the root cause monitoring index to target equipment for displaying.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart illustrating a method for automatically locating a failure root cause of a database system according to an embodiment of the present application;
FIG. 2 is an example of a smoothed noise segment according to an embodiment of the present application;
FIG. 3 is an example of a smoothing segment according to an embodiment of the present application;
FIG. 4 is a smooth processing procedure of a real anomaly monitoring index according to an embodiment of the present application;
FIG. 5 is a weighted undirected monitoring index relationship diagram according to an embodiment of the present application;
fig. 6 is a diagram illustrating an exemplary structure of an automatic database system fault root cause location system according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an automatic database system fault root cause locating device according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The database system fault root automatic positioning method and device according to the embodiment of the application are described below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a method for automatically locating a failure root cause of a database system according to an embodiment of the present application.
As shown in fig. 1, the automatic database system fault root locating method includes the following steps:
step 101, performing anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data.
In the embodiment of the application, the system abnormal time period of each monitoring index is obtained from a time sequence database, and all monitoring index data corresponding to the system abnormal time period and the time period with the preset difference value from the system abnormal time period are obtained; and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain anomaly monitoring index data.
In the embodiment of the application, all monitoring indexes of an abnormal database are subjected to abnormal detection, historical data (such as CPU utilization rate, disk utilization rate and the like) of the monitoring indexes are stored in a time sequence database, and when the abnormal detection is performed, the time sequence database is inquired to obtain data of each monitoring index in the vicinity of an abnormal time period of a system for analysis, so that in an actual production environment, even if the system is in a normal state, the monitoring index data of the system can have noise and irregular fluctuation.
Therefore, when the system is abnormal, noise, normal fluctuation, and abnormal fluctuation in the monitoring data are mixed together, which affects the performance of the abnormality detection. Therefore, the method designs a robust anomaly detection algorithm based on clustering, and the algorithm can effectively detect anomalies of monitoring data mixed with noise, normal fluctuation and abnormal fluctuation.
In the embodiment of the application, the cluster-based robust anomaly detection algorithm mainly divides the monitoring index into different segments through clustering, distinguishes a noise data segment and an anomaly fluctuation segment, and then carries out anomaly monitoring based on the cluster-based smoothing algorithm.
Specifically, the smoothing algorithm is a loop algorithm, and a part of the noise segment is smoothed in each loop. The algorithm will loop several times until all noise segments are smoothed, and the input of the algorithm is the raw data x of the monitoring indexiMonitoring index data after algorithm smoothing
Figure BDA0002806472650000041
In each cycle, firstly, the monitoring index data is gathered into two classes, such as a normal class and an abnormal class, through the gaussian mixture model, therefore, the normal class and the abnormal class are selected by the clustering number of the gaussian mixture model, and then, the monitoring index data is divided into different segments through the clustering result.
In the embodiment of the present application, { s } is usedjDenotes the jth segment, using | sjI denotes sjLength of the noise section, so when s is smaller than its neighborsjThe length of which is less than sj-1And sj+1Then min { | sj-1|,|sj+1|}>|sj|,{sjIs a noise segment that is smoothed using data from its neighbors in the manner shown in FIG. 2, if sjIs the noise segment, then s is usedj+1In random sampled data substitution sj
In particular from sj+1For the reason of medium sampling, if the monitoring index is divided into k segments and the left end of the analysis window happens to be in the noise region, s0Is a noise segment. If at that time s1Is also a noise segment, then s1Cannot be used from the noise section s0The data of (2) is smoothed. Further, since the right end of the analysis window is at a time period of system abnormality, the last segment sk-1May be an exception segment. If s isk-2Is a noise segment, then the exception segment s is usedk-1Data smoothing s in (1)k-2The result of the abnormality detection is not affected. Thus, using slave sj+1Segment randomly sampled data versus noise segment sjSmoothing is performed.
Finally, s0And sk-1Requiring separate treatment, first if s0Is of the same class as the noise segment that has been smoothed, then s0Also a noise section. At this time, the slave s is used1Data smoothing s of medium random sampling0. If there is no noise section to be smoothed, when s1Is greater than s1When, use is made of1Data smoothing s of medium random sampling0
As shown in fig. 3, smoothing s0Segment of, thussk-1In the system abnormal time period, so as to sk-1The smoothing is not carried out; as shown in fig. 4, a smoothing process for a real anomaly monitoring index is shown. It can be seen that after two smoothing passes, all the noisy data is smoothed out and the outlier data is preserved.
After the smoothing processing, the monitoring index is finally divided into a plurality of sections, and fluctuation exists between two adjacent sections. The fault may be associated due to the one fluctuation that is closest to the time of the system fault. Thus, the wooden application is concerned only with sk-2And sk-1And the degree of abnormality of the fluctuation can be measured by z-score.
First, sk-1Z-score through s for each data point ink-2Mean and variance of the data std:
Figure BDA0002806472650000051
using sk-1Mean of z-score for all data points in sk-1Z-score of (1). Then, s is judged by 3-Sigma rulek-2And sk-1Whether the fluctuation in between is an abnormal fluctuation. If s isk-1If the mean value of the z-score of all the data points in the data set is more than three times the variance, the fluctuation is abnormal fluctuation, and the corresponding monitoring index is also abnormal monitoring index.
And 102, constructing a monitoring index relation graph according to the abnormal monitoring index data.
In the embodiment of the application, the relation among all the abnormal monitoring indexes in the abnormal monitoring index data is obtained, and a monitoring index relation graph is constructed by taking all the abnormal monitoring indexes as points and taking the relation among all the abnormal monitoring indexes as sides.
In the embodiment of the application, the dependency graph construction algorithm can automatically construct a weighted undirected dependency graph to accurately represent the dependency relationship between abnormal monitoring indexes, and the constructed monitoring index relationship graph between the monitoring indexes is the core of root cause positioning.
Because the existing automatic dependency graph construction algorithm can deduce wrong dependency relationship, the application provides a Weighted Undirected Dependency Graph (WUDG), namely a monitoring index relation graph, which can more accurately represent the dependency relationship between monitoring indexes, and the core thought of constructing the weighted undirected dependency graph is as follows: if a dependency exists between two monitoring indexes, the two monitoring indexes are not independent, so that the design of the weighted undirected dependency graph is based on whether the dependency exists between the monitoring indexes (undirected graph) or not, and the direction of the dependency does not need to be inferred (directed graph). Determining whether a dependency exists may be more accurate than inferring the direction of the dependency.
Firstly, a full-connection graph among all abnormal monitoring indexes is constructed, nodes in the graph represent the monitoring indexes, edges represent the dependency relationship among the monitoring indexes, then the strength of the dependency relationship is calculated by performing independence check on the two monitoring indexes on each edge, for example, when the independence detection is performed on the two monitoring indexes X and Y, for example, the detection is performed by a Fisher-Z algorithm, and the independence between the two monitoring indexes X and Y is evaluated. The Fisher-Z algorithm is an independence detection method based on the Pearson correlation coefficient, and Fisher-Z transformation and partial correlation coefficient are combined. Where the Fisher-Z transform is used to evaluate the overall correlation and the partial correlation coefficients are used to evaluate the effect of other nodes, such as the Fisher-Z test between X and Y described above, can be expressed as
Figure BDA0002806472650000061
Where m denotes the number of monitoring index data, and r denotes a partial correlation coefficient between X and Y.
Specifically, the strength of the dependency relationship between two abnormal monitoring indexes can be measured by a p value of a zero hypothesis in Fisher-Z detection, and the p value is predicted to indicate that the weaker dependency relationship between the two abnormal monitoring indexes is, and conversely, to indicate that the stronger dependency relationship between the two abnormal monitoring indexes is, therefore, 1/p can be taken as the weight of an edge to finally generate a weighted undirected dependency relationship graph, such as an example of a monitoring index relationship graph shown in fig. 5, where the p value m isnRepresenting the p-value of the Fisher-Z measurement between the monitoring indices n and m.
And 103, analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm.
And 104, determining a root cause monitoring index according to the sequencing result.
In the embodiment of the application, the monitoring index relational graph is analyzed by a positioning algorithm based on a weighted web access evaluation method, the weight of the relation edge between abnormal monitoring indexes in the monitoring index relational graph is obtained, and the monitoring indexes are sorted according to the weight of the relation edge between the abnormal monitoring indexes.
In the embodiment of the application, the root cause related indexes are positioned based on the constructed monitoring index relation graph, and after the abnormal monitoring indexes are analyzed, the weighted undirected monitoring index relation graph among the abnormal monitoring indexes is generated.
It can be understood that a weighted undirected dependency graph between abnormal monitoring indexes, i.e. a monitoring index relation graph, is generated, and the graph contains root cause related indexes and symptom indexes. Therefore, in the application, it is assumed that the root cause related indexes are indexes with the largest influence in the Weighted undirected monitoring index relation graph, and a Weighted web access evaluation method (Weighted PageRank) can measure the influence of the nodes in the Weighted undirected graph. Therefore, the method designs a positioning algorithm based on a Weighted webpage access evaluation method (Weighted PageRank) to analyze the monitoring index relation graph and finally outputs a possible ranking result of the root cause related indexes, namely a diagnosis result of the root cause monitoring index automatic positioning technology.
Specifically, the weighted web page access evaluation method can measure the influence of the nodes in the weighted undirected graph, and for one monitoring index u, the score of u is calculated based on the weighted web page access evaluation method
Figure BDA0002806472650000062
Wherein, b (u) represents a node set (abnormal monitoring index with dependency relationship) directly connected with u, represents the weight of the edge between the node u and the node v, d is a constant, and by setting to 0.85, all the abnormal monitoring indexes are sorted by the calculated score, and the abnormal monitoring indexes arranged in the front are possible root cause correlation indexes.
In the embodiment of the application, the root cause monitoring index is sent to the target device to be displayed, so that the root cause monitoring index can be rapidly known, system faults are processed, and a fault system can be rapidly recovered to be normal.
Specifically, as shown in fig. 6, Web services require an underlying database to support their critical business and real-time applications. The root cause monitoring index automatic positioning technology is triggered when the performance of the database system is abnormal, for example, the response time of the database suddenly increases to carry out abnormal detection, the monitoring index relational graph is constructed and the root cause monitoring index automatic positioning technology is analyzed, and after the root cause monitoring index automatic positioning technology is analyzed, operation and maintenance personnel can rapidly take loss stopping measures based on a diagnosis result to enable the system to recover to be normal as soon as possible, wherein the common loss stopping measures comprise: SQL (database language) flow control, SQL optimization, system capacity expansion, and the like.
Therefore, with the rapid development of cloud services, performance monitoring and root cause analysis of the underlying database cluster supporting the cloud services face greater and greater challenges. For a bottom-layer large-scale database cluster supporting cloud services, database exception for hundreds of times per day makes manual exception diagnosis impossible. The system can automatically diagnose the performance abnormity of the online system, and when the system is abnormal, the root cause related indexes can be quickly positioned through the system, so that operation and maintenance personnel can rapidly analyze and take loss stopping measures, the system can be timely recovered to be normal, and the operation and maintenance personnel can focus on the root cause related indexes positioned by the algorithm, thereby greatly reducing the influence of alarm storm.
According to the automatic positioning method for the database system fault root, all monitoring indexes of an abnormal database are subjected to abnormal detection, and abnormal monitoring index data are obtained; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.
In order to implement the above embodiments, the present application further provides an automatic positioning device for a database system fault root cause.
Fig. 7 is a schematic structural diagram of an automatic positioning device for a database system fault root cause according to an embodiment of the present application.
As shown in fig. 7, the automatic database system fault root cause locating device includes: an acquisition module 210, a construction module 220, an analysis module 230, and a determination module 240.
The obtaining module 210 is configured to perform anomaly detection on all monitoring indexes of the anomaly database, and obtain anomaly monitoring index data.
The constructing module 220 is configured to construct a monitoring index relation graph according to the abnormal monitoring index data.
And the analysis module 230 is configured to analyze and sort the monitoring indexes in the monitoring index relation graph according to a preset algorithm.
And a determining module 240, configured to determine the root cause monitoring indicator according to the sorting result.
In an embodiment of the present application, the obtaining module 210 is configured to: acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period; and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
In one embodiment of the present application, a build module 220 is configured to: acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data; and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
In an embodiment of the present application, the analysis module 230 is specifically configured to: analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph; and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
In an embodiment of the present application, the apparatus further includes: and the sending module is used for sending the root cause monitoring index to target equipment for displaying.
According to the automatic positioning device for the database system fault root cause, all monitoring indexes of an abnormal database are subjected to abnormal detection, and abnormal monitoring index data are obtained; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.
It should be noted that the foregoing explanation of the embodiment of the method for automatically locating a database system fault root cause is also applicable to the apparatus for automatically locating a database system fault root cause of the embodiment, and is not repeated here.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A database system fault root cause automatic positioning method is characterized by comprising the following steps:
carrying out anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data;
constructing a monitoring index relation graph according to the abnormal monitoring index data;
analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm;
and determining a root cause monitoring index according to the sequencing result.
2. The method of claim 1, wherein the performing anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data comprises:
acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period;
and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
3. The method of claim 1, wherein the constructing a monitoring index relationship graph according to the abnormal monitoring index data comprises:
acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data;
and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
4. The method of claim 1, wherein the analyzing and sequencing the monitoring indicators in the monitoring indicator relationship graph according to a preset algorithm comprises
Analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph;
and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
5. The method of claim 1, further comprising:
and sending the root cause monitoring index to target equipment for display.
6. An automatic positioning device for a fault root cause of a database system is characterized by comprising:
the acquisition module is used for carrying out abnormity detection on all monitoring indexes of the abnormity database and acquiring abnormity monitoring index data;
the construction module is used for constructing a monitoring index relation graph according to the abnormal monitoring index data;
the analysis module is used for analyzing and sequencing all monitoring indexes in the monitoring index relation graph according to a preset algorithm;
and the determining module is used for determining the root cause monitoring index according to the sequencing result.
7. The apparatus of claim 6, wherein the acquisition module is to:
acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period;
and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
8. The apparatus of claim 6, wherein the build module is to:
acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data;
and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
9. The apparatus of claim 6, wherein the analysis module is specifically configured to:
analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph;
and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
10. The apparatus of claim 6, further comprising:
and the sending module is used for sending the root cause monitoring index to target equipment for displaying.
CN202011372173.8A 2020-11-30 2020-11-30 Automatic positioning method and device for database system fault root Pending CN112506763A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011372173.8A CN112506763A (en) 2020-11-30 2020-11-30 Automatic positioning method and device for database system fault root

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011372173.8A CN112506763A (en) 2020-11-30 2020-11-30 Automatic positioning method and device for database system fault root

Publications (1)

Publication Number Publication Date
CN112506763A true CN112506763A (en) 2021-03-16

Family

ID=74967789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011372173.8A Pending CN112506763A (en) 2020-11-30 2020-11-30 Automatic positioning method and device for database system fault root

Country Status (1)

Country Link
CN (1) CN112506763A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342889A (en) * 2021-06-03 2021-09-03 中国工商银行股份有限公司 Distributed database management method, device, equipment and medium
CN113505044A (en) * 2021-09-09 2021-10-15 格创东智(深圳)科技有限公司 Database warning method, device, equipment and storage medium
CN114385451A (en) * 2022-01-11 2022-04-22 上海鹤优信息科技有限公司 Fault root cause analysis method
CN114553675A (en) * 2022-03-24 2022-05-27 中国联合网络通信集团有限公司 Fault network element processing method, device and storage medium
CN117389230A (en) * 2023-11-16 2024-01-12 广州中健中医药科技有限公司 Antihypertensive traditional Chinese medicine extract production control method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102938708A (en) * 2012-11-05 2013-02-20 国网电力科学研究院 Alarm transmission mode based alarm correlation analysis system and analysis method thereof
US20160080476A1 (en) * 2014-08-11 2016-03-17 Systems & Technology Research, Llc Meme discovery system
US20160350294A1 (en) * 2015-05-31 2016-12-01 Thomson Reuters Global Resources Method and system for peer detection
US20180196835A1 (en) * 2017-01-06 2018-07-12 International Business Machines Corporation Root cause analysis of performance problems
CN108923952A (en) * 2018-05-31 2018-11-30 北京百度网讯科技有限公司 Method for diagnosing faults, equipment and storage medium based on service monitoring index
CN109992440A (en) * 2019-04-02 2019-07-09 北京睿至大数据有限公司 A kind of IT root accident analysis recognition methods of knowledge based map and machine learning
CN110493025A (en) * 2018-05-15 2019-11-22 中国移动通信集团浙江有限公司 It is a kind of based on the failure root of multilayer digraph because of the method and device of diagnosis
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
CN111459695A (en) * 2020-03-12 2020-07-28 平安科技(深圳)有限公司 Root cause positioning method and device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102938708A (en) * 2012-11-05 2013-02-20 国网电力科学研究院 Alarm transmission mode based alarm correlation analysis system and analysis method thereof
US20160080476A1 (en) * 2014-08-11 2016-03-17 Systems & Technology Research, Llc Meme discovery system
US20160350294A1 (en) * 2015-05-31 2016-12-01 Thomson Reuters Global Resources Method and system for peer detection
US20180196835A1 (en) * 2017-01-06 2018-07-12 International Business Machines Corporation Root cause analysis of performance problems
CN110493025A (en) * 2018-05-15 2019-11-22 中国移动通信集团浙江有限公司 It is a kind of based on the failure root of multilayer digraph because of the method and device of diagnosis
CN108923952A (en) * 2018-05-31 2018-11-30 北京百度网讯科技有限公司 Method for diagnosing faults, equipment and storage medium based on service monitoring index
CN109992440A (en) * 2019-04-02 2019-07-09 北京睿至大数据有限公司 A kind of IT root accident analysis recognition methods of knowledge based map and machine learning
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
CN111459695A (en) * 2020-03-12 2020-07-28 平安科技(深圳)有限公司 Root cause positioning method and device, computer equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
MINGHUA MA等: ""Diagnosing root causes of intermittent slow queries in cloud databases"", 《PROCEEDINGS OF THE VLDB ENDOWMENT》 *
PING LIU等: ""FluxInfer: Automatic Diagnosis of Performance Anomaly for Online Database System"", 《2020 IEEE 39TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC)》 *
PING LIU等: ""FluxRank: A Widely-Deployable Framework to Automatically Localizing Root Cause Machines for Software Service Failure Mitigation"", 《2019 IEEE 30TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE)》 *
YUAN MENG等: ""Localizing Failure Root Causes in a Microservice through Causality Inference"", 《 2020 IEEE/ACM 28TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS)》 *
李海斌 等: ""一种无监督的数据库用户行为异常检测方法"", 《小型微型计算机系统》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342889A (en) * 2021-06-03 2021-09-03 中国工商银行股份有限公司 Distributed database management method, device, equipment and medium
CN113505044A (en) * 2021-09-09 2021-10-15 格创东智(深圳)科技有限公司 Database warning method, device, equipment and storage medium
CN114385451A (en) * 2022-01-11 2022-04-22 上海鹤优信息科技有限公司 Fault root cause analysis method
CN114553675A (en) * 2022-03-24 2022-05-27 中国联合网络通信集团有限公司 Fault network element processing method, device and storage medium
CN114553675B (en) * 2022-03-24 2023-05-09 中国联合网络通信集团有限公司 Fault network element processing method, device and storage medium
CN117389230A (en) * 2023-11-16 2024-01-12 广州中健中医药科技有限公司 Antihypertensive traditional Chinese medicine extract production control method and system
CN117389230B (en) * 2023-11-16 2024-06-07 广州中健中医药科技有限公司 Antihypertensive traditional Chinese medicine extract production control method and system

Similar Documents

Publication Publication Date Title
CN112506763A (en) Automatic positioning method and device for database system fault root
US10852357B2 (en) System and method for UPS battery monitoring and data analysis
KR102141391B1 (en) Failure data management method based on cluster estimation
US10373065B2 (en) Generating database cluster health alerts using machine learning
US7693982B2 (en) Automated diagnosis and forecasting of service level objective states
US20210397175A1 (en) Abnormality detection device, abnormality detection method, and program
US20200143292A1 (en) Signature enhancement for deviation measurement-based classification of a detected anomaly in an industrial asset
CN116450399B (en) Fault diagnosis and root cause positioning method for micro service system
CN117312997B (en) Intelligent diagnosis method and system for power management system
JP6714498B2 (en) Equipment diagnosis device and equipment diagnosis method
JP6540531B2 (en) Monitoring device and control method of monitoring device
JP6521096B2 (en) Display method, display device, and program
US12079070B2 (en) Alert similarity and label transfer
JP2020071624A (en) Abnormality diagnosing apparatus, abnormality diagnosing method and program
CN107391335B (en) Method and equipment for checking health state of cluster
KR102059112B1 (en) IoT STREAM DATA QUALITY MEASUREMENT INDICATORS AND PROFILING METHOD FOR INTERNET OF THINGS AND SYSTEM THEREFORE
CN112581719B (en) Semiconductor packaging process early warning method and device based on time sequence generation countermeasure network
JPH08234832A (en) Device and method for monitoring and diagnostic plant
JP6777142B2 (en) System analyzer, system analysis method, and program
EP2135144B1 (en) Machine condition monitoring using pattern rules
CN114202009A (en) Medical equipment performance index abnormity detection method and device based on PU learning
US11378944B2 (en) System analysis method, system analysis apparatus, and program
JP6898607B2 (en) Abnormality sign detection system and abnormality sign detection method
CN109990803A (en) The method, apparatus of method, apparatus and the sensor processing of detection system exception
US11339763B2 (en) Method for windmill farm monitoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210316