CN112506763A - Automatic positioning method and device for database system fault root - Google Patents
Automatic positioning method and device for database system fault root Download PDFInfo
- Publication number
- CN112506763A CN112506763A CN202011372173.8A CN202011372173A CN112506763A CN 112506763 A CN112506763 A CN 112506763A CN 202011372173 A CN202011372173 A CN 202011372173A CN 112506763 A CN112506763 A CN 112506763A
- Authority
- CN
- China
- Prior art keywords
- monitoring index
- monitoring
- abnormal
- relation
- root cause
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3688—Test management for test execution, e.g. scheduling of test suites
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application provides a method and a device for automatically positioning a fault root cause of a database system, and relates to the technical field of data processing, wherein the method comprises the following steps: carrying out anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.
Description
Technical Field
The application relates to the technical field of data processing, in particular to a method and a device for automatically positioning a fault root cause of a database system.
Background
When a computer system breaks down, operation and maintenance personnel need to rapidly analyze a large number of abnormal monitoring indexes in an abnormal machine, locate root cause indexes and then take measures to stop damage. In an actual production environment, because a large number of abnormal monitoring indexes depend on manual analysis of operation and maintenance personnel, the time consumption of the positioning process of the root index is long, and the system is in a fault state for a long time.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide an automatic location method for a root cause of a fault in a database system, which solves the problem of automatic location of a root cause monitoring index of a fault system, and improves the efficiency of locating the root cause monitoring index, so that the fault system can be quickly recovered to normal.
The second purpose of the present application is to provide an automatic database system fault root cause locating device.
To achieve the above object, an embodiment of a first aspect of the present application provides a method for automatically locating a failure root cause of a database system, including:
carrying out anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data;
constructing a monitoring index relation graph according to the abnormal monitoring index data;
analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm;
and determining a root cause monitoring index according to the sequencing result.
According to the automatic positioning method for the database system fault root, all monitoring indexes of an abnormal database are subjected to abnormal detection, and abnormal monitoring index data are obtained; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.
In an embodiment of the present application, the performing anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data includes:
acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period;
and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
In an embodiment of the present application, the constructing a monitoring index relation graph according to the abnormal monitoring index data includes:
acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data;
and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
In an embodiment of the application, the analyzing and sorting the monitoring indexes in the monitoring index relation graph according to a preset algorithm includes
Analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph;
and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
In an embodiment of the present application, the method for automatically locating a database system fault root further includes:
and sending the root cause monitoring index to target equipment for display.
In order to achieve the above object, a second aspect of the present application provides an automatic database system fault root cause locating device, including:
the acquisition module is used for carrying out abnormity detection on all monitoring indexes of the abnormity database and acquiring abnormity monitoring index data;
the construction module is used for constructing a monitoring index relation graph according to the abnormal monitoring index data;
the analysis module is used for analyzing and sequencing all monitoring indexes in the monitoring index relation graph according to a preset algorithm;
and the determining module is used for determining the root cause monitoring index according to the sequencing result.
According to the automatic positioning device for the database system fault root cause, all monitoring indexes of an abnormal database are subjected to abnormal detection, and abnormal monitoring index data are obtained; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.
In an embodiment of the present application, the obtaining module is configured to:
acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period;
and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
In one embodiment of the present application, the building module is configured to:
acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data;
and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
In an embodiment of the application, the analysis module is specifically configured to:
analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph;
and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
In an embodiment of the present application, the apparatus further includes:
and the sending module is used for sending the root cause monitoring index to target equipment for displaying.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart illustrating a method for automatically locating a failure root cause of a database system according to an embodiment of the present application;
FIG. 2 is an example of a smoothed noise segment according to an embodiment of the present application;
FIG. 3 is an example of a smoothing segment according to an embodiment of the present application;
FIG. 4 is a smooth processing procedure of a real anomaly monitoring index according to an embodiment of the present application;
FIG. 5 is a weighted undirected monitoring index relationship diagram according to an embodiment of the present application;
fig. 6 is a diagram illustrating an exemplary structure of an automatic database system fault root cause location system according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an automatic database system fault root cause locating device according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The database system fault root automatic positioning method and device according to the embodiment of the application are described below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a method for automatically locating a failure root cause of a database system according to an embodiment of the present application.
As shown in fig. 1, the automatic database system fault root locating method includes the following steps:
In the embodiment of the application, the system abnormal time period of each monitoring index is obtained from a time sequence database, and all monitoring index data corresponding to the system abnormal time period and the time period with the preset difference value from the system abnormal time period are obtained; and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain anomaly monitoring index data.
In the embodiment of the application, all monitoring indexes of an abnormal database are subjected to abnormal detection, historical data (such as CPU utilization rate, disk utilization rate and the like) of the monitoring indexes are stored in a time sequence database, and when the abnormal detection is performed, the time sequence database is inquired to obtain data of each monitoring index in the vicinity of an abnormal time period of a system for analysis, so that in an actual production environment, even if the system is in a normal state, the monitoring index data of the system can have noise and irregular fluctuation.
Therefore, when the system is abnormal, noise, normal fluctuation, and abnormal fluctuation in the monitoring data are mixed together, which affects the performance of the abnormality detection. Therefore, the method designs a robust anomaly detection algorithm based on clustering, and the algorithm can effectively detect anomalies of monitoring data mixed with noise, normal fluctuation and abnormal fluctuation.
In the embodiment of the application, the cluster-based robust anomaly detection algorithm mainly divides the monitoring index into different segments through clustering, distinguishes a noise data segment and an anomaly fluctuation segment, and then carries out anomaly monitoring based on the cluster-based smoothing algorithm.
Specifically, the smoothing algorithm is a loop algorithm, and a part of the noise segment is smoothed in each loop. The algorithm will loop several times until all noise segments are smoothed, and the input of the algorithm is the raw data x of the monitoring indexiMonitoring index data after algorithm smoothing
In each cycle, firstly, the monitoring index data is gathered into two classes, such as a normal class and an abnormal class, through the gaussian mixture model, therefore, the normal class and the abnormal class are selected by the clustering number of the gaussian mixture model, and then, the monitoring index data is divided into different segments through the clustering result.
In the embodiment of the present application, { s } is usedjDenotes the jth segment, using | sjI denotes sjLength of the noise section, so when s is smaller than its neighborsjThe length of which is less than sj-1And sj+1Then min { | sj-1|,|sj+1|}>|sj|,{sjIs a noise segment that is smoothed using data from its neighbors in the manner shown in FIG. 2, if sjIs the noise segment, then s is usedj+1In random sampled data substitution sj。
In particular from sj+1For the reason of medium sampling, if the monitoring index is divided into k segments and the left end of the analysis window happens to be in the noise region, s0Is a noise segment. If at that time s1Is also a noise segment, then s1Cannot be used from the noise section s0The data of (2) is smoothed. Further, since the right end of the analysis window is at a time period of system abnormality, the last segment sk-1May be an exception segment. If s isk-2Is a noise segment, then the exception segment s is usedk-1Data smoothing s in (1)k-2The result of the abnormality detection is not affected. Thus, using slave sj+1Segment randomly sampled data versus noise segment sjSmoothing is performed.
Finally, s0And sk-1Requiring separate treatment, first if s0Is of the same class as the noise segment that has been smoothed, then s0Also a noise section. At this time, the slave s is used1Data smoothing s of medium random sampling0. If there is no noise section to be smoothed, when s1Is greater than s1When, use is made of1Data smoothing s of medium random sampling0。
As shown in fig. 3, smoothing s0Segment of, thussk-1In the system abnormal time period, so as to sk-1The smoothing is not carried out; as shown in fig. 4, a smoothing process for a real anomaly monitoring index is shown. It can be seen that after two smoothing passes, all the noisy data is smoothed out and the outlier data is preserved.
After the smoothing processing, the monitoring index is finally divided into a plurality of sections, and fluctuation exists between two adjacent sections. The fault may be associated due to the one fluctuation that is closest to the time of the system fault. Thus, the wooden application is concerned only with sk-2And sk-1And the degree of abnormality of the fluctuation can be measured by z-score.
First, sk-1Z-score through s for each data point ink-2Mean and variance of the data std:using sk-1Mean of z-score for all data points in sk-1Z-score of (1). Then, s is judged by 3-Sigma rulek-2And sk-1Whether the fluctuation in between is an abnormal fluctuation. If s isk-1If the mean value of the z-score of all the data points in the data set is more than three times the variance, the fluctuation is abnormal fluctuation, and the corresponding monitoring index is also abnormal monitoring index.
And 102, constructing a monitoring index relation graph according to the abnormal monitoring index data.
In the embodiment of the application, the relation among all the abnormal monitoring indexes in the abnormal monitoring index data is obtained, and a monitoring index relation graph is constructed by taking all the abnormal monitoring indexes as points and taking the relation among all the abnormal monitoring indexes as sides.
In the embodiment of the application, the dependency graph construction algorithm can automatically construct a weighted undirected dependency graph to accurately represent the dependency relationship between abnormal monitoring indexes, and the constructed monitoring index relationship graph between the monitoring indexes is the core of root cause positioning.
Because the existing automatic dependency graph construction algorithm can deduce wrong dependency relationship, the application provides a Weighted Undirected Dependency Graph (WUDG), namely a monitoring index relation graph, which can more accurately represent the dependency relationship between monitoring indexes, and the core thought of constructing the weighted undirected dependency graph is as follows: if a dependency exists between two monitoring indexes, the two monitoring indexes are not independent, so that the design of the weighted undirected dependency graph is based on whether the dependency exists between the monitoring indexes (undirected graph) or not, and the direction of the dependency does not need to be inferred (directed graph). Determining whether a dependency exists may be more accurate than inferring the direction of the dependency.
Firstly, a full-connection graph among all abnormal monitoring indexes is constructed, nodes in the graph represent the monitoring indexes, edges represent the dependency relationship among the monitoring indexes, then the strength of the dependency relationship is calculated by performing independence check on the two monitoring indexes on each edge, for example, when the independence detection is performed on the two monitoring indexes X and Y, for example, the detection is performed by a Fisher-Z algorithm, and the independence between the two monitoring indexes X and Y is evaluated. The Fisher-Z algorithm is an independence detection method based on the Pearson correlation coefficient, and Fisher-Z transformation and partial correlation coefficient are combined. Where the Fisher-Z transform is used to evaluate the overall correlation and the partial correlation coefficients are used to evaluate the effect of other nodes, such as the Fisher-Z test between X and Y described above, can be expressed asWhere m denotes the number of monitoring index data, and r denotes a partial correlation coefficient between X and Y.
Specifically, the strength of the dependency relationship between two abnormal monitoring indexes can be measured by a p value of a zero hypothesis in Fisher-Z detection, and the p value is predicted to indicate that the weaker dependency relationship between the two abnormal monitoring indexes is, and conversely, to indicate that the stronger dependency relationship between the two abnormal monitoring indexes is, therefore, 1/p can be taken as the weight of an edge to finally generate a weighted undirected dependency relationship graph, such as an example of a monitoring index relationship graph shown in fig. 5, where the p value m isnRepresenting the p-value of the Fisher-Z measurement between the monitoring indices n and m.
And 103, analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm.
And 104, determining a root cause monitoring index according to the sequencing result.
In the embodiment of the application, the monitoring index relational graph is analyzed by a positioning algorithm based on a weighted web access evaluation method, the weight of the relation edge between abnormal monitoring indexes in the monitoring index relational graph is obtained, and the monitoring indexes are sorted according to the weight of the relation edge between the abnormal monitoring indexes.
In the embodiment of the application, the root cause related indexes are positioned based on the constructed monitoring index relation graph, and after the abnormal monitoring indexes are analyzed, the weighted undirected monitoring index relation graph among the abnormal monitoring indexes is generated.
It can be understood that a weighted undirected dependency graph between abnormal monitoring indexes, i.e. a monitoring index relation graph, is generated, and the graph contains root cause related indexes and symptom indexes. Therefore, in the application, it is assumed that the root cause related indexes are indexes with the largest influence in the Weighted undirected monitoring index relation graph, and a Weighted web access evaluation method (Weighted PageRank) can measure the influence of the nodes in the Weighted undirected graph. Therefore, the method designs a positioning algorithm based on a Weighted webpage access evaluation method (Weighted PageRank) to analyze the monitoring index relation graph and finally outputs a possible ranking result of the root cause related indexes, namely a diagnosis result of the root cause monitoring index automatic positioning technology.
Specifically, the weighted web page access evaluation method can measure the influence of the nodes in the weighted undirected graph, and for one monitoring index u, the score of u is calculated based on the weighted web page access evaluation method
Wherein, b (u) represents a node set (abnormal monitoring index with dependency relationship) directly connected with u, represents the weight of the edge between the node u and the node v, d is a constant, and by setting to 0.85, all the abnormal monitoring indexes are sorted by the calculated score, and the abnormal monitoring indexes arranged in the front are possible root cause correlation indexes.
In the embodiment of the application, the root cause monitoring index is sent to the target device to be displayed, so that the root cause monitoring index can be rapidly known, system faults are processed, and a fault system can be rapidly recovered to be normal.
Specifically, as shown in fig. 6, Web services require an underlying database to support their critical business and real-time applications. The root cause monitoring index automatic positioning technology is triggered when the performance of the database system is abnormal, for example, the response time of the database suddenly increases to carry out abnormal detection, the monitoring index relational graph is constructed and the root cause monitoring index automatic positioning technology is analyzed, and after the root cause monitoring index automatic positioning technology is analyzed, operation and maintenance personnel can rapidly take loss stopping measures based on a diagnosis result to enable the system to recover to be normal as soon as possible, wherein the common loss stopping measures comprise: SQL (database language) flow control, SQL optimization, system capacity expansion, and the like.
Therefore, with the rapid development of cloud services, performance monitoring and root cause analysis of the underlying database cluster supporting the cloud services face greater and greater challenges. For a bottom-layer large-scale database cluster supporting cloud services, database exception for hundreds of times per day makes manual exception diagnosis impossible. The system can automatically diagnose the performance abnormity of the online system, and when the system is abnormal, the root cause related indexes can be quickly positioned through the system, so that operation and maintenance personnel can rapidly analyze and take loss stopping measures, the system can be timely recovered to be normal, and the operation and maintenance personnel can focus on the root cause related indexes positioned by the algorithm, thereby greatly reducing the influence of alarm storm.
According to the automatic positioning method for the database system fault root, all monitoring indexes of an abnormal database are subjected to abnormal detection, and abnormal monitoring index data are obtained; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.
In order to implement the above embodiments, the present application further provides an automatic positioning device for a database system fault root cause.
Fig. 7 is a schematic structural diagram of an automatic positioning device for a database system fault root cause according to an embodiment of the present application.
As shown in fig. 7, the automatic database system fault root cause locating device includes: an acquisition module 210, a construction module 220, an analysis module 230, and a determination module 240.
The obtaining module 210 is configured to perform anomaly detection on all monitoring indexes of the anomaly database, and obtain anomaly monitoring index data.
The constructing module 220 is configured to construct a monitoring index relation graph according to the abnormal monitoring index data.
And the analysis module 230 is configured to analyze and sort the monitoring indexes in the monitoring index relation graph according to a preset algorithm.
And a determining module 240, configured to determine the root cause monitoring indicator according to the sorting result.
In an embodiment of the present application, the obtaining module 210 is configured to: acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period; and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
In one embodiment of the present application, a build module 220 is configured to: acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data; and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
In an embodiment of the present application, the analysis module 230 is specifically configured to: analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph; and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
In an embodiment of the present application, the apparatus further includes: and the sending module is used for sending the root cause monitoring index to target equipment for displaying.
According to the automatic positioning device for the database system fault root cause, all monitoring indexes of an abnormal database are subjected to abnormal detection, and abnormal monitoring index data are obtained; constructing a monitoring index relation graph according to the abnormal monitoring index data; analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm; and determining a root cause monitoring index according to the sequencing result. Therefore, the problem of automatic positioning of the root cause monitoring index of the fault system is solved, and the efficiency of positioning the root cause monitoring index is improved, so that the fault system can be quickly recovered to be normal.
It should be noted that the foregoing explanation of the embodiment of the method for automatically locating a database system fault root cause is also applicable to the apparatus for automatically locating a database system fault root cause of the embodiment, and is not repeated here.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
Claims (10)
1. A database system fault root cause automatic positioning method is characterized by comprising the following steps:
carrying out anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data;
constructing a monitoring index relation graph according to the abnormal monitoring index data;
analyzing and sequencing each monitoring index in the monitoring index relation graph according to a preset algorithm;
and determining a root cause monitoring index according to the sequencing result.
2. The method of claim 1, wherein the performing anomaly detection on all monitoring indexes of an anomaly database to obtain anomaly monitoring index data comprises:
acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period;
and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
3. The method of claim 1, wherein the constructing a monitoring index relationship graph according to the abnormal monitoring index data comprises:
acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data;
and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
4. The method of claim 1, wherein the analyzing and sequencing the monitoring indicators in the monitoring indicator relationship graph according to a preset algorithm comprises
Analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph;
and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
5. The method of claim 1, further comprising:
and sending the root cause monitoring index to target equipment for display.
6. An automatic positioning device for a fault root cause of a database system is characterized by comprising:
the acquisition module is used for carrying out abnormity detection on all monitoring indexes of the abnormity database and acquiring abnormity monitoring index data;
the construction module is used for constructing a monitoring index relation graph according to the abnormal monitoring index data;
the analysis module is used for analyzing and sequencing all monitoring indexes in the monitoring index relation graph according to a preset algorithm;
and the determining module is used for determining the root cause monitoring index according to the sequencing result.
7. The apparatus of claim 6, wherein the acquisition module is to:
acquiring a system abnormal time period of each monitoring index from a time sequence database, and acquiring all monitoring index data corresponding to the system abnormal time period and a time period with a preset difference from the system abnormal time period;
and analyzing all monitoring index data by using a clustering-based robust anomaly detection algorithm to obtain the anomaly monitoring index data.
8. The apparatus of claim 6, wherein the build module is to:
acquiring the relation between each abnormal monitoring index in the abnormal monitoring index data;
and constructing the monitoring index relation graph by taking each abnormal monitoring index as a point and taking the relation between each abnormal monitoring index as a side.
9. The apparatus of claim 6, wherein the analysis module is specifically configured to:
analyzing the monitoring index relation graph by a positioning algorithm based on a weight type webpage access evaluation method to obtain the weight of a relation edge between each abnormal monitoring index in the monitoring index relation graph;
and sequencing the monitoring indexes according to the weight of the relation edge between the abnormal monitoring indexes.
10. The apparatus of claim 6, further comprising:
and the sending module is used for sending the root cause monitoring index to target equipment for displaying.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011372173.8A CN112506763A (en) | 2020-11-30 | 2020-11-30 | Automatic positioning method and device for database system fault root |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011372173.8A CN112506763A (en) | 2020-11-30 | 2020-11-30 | Automatic positioning method and device for database system fault root |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112506763A true CN112506763A (en) | 2021-03-16 |
Family
ID=74967789
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011372173.8A Pending CN112506763A (en) | 2020-11-30 | 2020-11-30 | Automatic positioning method and device for database system fault root |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112506763A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113342889A (en) * | 2021-06-03 | 2021-09-03 | 中国工商银行股份有限公司 | Distributed database management method, device, equipment and medium |
CN113505044A (en) * | 2021-09-09 | 2021-10-15 | 格创东智(深圳)科技有限公司 | Database warning method, device, equipment and storage medium |
CN114385451A (en) * | 2022-01-11 | 2022-04-22 | 上海鹤优信息科技有限公司 | Fault root cause analysis method |
CN114553675A (en) * | 2022-03-24 | 2022-05-27 | 中国联合网络通信集团有限公司 | Fault network element processing method, device and storage medium |
CN117389230A (en) * | 2023-11-16 | 2024-01-12 | 广州中健中医药科技有限公司 | Antihypertensive traditional Chinese medicine extract production control method and system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102938708A (en) * | 2012-11-05 | 2013-02-20 | 国网电力科学研究院 | Alarm transmission mode based alarm correlation analysis system and analysis method thereof |
US20160080476A1 (en) * | 2014-08-11 | 2016-03-17 | Systems & Technology Research, Llc | Meme discovery system |
US20160350294A1 (en) * | 2015-05-31 | 2016-12-01 | Thomson Reuters Global Resources | Method and system for peer detection |
US20180196835A1 (en) * | 2017-01-06 | 2018-07-12 | International Business Machines Corporation | Root cause analysis of performance problems |
CN108923952A (en) * | 2018-05-31 | 2018-11-30 | 北京百度网讯科技有限公司 | Method for diagnosing faults, equipment and storage medium based on service monitoring index |
CN109992440A (en) * | 2019-04-02 | 2019-07-09 | 北京睿至大数据有限公司 | A kind of IT root accident analysis recognition methods of knowledge based map and machine learning |
CN110493025A (en) * | 2018-05-15 | 2019-11-22 | 中国移动通信集团浙江有限公司 | It is a kind of based on the failure root of multilayer digraph because of the method and device of diagnosis |
CN110888755A (en) * | 2019-11-15 | 2020-03-17 | 亚信科技(中国)有限公司 | Method and device for searching abnormal root node of micro-service system |
CN111459695A (en) * | 2020-03-12 | 2020-07-28 | 平安科技(深圳)有限公司 | Root cause positioning method and device, computer equipment and storage medium |
-
2020
- 2020-11-30 CN CN202011372173.8A patent/CN112506763A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102938708A (en) * | 2012-11-05 | 2013-02-20 | 国网电力科学研究院 | Alarm transmission mode based alarm correlation analysis system and analysis method thereof |
US20160080476A1 (en) * | 2014-08-11 | 2016-03-17 | Systems & Technology Research, Llc | Meme discovery system |
US20160350294A1 (en) * | 2015-05-31 | 2016-12-01 | Thomson Reuters Global Resources | Method and system for peer detection |
US20180196835A1 (en) * | 2017-01-06 | 2018-07-12 | International Business Machines Corporation | Root cause analysis of performance problems |
CN110493025A (en) * | 2018-05-15 | 2019-11-22 | 中国移动通信集团浙江有限公司 | It is a kind of based on the failure root of multilayer digraph because of the method and device of diagnosis |
CN108923952A (en) * | 2018-05-31 | 2018-11-30 | 北京百度网讯科技有限公司 | Method for diagnosing faults, equipment and storage medium based on service monitoring index |
CN109992440A (en) * | 2019-04-02 | 2019-07-09 | 北京睿至大数据有限公司 | A kind of IT root accident analysis recognition methods of knowledge based map and machine learning |
CN110888755A (en) * | 2019-11-15 | 2020-03-17 | 亚信科技(中国)有限公司 | Method and device for searching abnormal root node of micro-service system |
CN111459695A (en) * | 2020-03-12 | 2020-07-28 | 平安科技(深圳)有限公司 | Root cause positioning method and device, computer equipment and storage medium |
Non-Patent Citations (5)
Title |
---|
MINGHUA MA等: ""Diagnosing root causes of intermittent slow queries in cloud databases"", 《PROCEEDINGS OF THE VLDB ENDOWMENT》 * |
PING LIU等: ""FluxInfer: Automatic Diagnosis of Performance Anomaly for Online Database System"", 《2020 IEEE 39TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC)》 * |
PING LIU等: ""FluxRank: A Widely-Deployable Framework to Automatically Localizing Root Cause Machines for Software Service Failure Mitigation"", 《2019 IEEE 30TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE)》 * |
YUAN MENG等: ""Localizing Failure Root Causes in a Microservice through Causality Inference"", 《 2020 IEEE/ACM 28TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS)》 * |
李海斌 等: ""一种无监督的数据库用户行为异常检测方法"", 《小型微型计算机系统》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113342889A (en) * | 2021-06-03 | 2021-09-03 | 中国工商银行股份有限公司 | Distributed database management method, device, equipment and medium |
CN113505044A (en) * | 2021-09-09 | 2021-10-15 | 格创东智(深圳)科技有限公司 | Database warning method, device, equipment and storage medium |
CN114385451A (en) * | 2022-01-11 | 2022-04-22 | 上海鹤优信息科技有限公司 | Fault root cause analysis method |
CN114553675A (en) * | 2022-03-24 | 2022-05-27 | 中国联合网络通信集团有限公司 | Fault network element processing method, device and storage medium |
CN114553675B (en) * | 2022-03-24 | 2023-05-09 | 中国联合网络通信集团有限公司 | Fault network element processing method, device and storage medium |
CN117389230A (en) * | 2023-11-16 | 2024-01-12 | 广州中健中医药科技有限公司 | Antihypertensive traditional Chinese medicine extract production control method and system |
CN117389230B (en) * | 2023-11-16 | 2024-06-07 | 广州中健中医药科技有限公司 | Antihypertensive traditional Chinese medicine extract production control method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112506763A (en) | Automatic positioning method and device for database system fault root | |
US10852357B2 (en) | System and method for UPS battery monitoring and data analysis | |
KR102141391B1 (en) | Failure data management method based on cluster estimation | |
US10373065B2 (en) | Generating database cluster health alerts using machine learning | |
US7693982B2 (en) | Automated diagnosis and forecasting of service level objective states | |
US20210397175A1 (en) | Abnormality detection device, abnormality detection method, and program | |
US20200143292A1 (en) | Signature enhancement for deviation measurement-based classification of a detected anomaly in an industrial asset | |
CN116450399B (en) | Fault diagnosis and root cause positioning method for micro service system | |
CN117312997B (en) | Intelligent diagnosis method and system for power management system | |
JP6714498B2 (en) | Equipment diagnosis device and equipment diagnosis method | |
JP6540531B2 (en) | Monitoring device and control method of monitoring device | |
JP6521096B2 (en) | Display method, display device, and program | |
US12079070B2 (en) | Alert similarity and label transfer | |
JP2020071624A (en) | Abnormality diagnosing apparatus, abnormality diagnosing method and program | |
CN107391335B (en) | Method and equipment for checking health state of cluster | |
KR102059112B1 (en) | IoT STREAM DATA QUALITY MEASUREMENT INDICATORS AND PROFILING METHOD FOR INTERNET OF THINGS AND SYSTEM THEREFORE | |
CN112581719B (en) | Semiconductor packaging process early warning method and device based on time sequence generation countermeasure network | |
JPH08234832A (en) | Device and method for monitoring and diagnostic plant | |
JP6777142B2 (en) | System analyzer, system analysis method, and program | |
EP2135144B1 (en) | Machine condition monitoring using pattern rules | |
CN114202009A (en) | Medical equipment performance index abnormity detection method and device based on PU learning | |
US11378944B2 (en) | System analysis method, system analysis apparatus, and program | |
JP6898607B2 (en) | Abnormality sign detection system and abnormality sign detection method | |
CN109990803A (en) | The method, apparatus of method, apparatus and the sensor processing of detection system exception | |
US11339763B2 (en) | Method for windmill farm monitoring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210316 |