CN115964206A - Database operation and maintenance method and device - Google Patents

Database operation and maintenance method and device Download PDF

Info

Publication number
CN115964206A
CN115964206A CN202111189777.3A CN202111189777A CN115964206A CN 115964206 A CN115964206 A CN 115964206A CN 202111189777 A CN202111189777 A CN 202111189777A CN 115964206 A CN115964206 A CN 115964206A
Authority
CN
China
Prior art keywords
root
repairing
root cause
abnormal
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111189777.3A
Other languages
Chinese (zh)
Inventor
王天庆
李士福
李坤
刘陆洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202111189777.3A priority Critical patent/CN115964206A/en
Priority to PCT/CN2022/122240 priority patent/WO2023061227A1/en
Publication of CN115964206A publication Critical patent/CN115964206A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a database operation and maintenance method and a device, wherein the method comprises the following steps: acquiring an abnormal index set of a database, wherein the value of each abnormal index in the abnormal index set is positioned outside a preset value range corresponding to each abnormal index; searching at least one root cause having a mapping relation with each abnormal index based on a graph structure to obtain a root cause set, wherein the graph structure comprises the mapping relation between the abnormal indexes and the root causes; based on the graph structure, at least one repairing mode having a mapping relation with a first root cause in the root cause set is searched, the database is repaired by using the at least one repairing mode until all abnormal indexes in the abnormal index set are in a normal state, and the graph structure further comprises the mapping relation between the root causes and the repairing modes. By the method and the device, the dependency of the automatic operation and maintenance process of the database on historical operation and maintenance data is small, and after scenes are switched, the database fault can be quickly repaired, namely the adaptability is strong.

Description

Database operation and maintenance method and device
Technical Field
The application relates to the technical field of autonomous databases, in particular to a database operation and maintenance method and device.
Background
Data is the most strategic asset in any business and public security, comprehensive clouding in the field of information technology is trending, and after cloud computing of large data, cloud on a database is the future development direction of the database. With the development of the information era, the value and accessibility of database information are improved, and the examination of the safety, the practicability and the reliability of the database is more and more severe. The monitoring operation and maintenance of the database usually need a large amount of manpower and material resources, and huge loss can be caused if the data is lost and damaged due to improper processing or manual misoperation after the downtime condition occurs. In the operation and maintenance process, a Database Administrator (DBA) is difficult to comprehensively master the states of massive nodes, so that a set of automatic operation and maintenance system capable of automatically identifying Database faults and automatically analyzing fault reasons is designed, and therefore, the automatic operation and maintenance system capable of automatically repairing the Database faults is very important.
The prior art is mainly an automatic operation and maintenance method based on artificial intelligence, that is, the automatic operation and maintenance model is trained by collecting historical operation and maintenance data, and then the database is automatically operated and maintained based on the automatic operation and maintenance model.
However, the above automatic operation and maintenance method based on artificial intelligence highly depends on the historical operation and maintenance data set, and after the scene is switched, the cold start problem cannot be solved, that is, the automatic operation and maintenance method has poor adaptability.
Disclosure of Invention
The embodiment of the application provides a database operation and maintenance method and device, so that the dependency of the automatic operation and maintenance process of a database on historical operation and maintenance data is small, and after scenes are switched, the database fault can be quickly repaired, namely the adaptability is strong.
In a first aspect, the present application provides a database operation and maintenance method, including: acquiring an abnormal index set of a database, wherein the value of each abnormal index in the abnormal index set is positioned outside a preset value range corresponding to each abnormal index; searching at least one root cause having a mapping relation with each abnormal index based on a graph structure to obtain a root cause set, wherein the graph structure comprises the mapping relation between the abnormal indexes and the root causes; and searching at least one repairing mode having a mapping relation with a first root cause in the root cause set based on the graph structure, repairing the database by using the at least one repairing mode until all abnormal indexes in the abnormal index set are in a normal state, wherein the graph structure further comprises the mapping relation between the root causes and the repairing modes.
The abnormal condition of the abnormal index may include a high abnormality and a low abnormality. The high abnormity means that the value of the abnormity index is larger than the larger value of the two endpoints of the corresponding preset interval of the abnormity index; the low anomaly means that the value of the anomaly index is smaller than the smaller value of the two endpoints of the preset interval corresponding to the anomaly index.
The first root cause may be any root cause in the root cause set.
In view of technical effects, the embodiment of the present application constructs the mapping relationship between the abnormal index and the root cause and the mapping relationship between the root cause and the repairing manner based on the graph structure, so that the repairing manner for the abnormal database (or referred to as a failure database) can be directly searched based on the graph structure. Meanwhile, since historical operation and maintenance data are not needed, the influence of unsuitable historical operation and maintenance data on the current repair process of the database can be avoided; in addition, in the actual operation and maintenance process, historical operation and maintenance data of different databases are difficult to obtain (required for confidentiality), so that the embodiment of the application has better universality. In addition, after the operation scene of the database is switched, the embodiment of the application can also directly search out a corresponding repairing mode based on the graph structure and quickly realize fault repairing, so that the cold start problem after the scene is switched in the prior art can be effectively avoided.
In a possible implementation manner, the searching out at least one root cause having a mapping relationship with each index based on the value of each index to obtain a root cause set includes: searching at least one root cause having a mapping relation with each abnormal index based on the graph structure to obtain P root causes, wherein P is a positive integer; removing the weight of the P root causes to obtain Q root causes, and sequencing the Q root causes based on the weight coefficient of each root cause in the Q root causes to obtain the root cause set; wherein Q is a positive integer less than or equal to P.
The weight coefficient corresponding to each root factor in the root factor set can represent the possibility that the index in the abnormal index set is abnormal by the root factor. For example, when the weight coefficient corresponding to the root factor in the root factor set is larger, the probability that the index in the abnormal index set is abnormal by the root factor is higher; alternatively, when the weighting factor corresponding to the root factor in the root factor set is larger, the probability that the index in the abnormal index set is abnormal by the root factor is smaller.
In view of technical effects, the root cause set is obtained by sorting based on the weight coefficients of the Q root causes, so that root causes most likely to cause the indexes in the abnormal index set to be abnormal can be quickly determined based on the root cause set in the follow-up process, that is, the database fault is repaired as soon as possible, and the automatic operation and maintenance performance is improved.
It should be understood that when K abnormal indicators in the abnormal indicator set all have a mapping relationship with one root cause, the root cause corresponds to K weighting coefficients, and the K weighting coefficients may be the same or different respectively, in this case, the P root causes include K same root causes, and the K same root causes correspond to K weighting coefficients respectively, and K is an integer greater than or equal to 2.
In a possible embodiment, each of the P root factors and the Q root factors corresponds to a weight coefficient; the weighting coefficient of the ith root cause in the Q root causes is equal to the sum of all the weighting coefficients of the ith root cause in the P root causes, i =1, \ 8230, Q.
In view of technical effects, because P root causes may include the same root cause, in the embodiment of the present application, the weight coefficients corresponding to the same root cause in the P root causes are added to obtain the weight coefficients of the same root cause in Q root causes, and in this way, the importance degree of the same root cause is improved, so that a root cause with a high importance degree can be selected from a root cause set through the weight coefficients in the following process, so as to repair a database, and improve automatic operation and maintenance performance.
In a possible embodiment, the repairing the database by using the at least one repairing manner until all the abnormal indexes in the abnormal index set are in a normal state includes: and searching at least one repairing mode having a mapping relation with a second root cause and repairing the database by using at least one repairing mode having a mapping relation with the second root cause, wherein the second root cause is a next root cause after the first root cause in the root cause set, based on the graph structure, when the database is repaired by using the at least one repairing mode and all abnormal indicators in the abnormal indicator set are not in the normal state.
In view of technical effects, the embodiment of the present application may sequentially select a repairing method having a mapping relationship with each root cause according to the sequence of each root cause in the root cause set to repair until each abnormal index is in a normal state. The method can quickly determine the root cause of the abnormal indexes in the abnormal index set, and improves the speed of fault repair.
In a possible embodiment, the searching for at least one repairing manner having a mapping relationship with a first root cause in the root cause set and repairing the database by using the at least one repairing manner includes: searching at least one repairing mode having a mapping relation with the first root cause to obtain a repairing mode set, wherein each repairing mode in the at least one repairing mode having the mapping relation with the first root cause corresponds to a weight coefficient, and in the repairing mode set, the weight coefficient corresponding to the repairing mode ranked in the front is larger than or equal to the weight coefficient corresponding to the repairing mode ranked in the rear; after the database is repaired by using a first repairing mode in the repairing mode set, if all abnormal indexes in the abnormal index set are in a normal state, stopping the repairing process, and if all abnormal indexes in the abnormal index set are not in the normal state, repairing the database by using a second repairing mode, wherein the second repairing mode is a next repairing mode arranged after the first repairing mode in the repairing mode set.
The weight coefficient corresponding to each repair mode can represent the possibility that the abnormal indexes in the abnormal index set recover to a normal state after the database is repaired by adopting the repair mode.
It should be noted that in the first repair of the database after determining the root cause set, the first root cause may be the first root cause in the root cause set, i.e., the root cause ordered first.
From the technical effect, according to the embodiment of the application, the database is subjected to fault repairing by sequentially selecting the corresponding repairing modes according to the weight coefficients corresponding to the repairing modes, so that the correct repairing mode can be quickly found out, and the fault repairing speed is increased.
In a possible implementation manner, the normal state means that the value of the abnormal index is within a preset value range corresponding to the abnormal index.
In one possible embodiment, the method further comprises: determining a third cause and a third repair mode, where the third repair mode is one of at least one repair mode having a mapping relationship with the third cause in the graph structure, and after the database is repaired by using the third repair mode, all abnormal indexes in the abnormal index set are in the normal state; and updating the weight coefficient corresponding to the third repair mode, and updating the weight coefficient corresponding to the third root cause in the P root causes.
The method has the advantages that after the database fault is repaired, the repairing mode and the corresponding root cause used when the fault is repaired are determined, and at the moment, the weight coefficients corresponding to the first repairing mode and the third root cause in the graph structure are updated, so that when the same abnormal condition occurs in the subsequent process, the correct root cause and the correct repairing mode can be quickly positioned, and the database can be quickly repaired.
In a possible implementation manner, the updating the weight coefficient corresponding to the third repairing manner and the updating the weight coefficient corresponding to the third root cause of the P root causes includes: and increasing the weight coefficient corresponding to the third repair mode, and increasing the weight coefficient corresponding to the third root cause in the P root causes.
In the aspect of technical effect, the weight coefficients of the first repairing mode and the third root cause are improved, so that the subsequent databases have the same abnormal condition, and when the root causes in the root cause set are sorted from large to small according to the weight coefficients, the correct root causes and the correct repairing mode can be quickly positioned, and the databases are quickly repaired.
In a possible implementation manner, the abnormal index in the abnormal index set includes at least one of a number of transactions per second TPS, a number of queries per second QPS, a response time, a central processing unit usage rate, a memory usage rate, or a number of read/write operations per second IOPS.
In a possible embodiment, the root cause in the root cause set comprises at least one of incorrect parameters, excessive traffic, improper indexing, or insufficient resources; the repair modes in the repair mode set comprise at least one of restarting a database, executing a current limiting operation, invoking Structured Query Language (SQL) optimization or calling an administrator.
In a second aspect, an embodiment of the present application provides a database operation and maintenance device, where the device includes: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an abnormal index set of a database, and the value of each abnormal index in the abnormal index set is positioned outside a preset value range corresponding to each abnormal index; the searching unit is used for searching out at least one root cause which has a mapping relation with each abnormal index based on a graph structure to obtain a root cause set, and the graph structure comprises the mapping relation between the abnormal indexes and the root causes; the method comprises the steps of searching out at least one repairing mode which has a mapping relation with a first root cause in the root cause set based on the graph structure; and the repairing unit is used for repairing the database by using the at least one repairing mode until all the abnormal indexes in the abnormal index set are in a normal state, and the graph structure further comprises a mapping relation between the root cause and the repairing mode.
In a possible implementation manner, in the aspect that at least one root cause having a mapping relationship with each abnormal index is searched out based on the graph structure to obtain a root cause set, the search unit is specifically configured to: searching at least one root cause having a mapping relation with each abnormal index based on the graph structure to obtain P root causes, wherein P is a positive integer; removing the weight of the P root causes to obtain Q root causes, and sequencing the Q root causes based on the weight coefficient of each root cause in the Q root causes to obtain the root cause set; wherein Q is a positive integer less than or equal to P.
In one possible embodiment, each of the P root factors and the Q root factors corresponds to a weight coefficient; the weight coefficient of the ith root factor in the Q root factors is equal to the sum of all the weight coefficients of the ith root factor in the P root factors, i =1, \8230andQ.
In a possible implementation manner, in terms of repairing the database by using the at least one repairing manner until all the abnormal indexes in the abnormal index set are in a normal state, the searching unit is specifically configured to: searching at least one repairing mode having a mapping relation with a second root cause based on the graph structure when the database is repaired by the at least one repairing mode and all abnormal indexes in the abnormal index set are not in the normal state; the repair unit is specifically configured to: repairing the database using at least one repair method having a mapping relationship with the second root, the second root being a next root in the root set after the first root.
In a possible embodiment, in the aspect of searching for at least one repairing manner having a mapping relationship with a first root cause in the root cause set, the searching unit is specifically configured to: searching at least one repairing mode having a mapping relation with the first root cause to obtain a repairing mode set, wherein each repairing mode in the at least one repairing mode having the mapping relation with the first root cause corresponds to a weight coefficient, and in the repairing mode set, the weight coefficient corresponding to the repairing mode ranked in the front is larger than or equal to the weight coefficient corresponding to the repairing mode ranked in the rear; in the aspect of repairing the database by using the at least one repair manner, the repair unit is specifically configured to: after the database is repaired by using a first repairing mode in the repairing mode set, if all abnormal indexes in the abnormal index set are in a normal state, stopping the repairing process, and if all abnormal indexes in the abnormal index set are not in the normal state, repairing the database by using a second repairing mode, wherein the second repairing mode is a next repairing mode arranged after the first repairing mode in the repairing mode set.
In a possible implementation manner, the normal state means that the value of the abnormal index is within a preset value range corresponding to the abnormal index.
In a possible embodiment, the apparatus further comprises: a determining unit, configured to determine a third cause and a third repair manner, where the third repair manner is one of at least one repair manner having a mapping relationship with the third cause, and after the database is repaired by using the third repair manner, all abnormal indexes in the abnormal index set are in the normal state; and the updating unit is used for updating the weight coefficient corresponding to the third repairing mode and updating the weight coefficient corresponding to the third root cause in the P root causes.
In a possible implementation, the updating unit is specifically configured to: and increasing the weight coefficient corresponding to the third repair mode, and increasing the weight coefficient corresponding to the third root cause in the P root causes.
In a possible implementation manner, the abnormal index in the abnormal index set includes at least one of a number of transactions per second TPS, a number of queries per second QPS, a response time, a central processing unit usage rate, a memory usage rate, or a number of read/write operations per second IOPS.
In a possible embodiment, the root cause in the root cause set comprises at least one of incorrect parameters, excessive traffic, improper indexing, or insufficient resources; the repair modes in the repair mode set comprise at least one of restarting a database, executing a current limiting operation, invoking Structured Query Language (SQL) optimization or calling an administrator.
In a third aspect, an embodiment of the present application provides a chip system, where the chip system includes at least one processor, a memory, and an interface circuit, where the memory, the interface circuit, and the at least one processor are interconnected by a line, and the at least one memory stores instructions therein; the method of any of the above first aspects is implemented when the instructions are executed by the processor.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein program instructions, which when executed on one or more processors, implement the method of any one of the above first aspects.
In a fifth aspect, the present application provides a computer program product, when the computer program product runs on a computer device, the method of any one of the above first aspects is implemented.
Drawings
The drawings used in the embodiments of the present application are described below.
FIG. 1 is a diagram of a system architecture according to an embodiment of the present application;
FIG. 2 is a schematic flowchart of a database operation and maintenance method in an embodiment of the present application;
FIG. 3 is a diagram illustrating a mapping relationship between an anomaly indicator and a root cause in a graph structure according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a process for constructing a root cause set in an embodiment of the present application;
FIG. 5 is a diagram illustrating a mapping relationship between a root cause and a repair method in a graph structure according to an embodiment of the present application;
FIG. 6 is an exemplary diagram of a mapping relationship between a root cause and a repair manner in an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a database operation and maintenance device in an embodiment of the present application;
fig. 8 is a hardware structure diagram of a computer device in an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings. The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different elements and not for describing a particular sequential order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The relevant terms in the examples of the present application are explained first:
(1) Transaction Processing (TP): the main applications in traditional relational databases are primarily basic, everyday transactions such as banking transactions.
(2) Analysis Process (AP): the main application in the database warehouse supports complex analysis operation, emphasizes decision support and provides intuitive and understandable query results.
(3) Hybrid Transaction Analysis Process (HTAP): a new database application system structure can process AP affairs and TP affairs simultaneously, and is more in line with actual business requirements.
(4) Autonomous Database (Autonomous Database): a cloud database management solution has automatic repairing, upgrading and adjusting functions, can automatically execute maintenance work of all conventional databases during system operation, and does not need any manual intervention in the whole process. The autonomous database cloud has autonomous driving, autonomous safety and autonomous repair capabilities, and can effectively reduce manual database management work and human errors.
(5) FIG. (Graph): which may also be referred to as a graph structure, is a complex, non-linear structure of discrete structures made up of vertices and edges connecting the vertices. In computer science, a graph is one of the most flexible data structures. In the graph structure, each element may have zero or more predecessors and zero or more successors, i.e., the relationship between the elements is arbitrary.
Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present disclosure. As shown in FIG. 1, the system architecture 100 includes a client device 110 and an execution device 120, the execution device 120 including an I/O interface 121, a data collection device 122, and a computation module 123.
The client device 110 may comprise one or more databases, and the data collecting device 122 is configured to detect the one or more databases, and in particular, may collect status information of the one or more databases through the I/O interface 121. The state information of the database may include a running log, an audit log, a Structured Query Language (SQL) pipeline of the database, index feature data (such as Query Per Second (QPS) and Transaction Per Second (TPS), and the like.
The execution device 120 may receive data input by the client device 110 through the I/O interface 121 and the data collection device 122, and then perform a relevant calculation process by using the calculation module 123 to obtain a corresponding processing result. For example, the collected database state information is analyzed to determine a root cause of a failure in the database and a corresponding repair method, and the repair method is sent to the client device 110 through the I/O interface 121, so as to repair the failed database.
The computing module 123 may be a Processing Unit such as a Central Processing Unit (CPU), and may be a single-core or multi-core processor in terms of hardware, which is not limited in this application.
The execution device 120 may be any feasible computer device, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR), a vehicle-mounted terminal, and the like, and may also be a server or a cloud.
It should be understood that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the position relationship between the devices, modules, etc. shown in the diagram does not constitute any limitation, for example, in fig. 1, the data acquisition device 122 is an internal device with respect to the execution device 120, and in other cases, the data acquisition device 122 may be disposed outside the execution device 120.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a database operation and maintenance method according to an embodiment of the present application. As shown in fig. 2, the method 200 includes step S210, step S220, and step S230.
Step S210: and acquiring an abnormal index set of the database, wherein the value of each abnormal index in the abnormal index set is positioned outside a preset value range corresponding to each abnormal index.
Specifically, database state information is periodically obtained, and includes but is not limited to: running logs, audit logs, SQL flow, index feature data (such as query number per second QPS and transaction number per second TPS, etc.).
The time period for acquiring the database state information may be set according to an actual scene, which is not limited in the present application.
Further, after the database status information is obtained, the status information is preprocessed, and the preprocessing process may include data extraction, cleaning, and standardization of the status information. Specifically, the method comprises the following steps: a plurality of indicators Metric representing the state of the database may be extracted from the state information, then the indicators lacking the corresponding values and the indicators having incorrect value formats are deleted, and the values of each of the remaining indicators are normalized respectively to obtain a first indicator set.
Alternatively, the normalization process may be performed by mapping the value of each index in the remaining indexes after the preprocessing into a range of [0,1], and the application does not limit the value range mapped after the normalization.
Wherein the plurality of indicators may include: TPS, QPS, response Time, central Processing Unit Usage (CPU Usage), memory Usage, and number of read/write Operations Per Second (Input/Output Operations Per Second, IOPS).
Alternatively, the value of each index in the first index set may be a time series of indexes in a period of time, that is, the value corresponding to one index may be different at different time nodes.
Further, each index in the first index set may be subjected to abnormality detection by using an abnormality detection algorithm and/or a preset rule.
Specifically, whether values on different time nodes meet the same trend in the time sequence corresponding to each index in the first index set can be judged through an anomaly detection algorithm. Wherein, the detailed process about the abnormality detection is not described in detail in the present application.
The anomaly detection algorithm may be a time sequence prediction method, a statistical method, or the like, which is not limited in the present application. Further, the anomaly detector in the anomaly detection algorithm may be an anomaly detector such as a 3sigma (sigma) or a box plot (box plot), which is not limited in the present application.
Optionally, the preset rule may include: when the time that the CPU utilization rate exceeds R1 is detected to reach T1, identifying the CPUUSage as an abnormal index, wherein T is a positive number; when the time that the memory utilization rate exceeds R2 reaches T2, identifying the memory utilization rate as an abnormal index; and when the time when the IOPS exceeds R3 reaches T3, the IOPS is identified as an abnormal index or the like. Wherein R1, R2, R3, T1, T2 and T3 are positive numbers.
The skilled person in the art may also set a corresponding abnormal state identification rule for other indexes, which is not listed one by one in the present application.
Further, after abnormality detection, the abnormality index set is generated by using all the indexes in the first index set, which have abnormality. And the value of each abnormal index in the abnormal index set is the value of the index at an abnormal point on the corresponding time series.
In a possible implementation manner, the abnormal indexes in the abnormal index set include at least one of a number of transactions per second TPS, a number of queries per second QPS, a response time, a central processing unit usage rate, a memory usage rate, or a number of read/write operations per second IOPS.
Each abnormal index in the abnormal index set corresponds to a preset value range, and the value of each abnormal index is positioned outside the value range corresponding to the abnormal index.
Optionally, the value of the abnormal indicator is not equal to any one of the two endpoint values of the value range corresponding to the abnormal indicator.
In summary, it can be understood that the abnormal condition of each abnormal index in the abnormal index set can be divided into two conditions, namely a high abnormal condition and a low abnormal condition:
(1) High abnormality
In the abnormal index set, when the value of one abnormal index is larger than the larger value of the two end points of the value range corresponding to the abnormal index, the abnormal index is a high abnormal index.
(2) Low anomaly
In the abnormal index set, when the value of one abnormal index is smaller than the smaller value of the two end points of the value section corresponding to the abnormal index, the abnormal index is a low abnormal index.
Step S220: and searching at least one root cause having a mapping relation with each abnormal index based on a graph structure to obtain a root cause set, wherein the graph structure comprises the mapping relation between the abnormal indexes and the root causes.
The root factors contained in the root factor set are different, and each root factor in the root factor set corresponds to one weight coefficient. The weight coefficients are used to characterize the probability that the root cause corresponding to the weight coefficient causes the database to produce an abnormal condition characterized by the set of abnormal indicators. For example, when the weighting factor is larger, the probability that the root factor corresponding to the weighting factor causes the database to generate the abnormal index set is higher; or the larger the weighting coefficient is, the less the root factor corresponding to the weighting coefficient is to cause the database to generate the possibility represented by the abnormal index set.
In a possible implementation manner, the searching for at least one root cause having a mapping relationship with each index based on the value of each index to obtain a root cause set includes: searching at least one root cause having a mapping relation with each abnormal index based on the graph structure to obtain P root causes, wherein P is a positive integer; removing the weight of the P root causes to obtain Q root causes, and sequencing the Q root causes based on the weight coefficient of each root cause in the Q root causes to obtain the root cause set; wherein Q is a positive integer less than or equal to P.
In the above abnormal index set, any two different abnormal indexes may correspond to partially identical root causes, completely identical root causes, or completely different root causes.
The following describes a process of searching out at least one root cause having a mapping relationship with a first abnormal index in an abnormal index set by taking the first abnormal index as an example:
specifically, firstly, the abnormal condition of the first abnormal index is judged based on the value of the first abnormal index and the value range corresponding to the first abnormal index, that is, whether the first abnormal index belongs to the high abnormal condition or the low abnormal condition is judged. Then, K root causes which have mapping relation with the first abnormal index are searched out from the graph structure based on the abnormal condition of the first abnormal index. The K root causes are respectively used for describing K types of causes of the first abnormal indicator generating an abnormal condition (that is, the value of the first abnormal indicator is outside the corresponding value range). That is, the first abnormal index has K mapping relationships and is mapped to the K root causes respectively. Each root cause in the K root causes corresponds to a weight coefficient, that is, each mapping relationship in the K mapping relationships corresponds to a weight coefficient, and the weight coefficient is used for representing the possibility that the root cause corresponding to the weight coefficient causes an abnormality to occur in the value of the first abnormality index. Wherein K is a positive integer and is less than or equal to P.
For example, if the weighting factor corresponding to one of the K factors is 0.5, it indicates that the probability that the corresponding abnormality occurs in the value of the first abnormality index is 50% due to the factor.
It should be understood that the first abnormality index has different root causes in the high abnormality case and the low abnormality case.
And searching out the root factors having the mapping relation with each abnormal index in the abnormal index set according to the process of searching out the K root factors having the mapping relation with the first abnormal index to obtain P root factors.
In view of technical effects, the root cause set is obtained by sequencing based on the weight coefficients of the Q root causes, so that root causes which are most likely to cause the indexes in the abnormal index set to be abnormal can be quickly determined subsequently based on the root cause set, that is, the database fault is repaired as soon as possible, and the automatic operation and maintenance performance is improved.
In one possible embodiment, each of the P root factors and the Q root factors corresponds to a weight coefficient; the weighting coefficient of the ith root cause in the Q root causes is equal to the sum of all the weighting coefficients of the ith root cause in the P root causes, i =1, \ 8230, Q.
Each root factor in the P root factors corresponds to one weight coefficient, that is, the P root factors correspond to P weight coefficients respectively.
Optionally, the P root causes may include the same root cause, and the weight coefficients corresponding to the same root cause may be different.
For example, when the first abnormal index and the second abnormal index have a mapping relation with the third factor at the same time, the third factor corresponds to two weight coefficients, namely a first weight coefficient and a second weight coefficient; the first weight coefficient is a weight coefficient of a mapping relation between the first abnormal index and the third cause, and the second weight coefficient is a weight coefficient between the second abnormal index and the third cause. At this time, the third root cause of the P root causes appears twice and corresponds to the first weight coefficient and the second weight coefficient, respectively. Wherein the first weight coefficient and the second weight coefficient may be different.
The above-mentioned removing duplicate of the P root causes, obtaining Q root causes, specifically includes: when the P root causes are subjected to de-weighting, all the weight coefficients corresponding to the same root causes in the P root causes are summed, and then the weight coefficient obtained after summation is used as the weight coefficient corresponding to the root cause in Q root causes. That is, the weighting factor corresponding to the ith root factor among the Q root factors is equal to the sum of the weighting factors corresponding to all root factors identical to the ith root factor among the P root factors.
Optionally, the sorting the Q root causes based on the weight coefficient of each root cause in the Q root causes to obtain the root cause set specifically includes: sequencing the Q root causes according to the sequence of the weight coefficients from large to small to obtain a root cause set; or sorting the Q root factors according to the sequence of the weight coefficients from small to large to obtain a root factor set.
In one possible embodiment, the root cause of the set of root causes includes at least one of incorrect parameters, excessive traffic, improper indexing, or insufficient resources.
Specifically, the root cause in the graph structure or the root cause set may include sub-root causes and a combined root cause formed by combining different sub-root causes. The sub-root, because of the minimum root cause that cannot be subdivided continuously, only contains one type of root cause, for example, the sub-root may be incorrect parameter, too large traffic, improper index, insufficient resource or slow SQL, which is not listed in the present application. The combined root is obtained by combining one or more sub-roots, and the combined root can be incorrect parameters, overlarge traffic, improper index, slow SQL, insufficient resources, slow SQL and the like, which are not listed in the application.
Optionally, after the P root causes are deduplicated to obtain Q root causes, the Q root causes may be manually checked to remove incorrect root causes from the Q root causes, that is, to delete root causes that do not cause abnormal conditions represented by the abnormal index set from the Q root causes.
For example, when the root of "improper index" included in Q root causes is manually checked, whether to execute index scanning and whether the index scanning is really effective can be detected by acquiring an execution plan of an SQL statement; when the index scanning is effective, the root cause of 'improper index' in the Q root causes is deleted, and when the index scanning is ineffective, the root cause of 'improper index' in the Q root causes is reserved.
In view of technical effects, since the P root causes may include the same root cause, in the embodiment of the present application, the weight coefficients corresponding to the same root cause in the P root causes are added to obtain the weight coefficients of the same root cause in the Q root causes, and in this way, the importance degree of the same root cause is improved, so that the root cause with a high importance degree can be selected from the root cause set through the weight coefficients in the following process, so as to repair the database, and improve the automatic operation and maintenance performance.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a mapping relationship between an abnormal indicator and a root cause in a graph structure according to an embodiment of the present application. The mapping relationship shown in fig. 3 may be a part of the graph structure in the embodiments of the present application. As shown in fig. 3, the mapping relationship includes E anomaly indicators: abnormality index 1, abnormality index 2, \ 8230, abnormality index E; and F root causes: root cause 1, root cause 2, root cause 3, \ 8230, root cause F. Wherein E and F are positive integers.
Each abnormality index corresponds to two kinds of abnormal conditions: namely a low exception and a high exception. Each abnormal index can correspond to different root causes under different abnormal conditions. Namely, the graph structure comprises: the mapping relation between each abnormal index and at least one root factor when the abnormal index is in low abnormality and at least one corresponding weight coefficient; and the mapping relation between each abnormal index and at least one root factor and at least one corresponding weight coefficient when the abnormal index is in high abnormality.
As shown in fig. 3, in the case of a high abnormality, the abnormality indicator 1 has a mapping relationship with the root cause 1 and the root cause 3, and at this time, the weight coefficient corresponding to the root cause 1 is d1, the weight coefficient corresponding to the root cause 3 is d2, and d1+ d2=1. In the low anomaly case, the anomaly index 1 has a mapping relationship with the root cause 2 and the root cause F, and in this case, the weight coefficient corresponding to the root cause 2 is e1, the weight coefficient corresponding to the root cause F is e2, and e1+ e2=1. In the case of a low abnormality, the abnormality index 2 has a mapping relationship with the root cause 1 and the root cause 3, and in this case, the weight coefficient corresponding to the root cause 1 is f1, the weight coefficient corresponding to the root cause 3 is f2, and f1+ f2=1.
For simplicity, the root causes of the mapping relationships of the abnormality index 2 in the high abnormality case, the abnormality index E in the high abnormality case, and the low abnormality case, respectively, are not shown.
Please refer to fig. 4, fig. 4 is a schematic diagram illustrating a root cause set constructing process according to an embodiment of the present application. The mapping relationship between the anomaly index and the root cause and the weighting coefficient corresponding to the root cause in fig. 4 are obtained by searching based on the graph structure shown in fig. 3.
As shown in fig. 4, the abnormal index set includes four abnormal indexes: abnormality index 1, abnormality index 2, abnormality index 3, and abnormality index 4. Among them, the abnormality index 1 shows a high abnormality, the abnormality index 2 shows a low abnormality, the abnormality index 3 shows a low abnormality, and the abnormality index 4 shows a high abnormality.
In the case of a high anomaly, the anomaly index 1 has a mapping relationship with the root cause 1 and the root cause 3, and in this case, the weight coefficient corresponding to the root cause 1 is 0.4, and the weight coefficient corresponding to the root cause 3 is 0.6. In the case of a low anomaly, the anomaly index 2 has a mapping relationship with the root cause 1, the root cause 2, and the root cause 4, and in this case, the weight coefficient corresponding to the root cause 1 is 0.1, the weight coefficient corresponding to the root cause 2 is 0.3, and the weight coefficient corresponding to the root cause 4 is 0.6. In the case of a low anomaly, the anomaly index 3 has a mapping relationship with the root cause 1 and the root cause 3, and in this case, the weight coefficient corresponding to the root cause 1 is 0.5, and the weight coefficient corresponding to the root cause 3 is 0.5. In the case of a high anomaly, the anomaly index 4 has a mapping relationship with the root cause 2 and the root cause 4, and in this case, the weight coefficient corresponding to the root cause 2 is 0.3, and the weight coefficient corresponding to the root cause 4 is 0.7. It can be seen that, in the abnormal index set, the sum of the weight coefficients corresponding to all the root factors having a mapping relationship with each abnormal index is 1.
Based on the graph structure, after the root cause having the mapping relation with each abnormal index in the abnormal index set is searched, 9 root causes are obtained, namely P is equal to 9 at this time. Among the 9 root causes obtained by the search, the number of occurrences of root cause 1 is 3, the number of occurrences of root cause 2 is 2, the number of occurrences of root cause 3 is 2, and the number of occurrences of root cause 4 is 2.
After 9 root causes are obtained, the 9 root causes are deduplicated to obtain four root causes: root 1, root 2, root 3 and root 4, in which case 4 corresponds to Q in the preceding example. And summing the weight coefficients corresponding to the same root cause in the 9 root causes to obtain that the weight coefficients of the root cause 1, the root cause 2, the root cause 3 and the root cause 4 are respectively 1.0, 0.6, 1.1 and 1.3. The four root causes are sorted based on the magnitude of the weight coefficient obtained after summation, and a root cause set shown in fig. 4 is obtained.
The sorting according to the weight coefficients from large to small in fig. 4 is only an example given in the present application, and the present application does not limit this.
It should be understood that fig. 4 is only an example of constructing a root cause set according to an embodiment of the present application, and the number of abnormal indicators in the abnormal indicator set, the number of root causes in the root cause set, the mapping relationship between the abnormal indicators and the root causes, and the corresponding weight coefficients given in fig. 4 do not limit the embodiment of the present application.
Step S230: and searching at least one repairing mode having a mapping relation with a first root cause in the root cause set based on the graph structure, repairing the database by using the at least one repairing mode until all abnormal indexes in the abnormal index set are in a normal state, wherein the graph structure further comprises the mapping relation between the root causes and the repairing modes.
Specifically, sequential searching is carried out according to the sequence of the root causes in the root cause set, at least one repairing mode having a mapping relation with one root cause in the graph structure is searched each time, and the database is repaired by using the at least one repairing mode having the mapping relation with the root cause. After each repair, acquiring the state information of the database according to the steps in the previous embodiment, judging whether all the abnormal indexes in the abnormal index set are in a normal state at the moment through the state information, if so, stopping the fault repair process of the database, and indicating that the fault of the database is repaired; if not, the next repair is carried out.
The first root cause may be any root cause in the root cause set.
Optionally, after the root cause set is obtained, in the process of repairing the database for the first time, the first root cause may be the first root cause in the root cause set, that is, the root cause sorted at the first position.
Optionally, for each abnormal index, the normal state indicates that the value of the abnormal index is within a preset value range corresponding to the abnormal index.
For example, if the value range corresponding to the first abnormal index is [0.35,0.55], and the value of the first abnormal index in the abnormal index set is 0.6, the first abnormal index is in a high abnormal state. After the database is repaired once, the values of the time nodes on the time series corresponding to the first abnormal index are all located in [0.35,0.55], and the first abnormal index is in a normal state.
In a possible embodiment, the repairing the database by using the at least one repairing manner until all the abnormal indexes in the abnormal index set are in a normal state includes: and searching at least one repairing mode having a mapping relation with a second root cause and repairing the database by using at least one repairing mode having a mapping relation with the second root cause, wherein the second root cause is a next root cause after the first root cause in the root cause set, based on the graph structure, when the database is repaired by using the at least one repairing mode and all abnormal indicators in the abnormal indicator set are not in the normal state.
The first root cause and the second root cause are two adjacent root causes in the root cause set, and the first root cause is arranged before the second root cause.
Specifically, after at least one repairing mode having a mapping relation with the first root cause is searched, the database is repaired in sequence by using the at least one repairing mode, that is, the number of the at least one repairing mode is the same as the number of times of repairing. When all the abnormal indexes in the abnormal index set are in a normal state in the process of repairing the database by using the at least one repairing mode, stopping the fault repairing process, namely repairing the database fault represented by the abnormal index set; and searching at least one repairing mode having a mapping relation with the second root cause and repairing the database in sequence by using the at least one repairing mode if all the abnormal indexes in the abnormal index set are not in a normal state after the database is repaired in sequence by using the at least one repairing mode.
The abnormal indexes in the abnormal index set are not all in a normal state, and after one-time repair, part of the abnormal indexes in the abnormal index set are in a normal state, and part of the abnormal indexes are not in a normal state.
In view of technical effects, according to the embodiment of the application, the restoration method having a mapping relationship with each root cause can be sequentially selected to restore the root cause according to the sequence of each root cause in the root cause set until each abnormal index is in a normal state. The method can quickly determine the root cause of the abnormal indexes in the abnormal index set, and improves the speed of fault repair.
In a possible embodiment, the searching for at least one repairing manner having a mapping relationship with a first root cause in the root cause set and repairing the database by using the at least one repairing manner includes: searching at least one repairing mode having a mapping relation with the first root cause to obtain a repairing mode set, wherein each repairing mode in the at least one repairing mode having the mapping relation with the first root cause corresponds to a weight coefficient, and in the repairing mode set, the weight coefficient corresponding to the repairing mode ranked in the front is larger than or equal to the weight coefficient corresponding to the repairing mode ranked in the rear; after the database is repaired by using a first repairing mode in the repairing mode set, if all abnormal indexes in the abnormal index set are in a normal state, stopping the repairing process, and if all abnormal indexes in the abnormal index set are not in the normal state, repairing the database by using a second repairing mode, wherein the second repairing mode is a next repairing mode arranged after the first repairing mode in the repairing mode set.
In the graph structure, each root cause has a mapping relation with at least one repair mode, and the at least one repair mode corresponds to a weight coefficient. And the weight coefficient corresponding to each repairing mode is used for representing the possibility of repairing the database fault caused by the corresponding root cause by using the repairing mode.
Optionally, when the weight coefficient corresponding to the repair method is larger, it indicates that the probability that the database fault caused by the corresponding root factor can be repaired by using the repair method corresponding to the weight coefficient is higher.
Specifically, the searching for at least one repair mode having a mapping relationship with the first root cause to obtain a repair mode set includes: searching at least one repairing mode having a mapping relation with the first root cause from the graph structure, and one weight coefficient corresponding to each repairing mode in the at least one repairing mode; and then sorting the at least one repairing mode based on the size of the weight coefficient to obtain the repairing mode set.
Optionally, the at least one repair method having a mapping relationship with the first root cause may be sorted in the order from large to small of the weight coefficient, so as to obtain the repair method set.
The process of performing fault repair on the database by using the repair modes in the repair mode set specifically comprises the following steps: repairing the database by using a first repairing mode in the repairing mode set, after the repairing is completed, acquiring state information of the database according to the steps in the embodiment, judging whether all abnormal indexes in the abnormal index set are in a normal state at the moment according to the state information, and if so, stopping a fault repairing process of the database to indicate that the fault of the database is repaired; if not, the database is repaired next time by using a second repairing mode which is sequenced after the first repairing mode, after the repairing is completed by using the second repairing mode, the state information of the database is also obtained, and whether all the abnormal indexes in the abnormal index set are in a normal state at the moment is judged.
Sequentially adopting each repairing mode to repair the database according to the sequence of each mode in the repairing mode set, and judging whether all abnormal indexes in the abnormal index set are in a normal state after each repairing; and stopping the database repairing process until all the abnormal indexes in the abnormal index set are detected to be in the normal state.
The first repair mode and the second repair mode may be any two adjacent repair modes in a repair mode set, and the first repair mode is ordered before the second repair mode.
It should be noted that, when the first repair mode is performed by using the repair mode set, the first repair mode is a repair mode ranked at the first position in the repair mode set, and the second repair mode is a repair mode ranked at the second position in the repair mode set.
It should be understood that the process of repairing the database by using the repairing manner having the mapping relationship with other root causes in the root cause set is the same as the process of repairing the database by using at least one repairing manner corresponding to the first root cause, and is not described herein again.
From the technical effect, according to the embodiment of the application, the database is subjected to fault repairing by sequentially selecting the corresponding repairing modes according to the weight coefficients corresponding to the repairing modes, so that the correct repairing mode can be quickly found out, and the fault repairing speed is increased.
Please refer to fig. 5, fig. 5 is a schematic diagram illustrating a mapping relationship between a root cause and a repair manner in a graph structure according to an embodiment of the present disclosure. As shown in fig. 5, the graph structure includes M root causes: 2, 8230, M; and N repair modes: repair method 1, repair method 2, repair method 3, \ 8230, and repair method N. Wherein M and N are positive integers.
In the graph structure shown in fig. 5, any root cause has a mapping relationship with at least one repair manner, the at least one repair manner constitutes a repair manner set corresponding to the any root cause, each repair manner in the at least one repair manner corresponds to one weight coefficient, and the sum of the weight coefficients corresponding to each repair manner in the at least one repair manner is 1.
The root cause 1 and the repair modes 1 and 3 have a mapping relationship, that is, the repair modes 1 and 3 constitute a repair mode set corresponding to the root cause 1, at this time, the weight coefficients corresponding to the repair modes 1 and 3 are a1 and a2, respectively, and a1+ a2=1. The root cause 2 has a mapping relationship with the repair method 1, the repair method 2, and the repair method 3, that is, the repair method 1, the repair method 2, and the repair method 3 constitute a repair method set corresponding to the root cause 2, and at this time, the weight coefficients corresponding to the repair method 1, the repair method 2, and the repair method 3 are b1, b2, and b3, respectively, and b1+ b2+ b3=1. The root cause M has a mapping relationship with the repair method 2 and the repair method N, that is, the repair method 2 and the repair method N constitute a repair method set corresponding to the root cause M, at this time, the weight coefficients corresponding to the repair method 2 and the repair method N are c1 and c2, respectively, and c1+ c2=1.
In one possible embodiment, the repair style in the set of repair styles includes at least one of restarting a database, performing a throttling operation, invoking a Structured Query Language (SQL) optimization, or calling an administrator.
Specifically, the graph structure or the repair method set may include sub-repair methods and a combined repair method obtained by freely combining the sub-repair methods. The sub-repairing method is a repairing method that cannot be subdivided, or a repairing method that only performs one operation, for example, the sub-repairing method may be restarting a database, performing a current limiting operation, invoking SQL optimization or calling an administrator, which is not listed herein. The combined repair mode is a repair mode obtained by combining one or more word repair modes, and the combined repair mode may be restarting the database + performing a current limiting operation, performing a current limiting operation + invoking SQL optimization, restarting the database + performing a current limiting operation + invoking SQL optimization, or the like, which is not listed herein.
Referring to fig. 6, fig. 6 is an exemplary diagram of a mapping relationship between a root cause and a repair manner in an embodiment of the present application. As shown in fig. 6, the root cause set includes three root causes: incorrect parameters, insufficient resources, and slow SQL + improper indexing.
In the graph structure, the three root causes have a mapping relationship with at least one repair method. The parameters are incorrect and have mapping relation with three repairing modes: in the mapping relation with the restarting database, the weight coefficient corresponding to the restarting database is 0.3; in the mapping relation with the calling parameter optimization, the weight coefficient corresponding to the calling parameter optimization is 0.6; in the mapping relationship with the call administrator, the weight coefficient corresponding to the call administrator is 0.1. The resource deficiency and the three repair modes have a mapping relation: in the mapping relation with the restarting database, the weight coefficient corresponding to the restarting database is 0.2; in the mapping relation with the execution of the current limiting operation, the weight coefficient corresponding to the execution of the current limiting operation is 0.7; in the mapping relation with the call administrator, the weight coefficient corresponding to the call administrator is 0.1. The slow SQL + improper index has a mapping relation with four repairing modes: in the mapping relation with the calling index recommendation, the weight coefficient corresponding to the calling index recommendation is 0.2; in the mapping relation between the index recommendation calling and the SQL calling optimization, the weight coefficient corresponding to the index recommendation calling and the SQL calling optimization is 0.5; in the mapping relation with the calling SQL optimization, the weight coefficient corresponding to the calling SQL optimization is 0.2; in the mapping relation with the call administrator, the weight coefficient corresponding to the call administrator is 0.1.
It should be understood that fig. 6 merely gives a specific example between the root cause and the repair manner in the graph structure in the embodiment of the present application, and does not limit the mapping relationship between the root cause and the repair manner in the graph structure in this scheme.
In one possible embodiment, the method further comprises: determining a third cause and a third repair mode, where the third repair mode is one of at least one repair mode having a mapping relationship with the third cause, and after the database is repaired by using the third repair mode, all abnormal indexes in the abnormal index set are in the normal state; and updating the weight coefficient corresponding to the third repair mode, and updating the weight coefficient corresponding to the third root cause in the P root causes.
Specifically, after a fault is repaired once, when it is detected that all indexes in the abnormal index set are in a normal state, the repairing mode used for the fault repairing is used as a third repairing mode. And taking the root cause corresponding to the repair manner set to which the third repair manner belongs in the root cause set as a third root cause, namely taking the cause of the abnormal condition represented by the abnormal index set of the database at this time as the third root cause, and repairing the database fault caused by the third root cause by adopting the third repair manner.
Optionally, after the third cause and the third repair method are determined, the weight coefficients corresponding to part or all of the repair methods in the repair method set to which the third cause belongs may be updated, so that after the weight coefficient corresponding to the third repair method is updated, the sum of the weight coefficients corresponding to all of the repair methods in the repair method set to which the third cause belongs is 1.
When the occurrence frequency of the third root cause in the P root causes is greater than 1, the weight coefficient corresponding to each third root cause needs to be updated.
To sum up, after the database fault is repaired, a fault repair path when the database generates an abnormality represented by the abnormality index set in the current scene may be determined: abnormal index set-third cause-third repair method.
The following describes the update process of the weight coefficient corresponding to the third factor by taking fig. 4 as an example:
assuming that the third root is root 1 in fig. 4, it can be seen from fig. 4 that anomaly index 1, anomaly index 2 and anomaly index 4 all have a mapping relation with root 1. At this time, root 1 corresponds to three weighting coefficients, and updating of the weighting coefficients corresponding to root 1 may include three aspects: (1) Updating the weight coefficients corresponding to all the root causes having the mapping relation with the abnormal index 1, namely updating the weight coefficient 0.4 corresponding to the root cause 1 in the mapping relation between the abnormal index 1 and the root cause 1, and at this time, in order to ensure that the sum of the weight coefficients corresponding to all the root causes having the mapping relation with the abnormal index 1 is 1, the weight coefficient 0.6 corresponding to the root cause 3 in the mapping relation between the abnormal index 1 and the root cause 3 needs to be updated synchronously; (2) Updating the weight coefficients corresponding to all the root causes having the mapping relation with the abnormal index 2, namely updating the weight coefficient 0.1 corresponding to the root cause 1 in the mapping relation between the abnormal index 2 and the root cause 1, synchronously updating the weight coefficient 0.3 corresponding to the root cause 2 in the mapping relation between the abnormal index 2 and the root cause 2, and updating the weight coefficient 0.6 corresponding to the root cause 4 in the mapping relation between the abnormal index 2 and the root cause 4; (3) The weighting coefficients corresponding to all the root causes having the mapping relation with the abnormal index 3 are updated, that is, the weighting coefficient 0.5 corresponding to the root cause 1 in the mapping relation between the abnormal index 3 and the root cause 1 is updated, and the weighting coefficient 0.5 corresponding to the root cause 3 in the mapping relation between the abnormal index 3 and the root cause 3 is updated synchronously.
In a possible embodiment, the updating the weight coefficient corresponding to the third repairing manner and the updating the weight coefficient corresponding to the third root cause of the P root causes includes: and increasing the weight coefficient corresponding to the third repair mode, and increasing the weight coefficient corresponding to the third root cause in the P root causes.
Optionally, the updating of the weight coefficient corresponding to the third repairing manner may be increasing or decreasing the weight coefficient corresponding to the third repairing manner, which is not limited in this application. The above updating of the weight coefficient corresponding to the third root cause in the P root causes may be increasing or decreasing the weight coefficient corresponding to all the third root causes in the P root causes, which is not limited in this application.
In view of technical effects, the embodiment of the application constructs the mapping relationship between the abnormal index and the root cause and the mapping relationship between the root cause and the repair manner based on the graph structure, so that the repair manner for the abnormal database (or called fault database) can be directly searched based on the graph structure, and compared with the AI operation and maintenance manner in the prior art, the AI operation and maintenance manner in the prior art does not need historical operation and maintenance data, i.e. has small dependency on the historical data. Meanwhile, since historical operation and maintenance data are not needed, the influence of unsuitable historical operation and maintenance data on the current repair process of the database can be avoided; in addition, in the actual operation and maintenance process, historical operation and maintenance data of different databases are difficult to obtain (confidentiality needs), so that the embodiment of the application has better universality. In addition, after the operation scene of the database is switched, the embodiment of the application can also directly search out a corresponding repairing mode based on the graph structure and quickly realize fault repairing, so that the cold start problem after the scene is switched in the prior art can be effectively avoided.
Please refer to fig. 7, fig. 7 is a schematic structural diagram of a database operation and maintenance apparatus in an embodiment of the present application. As shown in fig. 7, the database operation and maintenance device 700 includes an obtaining unit 701, a searching unit 702, and a repairing unit 703.
An obtaining unit 701, configured to obtain an abnormal index set of a database, where a value of each abnormal index in the abnormal index set is located outside a preset value range corresponding to each abnormal index; a searching unit 702, configured to search out at least one root cause having a mapping relationship with each abnormal indicator based on a graph structure, so as to obtain a root cause set, where the graph structure includes a mapping relationship between an abnormal indicator and a root cause; the method comprises the steps of searching out at least one repairing mode which has a mapping relation with a first root cause in the root cause set based on the graph structure; a repairing unit 703, configured to repair the database by using the at least one repairing manner until all the abnormal indicators in the abnormal indicator set are in a normal state, where the graph structure further includes a mapping relationship between the root cause and the repairing manner.
In a possible implementation manner, in an aspect that the graph structure is searched out to obtain at least one root cause having a mapping relationship with each abnormal index, so as to obtain a root cause set, the searching unit 702 is specifically configured to: searching at least one root cause having a mapping relation with each abnormal index based on the graph structure to obtain P root causes, wherein P is a positive integer; removing the weight of the P root causes to obtain Q root causes, and sequencing the Q root causes based on the weight coefficient of each root cause in the Q root causes to obtain the root cause set; wherein Q is a positive integer less than or equal to P.
In one possible embodiment, each of the P root factors and the Q root factors corresponds to a weight coefficient; the weighting coefficient of the ith root cause in the Q root causes is equal to the sum of all the weighting coefficients of the ith root cause in the P root causes, i =1, \ 8230, Q.
In a possible implementation manner, in the aspect that the database is repaired by using the at least one repairing manner until all the abnormal indexes in the abnormal index set are in a normal state, the searching unit 702 is specifically configured to: searching at least one repairing mode having a mapping relation with a second root cause based on the graph structure when the database is repaired by using the at least one repairing mode and all abnormal indexes in the abnormal index set are not in the normal state; the repair unit 703 is specifically configured to: repairing the database using at least one repair method having a mapping relationship with the second root cause, the second root cause being a next root cause ranked after the first root cause in the root cause set.
In a possible implementation manner, in the aspect of searching out at least one repairing manner having a mapping relationship with a first root cause in the root cause set, the searching unit 702 is specifically configured to: searching at least one repairing mode having a mapping relation with the first root cause to obtain a repairing mode set, wherein each repairing mode in the at least one repairing mode having the mapping relation with the first root cause corresponds to a weight coefficient, and in the repairing mode set, the weight coefficient corresponding to the repairing mode ranked in the front is larger than or equal to the weight coefficient corresponding to the repairing mode ranked in the rear; in the aspect of repairing the database by using the at least one repair manner, the repair unit 703 is specifically configured to: after the database is repaired by using a first repair mode in the repair mode set, if all abnormal indexes in the abnormal index set are in a normal state, stopping the repair process, and if all abnormal indexes in the abnormal index set are not in the normal state, repairing the database by using a second repair mode, wherein the second repair mode is the next repair mode arranged after the first repair mode in the repair mode set.
In a possible implementation manner, the normal state means that the value of the abnormal index is within a preset value range corresponding to the abnormal index.
In one possible embodiment, the apparatus further comprises: a determining unit, configured to determine a third cause and a third repair manner, where the third repair manner is one of at least one repair manner having a mapping relationship with the third cause, and after the database is repaired by using the third repair manner, all abnormal indexes in the abnormal index set are in the normal state; and the updating unit is used for updating the weight coefficient corresponding to the third repairing mode and updating the weight coefficient corresponding to the third root cause in the P root causes.
In a possible implementation, the updating unit is specifically configured to: and increasing the weight coefficient corresponding to the third repair mode, and increasing the weight coefficient corresponding to the third root cause in the P root causes.
In a possible implementation manner, the abnormal index in the abnormal index set includes at least one of a number of transactions per second TPS, a number of queries per second QPS, a response time, a central processing unit usage rate, a memory usage rate, or a number of read/write operations per second IOPS.
In a possible embodiment, the root cause in the root cause set comprises at least one of incorrect parameters, excessive traffic, improper indexing, or insufficient resources; the repair modes in the repair mode set comprise at least one of restarting a database, executing a current limiting operation, invoking Structured Query Language (SQL) optimization or calling an administrator.
Referring to fig. 8, fig. 8 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present application. As shown in fig. 8, the computer device 800 includes a processor 801, a memory 802, an interface circuit 803, and a bus 804. The processor 801, the memory 802, and the interface circuit 803 perform data transmission via the bus 804.
The computer device may be any feasible terminal device or server. For example, the mobile phone terminal, the tablet computer, the notebook computer, the Augmented Reality (AR)/Virtual Reality (VR), the vehicle-mounted terminal, and the cloud terminal, which are not limited in this application.
A memory 802 for storing computer program instructions; a processor 801 for retrieving program instructions from memory 802 to perform the steps of: acquiring an abnormal index set of a database, wherein the value of each abnormal index in the abnormal index set is positioned outside a preset value range corresponding to each abnormal index; searching at least one root cause having a mapping relation with each abnormal index based on a graph structure to obtain a root cause set, wherein the graph structure comprises the mapping relation between the abnormal indexes and the root causes; and searching at least one repairing mode having a mapping relation with a first root cause in the root cause set based on the graph structure, repairing the database by using the at least one repairing mode until all abnormal indexes in the abnormal index set are in a normal state, wherein the graph structure further comprises the mapping relation between the root causes and the repairing modes.
Specifically, in this embodiment, for a specific operation process of the processor 801 and the memory 802 on the computer device 800, reference may be made to a corresponding process in the foregoing method embodiment 200, which is not described herein again.
The embodiment of the application provides a chip system, which comprises at least one processor, a memory and an interface circuit, wherein the memory, the interface circuit and the at least one processor are interconnected through lines, and instructions are stored in the at least one memory; when the instructions are executed by the processor, part or all of the steps described in the method embodiment of fig. 2 are realized.
Embodiments of the present application provide a computer-readable storage medium, in which program instructions are stored, and when the program instructions are executed on one or more processors, some or all of the steps described in the method embodiment of fig. 2 are implemented.
The present application provides a computer program product, and when the computer program product runs on a computer device, part or all of the steps described in the method embodiment of fig. 2 above are implemented.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments. It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the above-described units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present application.

Claims (23)

1. A database operation and maintenance method, the method comprising:
acquiring an abnormal index set of a database, wherein the value of each abnormal index in the abnormal index set is positioned outside a preset value range corresponding to each abnormal index;
searching at least one root cause having a mapping relation with each abnormal index based on a graph structure to obtain a root cause set, wherein the graph structure comprises the mapping relation between the abnormal indexes and the root causes;
and searching at least one repairing mode having a mapping relation with a first root cause in the root cause set based on the graph structure, repairing the database by using the at least one repairing mode until all abnormal indexes in the abnormal index set are in a normal state, wherein the graph structure further comprises the mapping relation between the root causes and the repairing modes.
2. The method of claim 1, wherein the searching for at least one root cause having a mapping relationship with each index based on the value of each index to obtain a root cause set comprises:
searching at least one root cause having a mapping relation with each abnormal index based on the graph structure to obtain P root causes, wherein P is a positive integer;
removing the weight of the P root causes to obtain Q root causes, and sequencing the Q root causes based on the weight coefficient of each root cause in the Q root causes to obtain the root cause set;
wherein Q is a positive integer less than or equal to P.
3. The method of claim 2,
each root factor of the P root factors and the Q root factors corresponds to a weight coefficient;
the weighting coefficient of the ith root cause in the Q root causes is equal to the sum of all the weighting coefficients of the ith root cause in the P root causes, i =1, \ 8230, Q.
4. The method according to any one of claims 1 to 3, wherein the repairing the database by using the at least one repairing manner until all abnormal indexes in the abnormal index set are in a normal state comprises:
and searching at least one repairing mode having a mapping relation with a second root cause and repairing the database by using at least one repairing mode having a mapping relation with the second root cause, wherein the second root cause is a next root cause after the first root cause in the root cause set, based on the graph structure, when the database is repaired by using the at least one repairing mode and all abnormal indicators in the abnormal indicator set are not in the normal state.
5. The method according to any one of claims 1 to 4, wherein the searching out at least one repairing method having a mapping relationship with a first root cause in the root cause set and repairing the database by using the at least one repairing method includes:
searching at least one repairing mode having a mapping relation with the first root cause to obtain a repairing mode set, wherein each repairing mode in the at least one repairing mode having the mapping relation with the first root cause corresponds to a weight coefficient, and in the repairing mode set, the weight coefficient corresponding to the repairing mode ranked in the front is larger than or equal to the weight coefficient corresponding to the repairing mode ranked in the rear;
after the database is repaired by using a first repair mode in the repair mode set, if all abnormal indexes in the abnormal index set are in a normal state, stopping the repair process, and if all abnormal indexes in the abnormal index set are not in the normal state, repairing the database by using a second repair mode, wherein the second repair mode is the next repair mode arranged after the first repair mode in the repair mode set.
6. The method according to any one of claims 1 to 5,
the normal state means that the value of the abnormal index is within a preset value range corresponding to the abnormal index.
7. The method according to any one of claims 1-6, further comprising:
determining a third cause and a third repair mode, where the third repair mode is one of at least one repair mode having a mapping relationship with the third cause, and after the database is repaired by using the third repair mode, all abnormal indexes in the abnormal index set are in the normal state;
and updating the weight coefficient corresponding to the third repair mode, and updating the weight coefficient corresponding to the third root cause in the P root causes.
8. The method according to claim 7, wherein the updating the weighting factor corresponding to the third repair method and the weighting factor corresponding to the third root cause of the P root causes comprises:
and increasing the weight coefficient corresponding to the third repair mode, and increasing the weight coefficient corresponding to the third root cause in the P root causes.
9. The method according to any one of claims 1 to 8,
the abnormal indexes in the abnormal index set comprise at least one of transaction processing number per second TPS, query number per second QPS, response time, central processing unit utilization rate, memory utilization rate or read-write operation times per second IOPS.
10. The method of any one of claims 1 to 8,
the root cause in the root cause set comprises at least one of incorrect parameters, overlarge traffic, improper index or insufficient resources;
the repair modes in the repair mode set comprise at least one of restarting a database, executing a current limiting operation, invoking Structured Query Language (SQL) optimization or calling an administrator.
11. A database operation and maintenance device, the device comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an abnormal index set of a database, and the value of each abnormal index in the abnormal index set is positioned outside a preset value range corresponding to each abnormal index;
the searching unit is used for searching out at least one root cause which has a mapping relation with each abnormal index based on a graph structure to obtain a root cause set, and the graph structure comprises the mapping relation between the abnormal indexes and the root causes; the method comprises the steps of searching out at least one repair mode having a mapping relation with a first root cause in the root cause set based on the graph structure;
and the repairing unit is used for repairing the database by using the at least one repairing mode until all the abnormal indexes in the abnormal index set are in a normal state, and the graph structure further comprises a mapping relation between the root cause and the repairing mode.
12. The apparatus according to claim 11, wherein in the aspect that the graph-based structure is used to search out at least one root cause having a mapping relationship with each anomaly indicator, so as to obtain a root cause set, the searching unit is specifically configured to:
searching at least one root cause having a mapping relation with each abnormal index based on the graph structure to obtain P root causes, wherein P is a positive integer;
removing the weight of the P root causes to obtain Q root causes, and sequencing the Q root causes based on the weight coefficient of each root cause in the Q root causes to obtain the root cause set;
wherein Q is a positive integer less than or equal to P.
13. The apparatus of claim 12,
each root factor of the P root factors and the Q root factors corresponds to a weight coefficient;
the weighting coefficient of the ith root cause in the Q root causes is equal to the sum of all the weighting coefficients of the ith root cause in the P root causes, i =1, \ 8230, Q.
14. The apparatus according to any one of claims 11-13, wherein in said repairing the database by using the at least one repairing manner until all abnormal indexes in the abnormal index set are in a normal state,
the search unit is specifically configured to:
searching at least one repairing mode having a mapping relation with a second root cause based on the graph structure when the database is repaired by using the at least one repairing mode and all abnormal indexes in the abnormal index set are not in the normal state;
the repair unit is specifically configured to:
repairing the database using at least one repair method having a mapping relationship with the second root, the second root being a next root in the root set after the first root.
15. The apparatus of claim 14,
in the aspect of the searching for at least one repairing manner having a mapping relationship with the first root cause in the root cause set, the searching unit is specifically configured to:
searching at least one repairing mode having a mapping relation with the first root cause to obtain a repairing mode set, wherein each repairing mode in the at least one repairing mode having the mapping relation with the first root cause corresponds to a weight coefficient, and in the repairing mode set, the weight coefficient corresponding to the repairing mode ranked in the front is larger than or equal to the weight coefficient corresponding to the repairing mode ranked in the rear;
in the aspect of repairing the database by using the at least one repair manner, the repair unit is specifically configured to:
after the database is repaired by using a first repairing mode in the repairing mode set, if all abnormal indexes in the abnormal index set are in a normal state, stopping the repairing process, and if all abnormal indexes in the abnormal index set are not in the normal state, repairing the database by using a second repairing mode, wherein the second repairing mode is a next repairing mode arranged after the first repairing mode in the repairing mode set.
16. The apparatus according to any one of claims 11-15,
the normal state means that the value of the abnormal index is within a preset value range corresponding to the abnormal index.
17. The apparatus according to any one of claims 11-16, further comprising:
a determining unit, configured to determine a third cause and a third repair manner, where the third repair manner is one of at least one repair manner that has a mapping relationship with the third cause, and after the database is repaired by using the third repair manner, all the abnormal indexes in the abnormal index set are in the normal state;
and the updating unit is used for updating the weight coefficient corresponding to the third repairing mode and updating the weight coefficient corresponding to the third root cause in the P root causes.
18. The apparatus according to claim 17, wherein the updating unit is specifically configured to:
and increasing the weight coefficient corresponding to the third repair mode, and increasing the weight coefficient corresponding to the third root cause in the P root causes.
19. The apparatus of any one of claims 11-18,
the abnormal indexes in the abnormal index set comprise at least one of transaction processing number per second TPS, query number per second QPS, response time, central processing unit utilization rate, memory utilization rate or read-write operation times per second IOPS.
20. The apparatus of any one of claims 11-19,
the root cause in the root cause set comprises at least one of incorrect parameters, overlarge traffic, improper index or insufficient resources;
the repair modes in the repair mode set comprise at least one of restarting a database, executing a current limiting operation, invoking Structured Query Language (SQL) optimization or calling an administrator.
21. A chip system, comprising at least one processor, a memory, and an interface circuit, the memory, the interface circuit, and the at least one processor interconnected by a line, the at least one memory having instructions stored therein; the method of any of claims 1-10 when the instructions are executed by the processor.
22. A computer-readable storage medium, having stored therein program instructions which, when executed on one or more processors, implement the method of any one of claims 1-10.
23. A computer program product, characterized in that the method of any of claims 1-10 is implemented when the computer program product is run on a computer device.
CN202111189777.3A 2021-10-12 2021-10-12 Database operation and maintenance method and device Pending CN115964206A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111189777.3A CN115964206A (en) 2021-10-12 2021-10-12 Database operation and maintenance method and device
PCT/CN2022/122240 WO2023061227A1 (en) 2021-10-12 2022-09-28 Database operation and maintenance method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111189777.3A CN115964206A (en) 2021-10-12 2021-10-12 Database operation and maintenance method and device

Publications (1)

Publication Number Publication Date
CN115964206A true CN115964206A (en) 2023-04-14

Family

ID=85898214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111189777.3A Pending CN115964206A (en) 2021-10-12 2021-10-12 Database operation and maintenance method and device

Country Status (2)

Country Link
CN (1) CN115964206A (en)
WO (1) WO2023061227A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10589764B2 (en) * 2017-04-26 2020-03-17 General Electric Company Determining root cause of locomotive failure
CN107562034A (en) * 2017-07-14 2018-01-09 宝沃汽车(中国)有限公司 Fault handling method and processing system on line
CN112631818A (en) * 2020-12-24 2021-04-09 平安科技(深圳)有限公司 Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium
CN112559376A (en) * 2020-12-25 2021-03-26 中国建设银行股份有限公司 Automatic positioning method and device for database fault and electronic equipment
CN113342889A (en) * 2021-06-03 2021-09-03 中国工商银行股份有限公司 Distributed database management method, device, equipment and medium
CN113849486A (en) * 2021-11-30 2021-12-28 云和恩墨(北京)信息技术有限公司 Fault processing method, device thereof, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
WO2023061227A1 (en) 2023-04-20

Similar Documents

Publication Publication Date Title
CN108923952B (en) Fault diagnosis method, equipment and storage medium based on service monitoring index
CN103513983B (en) method and system for predictive alert threshold determination tool
CN108959400B (en) Bank system historical data cleaning method and device
US10878335B1 (en) Scalable text analysis using probabilistic data structures
US8538988B2 (en) Selective storing of mining models for enabling interactive data mining
CN114430365B (en) Fault root cause analysis method, device, electronic equipment and storage medium
CN114742477A (en) Enterprise order data processing method, device, equipment and storage medium
CN113360722A (en) Fault root cause positioning method and system based on multidimensional data map
CN115309575A (en) Micro-service fault diagnosis method, device and equipment based on graph convolution neural network
CN114968727B (en) Database through infrastructure fault positioning method based on artificial intelligence operation and maintenance
CN111865673A (en) Automatic fault management method, device and system
US20220284045A1 (en) Matching machine generated data entries to pattern clusters
CN113918532A (en) Portrait label aggregation method, electronic device and storage medium
CN112068979B (en) Service fault determination method and device
CN113138906A (en) Call chain data acquisition method, device, equipment and storage medium
CN114503132A (en) Debugging and profiling of machine learning model training
CN112306820A (en) Log operation and maintenance root cause analysis method and device, electronic equipment and storage medium
CN115964206A (en) Database operation and maintenance method and device
CN115860709A (en) Software service guarantee system and method
CN115580528A (en) Fault root cause positioning method, device, equipment and readable storage medium
CN114706856A (en) Fault processing method and device, electronic equipment and computer readable storage medium
CN114385399A (en) Fault root cause discovery method based on storage device model
KR20220069229A (en) The method of coupling with heterogeneous data using relation of fields in data
CN109992475A (en) A kind of processing method of log, server and storage medium
CN114036174B (en) Data updating method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination