WO2022111265A1 - Information alerting method and device, and storage medium - Google Patents

Information alerting method and device, and storage medium Download PDF

Info

Publication number
WO2022111265A1
WO2022111265A1 PCT/CN2021/129296 CN2021129296W WO2022111265A1 WO 2022111265 A1 WO2022111265 A1 WO 2022111265A1 CN 2021129296 W CN2021129296 W CN 2021129296W WO 2022111265 A1 WO2022111265 A1 WO 2022111265A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
monitoring
value
row
consumption
Prior art date
Application number
PCT/CN2021/129296
Other languages
French (fr)
Chinese (zh)
Inventor
李云龙
胡盼盼
卢道和
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022111265A1 publication Critical patent/WO2022111265A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Definitions

  • the present application relates to the technical field of computer applications, and in particular, to an information alarm method, device and storage medium.
  • the process of monitoring whether the database is abnormal is mainly realized by judging whether the database index value exceeds the corresponding fixed threshold, or whether the year-on-year growth ratio exceeds the corresponding fixed threshold.
  • the request process in different time periods may exhibit a certain periodicity. For example, the early morning is a low business peak period, and the daytime is a business peak period. For example, the batch tasks of the bank at night will lead to a large amount of requests.
  • the database index value is compared with the corresponding fixed threshold value. , resulting in the inability to accurately realize the alarm requirements of the database in each time period, resulting in a high alarm error rate.
  • the embodiments of the present application expect to provide an information alarm method, device, and storage medium, which solve the problem that the current use of fixed thresholds to implement database monitoring results in the inability to accurately realize the alarm requirements of the database in various time periods, and achieve The scheme of adaptively adjusting the threshold according to the actual situation to realize the database monitoring scheme effectively improves the accuracy of the database alarm.
  • an information alarm method the method includes:
  • a target threshold is determined based on the first monitoring value and the first historical monitoring value; wherein the target threshold includes at least one different threshold;
  • the alarm prompt information is used to provide an alarm prompt for the target monitoring indicator of the target monitoring object;
  • the alarm prompt information is displayed.
  • an information alarm device the device includes a memory, a processor and a communication bus; wherein:
  • the memory for storing executable instructions
  • the communication bus for realizing the communication connection between the processor and the memory
  • the processor is configured to execute the information alarm program stored in the memory to implement the steps of the information alarm method according to any one of the above.
  • a storage medium stores an information alarm program, and when the information alarm program is executed by a processor, implements the steps of the information alarm method described in any one of the above.
  • the information alarm device collects the target monitoring index of the target monitoring object according to the first sampling interval, and after obtaining the first monitoring value at the current moment, obtains the first historical monitoring value obtained by collecting the target monitoring index at the historical moment, And based on the first monitoring value and the first historical monitoring value, a target threshold is determined, then based on the first monitoring value and the target threshold, alarm prompt information is determined, and finally the alarm prompt information is displayed.
  • the information alarm device dynamically determines the target threshold value according to the first monitoring value at the current moment and the first historical monitoring value at the historical moment in real time, and generates alarm prompt information according to the relationship between the target threshold value and the first monitoring value, which solves the problem.
  • FIG. 1 is a schematic flowchart of an information alarm method provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of another information alarm method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another information alarm method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an information alarm method provided by another embodiment of the present application.
  • FIG. 5 is a schematic flowchart of another information alarm method provided by another embodiment of the present application.
  • FIG. 6 is a schematic flowchart of still another information alarm method provided by another embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an information alarm device according to an embodiment of the present application.
  • An embodiment of the present application provides an information alarm method. Referring to FIG. 1 , the method is applied to an information alarm device, and the method includes the following steps:
  • Step 101 Collect the target monitoring index of the target monitoring object according to the first sampling interval, and obtain the first monitoring value at the current moment.
  • the information alarm device may be a device that runs the target monitoring object, for example, a server, or a device with running capabilities such as a computer device.
  • the target monitoring object can be an application program, which can run on the information alarm device.
  • the target monitoring indicator can be the resources consumed by the information alarm device when running the target monitoring object, such as central processing unit (CPU) resources to be occupied, including the number of CPUs, that is, the number of cores and/or the running time of the CPU, etc. It is input and output (Input Output) resources, etc.
  • the target monitoring indicators can be set by default in advance, or can be set by the user according to the actual monitoring needs.
  • the first sampling interval is the sampling frequency used to collect the target monitoring indicators of the target monitoring object.
  • the first sampling interval cannot be too small.
  • the frequency should not be too high, in order to avoid increasing the monitoring pressure caused by the information alarm equipment monitoring the target monitoring object.
  • the first sampling interval can be continuously corrected or the user can continuously modify according to actual monitoring requirements.
  • Step 102 Obtain the first historical monitoring value obtained by collecting the target monitoring index at historical time.
  • the information alarm device when collecting the target monitoring index, the information alarm device will store the collected monitoring value, so that the first historical monitoring value collected at historical time can be obtained.
  • the first historical monitoring value may be a monitoring value within a period of time before the current moment.
  • Step 103 Determine a target threshold based on the first monitoring value and the first historical monitoring value.
  • the target threshold includes at least one different threshold.
  • the information alarm device analyzes and processes the first monitoring value at the current moment and the first historical monitoring value at the historical moment to obtain the target threshold.
  • the target threshold is dynamically determined according to the monitoring value collected at the current moment and the historical monitoring value at the historical moment.
  • Step 104 Determine alarm prompt information based on the first monitoring value and the target threshold.
  • the alarm prompt information is used to provide an alarm prompt for the target monitoring indicator of the target monitoring object.
  • the range of the target threshold where the first monitoring value is located is determined, so as to determine the corresponding alarm level, and generate corresponding alarm prompt information, so as to prompt the user through the alarm prompt information whether the target monitoring object is If there is a risk, whether it is necessary to eliminate the risk of the target monitoring object in time.
  • Step 105 displaying alarm prompt information.
  • the alarm prompt information may be displayed in a display area corresponding to the information alarm device.
  • the display area corresponding to the information alarm device may be the display screen of the information alarm device, or may be other devices with display functions that have a communication connection with the information alarm device, such as a mobile communication device that has a communication connection with the information alarm device, that is, After the information alarm device generates the alarm prompt information, it sends the alarm prompt information to the mobile communication device having a communication connection with the information alarm device, so as to display the alarm prompt information through the mobile communication device.
  • the information alarm device collects the target monitoring object according to the first sampling interval, and after obtaining the first monitoring value of the target monitoring indicator at the current moment, obtains the first historical monitoring value obtained by collecting the target monitoring indicator at the historical moment, and based on A target threshold is determined for the first monitoring value and the first historical monitoring value, then based on the first monitoring value and the target threshold, alarm prompt information is determined, and finally the alarm prompt information is displayed.
  • the information alarm device dynamically determines the target threshold value according to the first monitoring value at the current moment and the first historical monitoring value at the historical moment in real time, and generates alarm prompt information according to the relationship between the target threshold value and the first monitoring value, which solves the problem.
  • the embodiments of the present application provide an information alarm method.
  • the method is applied to an information alarm device, and the method includes the following steps:
  • Step 201 Collect the target monitoring index of the target monitoring object according to the first sampling interval, and obtain the first monitoring value at the current moment.
  • Step 202 Obtain the first historical monitoring value obtained by collecting the target monitoring index at historical time.
  • the server obtains the monitoring value of the target monitoring collected and stored at the historical moment from the storage area, and obtains the first historical monitoring value.
  • the storage area can be a local storage area of the server or a cloud storage area that the server can access.
  • Step 203 Acquire a second monitoring value at the current moment of the reference monitoring index of the target monitoring object and a second historical monitoring value collected by referring to the historical moment of the monitoring index.
  • the reference monitoring index and the target monitoring index have an associated relationship.
  • the reference monitoring indicator of the target monitoring object may be other indicators of the target monitoring object except the target monitoring indicator, and the reference monitoring indicator has a certain influence on the target monitoring indicator.
  • the information alarm device also collects the reference monitoring index of the target monitoring object at the first sampling interval, so that the second monitoring value of the reference monitoring index at the current moment and the second historical monitoring value at the historical moment can be obtained. It should be noted that when acquiring the second historical monitoring value of the reference monitoring index historical time, the historical time of the second historical monitoring value and the historical time of the first historical monitoring value of the target monitoring time correspond one-to-one.
  • the reference monitoring indicator can be the total request volume indicator in the database.
  • the first historical monitoring value collected for the target monitoring indicator and the second historical monitoring value collected for the reference monitoring indicator within a period of time before the current moment, such as within one month, are acquired.
  • the relationship between the reference monitoring indicator and the target monitoring indicator means that the change of the reference monitoring indicator will affect the change of the target monitoring indicator. For example, when the database is running normally, the CPU usage and IO usage of the server will maintain a certain positive correlation with the total request volume of the database. When the database is running abnormally, the total request volume of the database will decrease. CPU usage and IO usage increase instead.
  • Step 204 Determine the target weight coefficient based on the first monitoring value, the first historical monitoring value, the second monitoring value and the second historical monitoring value.
  • the server analyzes the first monitoring value, the first historical monitoring value, the second monitoring value and the second historical monitoring value to obtain the target weight coefficient.
  • Step 205 Determine a first reference value based on the first historical monitoring value.
  • the information alarm device analyzes the first historical monitoring value to obtain the first reference value.
  • Step 206 From the first historical monitoring value, obtain the first historical sub-monitoring value of the previous moment adjacent to the current moment.
  • the first sampling interval is 1 minute as an example for illustration, and from the first historical monitoring value, obtain the current time at 14:00 on November 11 at 14:00 at the previous time at 13:00 on November 11: The first history sub-monitor value of 59.
  • Step 207 Determine the difference between the first monitoring value and the first historical sub-monitoring value to obtain a second reference value.
  • Step 208 Obtain a target threshold based on the target weight coefficient, the second reference value, the first monitoring value, at least one preset weight coefficient and the first reference value.
  • the target threshold includes at least one different threshold.
  • the target weight coefficient, the second reference value, the first monitoring value, the at least one preset weight coefficient, and the first reference value are analyzed and calculated to obtain a target threshold value including at least one different threshold value, and the target threshold value includes
  • the number of thresholds is the same as the number of at least one preset weight coefficient.
  • At least one preset weight coefficient includes two different preset weight coefficients, which are preset weight coefficient 1 and preset weight coefficient 2, in this way, according to preset weight coefficient 1, target weight coefficient, second reference value, first A monitoring value and a first reference value, one of a target threshold value can be determined, and a target threshold value can be determined according to the preset weight coefficient 2, the target weight coefficient, the second reference value, the first monitoring value and the first reference value another threshold in .
  • Step 209 Determine alarm prompt information based on the first monitoring value and the target threshold.
  • the alarm prompt information is used to provide an alarm prompt for the target monitoring indicator of the target monitoring object.
  • Step 210 displaying alarm prompt information.
  • the relationship with the target monitoring indicator is also considered.
  • the second monitoring value at the current moment and the corresponding second historical monitoring value of the reference monitoring index of the relationship so that the target threshold value that is more in line with the actual application situation can be obtained, the reliability of the target threshold value is effectively improved, and the automatic Accuracy of alerts.
  • step 204 may be implemented by steps a11 to a14:
  • Step a11 Process the first monitoring value and the first historical monitoring value by determining the difference between the monitoring value at the second moment and the monitoring value at the first moment to obtain the second difference value of the target monitoring index at different times.
  • the first moment and the second moment are two adjacent moments, and the first moment is farther from the current moment than the second moment is from the current moment.
  • the monitoring value at the next moment that is, the monitoring value at the second moment minus the monitoring value at the previous moment is adopted.
  • the value is the calculation method of the monitoring value at the next moment, and it is determined to obtain the second difference value at different times.
  • Step a12 Process the second monitoring value and the second historical monitoring value by determining the difference between the monitoring value at the second moment and the monitoring value at the first moment to obtain the third difference value of the reference monitoring index at different times.
  • the monitoring value at the next moment is used, that is, the monitoring value at the second moment minus the monitoring value at the previous moment.
  • the value is the calculation method of the monitoring value at the first moment, and it is determined to obtain the third difference value at different moments.
  • Step a13 Determine the ratio of the third difference and the second difference at the same moment to obtain a reference ratio.
  • the ratio calculation is performed on the obtained third difference value and the second difference value at the same time. , to get the reference ratio.
  • Step a14 Determine the target weight coefficient based on the reference ratio.
  • step a14 may be implemented by steps a141 to a142:
  • Step a141 Obtain from the reference ratio a second preset number of target ratios whose ratios are greater than zero and are closest to the current moment.
  • the second preset number may be an empirical value obtained from a large number of experiments or actual application scenarios, or may be set by the user according to actual needs. If the reference ratio is greater than zero, it can indicate that the target monitoring object is running in a normal working state. At adjacent moments, the running state of the target monitoring object generally does not have major problems suddenly. Therefore, the ratio of the second preset number that is closer to the current moment in the normal working state can be obtained, and the second preset number can be obtained. number of target ratios.
  • Step a142 Obtain a target weight coefficient based on the second preset number of target ratios.
  • a preset processing method is used to process the second preset number of target ratios to obtain a value, and the value is used as the target weight coefficient.
  • the preset processing method may be, for example, calculating an average value after summation, or calculating an average value after weighted summation, or a mathematical statistical induction method such as standard deviation.
  • step a142 may be implemented by the following steps: determining an average value of a second preset number of target ratios to obtain a target weight coefficient.
  • the accumulated value of the second preset number of target ratios is calculated, and then the ratio of the accumulated value of the second preset number of target ratios to the second preset number is calculated to obtain the corresponding average value, and the This average is used as the target weight coefficient.
  • the third difference value corresponding to the target monitoring index value at the same time and the second difference value corresponding to the reference monitoring index value are analyzed to determine the target weight coefficient. Because the operation of the database has a certain The regularity ensures the validity of the target weight coefficient and is more in line with the actual use requirements.
  • step 205 may be implemented by steps b11 to b12:
  • Step b11 From the first historical monitoring value, obtain a first predetermined number of second historical sub-monitoring values at the same moment as the current moment within a first predetermined number of time periods before the current moment.
  • the first preset quantity is an empirical value obtained according to a large number of experiments, or a value set by the user according to actual needs. Taking the time period as days and the first preset number as 7 as an example, from the first historical monitoring value, obtain the 7 monitoring values at the same moment as the current moment within 7 days before the current moment, as the 7 second monitoring values. Historical sub-monitoring values. Exemplarily, the current time is 14:00 on November 11, and the monitoring value at 14:00 seven days before 14:00 on November 11 is obtained to obtain 14:00 on November 4 and 14:00 on November 5 , 14:00 on November 6, 14:00 on November 7, 14:00 on November 8, 14:00 on November 9, and 14:00 on November 10, as the 7th 2. History sub-monitoring value.
  • Step b12 Determine the standard deviation of the first preset number of second historical sub-monitoring values to obtain a first reference value.
  • the standard deviation calculation formula is used to calculate the first preset number of second historical sub-monitoring values to obtain the standard deviation of the first preset number of second historical sub-monitoring values, and the first preset number of second historical sub-monitoring values is calculated.
  • the standard deviation of the number of second historical sub-monitoring values is set as the first reference value.
  • the analysis is performed through the historical monitoring values within a period of time closest to the current moment. Since the performance of the database within a period of time is basically kept constant, the consumption of computing resources in the calculation and analysis process is effectively reduced, and at the same time, the Guarantee the monitoring effect.
  • step 208 may be implemented by steps c11 to c13:
  • Step c11 Determine the first product of the target weight coefficient and the second reference value.
  • Step c12 Determine the product of at least one preset weight coefficient and the first reference value to obtain at least one second product.
  • Step c13 Determine the accumulated value of the first product, each of the at least one second product, and the first monitoring value to obtain the target threshold.
  • the standard deviation of the first preset number of values at the same time in the recent period of time is calculated, and the value of the standard deviation is used to adjust the threshold value, because the observed indicators cannot fully meet the model expectations, by adding Standard deviation to allow a certain tolerance, which can be more in line with the actual needs of detection.
  • multiple different thresholds are obtained by setting, which improves the universality for various application situations, thereby enabling different alarms, and effectively improving the user experience.
  • step 209 may be implemented by steps 209a to 209f:
  • Step 209a Obtain the target upper limit value of the target monitoring indicator.
  • the target upper limit value is a preset upper limit value, that is, for the target monitoring index, the allowable maximum value corresponding to the target monitoring index.
  • step 209a after the information alarm device executes step 209a, it can choose to execute steps 209b to 209c, or step 209d, or step 209e, or step 209f.
  • Step 209b If the first monitoring value is less than the target upper limit value, and the first monitoring value is greater than or equal to the first threshold value, continuously collect a third preset number of third monitoring values of the target monitoring index according to the second sampling interval.
  • the second sampling interval is smaller than the first sampling interval, and the target threshold includes the first threshold.
  • the first threshold is the smallest threshold among at least one threshold included in the target threshold.
  • the third preset number is an empirical value obtained from a large number of experiments or a value set by the user, and usually the third preset number is smaller than the first preset number. Taking the second sampling interval as 3 seconds and the third preset number as 2 as an example, when the first monitoring value is less than the target upper limit value, and the first monitoring value is greater than or equal to the first threshold value, for every 3 seconds
  • the target monitoring indicators are continuously collected twice to obtain two third monitoring values.
  • Step 209c If the third preset number of third monitoring values are all smaller than the target upper limit value, and at least one third monitoring value is greater than or equal to the second threshold, generate first alarm information.
  • the target threshold includes a second threshold, the second threshold is greater than the first threshold, and the first alarm information is used to implement a major alarm for the target monitoring object.
  • three third monitoring values are used as an example for description, and the case where at least one third monitoring value is greater than or equal to the second threshold includes: any one of the three third monitoring values is greater than or equal to equal to the second threshold, any two of the three third monitoring values are greater than or equal to the second threshold, and all the three third monitoring values are greater than or equal to the second threshold.
  • Step 209d if the first monitoring value is greater than or equal to the target upper limit value, generate second alarm information.
  • the second alarm information is used to implement a serious alarm for the target monitoring object.
  • Step 209e If at least one third monitoring value in the third preset number of third monitoring values is greater than or equal to the target upper limit value, generate second alarm information.
  • Step 209f if the third preset number of third monitoring values are all smaller than the second threshold, generate third alarm information.
  • the third alarm information is used to implement a secondary alarm for the target monitoring object.
  • the target monitoring object is running normally, and no alarm is required.
  • an embodiment of the present application provides an information alarm method, as shown in FIG. 3 , including the following steps:
  • Step 31 Collect the monitoring value of the target monitoring index of the database.
  • the monitoring indicators of the database are classified, as shown in Table 1.
  • Table 1 lists the indicator types, indicator names and corresponding indicator attributes.
  • each instance will have different threshold standards because it carries different services and different request volumes, and such indicators are key monitoring objects, so adaptive thresholds are used to monitor this type of resource type
  • the CPU usage and IO usage in the metrics that is, the target monitoring metrics are CPU usage and/or IO usage.
  • the monitoring system operated by the information alarm device cyclically collects the monitoring data of the target monitoring indicators according to a certain sampling frequency.
  • the sampling frequency is the sampling time interval, which can be adjusted as needed. The default is to collect once every minute. too much stress.
  • Step 32 generating an adaptive threshold.
  • the self-adaptive threshold U(t) p(t)+ ⁇ .
  • is the weighted ratio, which is used to control the proportion of the monitoring value of the target monitoring index at the current moment in the formula.
  • the p(t) formula can be used to adapt to the speed of local behavior. If ⁇ is set to a fixed constant, there is often no Very good applicability. Therefore, ⁇ changes as the monitoring index changes with the running time.
  • the monitoring values of the IO usage rate and CPU usage rate of the monitoring indicators of the database will maintain a certain positive correlation with the total request volume of the database. The monitoring values of the indicators IO usage and CPU usage increase instead. Therefore, it is only necessary to consider the correlation between the IO usage rate and the CPU usage rate and the total request volume under normal circumstances.
  • Qt represents the total request volume value at the current time t
  • the IO usage rate at time t is related to the two indicators of the total request volume.
  • the value of ⁇ at the current time t is obtained by taking its average value.
  • is the standard deviation.
  • the process of obtaining ⁇ is as follows: since the monitoring system always detects the database, there is an observation value for the target monitoring indicator at the same time every day. In this way, when the target monitoring indicator is the IO usage rate, it can be Obtain the historical detection value of the IO usage rate at the same time for 7 times in the past period of time, such as a week.
  • the standard deviation ⁇ can be obtained by calculating the standard deviation of the historical detection values of the IO usage rates at the same time.
  • the value of ⁇ determines different tolerances, and if it is set to a fixed value, there may also be universal problems. Therefore, to alert the target monitoring indicators here, different ⁇ values can be set.
  • the set beta value is usually an empirical value. In this way, when the ⁇ values are different, for example, when two ⁇ values are set as ⁇ 1 and ⁇ 2 ( ⁇ 1 ⁇ 2, both are positive integers), the corresponding adaptive thresholds U(t) are marked as U1 and U2, and U1 ⁇ U2.
  • Step 33 Match the corresponding alarm severity.
  • the upper limit of resources is usually considered, and the upper limit is denoted as L (L is a fixed constant).
  • L is a fixed constant.
  • the adaptive thresholds U1 and U2 can be referred to as shown in FIG. 4 , where U1 ⁇ U2. As shown in Figure 4, it includes the following steps:
  • Step 41 Obtain the monitoring value of the IO usage rate at the current moment.
  • Step 42 Determine whether the monitoring value of the IO usage rate is greater than the upper limit L, if the monitoring value of the IO usage rate is greater than or equal to the upper limit L, go to step 43, and if the monitoring value of the IO usage rate is less than the upper limit L, go to step 44.
  • Step 43 generate a CRITICAL alarm.
  • the CRITICAL alarm corresponds to the aforementioned first alarm information, and the CRITICAL level alarm indicates that an abnormality has occurred in the database and needs to be dealt with immediately.
  • Step 44 determine whether the monitoring value of the IO usage rate is greater than the adaptive threshold U1, if the monitoring value of the IO usage rate is less than the adaptive threshold U1, go to step 45, if the monitoring value of the IO usage rate is greater than or equal to the adaptive threshold U1, execute Step 46.
  • Step 45 determine that the database is normal.
  • Step 46 Continuously collect the monitoring values of the IO usage rate for two times according to the collection interval of 3 seconds, and obtain two monitoring values.
  • Step 47 Determine whether the two monitoring values are greater than the upper limit L, if at least one of the two monitoring values is greater than or equal to the upper limit L, go to step 43, if both monitoring values are less than the upper limit L, go to step 48.
  • Step 48 determine whether the 2 monitoring values are greater than the adaptive threshold U2, if at least one monitoring value in the 2 monitoring values is greater than or equal to the adaptive threshold U2, perform step 49, if the 2 monitoring values are less than the adaptive threshold U2, Step 410 is performed.
  • Step 49 generate a MAJOR alarm.
  • the MAJOR alarm corresponds to the aforementioned second alarm information, and the MAJOR level alarm needs to be focused on, and may have a certain impact.
  • Step 410 Generate a MINOR alarm.
  • MINOR alarms correspond to the aforementioned third alarm information, MINOR-level alarms do not need to be processed immediately, and data can be collected for potential risk analysis afterwards.
  • a method for implementing root cause recommendation is provided when the alarm prompt information is the first alarm information or the second alarm information.
  • the information alarm After the device performs steps 201-208 and steps 209a-209c, or steps 201-208, steps 209a and 209d, or steps 201-208, steps 209a and 209e, it is also used to perform steps 211-215:
  • Step 211 If the alarm prompt information is the first alarm information or the second alarm information, obtain at least one structured query language SQL statement executed by the target monitoring object at the current moment.
  • Step 212 Obtain an execution plan corresponding to at least one SQL statement.
  • an execution plan (also called a query plan or an explain plan) is a specific step for a database to execute an SQL statement, such as accessing data in a table through an index or a full table scan, and the implementation method and connection of the connection query. order, etc.
  • Step 213 Determine the consumption cost of each SQL statement corresponding to the target monitoring indicator based on the execution plan corresponding to the at least one SQL statement, and obtain the consumption cost corresponding to the at least one SQL statement.
  • the execution plan corresponding to each SQL statement in the execution plan corresponding to the at least one SQL statement is analyzed to determine the consumption cost of the target monitoring indicator corresponding to each SQL statement, so as to obtain the corresponding execution plan of the at least one SQL statement.
  • consumption cost For example, the information alarm device determines that there are three SQL statements currently executed in the database, obtains the execution plans corresponding to the three SQL statements respectively, and according to the execution plan corresponding to each SQL statement in the execution plans corresponding to the three SQL statements, Determine the consumption cost of the target monitoring indicator corresponding to each SQL statement, so as to obtain the corresponding consumption cost of the three SQL statements.
  • Step 214 Based on the consumption cost corresponding to the at least one SQL statement, sort the at least one SQL statement according to the order of the consumption cost from high to low, and obtain a SQL statement sorting result.
  • the three SQL statements include SQL statement 1, SQL statement 2, and SQL statement 3, and the corresponding IO consumption costs are IO_cost1, IO_cost2, and IO_cost3 in sequence.
  • the sorting order to the lowest is IO_cost3>IO_cost1>IO_cost12.
  • the sorting order of the corresponding three SQL statements that is, the sorting results of the SQL statements are: IO_cost3, IO_cost1, and IO_cost2.
  • Step 215 displaying the sorting result of the SQL statement.
  • steps 211 to 215 may be performed before step 210 , wherein step 215 and step 210 may be performed simultaneously, and step 215 may also be performed after step 210 .
  • step 213 may be implemented by steps d11 to d19 and/or steps e11 to e16, wherein the embodiment corresponding to steps d11 to d19 provides determination of the consumption of the target monitoring object
  • the solution implementation process when the cost is the IO consumption cost
  • the embodiments corresponding to steps e11 to e16 provide the solution implementation process when it is determined that the consumption cost of the target monitoring object is the CPU consumption cost.
  • the target monitoring indicator of the target monitoring object when determining the consumption cost, only the IO consumption cost can be determined, or the CPU consumption cost can also be determined while determining the IO consumption cost; , when the target monitoring indicator of the target monitoring object is the CPU indicator, when determining the consumption cost, you can only determine the CPU consumption cost, or you can determine the CPU consumption cost at the same time as the CPU consumption cost.
  • the actual execution process can refer to actual application requirements.
  • Step d11 Determine the number of rows of each identity ID included in the execution plan corresponding to each SQL statement based on the execution plan corresponding to at least one SQL statement.
  • the execution plan corresponding to each SQL statement includes an id column, and the id column is used to indicate the sequence number of the selection (SELECT) in the corresponding SQL statement, that is, the corresponding ID. Among them, those with larger ID values are executed first, and those with the same ID value are executed from top to bottom. If the id value is Null, it means that the union operation is performed using the row results of other id values. It is necessary to judge which step to place according to the value of the table column ⁇ union m,n>, and place it in min(m,n) (take m and The smaller value of n) is executed after the next step.
  • step d11 after the information alarm device executes step d11, it can choose to execute steps d12-d13, or steps d14-d18.
  • steps d12 to d13 are selected to be executed; when the number of rows of each ID is greater than 1, steps d14 to d18 are selected to be executed.
  • Step d12 Determine the ID corresponding to each SQL statement with the row number of 1 as the first ID, and obtain the first row number and average length corresponding to the first ID.
  • the number of the first row is recorded in the execution plan corresponding to each SQL statement.
  • the first row number is recorded in the rows column in the execution plan corresponding to each SQL statement, and the first row number may be represented by rows.
  • the average length AVG_ROW_LENGTH of the first ID corresponding to each SQL statement can be obtained from the file named information_schema.tables.
  • Step d13 rounding up the product of the first row number and the average length and the preset InnoDB data page size to obtain the IO consumption cost of the first ID.
  • the default uncompressed data page of InnoDB data page in MySQL is 16KB, and the data page includes seven parts, data page file management header information, data page header information, maximum and minimum records, user records, free space, Data directory (slot), tail of data pages.
  • the preset InnoDB data page size can be recorded as innodb_page_size, corresponding to the IO consumption cost of SQL statements with a row number of 1 Among them, the symbol Used to indicate rounding up. InnoDB data pages can also be set by the user.
  • a certain ID in SQL statement 1 is, for example, ID1 with multiple rows, such as k rows, count the number of returned rows and the number of scanned rows of each row of ID1.
  • a way to determine the number of returned rows and the number of scanned rows of each row of each ID of each SQL statement may be as follows: From the execution plan corresponding to SQL statement 1, determine the k rows corresponding to ID1, and according to The row order of the k rows corresponding to ID1 from top to bottom determines the number of returned rows for each row in turn.
  • the Join method that is, the number of join comparisons needs to be counted.
  • the number of join comparisons can be determined by Table 2.
  • RN represents the number of records in the outer table
  • SN represents the number of records in the inner table.
  • SNLJ refers to a simple nested loop join
  • the corresponding English spelling is the Join method of Simple Nested-Loops Join
  • INLJ refers to an index-based nested loop join
  • the corresponding English spelling is full
  • BNL refers to the block-based nested loop join
  • the corresponding English spelling is Block Nested-Loops Join
  • CHJ refers to the classic hash join
  • the corresponding English spelling is Classic Hash Join.
  • return_rows_k return_rows_(k-1)*(rows_k*filtered_k)
  • return_rows_k return_rows_(k-1)
  • return_rows_k table_rows_k.
  • return_rows_(k-1) is the number of rows returned by row k-1, which is the number of outer rows in the execution order
  • table_rows_k is the number of inner table rows.
  • join_compare_k table_rows_k*index_height, where index_height is used to represent the index height. It should be noted that the index distinction index_cdl and index height index_height can be obtained from the mysql.innodb_index_stat table corresponding to ID1 in SQL statement 1.
  • Step d15 Obtain the average length corresponding to the corresponding second ID.
  • the average length AVG_ROW_LENGTH corresponding to the second ID can be obtained from information_schema.tables corresponding to each SQL statement.
  • Step d16 Determine the difference between the number of scanned lines in each line of the second ID and the number of returned lines in the corresponding adjacent previous line to obtain the first difference.
  • first difference value examine_rows_k ⁇ return_rows_(k ⁇ 1).
  • Step d17 Determine the product of the first difference value and the average length and the preset InnoDB data page size and perform rounding processing to obtain the IO consumption sub-cost corresponding to each row of the second ID.
  • Step d18 Determine the accumulated value of the IO consumption sub-cost corresponding to each row of the second ID, and obtain the IO consumption cost of the second ID.
  • the IO consumption cost of the second ID of SQL statement 1 The symbol ⁇ denotes accumulation.
  • Step d19 Determine the cumulative value of the IO consumption cost of each ID included in the execution plan corresponding to each SQL statement, and obtain the consumption cost corresponding to at least one SQL statement.
  • the accumulated value of the IO consumption cost of each ID includes the IO consumption cost of the first ID and/or the IO consumption cost of the second ID.
  • SQL statement 1 includes 5 IDs, wherein the number of rows for 3 IDs such as ID1, ID2 and ID3 is 1 row, and the number of rows for the remaining 2 IDs such as ID4 and ID5 includes at least 2 rows , therefore, the IO consumption cost of ID1, the IO consumption cost of ID2 and the IO consumption cost of ID3 can be determined through steps d12 to d13 respectively, and the IO consumption cost of ID4 and the IO consumption cost of ID5 can be determined through steps d14 to d17 respectively.
  • a plan table with an ID of 1 included in a certain SQL statement can be referred to as shown in Table 3.
  • the SQL statements in the at least one SQL statement are sorted according to the IO consumption cost and/or the CPU consumption cost of the at least one SQL statement at the current moment to obtain a sorting result.
  • root cause sorting can also be performed in combination with the number of executions of SQL statements, wherein the SQL statements can be sorted according to the number of executions from high to low, and SQL statements with the same number of executions can be sorted according to Sort by IO consumption cost or CPU consumption cost.
  • Step e11 Determine the number of rows of each identity ID included in the execution plan corresponding to each SQL statement based on the execution plan corresponding to at least one SQL statement.
  • Step e12 determine that the ID whose row number is 1 is the first ID.
  • Step e13 Determine the CPU consumption cost corresponding to the first ID based on the execution plan of the first ID.
  • Step e14 determine that the ID whose number of rows is greater than 1 is the second ID.
  • Step e15 Determine the CPU consumption cost corresponding to each row of the second ID based on the execution plan of each row of the second ID.
  • the second ID includes 3 rows, and the CPU consumption cost corresponding to each row of the second ID is determined according to the parameters included in the execution plan of each row of the second ID, and the first ID of the second ID is obtained.
  • Step e16 Accumulate the CPU consumption cost corresponding to each row of the second ID to obtain the CPU consumption cost corresponding to the second ID.
  • the CPU consumption cost corresponding to each of the three rows included in the second ID is accumulated, that is, the CPU consumption cost corresponding to the first row and the CPU consumption cost corresponding to the second row of the second ID are calculated.
  • the accumulated value of the CPU consumption cost corresponding to the third row is used to obtain the CPU consumption cost corresponding to the second ID.
  • step e13 may be implemented by steps e131 to e135:
  • Step e131 based on the execution plan of the first ID, determine the number of the first row and the number of returned rows corresponding to the first ID.
  • Step e132 From the execution plan of the first ID, obtain the access type Type and extra field information Extra corresponding to the first ID.
  • the row number of the row whose ID is 2 of SQL statement 1 is 1 is used as an example for description.
  • the access type Type of the row whose ID is 2 in SQL statement 1 can be obtained from the Select_Type column of Table 4, and the Extra of the row whose ID is 2 in SQL statement 1 can be obtained from the Extra column of Table 4.
  • Step e133 obtaining the first CPU consumption sub-cost corresponding to the first ID based on the Type, Extra and the first row number corresponding to the first ID.
  • f_filter(rows) between the first row number rows and the consumption sub-cost can be referred to as shown in Table 5.
  • primary_key_height identifies the component index height, which is generally 3 or 4.
  • Step e134 Determine the product of the logarithm of the number of returned rows and the number of returned rows to obtain the second CPU consumption sub-cost corresponding to the first ID.
  • Step e135 Determine the sum of the first CPU consumption sub-cost and the second CPU consumption sub-cost corresponding to the first ID, and obtain the CPU consumption cost corresponding to the first ID.
  • step e15 can be realized by steps e151 ⁇ e156:
  • Step e151 Determine the second row number and the returned row number of each row of the second ID based on the execution plan of each row of the second ID.
  • Step e152 From the execution plan corresponding to each row of the second ID, obtain the access type Type and extra field information Extra corresponding to each row of the second ID, and the number of table rows corresponding to each row of the second ID.
  • Step e153 obtaining the third CPU consumption sub-cost corresponding to each row of the second ID based on the Type, Extra and the number of the second row corresponding to each row of the second ID.
  • step e153 for the specific implementation process of step e153, reference may be made to the specific implementation process of step e133, which will not be described in detail here.
  • Step e154 Determine the product of the logarithm of the number of returned rows of each row of the second ID and the corresponding number of returned rows to obtain the fourth CPU consumption sub-cost corresponding to each row of the second ID.
  • step e154 for the specific implementation process of step e154, reference may be made to the specific implementation process of step e134, which will not be described in detail here.
  • Step e155 Determine the product of the number of table rows corresponding to each row of the second ID and the index height corresponding to each row of the second ID, and obtain the fifth CPU consumption sub-cost corresponding to each row of the second ID.
  • Step e156 Determine the cumulative value of the third CPU consumption sub-cost, the fourth CPU consumption sub-cost and the fifth CPU consumption sub-cost corresponding to each row of the second ID, and obtain the CPU consumption cost of each row of the second ID.
  • the CPU consumption cost and/or the IO cost is analyzed, and when the alarm is issued, the analysis result is output to implement the root cause recommendation, which effectively reduces the work difficulty of the analyst and improves the This improves the efficiency of the analyst in determining alarm faults and improves the utilization efficiency of the analyst.
  • another method for implementing root cause recommendation is provided when the alarm prompt information is the first alarm information or the second alarm information.
  • the information After the alarm device performs steps 208 and 209a to 209c, or steps 201 to 208, steps 209a and 209d, or steps 201 to 208, steps 209a and 209e, it is also used to perform steps 216 to 223:
  • Step 216 If the alarm prompt information is the first alarm information or the second alarm information, acquire at least one SQL statement executed by the target monitoring object at the current moment.
  • the sql statement currently being executed may be obtained from a currently high-load database table, for example, the table file name is information_schema.processlist.
  • Step 217 Group at least one SQL statement in the same manner as the statement to obtain at least one set of SQL statements.
  • SQL statements with the same sentence pattern but different parameter values can be grouped into a group of SQL.
  • the grouping can also be based on each The characteristics of a group of SQL statements, and the SQL fingerprint is generated as the mark of the group of SQL statements.
  • Step 218 based on at least one set of SQL statements, count the number of SQL statements included in each set of SQL statements.
  • the number of SQL statements included in each group of SQL statements in the at least one group of SQL statements is counted to obtain the number of SQL statements included in each group of SQL statements.
  • Step 219 Acquire one SQL statement from each group of SQL statements in the at least one group of SQL statements, to obtain at least one target SQL statement.
  • Step 220 Determine the target standby database, and run each target SQL statement respectively through the target standby database.
  • the target standby database refers to a standby database with a lower load.
  • you can use the statement set profiling 1 to achieve this.
  • Step 221 Determine the resource consumption sub-cost consumed when the target standby database runs each target SQL statement, and obtain the resource consumption sub-cost of each target SQL statement.
  • the target standby data when the target standby database runs each target SQL statement, the target standby data records the resource consumption. Therefore, the resource consumption recorded when the target standby database runs each target SQL statement can be determined. Resource consumption sub-cost for each target SQL statement.
  • Step 222 Determine the product of the resource consumption sub-cost of each target SQL statement and the number of SQL statements included in the group where each target SQL statement is located, to obtain the resource consumption cost of each group of SQL statements.
  • Step 223 Sort at least one group of SQL statements in a descending order of the resource consumption cost of each group of SQL statements, obtain a sorting result, and display the sorting result.
  • steps 216 to 223 may be performed before step 210 , wherein step 223 and step 210 may be performed simultaneously, and step 223 may also be performed after step 210 .
  • step 221 may be implemented by steps f11 to f13:
  • Step f11 If the target standby database runs each target SQL statement for a duration greater than or equal to a preset duration, determine the resource consumption sub-cost of each target SQL statement as a first value.
  • the preset duration is an empirical value, which is usually obtained based on a large number of experiments or actual experience. In some application scenarios, the user can also set it by himself.
  • the first value is an empirical value, usually used to represent the highest resource consumption cost.
  • Step f12 If the duration of running each target SQL statement on the target standby database is less than the preset duration, obtain a resource consumption table corresponding to when the target standby database runs each target SQL statement.
  • the corresponding resource consumption table acquisition method can be realized by the following statement:
  • the resource consumption table of each target SQL statement obtained by the above statement can be shown in Table 6 as an example.
  • status represents the state
  • Duration represents the state duration, usually in seconds
  • CPU_user represents the CPU resources consumed by the user state
  • CPU_system represents the CPU resources consumed by the core state
  • Block_ops_in represents the number of input block operations
  • Block_ops_out represents the number of input block operations. The number of output block operations.
  • Step f13 obtain the first consumption resource and the second consumption resource from the resource consumption table corresponding to each target SQL statement, determine the sum value of the first consumption resource and the second consumption resource, and obtain the resource consumption corresponding to each target SQL statement sub cost.
  • the first consumption resource is the CPU resource consumed by the user mode
  • the second consumption resource is the CPU resource consumed by the core mode
  • the first consumption resource is the number of input block operations
  • the second consumption resource is the output block operation. quantity.
  • each target SQL statement when there are 4 target SQL statements, 4 resource consumption tables as shown in Table 6 corresponding to each target SQL statement can be obtained.
  • the target monitoring indicator is CPU
  • for each target SQL statement compare the contents of the CPU_user column representing the CPU resources consumed by the user state and the CPU_system column representing the CPU resources consumed by the core state in the resource consumption table shown in Table 6. The contents are accumulated to obtain the sub-cost of CPU resource consumption corresponding to each target SQL statement.
  • the target monitoring indicator is IO
  • the content of the Block_ops_in column representing the number of input block operations and the content of the Block_ops_out column representing the number of output block operations in the resource consumption table shown in Table 6 are processed. Accumulate to obtain the sub-cost of IO resource consumption corresponding to each target SQL statement.
  • the CPU consumption cost and/or the IO cost is analyzed, and when the alarm is issued, the analysis result is output to implement the root cause recommendation, which effectively reduces the work difficulty of the analyst and improves the This improves the efficiency of the analyst in determining alarm faults and improves the utilization efficiency of the analyst.
  • the target monitoring indicator when the target monitoring indicator is CPU, in addition to calculating the sub-cost of CPU resource consumption corresponding to each target SQL statement, the sub-cost of IO resource consumption corresponding to each target SQL statement can also be calculated.
  • the target monitoring indicator when the target monitoring indicator is IO, in addition to calculating the IO resource consumption sub-cost corresponding to each target SQL statement, the CPU resource consumption sub-cost corresponding to each target SQL statement can also be calculated.
  • the information alarm device collects the target monitoring index of the target monitoring object according to the first sampling interval, and after obtaining the first monitoring value at the current moment, obtains the first historical monitoring value obtained by collecting the target monitoring index at the historical moment, and based on A target threshold is determined for the first monitoring value and the first historical monitoring value, then based on the first monitoring value and the target threshold, alarm prompt information is determined, and finally the alarm prompt information is displayed.
  • the information alarm device dynamically determines the target threshold value according to the first monitoring value at the current moment and the first historical monitoring value at the historical moment in real time, and generates alarm prompt information according to the relationship between the target threshold value and the first monitoring value, which solves the problem.
  • the current use of fixed thresholds to achieve database monitoring leads to the problem that the alarm requirements of the database in various time periods cannot be accurately realized.
  • the solution of adaptively adjusting the thresholds to realize database monitoring according to the actual situation is realized, which effectively improves the accuracy of database alarms.
  • different levels of alarm prompts are implemented, which can inform the user of the current specific risk level of the database, so that the user needs to immediately perform corresponding processing according to the risk level.
  • the possible causes of risks are displayed, which is convenient for users to analyze and process, which effectively improves the human-computer interaction process and improves the user experience.
  • the embodiments of the present application provide an information alarm device.
  • the information alarm device 5 may include: a processor 51, a memory 52, and a communication bus 53, wherein:
  • memory 52 for storing executable instructions
  • a communication bus 53 used to realize the communication connection between the processor 51 and the memory 52;
  • the processor 51 is configured to execute the information alarm program stored in the memory 52 to realize the following steps:
  • the target threshold based on the first monitoring value and the first historical monitoring value; wherein the target threshold includes at least one different threshold;
  • the alarm prompt information is used to provide an alarm prompt for the target monitoring index of the target monitoring object;
  • the target threshold is obtained based on the target weight coefficient, the second reference value, the first monitoring value, the at least one preset weight coefficient and the first reference value.
  • the processor executes the step to determine the target weight coefficient based on the first monitoring value, the first historical monitoring value, the second monitoring value and the second historical monitoring value, the following steps may be used to achieve:
  • the first monitoring value and the first historical monitoring value are processed to obtain the second difference value of the target monitoring index at different times;
  • the moment and the second moment are two adjacent moments, and the first moment is farther from the current moment than the second moment is from the current moment;
  • the second monitoring value and the second historical monitoring value are processed to obtain the third difference value of the reference monitoring index at different times;
  • the target weight coefficient is determined.
  • the average value of the second preset number of target ratios is determined to obtain the target weight coefficient.
  • the standard deviation of the first preset number of second historical sub-monitoring values is determined to obtain the first reference value.
  • the processor executes the steps to obtain the target threshold based on the target weight coefficient, the second reference value, the first monitoring value, at least one preset weight coefficient, and the first reference value, the following steps may be performed:
  • the accumulated value of each of the first product, each of the at least one second product, and the first monitoring value is determined to obtain the target threshold value.
  • the processor when the processor performs the step to determine the alarm prompt information based on the first monitoring value and the target threshold, the following steps may be used:
  • a third preset number of third monitoring values of the target monitoring index are continuously collected according to the second sampling interval; The interval is less than the first sampling interval, and the target threshold includes the first threshold;
  • first alarm information is generated; wherein, the target threshold value includes the second threshold value, the second threshold value When the value is greater than the first threshold, the first alarm information is used to implement a major alarm for the target monitoring object.
  • the processor is further configured to perform the following steps:
  • third alarm information is generated, wherein the third alarm information is used to implement a secondary alarm for the target monitoring object.
  • the processor performing step is further configured to perform the following steps:
  • alarm prompt information is the first alarm information or the second alarm information, obtain at least one structured query language SQL statement executed by the target monitoring object at the current moment;
  • the consumption cost of each SQL statement corresponding to the target monitoring indicator includes the input and output IO consumption cost
  • the processor executing step determines the consumption of each SQL statement corresponding to the target monitoring indicator based on the execution plan corresponding to at least one SQL statement
  • the following steps can be used to achieve:
  • the consumption cost of the target monitoring indicator corresponding to each SQL statement includes the CPU consumption cost of the central processing unit, and the processor executing step determines the target monitoring indicator corresponding to each SQL statement based on the execution plan corresponding to at least one SQL statement
  • the following steps can be used to achieve:
  • the CPU consumption cost corresponding to each row of the second ID is accumulated to obtain the CPU consumption cost corresponding to the second ID.
  • the processor when the processor is configured to execute the execution plan based on the first ID and determine the CPU consumption cost corresponding to the first ID, the processor may be configured to execute the following steps:
  • Extra and the first row number corresponding to the first ID obtain the first CPU consumption sub-cost corresponding to the first ID
  • the sum of the first CPU consumption sub-cost and the second CPU consumption sub-cost corresponding to the first ID is determined, and the CPU consumption cost corresponding to the first ID is obtained.
  • the processor performing step is further configured to perform the following steps:
  • alarm prompt information is the first alarm information or the second alarm information, obtain at least one SQL statement executed by the target monitoring object at the current moment;
  • Group at least one SQL statement in the same manner as the statement to obtain at least one set of SQL statements;
  • the processor executes the step to determine the resource consumption sub-cost consumed when the target standby database runs each target SQL statement, and obtains the resource consumption sub-cost of each target SQL statement, the following steps can be used to achieve:
  • the target standby database runs each target SQL statement for a duration greater than or equal to the preset duration, determine the resource consumption sub-cost of each target SQL statement as the first value;
  • the first consumption resource is the CPU resource consumed by the user mode
  • the second consumption resource is the CPU resource consumed by the core mode
  • the first consumption resource is the number of input block operations
  • the second consumption resource is the output block operation. quantity.
  • the information alarm device collects the target monitoring index of the target monitoring object according to the first sampling interval, and after obtaining the first monitoring value at the current moment, obtains the first historical monitoring value obtained by collecting the target monitoring index at the historical moment, and based on A target threshold is determined for the first monitoring value and the first historical monitoring value, then based on the first monitoring value and the target threshold, alarm prompt information is determined, and finally the alarm prompt information is displayed.
  • the information alarm device dynamically determines the target threshold value according to the first monitoring value at the current moment and the first historical monitoring value at the historical moment in real time, and generates alarm prompt information according to the relationship between the target threshold value and the first monitoring value, which solves the problem.
  • the current use of fixed thresholds to achieve database monitoring leads to the problem that the alarm requirements of the database in various time periods cannot be accurately realized.
  • the solution of adaptively adjusting the thresholds to realize database monitoring according to the actual situation is realized, which effectively improves the accuracy of database alarms.
  • different levels of alarm prompts are implemented, which can inform the user of the current specific risk level of the database, so that the user needs to immediately perform corresponding processing according to the risk level.
  • the possible causes of risks are displayed, which is convenient for users to analyze and process, which effectively improves the human-computer interaction process and improves the user experience.
  • the embodiments of the present application provide a computer-readable storage medium, referred to as a storage medium for short, and the computer-readable storage medium stores one or more programs, and the one or more programs can be stored by one or more programs.
  • the processor executes to implement the implementation process of the information alarm method provided by the embodiments corresponding to FIGS. 1 to 2 and FIGS. 5 to 6 , which will not be repeated here.
  • Embodiments of the present application provide an information alarm method, device, and storage medium.
  • the method includes: collecting a target monitoring index of a target monitoring object according to a first sampling interval to obtain a first monitoring value at the current moment; obtaining a historical moment to collect the target a first historical monitoring value obtained from a monitoring index; a target threshold is determined based on the first monitoring value and the first historical monitoring value; wherein the target threshold includes at least one different threshold; based on the first monitoring value and the target threshold value to determine alarm prompt information; wherein, the alarm prompt information is used to provide an alarm prompt for the target monitoring index of the target monitoring object; display the alarm prompt information, so that the information alarm device can
  • the first monitoring value at the time and the first historical monitoring value at the historical moment dynamically determine the target threshold to generate alarm prompt information according to the relationship between the target threshold and the first monitoring value, which solves the problem of using a fixed threshold to realize database monitoring. This leads to the problem that the alarm requirements of the database in each time period cannot be accurately realized, and the solution of adaptive

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Alarm Systems (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

Disclosed in embodiments of the present application is an information alerting method. The method comprises: acquiring a target monitoring index of a target monitoring object according to a first sampling interval, so as to obtain a first monitoring value of a current moment; obtaining a first historical monitoring value which is obtained by acquiring the target monitoring index at a historical moment; determining a target threshold on the basis of the first monitoring value and the first historical monitoring value, wherein the target threshold comprises at least one different threshold; determining alert information on the basis of the first monitoring value and the target threshold, wherein the alert information is used for sending an alert for the target monitoring index of the target monitoring object; and display the alert information. Also disclosed in the embodiments of the present application are an information alerting device and a storage medium.

Description

一种信息告警方法、设备及存储介质An information alarm method, device and storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请基于申请号为202011340177.8、申请日为2020年11月25日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on the Chinese patent application with the application number of 202011340177.8 and the filing date of November 25, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.
技术领域technical field
本申请涉及计算机应用技术领域,尤其涉及一种信息告警方法、设备及存储介质。The present application relates to the technical field of computer applications, and in particular, to an information alarm method, device and storage medium.
背景技术Background technique
随着计算机技术的飞速发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变,但由于金融行业的安全性和实时性要求,也对技术提出了更高的要求。随着互联网技术的飞速发展,数据库的应用越来越不可或缺,这样,对数据库的监控变得越来越重要。目前,监控数据库是否出现异常的过程主要是通过判断数据库指标值是否超过对应的固定阈值,或者同比环比增长比例是否超过对应的固定阈值来实现。With the rapid development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming into financial technology (Fintech). high demands. With the rapid development of Internet technology, the application of database is more and more indispensable, so the monitoring of database becomes more and more important. At present, the process of monitoring whether the database is abnormal is mainly realized by judging whether the database index value exceeds the corresponding fixed threshold, or whether the year-on-year growth ratio exceeds the corresponding fixed threshold.
但是目前数据库使用过程中,由于不同时间段,数据库的请求量变化较大,不同的时间段的请求过程可以呈现出一定的周期性。比如凌晨属于业务低峰期,白天属于业务高峰期,再比如银行夜间的批量任务又会导致请求量很大,这样,导致对数据库进行异常监控时,采用数据库指标值与对应的固定阈值进行比较,导致不能准确实现数据库在各个时间段的告警需求,造成告警失误率较高。However, during the current database use process, due to the large changes in the request volume of the database in different time periods, the request process in different time periods may exhibit a certain periodicity. For example, the early morning is a low business peak period, and the daytime is a business peak period. For example, the batch tasks of the bank at night will lead to a large amount of requests. In this way, when abnormal monitoring of the database is performed, the database index value is compared with the corresponding fixed threshold value. , resulting in the inability to accurately realize the alarm requirements of the database in each time period, resulting in a high alarm error rate.
发明内容SUMMARY OF THE INVENTION
为解决上述技术问题,本申请实施例期望提供一种信息告警方法、设备及存储介质,解决了目前的采用固定阈值来实现数据库监控导致不能准确实现数据库在各个时间段的告警需求的问题,实现了根据实际情况自适应调整阈值来实现数据库监控的方案,有效提高了针对数据库告警的准确率。In order to solve the above technical problems, the embodiments of the present application expect to provide an information alarm method, device, and storage medium, which solve the problem that the current use of fixed thresholds to implement database monitoring results in the inability to accurately realize the alarm requirements of the database in various time periods, and achieve The scheme of adaptively adjusting the threshold according to the actual situation to realize the database monitoring scheme effectively improves the accuracy of the database alarm.
本申请的技术方案是这样实现的:The technical solution of the present application is realized as follows:
第一方面,一种信息告警方法,所述方法包括:In a first aspect, an information alarm method, the method includes:
按照第一采样间隔采集目标监控对象的目标监控指标,得到当前时刻的第一监控值;Collect the target monitoring index of the target monitoring object according to the first sampling interval, and obtain the first monitoring value at the current moment;
获取历史时刻采集所述目标监控指标得到的第一历史监控值;Obtain the first historical monitoring value obtained by collecting the target monitoring index at historical time;
基于所述第一监控值和所述第一历史监控值,确定目标阈值;其中,所述目标阈值包括至少一个不同的阈值;A target threshold is determined based on the first monitoring value and the first historical monitoring value; wherein the target threshold includes at least one different threshold;
基于所述第一监控值和所述目标阈值,确定告警提示信息;其中,所述告警提示信息用于针对所述目标监控对象的目标监控指标进行告警提示;Determine alarm prompt information based on the first monitoring value and the target threshold; wherein, the alarm prompt information is used to provide an alarm prompt for the target monitoring indicator of the target monitoring object;
显示所述告警提示信息。The alarm prompt information is displayed.
第二方面,一种信息告警设备,所述设备包括存储器、处理器和通信总线;其中:In a second aspect, an information alarm device, the device includes a memory, a processor and a communication bus; wherein:
所述存储器,用于存储可执行指令;the memory for storing executable instructions;
所述通信总线,用于实现所述处理器和所述存储器之间的通信连接;the communication bus for realizing the communication connection between the processor and the memory;
所述处理器,用于执行所述存储器中存储的信息告警程序,实现如上述任一项所述的信息告警方法的步骤。The processor is configured to execute the information alarm program stored in the memory to implement the steps of the information alarm method according to any one of the above.
第三方面,一种存储介质,所述存储介质上存储有信息告警程序,所述信息告警程序被处理器执行时实现如上述任一项所述的信息告警方法的步骤。In a third aspect, a storage medium stores an information alarm program, and when the information alarm program is executed by a processor, implements the steps of the information alarm method described in any one of the above.
本申请实施例中,信息告警设备按照第一采样间隔采集目标监控对象的目标监控指标,得到当前时刻的第一监控值后,获取历史时刻采集所述目标监控指标得到的第一历史监控值,并基于所述第一监控值和所述第一历史监控值,确定目标阈值,然后基于所述第一监控值和所述目标阈值,确定告警提示信息,最后显示所述告警提示信息。这样,信息告警设备根据实时根据当前时刻的第一监控值和历史时刻的第一历史监控值动态确定目标阈值,来根据目标阈值和第一监控值之间的关系,生成告警提示信息,解决了目前的采用固定阈值来实现数据库监控导致不能准确实现数据库在各个时间段的告警需求的问题,实现了根据实际情况自适应调整阈值来实现数据库监控的方案,有效提高了针对数据库告警的准确率。In the embodiment of the present application, the information alarm device collects the target monitoring index of the target monitoring object according to the first sampling interval, and after obtaining the first monitoring value at the current moment, obtains the first historical monitoring value obtained by collecting the target monitoring index at the historical moment, And based on the first monitoring value and the first historical monitoring value, a target threshold is determined, then based on the first monitoring value and the target threshold, alarm prompt information is determined, and finally the alarm prompt information is displayed. In this way, the information alarm device dynamically determines the target threshold value according to the first monitoring value at the current moment and the first historical monitoring value at the historical moment in real time, and generates alarm prompt information according to the relationship between the target threshold value and the first monitoring value, which solves the problem. The current use of fixed thresholds to achieve database monitoring leads to the problem that the alarm requirements of the database in various time periods cannot be accurately realized. The solution of adaptively adjusting the thresholds to realize database monitoring according to the actual situation is realized, which effectively improves the accuracy of database alarms.
附图说明Description of drawings
图1为本申请实施例提供的一种信息告警方法的流程示意图;FIG. 1 is a schematic flowchart of an information alarm method provided by an embodiment of the present application;
图2为本申请实施例提供的另一种信息告警方法的流程示意图;FIG. 2 is a schematic flowchart of another information alarm method provided by an embodiment of the present application;
图3为本申请实施例提供的又一种信息告警方法的流程示意图;FIG. 3 is a schematic flowchart of another information alarm method provided by an embodiment of the present application;
图4为本申请另一实施例提供的一种信息告警方法的流程示意图;FIG. 4 is a schematic flowchart of an information alarm method provided by another embodiment of the present application;
图5为本申请另一实施例提供的另一种信息告警方法的流程示意图;FIG. 5 is a schematic flowchart of another information alarm method provided by another embodiment of the present application;
图6为本申请另一实施例提供的又一种信息告警方法的流程示意图;FIG. 6 is a schematic flowchart of still another information alarm method provided by another embodiment of the present application;
图7为本申请实施例提供的一种信息告警设备的结构示意图。FIG. 7 is a schematic structural diagram of an information alarm device according to an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail below with reference to the accompanying drawings. All other embodiments obtained under the premise of creative work fall within the scope of protection of the present application.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.
本申请的实施例提供一种信息告警方法,参照图1所示,方法应用于信息告警设备,该方法包括以下步骤:An embodiment of the present application provides an information alarm method. Referring to FIG. 1 , the method is applied to an information alarm device, and the method includes the following steps:
步骤101、按照第一采样间隔采集目标监控对象的目标监控指标,得到当前时刻的第一监控值。Step 101: Collect the target monitoring index of the target monitoring object according to the first sampling interval, and obtain the first monitoring value at the current moment.
在本申请实施例中,信息告警设备可以是运行目标监控对象的设备,例如可以是服务器,也可是计算机设备等具备运行能力的设备。目标监控对象可以是应用程序,可运行于信息告警设备上。目标监控指标可以是信息告警设备运行目标监控对象时所消耗的资源,例如需占用的中央处理器(central processing unit,CPU)资源,包括CPU数量即内核数量和/或CPU运行时长等,还可以是输入输出(Input Output)资源等,目标监控指标具体可以是预先默认设置的,也可以是用户根据实际监控需求进行设置的。第一采样间隔即用于采集目标监控对象的目标监控指标的采样频率,可以是用户进行设置的,也可以是根据大量实验或者实际经验得到的经验值,通常第一采样间隔不能太小即采样频率不能太高,以防增加信息告警设备对目标监控对象进行监控时造成的监控压力。在实际应用过程中,第一采样间隔可以进行不断的校正或者用户根据实际监控需求进行不断的修改。In this embodiment of the present application, the information alarm device may be a device that runs the target monitoring object, for example, a server, or a device with running capabilities such as a computer device. The target monitoring object can be an application program, which can run on the information alarm device. The target monitoring indicator can be the resources consumed by the information alarm device when running the target monitoring object, such as central processing unit (CPU) resources to be occupied, including the number of CPUs, that is, the number of cores and/or the running time of the CPU, etc. It is input and output (Input Output) resources, etc. The target monitoring indicators can be set by default in advance, or can be set by the user according to the actual monitoring needs. The first sampling interval is the sampling frequency used to collect the target monitoring indicators of the target monitoring object. It can be set by the user, or it can be an empirical value obtained from a large number of experiments or actual experience. Usually, the first sampling interval cannot be too small. The frequency should not be too high, in order to avoid increasing the monitoring pressure caused by the information alarm equipment monitoring the target monitoring object. In the actual application process, the first sampling interval can be continuously corrected or the user can continuously modify according to actual monitoring requirements.
步骤102、获取历史时刻采集目标监控指标得到的第一历史监控值。Step 102: Obtain the first historical monitoring value obtained by collecting the target monitoring index at historical time.
在本申请实施例中,信息告警设备在对目标监控指标进行采集时,会将采集到的监控值进行存储,这样,可以得到历史时刻采集的第一历史监控值。第一历史监控值可以是当前时刻之前一段时间内的监控值。In the embodiment of the present application, when collecting the target monitoring index, the information alarm device will store the collected monitoring value, so that the first historical monitoring value collected at historical time can be obtained. The first historical monitoring value may be a monitoring value within a period of time before the current moment.
步骤103、基于第一监控值和第一历史监控值,确定目标阈值。Step 103: Determine a target threshold based on the first monitoring value and the first historical monitoring value.
其中,目标阈值包括至少一个不同的阈值。Wherein, the target threshold includes at least one different threshold.
在本申请实施例中,信息告警设备对当前时刻的第一监控值和历史时刻的第一历史监控值进行分析处理,得到目标阈值。这样,根据当前时刻采集到的监控值和历史时刻的历史监控值,实现动态确定目标阈值。In the embodiment of the present application, the information alarm device analyzes and processes the first monitoring value at the current moment and the first historical monitoring value at the historical moment to obtain the target threshold. In this way, the target threshold is dynamically determined according to the monitoring value collected at the current moment and the historical monitoring value at the historical moment.
步骤104、基于第一监控值和目标阈值,确定告警提示信息。Step 104: Determine alarm prompt information based on the first monitoring value and the target threshold.
其中,告警提示信息用于针对目标监控对象的目标监控指标进行告警提示。The alarm prompt information is used to provide an alarm prompt for the target monitoring indicator of the target monitoring object.
在本申请实施例中,确定目标阈值后,确定第一监控值所在目标阈值的范围,从而确定对应的告警等级,生成对应的告警提示信息,以通过告警提示信息对用户进行提示目标监控对象是否存在风险,是否需要及时对目标监控对象进行风险排除处理。In the embodiment of the present application, after the target threshold is determined, the range of the target threshold where the first monitoring value is located is determined, so as to determine the corresponding alarm level, and generate corresponding alarm prompt information, so as to prompt the user through the alarm prompt information whether the target monitoring object is If there is a risk, whether it is necessary to eliminate the risk of the target monitoring object in time.
步骤105、显示告警提示信息。 Step 105 , displaying alarm prompt information.
在本申请实施例中,信息告警设备生成告警提示信息后,可以将告警提示信息显示在信息告警设备对应的显示区域。信息告警设备对应的显示区域可以是信息告警设备的显示屏幕,也可以是与信息告警设备具有通信连接的其他具有显示功能的设备,例如可以是与信息告警设备具有通信连接的移动通信设备,即信息告警设备生成告警提示信息后,将告警提示信息发送至与信息告警设备具有通信连接的移动通信设备,以通过移动通信设备显示告警提示信息。In this embodiment of the present application, after the information alarm device generates the alarm prompt information, the alarm prompt information may be displayed in a display area corresponding to the information alarm device. The display area corresponding to the information alarm device may be the display screen of the information alarm device, or may be other devices with display functions that have a communication connection with the information alarm device, such as a mobile communication device that has a communication connection with the information alarm device, that is, After the information alarm device generates the alarm prompt information, it sends the alarm prompt information to the mobile communication device having a communication connection with the information alarm device, so as to display the alarm prompt information through the mobile communication device.
本申请实施例中,信息告警设备按照第一采样间隔采集目标监控对象,得到目标监控指标的当前时刻的第一监控值后,获取历史时刻采集目标监控指标得到的第一历史监控值,并基于第一监控值和第一历史监控值,确定目标阈值,然后基于第一监控值和目标阈值,确定告警提示信息,最后显示告警提示信息。这样,信息告警设备根据实时根据当前时刻的第一监控值和历史时刻的第一历史监控值动态确定目标阈值,来根据目标阈值和第一监控值之间的关系,生成告警提示信息,解决了目前的采用固定阈值来实现数据库监控导致不能准确实现数据库在各个时间段的告警需求的问题,实现了根据实际情况自适应调整阈值来实现数据库监控的方案,有效提高了针对数据库告警的准确率。In the embodiment of the present application, the information alarm device collects the target monitoring object according to the first sampling interval, and after obtaining the first monitoring value of the target monitoring indicator at the current moment, obtains the first historical monitoring value obtained by collecting the target monitoring indicator at the historical moment, and based on A target threshold is determined for the first monitoring value and the first historical monitoring value, then based on the first monitoring value and the target threshold, alarm prompt information is determined, and finally the alarm prompt information is displayed. In this way, the information alarm device dynamically determines the target threshold value according to the first monitoring value at the current moment and the first historical monitoring value at the historical moment in real time, and generates alarm prompt information according to the relationship between the target threshold value and the first monitoring value, which solves the problem. The current use of fixed thresholds to achieve database monitoring leads to the problem that the alarm requirements of the database in various time periods cannot be accurately realized. The solution of adaptively adjusting the thresholds to realize database monitoring according to the actual situation is realized, which effectively improves the accuracy of database alarms.
基于前述实施例,本申请的实施例提供一种信息告警方法,参照图2所示,方法应用于信息告警设备,该方法包括以下步骤:Based on the foregoing embodiments, the embodiments of the present application provide an information alarm method. Referring to FIG. 2 , the method is applied to an information alarm device, and the method includes the following steps:
步骤201、按照第一采样间隔采集目标监控对象的目标监控指标,得到当前时刻的第一监控值。Step 201: Collect the target monitoring index of the target monitoring object according to the first sampling interval, and obtain the first monitoring value at the current moment.
在本申请实施例中,以信息告警设备是服务器,目标监控对象是数据库为例进行说明,服务器在运行数据库时,设置服务器以第一采样间隔采集数据库的目标监控指标,这样,即可得到当前时刻的第一监控值。目标监控指标至少可以包括以下之一:服务器的CPU使用率和/或IO使用率。需说明的是,信息告警设备采集得到当前时刻的第一监控值后,将第一监控值进行存储,以便后续进行分析使用。In the embodiment of the present application, the information alarm device is a server and the target monitoring object is a database as an example for illustration. When the server runs the database, the server is set to collect the target monitoring indicators of the database at the first sampling interval. In this way, the current data can be obtained. The first monitored value at the moment. Target monitoring metrics can include at least one of the following: server CPU usage and/or IO usage. It should be noted that, after the information alarm device collects the first monitoring value at the current moment, the first monitoring value is stored for subsequent analysis and use.
步骤202、获取历史时刻采集目标监控指标得到的第一历史监控值。Step 202: Obtain the first historical monitoring value obtained by collecting the target monitoring index at historical time.
在本申请实施例中,服务器从存储区域中获取在历史时刻采集存储的目标监控的监控值,得到第一历史监控值。存储区域可以是服务器本地存储区域,也可以是服务器可以访问的云端存储区域。In the embodiment of the present application, the server obtains the monitoring value of the target monitoring collected and stored at the historical moment from the storage area, and obtains the first historical monitoring value. The storage area can be a local storage area of the server or a cloud storage area that the server can access.
步骤203、获取目标监控对象的参考监控指标的当前时刻的第二监控值和参考监控指标的历史时刻采集到的第二历史监控值。Step 203: Acquire a second monitoring value at the current moment of the reference monitoring index of the target monitoring object and a second historical monitoring value collected by referring to the historical moment of the monitoring index.
其中,参考监控指标与目标监控指标具有关联关系。The reference monitoring index and the target monitoring index have an associated relationship.
在本申请实施例中,目标监控对象的参考监控指标可以是目标监控对象的除目标监控指标外的其他指标,参考监控指标对目标监控指标具有一定的影响。信息告警设备也以第一采样间隔采集目标监控对象的参考监控指标,从而可以得到参考监控指标当前时刻的第二监控值和历史时刻的第二历史监控值。需说明的是,获取参考监控指标历史时刻的第二历史监控值时,第二历史监控值的历史时刻与目标监控时刻的第一历史监控值的历史时刻一一对应。参考监控指标可以是数据库中的总请求量指标。示例性的,获取当前时刻之前一段时间内例如一个月内针对目标监控指标采集得到的第一历史监控值和针对参考监控指标采集得到的第二历史监控值。参考监控指标与目标监控指标具有关联关系指的是参考监控指标的改变会影响目标监 控指标的改变。例如,在数据库运行正常的情况下,服务器的CPU使用率和IO使用率,会与数据库的总请求量保持一定的正相关性,在数据库运行异常的情况下,出现数据库的总请求量下降,CPU使用率和IO使用率反而升高的情况。In the embodiment of the present application, the reference monitoring indicator of the target monitoring object may be other indicators of the target monitoring object except the target monitoring indicator, and the reference monitoring indicator has a certain influence on the target monitoring indicator. The information alarm device also collects the reference monitoring index of the target monitoring object at the first sampling interval, so that the second monitoring value of the reference monitoring index at the current moment and the second historical monitoring value at the historical moment can be obtained. It should be noted that when acquiring the second historical monitoring value of the reference monitoring index historical time, the historical time of the second historical monitoring value and the historical time of the first historical monitoring value of the target monitoring time correspond one-to-one. The reference monitoring indicator can be the total request volume indicator in the database. Exemplarily, the first historical monitoring value collected for the target monitoring indicator and the second historical monitoring value collected for the reference monitoring indicator within a period of time before the current moment, such as within one month, are acquired. The relationship between the reference monitoring indicator and the target monitoring indicator means that the change of the reference monitoring indicator will affect the change of the target monitoring indicator. For example, when the database is running normally, the CPU usage and IO usage of the server will maintain a certain positive correlation with the total request volume of the database. When the database is running abnormally, the total request volume of the database will decrease. CPU usage and IO usage increase instead.
步骤204、基于第一监控值、第一历史监控值、第二监控值和第二历史监控值,确定目标权重系数。Step 204: Determine the target weight coefficient based on the first monitoring value, the first historical monitoring value, the second monitoring value and the second historical monitoring value.
在本申请实施例中,服务器对第一监控值、第一历史监控值、第二监控值和第二历史监控值进行分析,得到目标权重系数。In the embodiment of the present application, the server analyzes the first monitoring value, the first historical monitoring value, the second monitoring value and the second historical monitoring value to obtain the target weight coefficient.
步骤205、基于第一历史监控值,确定第一参考值。Step 205: Determine a first reference value based on the first historical monitoring value.
在本申请实施例中,信息告警设备对第一历史监控值进行分析,得到第一参考值。In the embodiment of the present application, the information alarm device analyzes the first historical monitoring value to obtain the first reference value.
步骤206、从第一历史监控值中,获取与当前时刻相邻的前一时刻的第一历史子监控值。Step 206: From the first historical monitoring value, obtain the first historical sub-monitoring value of the previous moment adjacent to the current moment.
在本申请实施例中,以第一采样间隔为1分钟为例进行说明,从第一历史监控值中,获取当前时刻11月11日14:00相邻的前一时刻11月11日13:59的第一历史子监控值。In the embodiment of the present application, the first sampling interval is 1 minute as an example for illustration, and from the first historical monitoring value, obtain the current time at 14:00 on November 11 at 14:00 at the previous time at 13:00 on November 11: The first history sub-monitor value of 59.
步骤207、确定第一监控值与第一历史子监控值的差值,得到第二参考值。Step 207: Determine the difference between the first monitoring value and the first historical sub-monitoring value to obtain a second reference value.
在本申请实施例中,通过公式“第二参考值=第一监控值-第一历史子监控值”计算得到第二参考值。In the embodiment of the present application, the second reference value is obtained by calculating the formula "second reference value=first monitoring value-first historical sub-monitoring value".
步骤208、基于目标权重系数、第二参考值、第一监控值、至少一个预设权重系数和第一参考值,得到目标阈值。Step 208: Obtain a target threshold based on the target weight coefficient, the second reference value, the first monitoring value, at least one preset weight coefficient and the first reference value.
其中,目标阈值包括至少一个不同的阈值。Wherein, the target threshold includes at least one different threshold.
在本申请实施例中,对目标权重系数、第二参考值、第一监控值、至少一个预设权重系数和第一参考值进行分析计算,得到包括至少一个不同阈值的目标阈值,目标阈值包括的阈值的数量与至少一个预设权重系数的数量相同。假设至少一个预设权重系数包括两个不同的预设权重系数,为预设权重系数1和预设权重系数2时,这样,根据预设权重系数1、目标权重系数、第二参考值、第一监控值和第一参考值,可以确定一个目标阈值中的一个阈值,根据预设权重系数2、目标权重系数、第二参考值、第一监控值和第一参考值,可以确定一个目标阈值中的另一个阈值。In the embodiment of the present application, the target weight coefficient, the second reference value, the first monitoring value, the at least one preset weight coefficient, and the first reference value are analyzed and calculated to obtain a target threshold value including at least one different threshold value, and the target threshold value includes The number of thresholds is the same as the number of at least one preset weight coefficient. Assuming that at least one preset weight coefficient includes two different preset weight coefficients, which are preset weight coefficient 1 and preset weight coefficient 2, in this way, according to preset weight coefficient 1, target weight coefficient, second reference value, first A monitoring value and a first reference value, one of a target threshold value can be determined, and a target threshold value can be determined according to the preset weight coefficient 2, the target weight coefficient, the second reference value, the first monitoring value and the first reference value another threshold in .
步骤209、基于第一监控值和目标阈值,确定告警提示信息。Step 209: Determine alarm prompt information based on the first monitoring value and the target threshold.
其中,告警提示信息用于针对目标监控对象的目标监控指标进行告警提示。The alarm prompt information is used to provide an alarm prompt for the target monitoring indicator of the target monitoring object.
步骤210、显示告警提示信息。 Step 210 , displaying alarm prompt information.
在本申请实施例中,在确定目标监控指标对应的目标阈值时,除了分析目标监控指标对应当前时刻的第一监控值和对应的第一历史监控值外,还考虑了与目标监控指标具有关联关系的参考监控指标的当前时刻的第二监控值和对应的第二历史监控值,这样能够得到更加符合实际应用情况的目标阈值,有效提高了目标阈值的可 靠性,并提高了针对数据库进行自动告警的准确性。In the embodiment of the present application, when determining the target threshold value corresponding to the target monitoring indicator, in addition to analyzing the first monitoring value and the corresponding first historical monitoring value corresponding to the target monitoring indicator at the current moment, the relationship with the target monitoring indicator is also considered. The second monitoring value at the current moment and the corresponding second historical monitoring value of the reference monitoring index of the relationship, so that the target threshold value that is more in line with the actual application situation can be obtained, the reliability of the target threshold value is effectively improved, and the automatic Accuracy of alerts.
基于前述实施例,在本申请其他实施例中,步骤204可以由步骤a11~a14来实现:Based on the foregoing embodiments, in other embodiments of the present application, step 204 may be implemented by steps a11 to a14:
步骤a11、通过确定第二时刻的监控值与第一时刻的监控值的差值的方式对第一监控值和第一历史监控值进行处理,得到目标监控指标不同时刻的第二差值。Step a11: Process the first monitoring value and the first historical monitoring value by determining the difference between the monitoring value at the second moment and the monitoring value at the first moment to obtain the second difference value of the target monitoring index at different times.
其中,第一时刻与第二时刻是两个相邻时刻,第一时刻距离当前时刻比第二时刻距离当前时刻远。The first moment and the second moment are two adjacent moments, and the first moment is farther from the current moment than the second moment is from the current moment.
在本申请实施例中,针对第一监控值和第一历史监控值中任意两个相邻时刻的监控值,采用后一时刻的监控值即第二时刻的监控值减去前一时刻的监控值即后一时刻的监控值的计算方式,确定得到不同时刻的第二差值。In the embodiment of the present application, for the monitoring values at any two adjacent moments between the first monitoring value and the first historical monitoring value, the monitoring value at the next moment, that is, the monitoring value at the second moment minus the monitoring value at the previous moment is adopted. The value is the calculation method of the monitoring value at the next moment, and it is determined to obtain the second difference value at different times.
步骤a12、通过确定第二时刻的监控值与第一时刻的监控值的差值的方式对第二监控值和第二历史监控值进行处理,得到参考监控指标不同时刻的第三差值。Step a12: Process the second monitoring value and the second historical monitoring value by determining the difference between the monitoring value at the second moment and the monitoring value at the first moment to obtain the third difference value of the reference monitoring index at different times.
在本申请实施例中,针对第二监控值和第二历史监控值中任意两个相邻时刻的监控值,采用后一时刻的监控值即第二时刻的监控值减去前一时刻的监控值即第一时刻的监控值的计算方式,确定得到不同时刻的第三差值。In the embodiment of the present application, for the monitoring values at any two adjacent moments in the second monitoring value and the second historical monitoring value, the monitoring value at the next moment is used, that is, the monitoring value at the second moment minus the monitoring value at the previous moment. The value is the calculation method of the monitoring value at the first moment, and it is determined to obtain the third difference value at different moments.
步骤a13、确定同一时刻的第三差值与第二差值比值,得到参考比值。Step a13: Determine the ratio of the third difference and the second difference at the same moment to obtain a reference ratio.
在本申请实施例中,由于同一时刻的参考监控指标与目标监控指标之间才有相互制约、相互影响的作用,因此,对得到的同一时刻的第三差值和第二差值进行比值计算,得到参考比值。In the embodiment of the present application, since the reference monitoring index and the target monitoring index at the same time have the effect of mutual restriction and mutual influence, the ratio calculation is performed on the obtained third difference value and the second difference value at the same time. , to get the reference ratio.
步骤a14、基于参考比值,确定目标权重系数。Step a14: Determine the target weight coefficient based on the reference ratio.
基于前述实施例,在本申请其他实施例中,步骤a14可以由步骤a141~a142来实现:Based on the foregoing embodiments, in other embodiments of the present application, step a14 may be implemented by steps a141 to a142:
步骤a141、从参考比值中获取比值大于零,且距离当前时刻最近的第二预设数量个目标比值。Step a141: Obtain from the reference ratio a second preset number of target ratios whose ratios are greater than zero and are closest to the current moment.
在本申请实施例中,第二预设数量可以是大量实验或实际应用场景得到的一个经验值,也可以是用户根据实际需求进行设定的。参考比值大于零,能够表明目标监控对象运行于正常工作状态。在相邻时刻,目标监控对象的运行状态一般不会突然出现较大的问题,因此,可以获取正常工作状态下,与当前时刻距离更近的第二预设数量个比值,得到第二预设数量个目标比值。In this embodiment of the present application, the second preset number may be an empirical value obtained from a large number of experiments or actual application scenarios, or may be set by the user according to actual needs. If the reference ratio is greater than zero, it can indicate that the target monitoring object is running in a normal working state. At adjacent moments, the running state of the target monitoring object generally does not have major problems suddenly. Therefore, the ratio of the second preset number that is closer to the current moment in the normal working state can be obtained, and the second preset number can be obtained. number of target ratios.
步骤a142、基于第二预设数量个目标比值,得到目标权重系数。Step a142: Obtain a target weight coefficient based on the second preset number of target ratios.
在本申请实施例中,采用预设的处理方法对第二预设数量个目标比值进行处理,得到一个值,并将该值作为目标权重系数。预设的处理方法例如可以是求和后计算平均值,或者加权求和后计算平均值,或者求标准差等数学统计归纳方法。In the embodiment of the present application, a preset processing method is used to process the second preset number of target ratios to obtain a value, and the value is used as the target weight coefficient. The preset processing method may be, for example, calculating an average value after summation, or calculating an average value after weighted summation, or a mathematical statistical induction method such as standard deviation.
在本申请其他实施例中,步骤a142可以由以下步骤来实现:确定第二预设数量个目标比值的平均值,得到目标权重系数。In other embodiments of the present application, step a142 may be implemented by the following steps: determining an average value of a second preset number of target ratios to obtain a target weight coefficient.
在本申请实施例中,计算第二预设数量个目标比值的累加值,然后计算第二预设数量个目标比值的累加值与第二预设数量的比值,得到对应的平均值,并将该平均值作为目标权重系数。In the embodiment of the present application, the accumulated value of the second preset number of target ratios is calculated, and then the ratio of the accumulated value of the second preset number of target ratios to the second preset number is calculated to obtain the corresponding average value, and the This average is used as the target weight coefficient.
在本申请实施例中,对同一时刻的目标监控指标值对应的第三差值和参考监控指标值对应的第二差值进行分析,来确定得到目标权重系数,由于对数据库的操作具有一定的规律性,保证了目标权重系数的有效性,更加符合实际使用需求。In the embodiment of the present application, the third difference value corresponding to the target monitoring index value at the same time and the second difference value corresponding to the reference monitoring index value are analyzed to determine the target weight coefficient. Because the operation of the database has a certain The regularity ensures the validity of the target weight coefficient and is more in line with the actual use requirements.
基于前述实施例,在本申请其他实施例中,步骤205可以由步骤b11~b12来实现:Based on the foregoing embodiments, in other embodiments of the present application, step 205 may be implemented by steps b11 to b12:
步骤b11、从第一历史监控值中,获取当前时刻之前的第一预设数量个时间周期内,与当前时刻相同时刻的第一预设数量个第二历史子监控值。Step b11: From the first historical monitoring value, obtain a first predetermined number of second historical sub-monitoring values at the same moment as the current moment within a first predetermined number of time periods before the current moment.
在本申请实施例中,第一预设数量为根据大量实验得到的一个经验值,或者是用户根据实际需求进行设定的一个数值。以时间周期为天,第一预设数量为7为例进行说明,从第一历史监控值中,获取当前时刻之前的7天内,与当前时刻相同时刻的7个监控值,作为7个第二历史子监控值。示例性的,当前时刻为11月11日14:00,获取11月11日14:00前7天14:00时刻的监控值,得到11月4日14:00、11月5日14:00、11月6日14:00、11月7日14:00、11月8日14:00、11月9日14:00、11月10日14:00这些时刻的监控值,作为7个第二历史子监控值。In the embodiment of the present application, the first preset quantity is an empirical value obtained according to a large number of experiments, or a value set by the user according to actual needs. Taking the time period as days and the first preset number as 7 as an example, from the first historical monitoring value, obtain the 7 monitoring values at the same moment as the current moment within 7 days before the current moment, as the 7 second monitoring values. Historical sub-monitoring values. Exemplarily, the current time is 14:00 on November 11, and the monitoring value at 14:00 seven days before 14:00 on November 11 is obtained to obtain 14:00 on November 4 and 14:00 on November 5 , 14:00 on November 6, 14:00 on November 7, 14:00 on November 8, 14:00 on November 9, and 14:00 on November 10, as the 7th 2. History sub-monitoring value.
步骤b12、确定第一预设数量个第二历史子监控值的标准差,得到第一参考值。Step b12: Determine the standard deviation of the first preset number of second historical sub-monitoring values to obtain a first reference value.
在本申请实施例中,采用标准差计算公式对第一预设数量个第二历史子监控值进行计算,得到第一预设数量个第二历史子监控值的标准差,并将第一预设数量个第二历史子监控值的标准差作为第一参考值。In the embodiment of the present application, the standard deviation calculation formula is used to calculate the first preset number of second historical sub-monitoring values to obtain the standard deviation of the first preset number of second historical sub-monitoring values, and the first preset number of second historical sub-monitoring values is calculated. The standard deviation of the number of second historical sub-monitoring values is set as the first reference value.
在本申请实施例中,通过距离当前时刻最近的一段时间内的历史监控值进行分析,由于数据库在一段时间内的性能基本保持一定,有效降低了计算分析过程中的计算资源消耗,同时也能保证监控效果。In the embodiment of the present application, the analysis is performed through the historical monitoring values within a period of time closest to the current moment. Since the performance of the database within a period of time is basically kept constant, the consumption of computing resources in the calculation and analysis process is effectively reduced, and at the same time, the Guarantee the monitoring effect.
基于前述实施例,在本申请其他实施例中,步骤208可以由步骤c11~c13来实现:Based on the foregoing embodiments, in other embodiments of the present application, step 208 may be implemented by steps c11 to c13:
步骤c11、确定目标权重系数与第二参考值的第一乘积。Step c11: Determine the first product of the target weight coefficient and the second reference value.
步骤c12、确定至少一个预设权重系数与第一参考值的乘积,得到至少一个第二乘积。Step c12: Determine the product of at least one preset weight coefficient and the first reference value to obtain at least one second product.
步骤c13、确定第一乘积、至少一个第二乘积中每一第二乘积和第一监控值的累加值,得到目标阈值。Step c13: Determine the accumulated value of the first product, each of the at least one second product, and the first monitoring value to obtain the target threshold.
在本申请实施例中,计算最近一段时间周同一个时刻的第一预设数量个值的标准差,用标准差的值来调整阈值,因为观测到的指标不可能完全符合模型预期,通 过添加标准差来允许一定的容差,这样可以更符合检测的实际需要。In the embodiment of the present application, the standard deviation of the first preset number of values at the same time in the recent period of time is calculated, and the value of the standard deviation is used to adjust the threshold value, because the observed indicators cannot fully meet the model expectations, by adding Standard deviation to allow a certain tolerance, which can be more in line with the actual needs of detection.
在本申请实施例中,通过设置得到多个不同的阈值,提高了针对各种应用情况的普适性,进而能够实现不同的告警,有效提高了用户的使用体验效果。In the embodiment of the present application, multiple different thresholds are obtained by setting, which improves the universality for various application situations, thereby enabling different alarms, and effectively improving the user experience.
基于前述实施例,在本申请其他实施例中,步骤209可以由步骤209a~209f来实现:Based on the foregoing embodiments, in other embodiments of the present application, step 209 may be implemented by steps 209a to 209f:
步骤209a、获取目标监控指标的目标上限值。 Step 209a: Obtain the target upper limit value of the target monitoring indicator.
在本申请实施例中,目标上限值是预先设定的一个上限值,即针对目标监控指标时,允许目标监控指标对应的最大值。In the embodiment of the present application, the target upper limit value is a preset upper limit value, that is, for the target monitoring index, the allowable maximum value corresponding to the target monitoring index.
其中,信息告警设备执行步骤209a之后,可以选择执行步骤209b~209c,或者步骤209d,或者步骤209e,或者步骤209f。Wherein, after the information alarm device executes step 209a, it can choose to execute steps 209b to 209c, or step 209d, or step 209e, or step 209f.
步骤209b、若第一监控值小于目标上限值,且第一监控值大于或等于第一阈值,按照第二采样间隔连续采集目标监控指标的第三预设数量个第三监控值。 Step 209b: If the first monitoring value is less than the target upper limit value, and the first monitoring value is greater than or equal to the first threshold value, continuously collect a third preset number of third monitoring values of the target monitoring index according to the second sampling interval.
其中,第二采样间隔小于第一采样间隔,目标阈值包括第一阈值。Wherein, the second sampling interval is smaller than the first sampling interval, and the target threshold includes the first threshold.
在本申请实施例中,第一阈值为目标阈值包括的至少一个阈值中的最小阈值。第三预设数量为根据大量实验得到的经验值或为用户设定值,通常第三预设数量小于第一预设数量。以第二采样间隔为3秒、第三预设数量为2为例进行说明,在第一监控值小于目标上限值,且第一监控值大于或等于第一阈值时,针对每隔3秒连续采集2次目标监控指标,得到2个第三监控值。In this embodiment of the present application, the first threshold is the smallest threshold among at least one threshold included in the target threshold. The third preset number is an empirical value obtained from a large number of experiments or a value set by the user, and usually the third preset number is smaller than the first preset number. Taking the second sampling interval as 3 seconds and the third preset number as 2 as an example, when the first monitoring value is less than the target upper limit value, and the first monitoring value is greater than or equal to the first threshold value, for every 3 seconds The target monitoring indicators are continuously collected twice to obtain two third monitoring values.
步骤209c、若第三预设数量个第三监控值均小于目标上限值,且至少一个第三监控值大于或等于第二阈值,生成第一告警信息。 Step 209c: If the third preset number of third monitoring values are all smaller than the target upper limit value, and at least one third monitoring value is greater than or equal to the second threshold, generate first alarm information.
其中,目标阈值包括第二阈值,第二阈值大于第一阈值,第一告警信息用于实现针对目标监控对象的重大告警。The target threshold includes a second threshold, the second threshold is greater than the first threshold, and the first alarm information is used to implement a major alarm for the target monitoring object.
在本申请实施例中,以有3个第三监控值为例进行说明,至少一个第三监控值大于或等于第二阈值的情况包括:3个第三监控值中的任意一个监控值大于或等于第二阈值、3个第三监控值中的任意两个监控值大于或等于第二阈值和3个第三监控值均大于或等于第二阈值。In the embodiments of the present application, three third monitoring values are used as an example for description, and the case where at least one third monitoring value is greater than or equal to the second threshold includes: any one of the three third monitoring values is greater than or equal to equal to the second threshold, any two of the three third monitoring values are greater than or equal to the second threshold, and all the three third monitoring values are greater than or equal to the second threshold.
步骤209d、若第一监控值大于或等于目标上限值,生成第二告警信息。 Step 209d, if the first monitoring value is greater than or equal to the target upper limit value, generate second alarm information.
其中,第二告警信息用于实现针对目标监控对象的严重告警。The second alarm information is used to implement a serious alarm for the target monitoring object.
步骤209e、若第三预设数量个第三监控值中的至少一个第三监控值大于或等于目标上限值,生成第二告警信息。 Step 209e: If at least one third monitoring value in the third preset number of third monitoring values is greater than or equal to the target upper limit value, generate second alarm information.
步骤209f、若第三预设数量个第三监控值均小于第二阈值,生成第三告警信息。 Step 209f, if the third preset number of third monitoring values are all smaller than the second threshold, generate third alarm information.
其中,第三告警信息用于实现针对目标监控对象的次要告警。The third alarm information is used to implement a secondary alarm for the target monitoring object.
在本申请其他实施例中,若第一监控值小于第一阈值,确定目标监控对象正常运行,无需进行任何告警。In other embodiments of the present application, if the first monitoring value is less than the first threshold, it is determined that the target monitoring object is running normally, and no alarm is required.
在本申请实施例中,实现了不同等级的告警,有效提高了用户的使用体验效果。In the embodiment of the present application, different levels of alarms are implemented, which effectively improves the user experience.
基于前述实施例,本申请实施例提供一种信息告警方法,参照图3所示,包括以下步骤:Based on the foregoing embodiments, an embodiment of the present application provides an information alarm method, as shown in FIG. 3 , including the following steps:
步骤31、采集数据库的目标监控指标的监测值。Step 31: Collect the monitoring value of the target monitoring index of the database.
其中,针对数据库的监控指标进行分类,可以参照表1所示。表1中列举了指标类型,指标名称以及对应的指标属性。表中所列举的资源型指标,每个实例因为承载不同的业务,不同的请求量,会有不同的阈值标准,且该类指标属于重点监控对象,故使用自适应阈值来监控该类资源型指标中的CPU使用率和IO使用率,即目标监控指标为CPU使用率和/或IO使用率。信息告警设备运行的监控系统按某一采样频率循环采集目标监控指标的监控数据,其中,采样频率为采样的时间间隔,可以按需调整,默认1分钟采集1次,采样太频繁会导致监控库压力太大。Among them, the monitoring indicators of the database are classified, as shown in Table 1. Table 1 lists the indicator types, indicator names and corresponding indicator attributes. For the resource-type indicators listed in the table, each instance will have different threshold standards because it carries different services and different request volumes, and such indicators are key monitoring objects, so adaptive thresholds are used to monitor this type of resource type The CPU usage and IO usage in the metrics, that is, the target monitoring metrics are CPU usage and/or IO usage. The monitoring system operated by the information alarm device cyclically collects the monitoring data of the target monitoring indicators according to a certain sampling frequency. The sampling frequency is the sampling time interval, which can be adjusted as needed. The default is to collect once every minute. too much stress.
表1Table 1
Figure PCTCN2021129296-appb-000001
Figure PCTCN2021129296-appb-000001
步骤32、生成自适应阈值。 Step 32, generating an adaptive threshold.
其中,以目标监控指标为IO使用率为例进行说明,自适应阈值U(t)=p(t)+β·δ。其中:p(t)可以通过公式p(t)=α(I t-I t-1)+I t-1计算得到,It表示当前时刻t时刻的IO使用率的监测值;It-1是IO使用率的监测值。 Wherein, taking the target monitoring index as the IO usage rate as an example, the self-adaptive threshold U(t)=p(t)+β·δ. Wherein: p(t) can be calculated by the formula p(t)=α(I t -I t-1 )+I t-1 , It represents the monitoring value of the IO usage rate at the current moment t; It-1 is the Monitoring value of IO usage.
α是加权比例,用于控制当前时刻的目标监控指标的监测值在公式中所占的比重,p(t)公式能够用于适应局部行为的快慢程度,如果将α设置成一个固定常数往往没有很好的适用性。因此,α随着监控指标随着运行时间的改变,α也会发生改变。在数据库非异常情况下,数据库的监控指标IO使用率和CPU使用率的监测值,会与数据库的总请求量保持一定的正相关性,在数据库异常情况下,往往出现总请求量下降,监控指标IO使用率和CPU使用率的监测值反而升高的情况。因此,只需考虑正常情况下IO使用率和CPU使用率分别与总请求量的相关性,Qt表示当前时刻t时刻的总请求量值,则t时刻IO使用率与总请求量的两指标相关系数q可以记 为q=(Q t-Q t-1)/(I t-I t-1),这样,可以取举例当前时刻t时刻最近的7次q为正系数(正常情况下该系数为正)的值后取其平均值,得到当前时刻t时刻的α值。 α is the weighted ratio, which is used to control the proportion of the monitoring value of the target monitoring index at the current moment in the formula. The p(t) formula can be used to adapt to the speed of local behavior. If α is set to a fixed constant, there is often no Very good applicability. Therefore, α changes as the monitoring index changes with the running time. When the database is not abnormal, the monitoring values of the IO usage rate and CPU usage rate of the monitoring indicators of the database will maintain a certain positive correlation with the total request volume of the database. The monitoring values of the indicators IO usage and CPU usage increase instead. Therefore, it is only necessary to consider the correlation between the IO usage rate and the CPU usage rate and the total request volume under normal circumstances. Qt represents the total request volume value at the current time t, then the IO usage rate at time t is related to the two indicators of the total request volume. The coefficient q can be written as q=(Q t -Q t-1 )/(I t -I t-1 ), in this way, the 7th nearest q at the current time t can be taken as a positive coefficient (under normal circumstances, this coefficient The value of α at the current time t is obtained by taking its average value.
δ是标准差。其中,得到δ的过程为:由于监控系统是一直对数据库进行检测的,因此,每天的同一时刻,对目标监控指标均会有一个观测值,这样,在目标监控指标为IO使用率时,可以获取在过去的一段时间例如一周就有7个同一时刻的IO使用率的历史检测值。对这7个同一时刻的IO使用率的历史检测值进行标准差计算,即可得到标准差δ。δ is the standard deviation. Among them, the process of obtaining δ is as follows: since the monitoring system always detects the database, there is an observation value for the target monitoring indicator at the same time every day. In this way, when the target monitoring indicator is the IO usage rate, it can be Obtain the historical detection value of the IO usage rate at the same time for 7 times in the past period of time, such as a week. The standard deviation δ can be obtained by calculating the standard deviation of the historical detection values of the IO usage rates at the same time.
β的取值决定了不同的容差度,如果设置成一个固定的值,也可能有普适性问题。因此,此处针对目标监控指标进行告警,可以设置不同的β值。设置的β值通常为经验值。这样,在β值不同时,例如设置2个β值分别为β1、β2(β1<β2,均为正整数)时,分别对应的自适应阈值U(t)标记为U1、U2,且U1<U2。The value of β determines different tolerances, and if it is set to a fixed value, there may also be universal problems. Therefore, to alert the target monitoring indicators here, different β values can be set. The set beta value is usually an empirical value. In this way, when the β values are different, for example, when two β values are set as β1 and β2 (β1<β2, both are positive integers), the corresponding adaptive thresholds U(t) are marked as U1 and U2, and U1< U2.
步骤33、匹配对应告警级别。Step 33: Match the corresponding alarm severity.
其中,在资源型指标的监控过程中,通常会考虑资源上限,上限记为L(L为固定常数)。超过上限L时需以高级别的告警输出。这样,针对当前时刻采集到的IO使用率的监控值,与上限L、自适应阈值U1和U2进行对应告警级别监控的流程可以参照图4所示,其中,U1<U2。如图4所示,包括以下步骤:Among them, in the monitoring process of resource-based indicators, the upper limit of resources is usually considered, and the upper limit is denoted as L (L is a fixed constant). When the upper limit L is exceeded, a high-level alarm must be output. In this way, for the monitoring value of the IO usage rate collected at the current moment, the process of monitoring the corresponding alarm level with the upper limit L, the adaptive thresholds U1 and U2 can be referred to as shown in FIG. 4 , where U1<U2. As shown in Figure 4, it includes the following steps:
步骤41、获取当前时刻的IO使用率的监控值。Step 41: Obtain the monitoring value of the IO usage rate at the current moment.
步骤42、判断IO使用率的监控值是否大于上限L,若IO使用率的监控值大于或等于上限L,执行步骤43,若IO使用率的监控值小于上限L,执行步骤44。Step 42: Determine whether the monitoring value of the IO usage rate is greater than the upper limit L, if the monitoring value of the IO usage rate is greater than or equal to the upper limit L, go to step 43, and if the monitoring value of the IO usage rate is less than the upper limit L, go to step 44.
步骤43、生成CRITICAL告警。 Step 43, generate a CRITICAL alarm.
其中,CRITICAL告警对应前述的第一告警信息,CRITICAL级别告警,表明数据库已经出现异常,需要马上处理。The CRITICAL alarm corresponds to the aforementioned first alarm information, and the CRITICAL level alarm indicates that an abnormality has occurred in the database and needs to be dealt with immediately.
步骤44、判断IO使用率的监控值是否大于自适应阈值U1,若IO使用率的监控值小于自适应阈值U1,执行步骤45,若IO使用率的监控值大于或等于自适应阈值U1,执行步骤46。 Step 44, determine whether the monitoring value of the IO usage rate is greater than the adaptive threshold U1, if the monitoring value of the IO usage rate is less than the adaptive threshold U1, go to step 45, if the monitoring value of the IO usage rate is greater than or equal to the adaptive threshold U1, execute Step 46.
步骤45、确定数据库正常。 Step 45, determine that the database is normal.
步骤46、按照3秒的采集间隔连续采集2次IO使用率的监控值,得到2个监控值。Step 46: Continuously collect the monitoring values of the IO usage rate for two times according to the collection interval of 3 seconds, and obtain two monitoring values.
步骤47、判断2个监控值是否大于上限L,若2个监控值中的至少一个监控值大于或等于上限L,执行步骤43,若2个监控值均小于上限L,执行步骤48。Step 47: Determine whether the two monitoring values are greater than the upper limit L, if at least one of the two monitoring values is greater than or equal to the upper limit L, go to step 43, if both monitoring values are less than the upper limit L, go to step 48.
步骤48、判断2个监控值是否大于自适应阈值U2,若2个监控值中的至少一个监控值大于或等于自适应阈值U2,执行步骤49,若2个监控值均小于自适应阈值U2,执行步骤410。 Step 48, determine whether the 2 monitoring values are greater than the adaptive threshold U2, if at least one monitoring value in the 2 monitoring values is greater than or equal to the adaptive threshold U2, perform step 49, if the 2 monitoring values are less than the adaptive threshold U2, Step 410 is performed.
步骤49、生成MAJOR告警。Step 49, generate a MAJOR alarm.
其中,MAJOR告警对应前述第二告警信息,MAJOR级别告警,需要重点关注, 可能会产生一定的影响。Among them, the MAJOR alarm corresponds to the aforementioned second alarm information, and the MAJOR level alarm needs to be focused on, and may have a certain impact.
步骤410、生成MINOR告警。Step 410: Generate a MINOR alarm.
其中,MINOR告警对应前述第三告警信息,MINOR级别的告警,无需马上处理,可事后采集数据进行潜在风险分析。Among them, MINOR alarms correspond to the aforementioned third alarm information, MINOR-level alarms do not need to be processed immediately, and data can be collected for potential risk analysis afterwards.
基于前述实施例,在本申请其他实施例中,提供的是在告警提示信息为第一告警信息或第二告警信息时,进行根因推荐的一种实现方法,参照图5所示,信息告警设备执行步骤201~208和步骤209a~209c,或步骤201~208、步骤209a和步骤209d,或步骤201~208、步骤209a和步骤209e之后,还用于执行步骤211~215:Based on the foregoing embodiments, in other embodiments of the present application, a method for implementing root cause recommendation is provided when the alarm prompt information is the first alarm information or the second alarm information. Referring to FIG. 5 , the information alarm After the device performs steps 201-208 and steps 209a-209c, or steps 201-208, steps 209a and 209d, or steps 201-208, steps 209a and 209e, it is also used to perform steps 211-215:
步骤211、若告警提示信息是第一告警信息或第二告警信息,获取目标监控对象在当前时刻执行的至少一条结构化查询语言SQL语句。Step 211: If the alarm prompt information is the first alarm information or the second alarm information, obtain at least one structured query language SQL statement executed by the target monitoring object at the current moment.
步骤212、获取至少一条SQL语句对应的执行计划。Step 212: Obtain an execution plan corresponding to at least one SQL statement.
在本申请实施例中共,执行计划(Execution Plan,也叫查询计划或者解释计划)是数据库执行SQL语句的具体步骤,例如通过索引还是全表扫描访问表中的数据,连接查询的实现方式和连接的顺序等。In the embodiment of the present application, an execution plan (also called a query plan or an explain plan) is a specific step for a database to execute an SQL statement, such as accessing data in a table through an index or a full table scan, and the implementation method and connection of the connection query. order, etc.
步骤213、基于至少一条SQL语句对应的执行计划,确定每一SQL语句对应目标监控指标的消耗成本,得到至少一条SQL语句对应的消耗成本。Step 213: Determine the consumption cost of each SQL statement corresponding to the target monitoring indicator based on the execution plan corresponding to the at least one SQL statement, and obtain the consumption cost corresponding to the at least one SQL statement.
在本申请实施例中,对至少一条SQL语句对应的执行计划中的每一SQL语句对应的执行计划进行分析,确定每一SQL语句对应目标监控指标的消耗成本,从而得到至少一条SQL语句对应的消耗成本。例如,信息告警设备确定数据库当前执行的SQL语句有3条,分别获取这3条SQL语句对应的执行计划,并根据这3条SQL语句对应的执行计划中的每一条SQL语句对应的执行计划,确定每一条SQL语句对应的目标监控指标的消耗成本,从而得到3条SQL语句各自对应的消耗成本。In the embodiment of the present application, the execution plan corresponding to each SQL statement in the execution plan corresponding to the at least one SQL statement is analyzed to determine the consumption cost of the target monitoring indicator corresponding to each SQL statement, so as to obtain the corresponding execution plan of the at least one SQL statement. consumption cost. For example, the information alarm device determines that there are three SQL statements currently executed in the database, obtains the execution plans corresponding to the three SQL statements respectively, and according to the execution plan corresponding to each SQL statement in the execution plans corresponding to the three SQL statements, Determine the consumption cost of the target monitoring indicator corresponding to each SQL statement, so as to obtain the corresponding consumption cost of the three SQL statements.
步骤214、基于至少一条SQL语句对应的消耗成本,按照消耗成本从高到低的排序顺序对至少一条SQL语句进行排序,得到SQL语句排序结果。Step 214: Based on the consumption cost corresponding to the at least one SQL statement, sort the at least one SQL statement according to the order of the consumption cost from high to low, and obtain a SQL statement sorting result.
在本申请实施例中,假设3条SQL语句包括SQL语句1、SQL语句2和SQL语句3,对应的IO消耗成本依次为IO_cost1、IO_cost2、IO_cost3,对这3条SQL语句的消耗成本进行从高到低的排序顺序例如为IO_cost3>IO_cost1>IO_cost12,这样,对应的3条SQL语句的排序顺序即SQL语句排序结果为:IO_cost3、IO_cost1、IO_cost2。In the embodiment of the present application, it is assumed that the three SQL statements include SQL statement 1, SQL statement 2, and SQL statement 3, and the corresponding IO consumption costs are IO_cost1, IO_cost2, and IO_cost3 in sequence. For example, the sorting order to the lowest is IO_cost3>IO_cost1>IO_cost12. In this way, the sorting order of the corresponding three SQL statements, that is, the sorting results of the SQL statements are: IO_cost3, IO_cost1, and IO_cost2.
步骤215、显示SQL语句排序结果。 Step 215 , displaying the sorting result of the SQL statement.
需说明的是,步骤211~215可以在步骤210之前执行,其中,步骤215与步骤210可以同时执行,步骤215也可以在步骤210之后执行。It should be noted that steps 211 to 215 may be performed before step 210 , wherein step 215 and step 210 may be performed simultaneously, and step 215 may also be performed after step 210 .
基于前述实施例,在本申请其他实施例中,步骤213可以由步骤d11~d19和/或步骤e11~e16来实现,其中,步骤d11~d19对应的实施例提供的是确定目标监控 对象的消耗成本为IO消耗成本时的方案实现过程,步骤e11~e16对应的实施例提供的是确定目标监控对象的消耗成本为CPU消耗成本时的方案实现过程。需说明的是,在目标监控对象的目标监控指标为IO指标时,在确定消耗成本时,可以只需确定IO消耗成本,也可以在确定IO消耗成本的同时,还确定CPU消耗成本;同理,在目标监控对象的目标监控指标为CPU指标时,在确定消耗成本时,可以只需确定CPU消耗成本,也可以在确CPU消耗成本的同时,还确定CPU消耗成本,具体实际执行过程可以参照实际应用要求来确定。Based on the foregoing embodiments, in other embodiments of the present application, step 213 may be implemented by steps d11 to d19 and/or steps e11 to e16, wherein the embodiment corresponding to steps d11 to d19 provides determination of the consumption of the target monitoring object The solution implementation process when the cost is the IO consumption cost, and the embodiments corresponding to steps e11 to e16 provide the solution implementation process when it is determined that the consumption cost of the target monitoring object is the CPU consumption cost. It should be noted that when the target monitoring indicator of the target monitoring object is the IO indicator, when determining the consumption cost, only the IO consumption cost can be determined, or the CPU consumption cost can also be determined while determining the IO consumption cost; , when the target monitoring indicator of the target monitoring object is the CPU indicator, when determining the consumption cost, you can only determine the CPU consumption cost, or you can determine the CPU consumption cost at the same time as the CPU consumption cost. The actual execution process can refer to actual application requirements.
步骤d11、基于至少一条SQL语句对应的执行计划,确定每一SQL语句对应的执行计划中包括的每一身份标识ID的行数。Step d11: Determine the number of rows of each identity ID included in the execution plan corresponding to each SQL statement based on the execution plan corresponding to at least one SQL statement.
在本申请实施例中,每一SQL语句对应的执行计划中包括id列,id列用于指示对应的SQL语句中的选择(SELECT)的序号,即对应的ID。其中,ID值大的先执行,ID值相同的从上往下执行。如果id值为Null,表示利用其他id值的行结果进行union操作,需根据table列的值<union m,n>的值判断放在哪一步,放在min(m,n)(取m和n中较小的值)后一步执行。In the embodiment of the present application, the execution plan corresponding to each SQL statement includes an id column, and the id column is used to indicate the sequence number of the selection (SELECT) in the corresponding SQL statement, that is, the corresponding ID. Among them, those with larger ID values are executed first, and those with the same ID value are executed from top to bottom. If the id value is Null, it means that the union operation is performed using the row results of other id values. It is necessary to judge which step to place according to the value of the table column <union m,n>, and place it in min(m,n) (take m and The smaller value of n) is executed after the next step.
其中,信息告警设备执行步骤d11后,可以选择执行步骤d12~d13,或者步骤d14~d18。在每一ID的行数为1时,选择执行步骤d12~d13;在每一ID的行数大于1时,选择执行步骤d14~d18。Wherein, after the information alarm device executes step d11, it can choose to execute steps d12-d13, or steps d14-d18. When the number of rows for each ID is 1, steps d12 to d13 are selected to be executed; when the number of rows of each ID is greater than 1, steps d14 to d18 are selected to be executed.
步骤d12、确定每一SQL语句对应的行数为1的ID为第一ID,获取第一ID对应的第一行数和平均长度。Step d12: Determine the ID corresponding to each SQL statement with the row number of 1 as the first ID, and obtain the first row number and average length corresponding to the first ID.
其中,第一行数记录于每一SQL语句对应的执行计划中。The number of the first row is recorded in the execution plan corresponding to each SQL statement.
在本申请实施例中,第一行数记录于每一SQL语句对应的执行计划中的rows列中,第一行数可以用rows来表示。每一SQL语句对应的第一ID的平均长度AVG_ROW_LENGTH,可以从文件名为information_schema.tables的文件中获取得到。In this embodiment of the present application, the first row number is recorded in the rows column in the execution plan corresponding to each SQL statement, and the first row number may be represented by rows. The average length AVG_ROW_LENGTH of the first ID corresponding to each SQL statement can be obtained from the file named information_schema.tables.
步骤d13、对第一行数与平均长度的乘积与预设InnoDB数据页大小进行向上取整处理,得到第一ID的IO消耗成本。Step d13 , rounding up the product of the first row number and the average length and the preset InnoDB data page size to obtain the IO consumption cost of the first ID.
在本申请实施例中,InnoDB数据页在MySQL默认的非压缩数据页为16KB,数据页包括七个部分,数据页文件管理头信息,数据页面头信息,最大最小记录,用户记录,空闲空间,数据目录(槽),数据页尾部。预设InnoDB数据页大小可以记为innodb_page_size,对应的,行数为1的SQL语句的IO消耗成本
Figure PCTCN2021129296-appb-000002
其中,符号
Figure PCTCN2021129296-appb-000003
用于表示向上取整。InnoDB数据页也可以是用户进行设置得到。
In the embodiment of this application, the default uncompressed data page of InnoDB data page in MySQL is 16KB, and the data page includes seven parts, data page file management header information, data page header information, maximum and minimum records, user records, free space, Data directory (slot), tail of data pages. The preset InnoDB data page size can be recorded as innodb_page_size, corresponding to the IO consumption cost of SQL statements with a row number of 1
Figure PCTCN2021129296-appb-000002
Among them, the symbol
Figure PCTCN2021129296-appb-000003
Used to indicate rounding up. InnoDB data pages can also be set by the user.
步骤d14、确定每一SQL语句对应的行数大于1的ID为第二ID,确定第二ID的每一行的返回行数和扫描行数。Step d14: Determine the ID with the number of rows greater than 1 corresponding to each SQL statement as the second ID, and determine the number of returned rows and the number of scanned rows for each row of the second ID.
在本申请实例中,假设SQL语句1中的某一个ID例如为ID1具有多行例如为 k行时,统计ID1的每一行的返回行数和扫描行数。其中,一种确定每一SQL语句的每一ID的每一行的返回行数和扫描行数的方式可以如下所示:从SQL语句1对应的执行计划中,确定ID1对应的k行,并根据ID1对应的k行的行从上往下的顺序,依次确定每一行的返回行数。在确定每一SQL语句的每一ID的每一行的扫描行数时,需根据Join方式来确定,即需统计join比较次数。其中,join比较次数可以通过表2来确定,在表2中,RN表示外表记录数,SN表示内表记录数。In the example of this application, assuming that a certain ID in SQL statement 1 is, for example, ID1 with multiple rows, such as k rows, count the number of returned rows and the number of scanned rows of each row of ID1. Among them, a way to determine the number of returned rows and the number of scanned rows of each row of each ID of each SQL statement may be as follows: From the execution plan corresponding to SQL statement 1, determine the k rows corresponding to ID1, and according to The row order of the k rows corresponding to ID1 from top to bottom determines the number of returned rows for each row in turn. When determining the number of scan rows for each row of each ID of each SQL statement, it needs to be determined according to the Join method, that is, the number of join comparisons needs to be counted. The number of join comparisons can be determined by Table 2. In Table 2, RN represents the number of records in the outer table, and SN represents the number of records in the inner table.
表2Table 2
Figure PCTCN2021129296-appb-000004
Figure PCTCN2021129296-appb-000004
在表2的Join方式中,SNLJ指的是简单嵌套循环联接,对应的英文全拼为Simple Nested-Loops Join的Join方式,INLJ指的是基于索引的嵌套循环联接,对应的英文全拼为Index Nested-Loops Join,BNL指的是基于块的嵌套循环联接,对应的英文全拼为Block Nested-Loops Join,CHJ指的是经典哈希连接,对应的英文全拼为Classic Hash Join。In the Join method in Table 2, SNLJ refers to a simple nested loop join, and the corresponding English spelling is the Join method of Simple Nested-Loops Join, and INLJ refers to an index-based nested loop join, and the corresponding English spelling is full For Index Nested-Loops Join, BNL refers to the block-based nested loop join, the corresponding English spelling is Block Nested-Loops Join, CHJ refers to the classic hash join, and the corresponding English spelling is Classic Hash Join.
其中,ID1对应的K行中的第1行的返回行数记为return_rows_1=rows_1*filtered_1;其中,rows_1可以从SQL语句1对应的执行计划中的ID1对应的第1行的rows列中获取,filtered_1可以从SQL语句1对应的执行计划ID1对应的第1行的filtered列中获取。除第1行外,其他行的返回行数均与各自行对应的join方式有关,join方式记录于SQL语句1对应的执行计划中。Among them, the number of returned rows of the first row in the K rows corresponding to ID1 is recorded as return_rows_1=rows_1*filtered_1; wherein, rows_1 can be obtained from the rows column of the first row corresponding to ID1 in the execution plan corresponding to SQL statement 1, filtered_1 can be obtained from the filtered column of row 1 corresponding to execution plan ID1 corresponding to SQL statement 1. Except for the first row, the number of returned rows of other rows is related to the join mode corresponding to each row, and the join mode is recorded in the execution plan corresponding to SQL statement 1.
示例性的,确定SQL语句1对应的执行计划中的ID1对应的第2行的返回行数时,在第2行的join方式为inner join时,第2行的返回行数return_rows_2=return_rows_1*(rows_2*filtered_n2),在第2行的join方式为left join时,第2行的返回行数return_rows_2=return_rows_1,在第2行的join方式为right join时,第2行的返回行数return_rows_2=table_rows_2;…;确定SQL语句1对应的执行计划中的ID1对应的第k行的返回行数,在第k行的join方式为inner join时,return_rows_k=return_rows_(k-1)*(rows_k*filtered_k),在第k行的join方式为left join时,return_rows_k=return_rows_(k-1),在第k行的join方式为right join时,return_rows_k=table_rows_k。其中,以nk为例,则return_rows_(k-1)即第k-1行返回的行数,为执行顺序外表行数,table_rows_k为内表行数。Exemplarily, when determining the number of returned rows in the second row corresponding to ID1 in the execution plan corresponding to SQL statement 1, when the join mode of the second row is inner join, the number of returned rows in the second row return_rows_2=return_rows_1*( rows_2*filtered_n2), when the join method of the second row is left join, the number of returned rows in row 2 return_rows_2=return_rows_1, when the join method of row 2 is right join, the number of returned rows in row 2 return_rows_2=table_rows_2 ;...; Determine the number of returned rows of row k corresponding to ID1 in the execution plan corresponding to SQL statement 1. When the join method of row k is inner join, return_rows_k=return_rows_(k-1)*(rows_k*filtered_k) , when the join method of row k is left join, return_rows_k=return_rows_(k-1), and when the join method of row k is right join, return_rows_k=table_rows_k. Among them, taking nk as an example, return_rows_(k-1) is the number of rows returned by row k-1, which is the number of outer rows in the execution order, and table_rows_k is the number of inner table rows.
假设SQL语句1使用的Join方式为INLJ时,对应的SQL语句1中的ID1对应的第k行的扫描行数记为examine_rows_k=return_rows_(k-1)*(1+table_rows_k/index_cdl),其中,index_cdl表示索引区分度。如果需要回表即key的值不是PRIMARY,extra中不包含using index,且有using where时,SQL语句1中的ID1对应的第k行的扫描行数记为examine_rows_k=return_rows_(k-1)*(1+table_rows_k/index_cdl)*2。Assuming that the Join method used in SQL statement 1 is INLJ, the number of scan rows of the k-th row corresponding to ID1 in the corresponding SQL statement 1 is recorded as examine_rows_k=return_rows_(k-1)*(1+table_rows_k/index_cdl), where, index_cdl represents the index discrimination. If you need to return to the table, that is, the value of the key is not PRIMARY, the extra does not contain using index, and there is using where, the number of scanned rows of the kth row corresponding to ID1 in SQL statement 1 is recorded as examine_rows_k=return_rows_(k-1)* (1+table_rows_k/index_cdl)*2.
SQL语句1中的ID1对应的第k行的Join的比较次数可以记为join_compare_k=table_rows_k*index_height,其中,index_height用于表示索引高度。需说明的是,索引区分度index_cdl和索引高度index_height可以从SQL语句1中的ID1对应的mysql.innodb_index_stat表中获取得到。The comparison times of the k-th row of Join corresponding to ID1 in SQL statement 1 may be recorded as join_compare_k=table_rows_k*index_height, where index_height is used to represent the index height. It should be noted that the index distinction index_cdl and index height index_height can be obtained from the mysql.innodb_index_stat table corresponding to ID1 in SQL statement 1.
步骤d15、获取对应的第二ID对应的平均长度。Step d15: Obtain the average length corresponding to the corresponding second ID.
在本申请实施例中,第二ID对应的平均长度AVG_ROW_LENGTH,可以从每一SQL语句对应的information_schema.tables中获取得到。In this embodiment of the present application, the average length AVG_ROW_LENGTH corresponding to the second ID can be obtained from information_schema.tables corresponding to each SQL statement.
步骤d16、确定第二ID的每一行的扫描行数与对应的相邻的前一行的返回行数的差值,得到第一差值。Step d16: Determine the difference between the number of scanned lines in each line of the second ID and the number of returned lines in the corresponding adjacent previous line to obtain the first difference.
在本申请实施例中,计算得到第一差值的公式可以记为:第一差值=examine_rows_k–return_rows_(k-1)。In the embodiment of the present application, the formula for calculating the first difference value may be recorded as: first difference value=examine_rows_k−return_rows_(k−1).
步骤d17、确定第一差值与平均长度乘积与预设InnoDB数据页大小进行向上取整处理,得到第二ID的每一行对应的IO消耗子成本。Step d17: Determine the product of the first difference value and the average length and the preset InnoDB data page size and perform rounding processing to obtain the IO consumption sub-cost corresponding to each row of the second ID.
在本申请实施例中,SQL语句1中的ID1对应的第k行的IO消耗子成本可以记为
Figure PCTCN2021129296-appb-000005
其中,i=1,2,……,k,innodb_page_size用于表示预设InnoDB数据页大小,符号
Figure PCTCN2021129296-appb-000006
用于表示向上取整。
In the embodiment of the present application, the IO consumption sub-cost of the k-th row corresponding to ID1 in SQL statement 1 can be recorded as
Figure PCTCN2021129296-appb-000005
Among them, i=1,2,...,k, innodb_page_size is used to represent the preset InnoDB data page size, the symbol
Figure PCTCN2021129296-appb-000006
Used to indicate rounding up.
步骤d18、确定第二ID的每一行对应的IO消耗子成本的累加值,得到第二ID的IO消耗成本。Step d18: Determine the accumulated value of the IO consumption sub-cost corresponding to each row of the second ID, and obtain the IO consumption cost of the second ID.
在本申请实施例中,SQL语句1的第二ID的IO消耗成本
Figure PCTCN2021129296-appb-000007
符号∑表示累加。
In this embodiment of the present application, the IO consumption cost of the second ID of SQL statement 1
Figure PCTCN2021129296-appb-000007
The symbol Σ denotes accumulation.
步骤d19、确定每一SQL语句对应的执行计划中包括的每一ID的IO消耗成本的累加值,得到至少一条SQL语句对应的消耗成本。Step d19: Determine the cumulative value of the IO consumption cost of each ID included in the execution plan corresponding to each SQL statement, and obtain the consumption cost corresponding to at least one SQL statement.
其中,每一ID的IO消耗成本的累加值包括第一ID的IO消耗成本和/或第二ID的IO消耗成本。The accumulated value of the IO consumption cost of each ID includes the IO consumption cost of the first ID and/or the IO consumption cost of the second ID.
在本申请实施例中,假设SQL语句1中包括5个ID,其中3个ID例如为ID1、ID2和ID3的行数为1行,其余2个ID例如ID4和ID5的行数包括至少2行,因此,可以通过步骤d12~d13分别确定得到ID1的IO消耗成本、ID2的IO的消耗成本和ID3的IO消耗成本,通过步骤d14~d17分别确定得到ID4的IO消耗成本和ID5的IO消耗成本,然后将ID1的IO消耗成本、ID2的IO的消耗成本、ID3的IO消耗成本、ID4的IO消耗成本和ID5的IO消耗成本进行累加,得到SQL语句1的消耗成本,同理,也可以得到至少一条SQL语句中的其他SQL语句的消耗成本。In the embodiment of the present application, it is assumed that SQL statement 1 includes 5 IDs, wherein the number of rows for 3 IDs such as ID1, ID2 and ID3 is 1 row, and the number of rows for the remaining 2 IDs such as ID4 and ID5 includes at least 2 rows , therefore, the IO consumption cost of ID1, the IO consumption cost of ID2 and the IO consumption cost of ID3 can be determined through steps d12 to d13 respectively, and the IO consumption cost of ID4 and the IO consumption cost of ID5 can be determined through steps d14 to d17 respectively. , and then accumulate the IO consumption cost of ID1, the IO consumption cost of ID2, the IO consumption cost of ID3, the IO consumption cost of ID4, and the IO consumption cost of ID5 to obtain the consumption cost of SQL statement 1. Similarly, you can also get Consumption cost of other SQL statements within at least one SQL statement.
示例性的,假设某一SQL语句包括的ID为1的计划表可以参照表3所示。针对表3中id为1的行从上往下依次进行计算,可以确定:第一行扫描行数examine_rows_1=rows=54952,第一行返回行数return_rows_1=54952*100%=54952;第二行扫描行数examine_rows_2=tables_rows*(1+table_rows_2/index_cal)=54952*(1+413141/137713)=219808,第二行返回行数return_rows_2=219808*100%=219808;第三行扫描行数examine_rows_3=219808*(1+1)=439616,第三行返回行数return_rows_3=219808*1*100%=219808;第四行扫描行数examine_rows_4=219808*(1+1)=439616。Exemplarily, it is assumed that a plan table with an ID of 1 included in a certain SQL statement can be referred to as shown in Table 3. For the rows with id 1 in Table 3 from top to bottom, it can be determined: the number of scanned rows in the first row examine_rows_1=rows=54952, the number of returned rows in the first row return_rows_1=54952*100%=54952; the second row The number of scan rows examine_rows_2=tables_rows*(1+table_rows_2/index_cal)=54952*(1+413141/137713)=219808, the second row returns the row number return_rows_2=219808*100%=219808; the third row scan row number examine_rows_3= 219808*(1+1)=439616, the number of rows returned in the third row return_rows_3=219808*1*100%=219808; the number of scan rows in the fourth row examine_rows_4=219808*(1+1)=439616.
表3table 3
Figure PCTCN2021129296-appb-000008
Figure PCTCN2021129296-appb-000008
计算某一SQL语句包括的ID为1的IO消耗成本为io_cost=(439616-219808)*140/16000+(439616-219808)*350/16000+(219808-54952)*295/16000+(54592-0)*465/16000,约等于11300。Calculate the IO consumption cost of ID 1 included in a SQL statement as io_cost=(439616-219808)*140/16000+(439616-219808)*350/16000+(219808-54952)*295/16000+(54592- 0)*465/16000, approximately equal to 11300.
计算某一SQL语句包括的ID为1的CPU消耗成本为cpu_cost=(54952+54952*log54952)+(54952*3)+(219808*4+219808*3)+(219808*4+219808*3),约等于3500000。Calculate the CPU consumption cost with ID 1 included in a SQL statement as cpu_cost=(54952+54952*log54952)+(54952*3)+(219808*4+219808*3)+(219808*4+219808*3) , approximately equal to 3500000.
这样,根据当前时刻的至少一条SQL语句的IO消耗成本和/或CPU消耗成本,对至少一条SQL语句中的SQL语句进行排序,以得到排序结果。进一步的,在一些实施例中,还可以结合SQL语句的执行条数,进行根因排序,其中,可以按照执行次数从高到低的方式对SQL语句进行排序,执行次数相同的SQL语句可以根据IO消耗成本或CPU消耗成本进行排序。In this way, the SQL statements in the at least one SQL statement are sorted according to the IO consumption cost and/or the CPU consumption cost of the at least one SQL statement at the current moment to obtain a sorting result. Further, in some embodiments, root cause sorting can also be performed in combination with the number of executions of SQL statements, wherein the SQL statements can be sorted according to the number of executions from high to low, and SQL statements with the same number of executions can be sorted according to Sort by IO consumption cost or CPU consumption cost.
步骤e11、基于至少一条SQL语句对应的执行计划,确定每一SQL语句对应的执行计划中包括的每一身份标识ID的行数。Step e11: Determine the number of rows of each identity ID included in the execution plan corresponding to each SQL statement based on the execution plan corresponding to at least one SQL statement.
步骤e12、确定行数为1的ID为第一ID。Step e12, determine that the ID whose row number is 1 is the first ID.
步骤e13、基于第一ID的执行计划,确定第一ID对应的CPU消耗成本。Step e13: Determine the CPU consumption cost corresponding to the first ID based on the execution plan of the first ID.
步骤e14、确定行数大于1的ID为第二ID。Step e14, determine that the ID whose number of rows is greater than 1 is the second ID.
步骤e15、基于第二ID的每一行的执行计划,确定第二ID的每一行对应的CPU消耗成本。Step e15: Determine the CPU consumption cost corresponding to each row of the second ID based on the execution plan of each row of the second ID.
在本申请实施例中,假设第二ID包括3行,根据第二ID的每一行的执行计划中包括的参数,确定第二ID的每一行对应的CPU消耗成本,得到第二ID的第一行对应的CPU消耗成本,第二行对应的CPU消耗成本,第三行对应的CPU消耗成本。In the embodiment of the present application, it is assumed that the second ID includes 3 rows, and the CPU consumption cost corresponding to each row of the second ID is determined according to the parameters included in the execution plan of each row of the second ID, and the first ID of the second ID is obtained. The CPU consumption cost corresponding to the row, the CPU consumption cost corresponding to the second row, and the CPU consumption cost corresponding to the third row.
步骤e16、将第二ID的每一行对应的CPU消耗成本进行累加,得到第二ID对应的CPU消耗成本。Step e16: Accumulate the CPU consumption cost corresponding to each row of the second ID to obtain the CPU consumption cost corresponding to the second ID.
在本申请其他实施例中,将第二ID包括的3行中每一行对应的CPU消耗成本进行累加,即计算第二ID的第一行对应的CPU消耗成本、第二行对应的CPU消耗成本和第三行对应的CPU消耗成本的累加值,得到第二ID对应的CPU消耗成本。In other embodiments of the present application, the CPU consumption cost corresponding to each of the three rows included in the second ID is accumulated, that is, the CPU consumption cost corresponding to the first row and the CPU consumption cost corresponding to the second row of the second ID are calculated. The accumulated value of the CPU consumption cost corresponding to the third row is used to obtain the CPU consumption cost corresponding to the second ID.
基于前述实施例,在本申请其他实施例中,步骤e13可以由步骤e131~e135来实现:Based on the foregoing embodiments, in other embodiments of the present application, step e13 may be implemented by steps e131 to e135:
步骤e131、基于第一ID的执行计划,确定第一ID对应的第一行数和返回行数。Step e131 , based on the execution plan of the first ID, determine the number of the first row and the number of returned rows corresponding to the first ID.
步骤e132、从第一ID的执行计划中,获取第一ID对应的访问类型Type和额外字段信息Extra。Step e132: From the execution plan of the first ID, obtain the access type Type and extra field information Extra corresponding to the first ID.
在本申请实施例中,以SQL语句1的ID为2的行数为1行为例进行说明,假设针对SQL语句1ID为2的行的执行计划可以参照表4所示。In the embodiment of the present application, the row number of the row whose ID is 2 of SQL statement 1 is 1 is used as an example for description.
表4Table 4
Figure PCTCN2021129296-appb-000009
Figure PCTCN2021129296-appb-000009
这样,可以从表4的Select_Type列中获取得到SQL语句1ID为2的行的访问类型Type,从表4的Extra列中获取得到SQL语句1ID为2的行的Extra。In this way, the access type Type of the row whose ID is 2 in SQL statement 1 can be obtained from the Select_Type column of Table 4, and the Extra of the row whose ID is 2 in SQL statement 1 can be obtained from the Extra column of Table 4.
步骤e133、基于第一ID对应的Type、Extra和第一行数,得到第一ID对应的第一CPU消耗子成本。Step e133 , obtaining the first CPU consumption sub-cost corresponding to the first ID based on the Type, Extra and the first row number corresponding to the first ID.
在本申请实施例中,根据预先设置的不同Type下,不同Extra时,第一行数与消耗子成本之间的关系,确定得到第一ID对应的Type、Extra和第一行数对应的第一ID对应的第一CPU消耗子成本。In the embodiment of the present application, according to the relationship between the number of the first row and the consumption sub-cost under different Types and Extras that are preset, it is determined to obtain the Type corresponding to the first ID, the Extra, and the first row corresponding to the number of the first row. The first CPU consumption sub-cost corresponding to an ID.
示例性的,预先设置的不同Type下,不同Extra时,第一行数rows与消耗子成本之间的关系f_filter(rows)可以参照表5所示。其中,在表5中,primary_key_height标识组件索引高度,一般为3或4。Exemplarily, under different preset types and different Extras, the relationship f_filter(rows) between the first row number rows and the consumption sub-cost can be referred to as shown in Table 5. Wherein, in Table 5, primary_key_height identifies the component index height, which is generally 3 or 4.
表5table 5
Figure PCTCN2021129296-appb-000010
Figure PCTCN2021129296-appb-000010
步骤e134、确定返回行数的对数和返回行数的乘积,得到第一ID对应的第二CPU消耗子成本。Step e134: Determine the product of the logarithm of the number of returned rows and the number of returned rows to obtain the second CPU consumption sub-cost corresponding to the first ID.
在本申请实施例中,第一ID对应的第二CPU消耗子成本可以记为f_sort(return_rows)=return_rows*log return_rows,其中,log为对数函数。In this embodiment of the present application, the second CPU consumption sub-cost corresponding to the first ID may be recorded as f_sort(return_rows)=return_rows*log return_rows, where log is a logarithmic function.
步骤e135、确定第一ID对应的第一CPU消耗子成本和第二CPU消耗子成本的和值,得到第一ID对应的CPU消耗成本。Step e135: Determine the sum of the first CPU consumption sub-cost and the second CPU consumption sub-cost corresponding to the first ID, and obtain the CPU consumption cost corresponding to the first ID.
在本申请实施例中,第一ID对应的CPU消耗成本可以记为cpu_cost=f_filter(rows)+f_sort(return_rows)。In this embodiment of the present application, the CPU consumption cost corresponding to the first ID may be recorded as cpu_cost=f_filter(rows)+f_sort(return_rows).
对应的,步骤e15可以由步骤e151~e156来实现:Correspondingly, step e15 can be realized by steps e151~e156:
步骤e151、基于第二ID的每一行的执行计划,确定第二ID的每一行的第二行数和返回行数。Step e151: Determine the second row number and the returned row number of each row of the second ID based on the execution plan of each row of the second ID.
步骤e152、从第二ID的每一行对应的执行计划中,获取第二ID的每一行对应的访问类型Type和额外字段信息Extra,和第二ID的每一行对应的表行数。Step e152: From the execution plan corresponding to each row of the second ID, obtain the access type Type and extra field information Extra corresponding to each row of the second ID, and the number of table rows corresponding to each row of the second ID.
步骤e153、基于第二ID的每一行对应的Type、Extra和第二行数,得到第二ID的每一行对应的第三CPU消耗子成本。Step e153 , obtaining the third CPU consumption sub-cost corresponding to each row of the second ID based on the Type, Extra and the number of the second row corresponding to each row of the second ID.
在本申请实施例中,步骤e153的具体实现过程可以参照步骤e133的具体实现过程,此处不再详细赘述。In this embodiment of the present application, for the specific implementation process of step e153, reference may be made to the specific implementation process of step e133, which will not be described in detail here.
步骤e154、确定第二ID的每一行的返回行数的对数与对应的返回行数的乘积,得到第二ID的每一行对应的第四CPU消耗子成本。Step e154: Determine the product of the logarithm of the number of returned rows of each row of the second ID and the corresponding number of returned rows to obtain the fourth CPU consumption sub-cost corresponding to each row of the second ID.
在本申请实施例中,步骤e154的具体实现过程可以参照步骤e134的具体实现过程,此处不再详细赘述。In this embodiment of the present application, for the specific implementation process of step e154, reference may be made to the specific implementation process of step e134, which will not be described in detail here.
步骤e155、确定第二ID的每一行对应的表行数和第二ID的每一行对应的索引高度的乘积,得到第二ID的每一行对应的第五CPU消耗子成本。Step e155: Determine the product of the number of table rows corresponding to each row of the second ID and the index height corresponding to each row of the second ID, and obtain the fifth CPU consumption sub-cost corresponding to each row of the second ID.
在本申请实施例中,第二ID的每一行对应的第五CPU消耗子成本可以记为join_compare=table_rows*index_height,其中,index_height用于表示索引高度,table_rows表示第二ID的每一行对应的表行数。In the embodiment of the present application, the fifth CPU consumption sub-cost corresponding to each row of the second ID may be recorded as join_compare=table_rows*index_height, where index_height is used to represent the index height, and table_rows represents the table corresponding to each row of the second ID Rows.
步骤e156、确定第二ID的每一行对应的第三CPU消耗子成本、第四CPU消耗子成本和第五CPU消耗子成本的累加值,得到第二ID的每一行的CPU消耗成本。Step e156: Determine the cumulative value of the third CPU consumption sub-cost, the fourth CPU consumption sub-cost and the fifth CPU consumption sub-cost corresponding to each row of the second ID, and obtain the CPU consumption cost of each row of the second ID.
在本申请实施例中,在不同的告警情况下,对CPU消耗成本和/或IO成本进行分析,并在告警时,输出分析结果,实现根因推荐,有效降低了分析人员的工作难度,提高了分析人员进行告警故障确定的效率,提高了分析人员的使用效率。In the embodiment of the present application, under different alarm conditions, the CPU consumption cost and/or the IO cost is analyzed, and when the alarm is issued, the analysis result is output to implement the root cause recommendation, which effectively reduces the work difficulty of the analyst and improves the This improves the efficiency of the analyst in determining alarm faults and improves the utilization efficiency of the analyst.
基于前述实施例,在本申请其他实施例中,提供的是在告警提示信息为第一告警信息或第二告警信息时,进行根因推荐的另一种实现方法,参照图6所示,信息告警设备执行步骤208和步骤209a~209c,或步骤201~208、步骤209a和步骤209d,或步骤201~208、步骤209a和步骤209e之后,还用于执行步骤216~223:Based on the foregoing embodiments, in other embodiments of the present application, another method for implementing root cause recommendation is provided when the alarm prompt information is the first alarm information or the second alarm information. Referring to FIG. 6 , the information After the alarm device performs steps 208 and 209a to 209c, or steps 201 to 208, steps 209a and 209d, or steps 201 to 208, steps 209a and 209e, it is also used to perform steps 216 to 223:
步骤216、若告警提示信息是第一告警信息或第二告警信息,获取目标监控对象在当前时刻执行的至少一条SQL语句。Step 216: If the alarm prompt information is the first alarm information or the second alarm information, acquire at least one SQL statement executed by the target monitoring object at the current moment.
在本申请实施例中,可以从当前高负载的数据库表例如表文件名为information_schema.processlist中获取当前时刻正在执行的sql语句。In this embodiment of the present application, the sql statement currently being executed may be obtained from a currently high-load database table, for example, the table file name is information_schema.processlist.
步骤217、对至少一条SQL语句按照语句相同的方式进行分组,得到至少一组SQL语句。Step 217: Group at least one SQL statement in the same manner as the statement to obtain at least one set of SQL statements.
在本申请实施例中,按照语句相同的方式进行分组时,具体可以是将句式一样,但参数值不一样的SQL语句归为一组SQL,在一些应用场景中,分组后还可以基于每一组SQL语句的特征,生成SQL指纹作为该组SQL语句的标记。In the embodiments of the present application, when grouping the statements in the same manner, specifically, SQL statements with the same sentence pattern but different parameter values can be grouped into a group of SQL. In some application scenarios, the grouping can also be based on each The characteristics of a group of SQL statements, and the SQL fingerprint is generated as the mark of the group of SQL statements.
步骤218、基于至少一组SQL语句,统计每一组SQL语句包括的SQL语句数量。 Step 218 , based on at least one set of SQL statements, count the number of SQL statements included in each set of SQL statements.
在本申请实施例中,对至少一组SQL语句中的每一组SQL语句包括的SQL语句条数进行统计,得到每一组SQL语句包括的SQL语句数量。In the embodiment of the present application, the number of SQL statements included in each group of SQL statements in the at least one group of SQL statements is counted to obtain the number of SQL statements included in each group of SQL statements.
步骤219、从至少一组SQL语句的每一组SQL语句中获取一条SQL语句,得到至少一条目标SQL语句。Step 219: Acquire one SQL statement from each group of SQL statements in the at least one group of SQL statements, to obtain at least one target SQL statement.
在本申请实施例中,假设对至少一条SQL语句按照语句相同的方式进行分组,得到4组SQL语句,则从每一组SQL语句中随机抽取一条SQL语句,得到4条目标SQL语句。In the embodiment of the present application, it is assumed that at least one SQL statement is grouped in the same manner as the statement to obtain 4 groups of SQL statements, then one SQL statement is randomly selected from each group of SQL statements to obtain 4 target SQL statements.
步骤220、确定目标备用数据库,通过目标备用数据库分别运行每一目标SQL语句。Step 220: Determine the target standby database, and run each target SQL statement respectively through the target standby database.
在本申请实施例中,目标备用数据库指的是负载较低的备用数据库。开启备用数据库profiling时,可以通过语句set profiling=1来实现。In this embodiment of the present application, the target standby database refers to a standby database with a lower load. When enabling standby database profiling, you can use the statement set profiling=1 to achieve this.
步骤221、确定目标备用数据库运行每一目标SQL语句时消耗的资源消耗子成本,得到每一目标SQL语句的资源消耗子成本。Step 221: Determine the resource consumption sub-cost consumed when the target standby database runs each target SQL statement, and obtain the resource consumption sub-cost of each target SQL statement.
在本申请实施例中,在目标备用数据库运行每一目标SQL语句时,目标备用数据均会记录资源消耗情况,因此,可以对目标备用数据库运行每一目标SQL语句时记录的资源消耗情况来确定每一目标SQL语句的资源消耗子成本。In the embodiment of the present application, when the target standby database runs each target SQL statement, the target standby data records the resource consumption. Therefore, the resource consumption recorded when the target standby database runs each target SQL statement can be determined. Resource consumption sub-cost for each target SQL statement.
步骤222、确定每一目标SQL语句的资源消耗子成本与每一目标SQL语句所在组包括的SQL语句数量的乘积,得到每一组SQL语句的资源消耗成本。Step 222: Determine the product of the resource consumption sub-cost of each target SQL statement and the number of SQL statements included in the group where each target SQL statement is located, to obtain the resource consumption cost of each group of SQL statements.
步骤223、按照每一组SQL语句的资源消耗成本从大到小的排序顺序,对至少一组SQL语句进行排序,得到排序结果,并显示排序结果。Step 223: Sort at least one group of SQL statements in a descending order of the resource consumption cost of each group of SQL statements, obtain a sorting result, and display the sorting result.
需说明的是,步骤216~223可以在步骤210之前执行,其中,步骤223与步骤210可以同时执行,步骤223也可以在步骤210之后执行。It should be noted that steps 216 to 223 may be performed before step 210 , wherein step 223 and step 210 may be performed simultaneously, and step 223 may also be performed after step 210 .
基于前述实施例,在本申请其他实施例中,步骤221可以由步骤f11~f13来实现:Based on the foregoing embodiments, in other embodiments of the present application, step 221 may be implemented by steps f11 to f13:
步骤f11、若目标备用数据库运行每一目标SQL语句的时长大于或等于预设时长,确定每一目标SQL语句的资源消耗子成本为第一数值。Step f11: If the target standby database runs each target SQL statement for a duration greater than or equal to a preset duration, determine the resource consumption sub-cost of each target SQL statement as a first value.
在本申请实施例中,预设时长为一个经验值,通常是根据大量实验或实际经验得到的,在一些应用场景下,用户也可以自己自行进行设置。第一数值为一个经验值,通常用于表示最高资源消耗成本。In the embodiment of the present application, the preset duration is an empirical value, which is usually obtained based on a large number of experiments or actual experience. In some application scenarios, the user can also set it by himself. The first value is an empirical value, usually used to represent the highest resource consumption cost.
步骤f12、若目标备用数据库运行每一目标SQL语句的时长小于预设时长,获取目标备用数据库运行每一目标SQL语句时对应的资源消耗表。Step f12: If the duration of running each target SQL statement on the target standby database is less than the preset duration, obtain a resource consumption table corresponding to when the target standby database runs each target SQL statement.
在本申请实施例中,目标备用数据库运行每一目标SQL语句时对应的资源消耗表获取方式可以通过如下语句来实现:In the embodiment of the present application, when the target standby database runs each target SQL statement, the corresponding resource consumption table acquisition method can be realized by the following statement:
Show profiles;//显示执行sql的编号Query_IDShow profiles;//Display the number Query_ID of executing sql
show profile cpu,block io for query Query_ID;//显示编号Query_ID的资源消耗表show profile cpu,block io for query Query_ID;//Display the resource consumption table of number Query_ID
这样,通过上述语句得到每一目标SQL语句的资源消耗表示例性的可以如表6所示。在表6中,status表示状态,Duration表示状态持续时间,单位通常为秒,CPU_user表示用户态所消耗的CPU资源,CPU_system表示核心态所消耗的CPU资源,Block_ops_in表示输入块操作的数量,Block_ops_out表示输出块操作的数量。In this way, the resource consumption table of each target SQL statement obtained by the above statement can be shown in Table 6 as an example. In Table 6, status represents the state, Duration represents the state duration, usually in seconds, CPU_user represents the CPU resources consumed by the user state, CPU_system represents the CPU resources consumed by the core state, Block_ops_in represents the number of input block operations, and Block_ops_out represents the number of input block operations. The number of output block operations.
步骤f13、从每一目标SQL语句对应的资源消耗表中获取第一消耗资源和第二消耗资源,确定第一消耗资源和第二消耗资源的和值,得到每一目标SQL语句对应的资源消耗子成本。Step f13, obtain the first consumption resource and the second consumption resource from the resource consumption table corresponding to each target SQL statement, determine the sum value of the first consumption resource and the second consumption resource, and obtain the resource consumption corresponding to each target SQL statement sub cost.
其中,第一消耗资源为用户态所消耗的CPU资源,第二消耗资源为核心态所消耗的CPU资源,和/或第一消耗资源为输入块操作的数量,第二消耗资源为输出块操作的数量。Wherein, the first consumption resource is the CPU resource consumed by the user mode, the second consumption resource is the CPU resource consumed by the core mode, and/or the first consumption resource is the number of input block operations, and the second consumption resource is the output block operation. quantity.
表6Table 6
Figure PCTCN2021129296-appb-000011
Figure PCTCN2021129296-appb-000011
在本申请实施例中,示例性的,在有4条目标SQL语句时,可以得到4张分别与每一条目标SQL语句对应的如表6所示的资源消耗表。在目标监控指标为CPU时,针对每一目标SQL语句,对如表6所示的资源消耗表中的表示用户态所消耗的CPU资源CPU_user列内容,和表示核心态所消耗的CPU资源CPU_system列内容进行累加,得到每一目标SQL语句对应的CPU资源消耗子成本。In the embodiment of the present application, exemplarily, when there are 4 target SQL statements, 4 resource consumption tables as shown in Table 6 corresponding to each target SQL statement can be obtained. When the target monitoring indicator is CPU, for each target SQL statement, compare the contents of the CPU_user column representing the CPU resources consumed by the user state and the CPU_system column representing the CPU resources consumed by the core state in the resource consumption table shown in Table 6. The contents are accumulated to obtain the sub-cost of CPU resource consumption corresponding to each target SQL statement.
在目标监控指标为IO时,针对每一目标SQL语句,对如表6所示的资源消耗表中的表示输入块操作的数量的Block_ops_in列内容,和表示输出块操作的数量的Block_ops_out列内容进行累加,得到每一目标SQL语句对应的IO资源消耗子成本。When the target monitoring indicator is IO, for each target SQL statement, the content of the Block_ops_in column representing the number of input block operations and the content of the Block_ops_out column representing the number of output block operations in the resource consumption table shown in Table 6 are processed. Accumulate to obtain the sub-cost of IO resource consumption corresponding to each target SQL statement.
在本申请实施例中,在不同的告警情况下,对CPU消耗成本和/或IO成本进行分析,并在告警时,输出分析结果,实现根因推荐,有效降低了分析人员的工作难度,提高了分析人员进行告警故障确定的效率,提高了分析人员的使用效率。In the embodiment of the present application, under different alarm conditions, the CPU consumption cost and/or the IO cost is analyzed, and when the alarm is issued, the analysis result is output to implement the root cause recommendation, which effectively reduces the work difficulty of the analyst and improves the This improves the efficiency of the analyst in determining alarm faults and improves the utilization efficiency of the analyst.
需说明的是,在一些应用场景中,针对目标监控指标为CPU时,除了计算每一目标SQL语句对应的CPU资源消耗子成本,还可以计算每一目标SQL语句对应的IO资源消耗子成本。同理,在针对目标监控指标为IO时,除了计算每一目标SQL语句对应的IO资源消耗子成本,还可以计算每一目标SQL语句对应的CPU资源消耗子成本。It should be noted that, in some application scenarios, when the target monitoring indicator is CPU, in addition to calculating the sub-cost of CPU resource consumption corresponding to each target SQL statement, the sub-cost of IO resource consumption corresponding to each target SQL statement can also be calculated. Similarly, when the target monitoring indicator is IO, in addition to calculating the IO resource consumption sub-cost corresponding to each target SQL statement, the CPU resource consumption sub-cost corresponding to each target SQL statement can also be calculated.
需要说明的是,本实施例中与其它实施例中相同步骤和相同内容的说明,可以参照其它实施例中的描述,此处不再赘述。It should be noted that, for the description of the same steps and the same content in this embodiment as in other embodiments, reference may be made to the descriptions in other embodiments, and details are not repeated here.
本申请实施例中,信息告警设备按照第一采样间隔采集目标监控对象的目标监控指标,得到当前时刻的第一监控值后,获取历史时刻采集目标监控指标得到的第 一历史监控值,并基于第一监控值和第一历史监控值,确定目标阈值,然后基于第一监控值和目标阈值,确定告警提示信息,最后显示告警提示信息。这样,信息告警设备根据实时根据当前时刻的第一监控值和历史时刻的第一历史监控值动态确定目标阈值,来根据目标阈值和第一监控值之间的关系,生成告警提示信息,解决了目前的采用固定阈值来实现数据库监控导致不能准确实现数据库在各个时间段的告警需求的问题,实现了根据实际情况自适应调整阈值来实现数据库监控的方案,有效提高了针对数据库告警的准确率。并且,实现不同等级的告警提示,能够告知用户数据库当前的具体风险等级,以使用户根据风险等级决定是都需要立即进行相应处理。并根据不同风险等级,对造成风险的可能原因进行显示,方便用户进行分析处理,有效提高了人机交互过程,并提高了用户的使用体验效果。In the embodiment of the present application, the information alarm device collects the target monitoring index of the target monitoring object according to the first sampling interval, and after obtaining the first monitoring value at the current moment, obtains the first historical monitoring value obtained by collecting the target monitoring index at the historical moment, and based on A target threshold is determined for the first monitoring value and the first historical monitoring value, then based on the first monitoring value and the target threshold, alarm prompt information is determined, and finally the alarm prompt information is displayed. In this way, the information alarm device dynamically determines the target threshold value according to the first monitoring value at the current moment and the first historical monitoring value at the historical moment in real time, and generates alarm prompt information according to the relationship between the target threshold value and the first monitoring value, which solves the problem. The current use of fixed thresholds to achieve database monitoring leads to the problem that the alarm requirements of the database in various time periods cannot be accurately realized. The solution of adaptively adjusting the thresholds to realize database monitoring according to the actual situation is realized, which effectively improves the accuracy of database alarms. In addition, different levels of alarm prompts are implemented, which can inform the user of the current specific risk level of the database, so that the user needs to immediately perform corresponding processing according to the risk level. And according to different risk levels, the possible causes of risks are displayed, which is convenient for users to analyze and process, which effectively improves the human-computer interaction process and improves the user experience.
基于前述实施例,本申请的实施例提供一种信息告警设备,参照图7所示,该信息告警设备5可以包括:处理器51、存储器52和通信总线53,其中:Based on the foregoing embodiments, the embodiments of the present application provide an information alarm device. Referring to FIG. 7 , the information alarm device 5 may include: a processor 51, a memory 52, and a communication bus 53, wherein:
存储器52,用于存储可执行指令; memory 52 for storing executable instructions;
通信总线53,用于实现处理器51和存储器52之间的通信连接;A communication bus 53, used to realize the communication connection between the processor 51 and the memory 52;
处理器51,用于执行存储器52中存储的信息告警程序,以实现以下步骤:The processor 51 is configured to execute the information alarm program stored in the memory 52 to realize the following steps:
按照第一采样间隔采集目标监控对象的目标监控指标,得到当前时刻的第一监控值;Collect the target monitoring index of the target monitoring object according to the first sampling interval, and obtain the first monitoring value at the current moment;
获取历史时刻采集目标监控指标得到的第一历史监控值;Obtain the first historical monitoring value obtained by collecting target monitoring indicators at historical moments;
基于第一监控值和第一历史监控值,确定目标阈值;其中,目标阈值包括至少一个不同的阈值;Determine the target threshold based on the first monitoring value and the first historical monitoring value; wherein the target threshold includes at least one different threshold;
基于第一监控值和目标阈值,确定告警提示信息;其中,告警提示信息用于针对目标监控对象的目标监控指标进行告警提示;Determine the alarm prompt information based on the first monitoring value and the target threshold; wherein, the alarm prompt information is used to provide an alarm prompt for the target monitoring index of the target monitoring object;
显示告警提示信息。Displays warning information.
在本申请实施例中,处理器执行步骤基于第一监控值和第一历史监控值,确定目标阈值时,可以通过以下步骤来实现:In this embodiment of the present application, when the processor executes the step based on the first monitoring value and the first historical monitoring value, when determining the target threshold, the following steps may be used:
获取目标监控对象的参考监控指标的当前时刻的第二监控值和参考监控指标的历史时刻采集到的第二历史监控值;其中,参考监控指标与目标监控指标具有关联关系;Obtain the second monitoring value at the current moment of the reference monitoring indicator of the target monitoring object and the second historical monitoring value collected at the historical moment of the reference monitoring indicator; wherein, the reference monitoring indicator and the target monitoring indicator have an associated relationship;
基于第一监控值、第一历史监控值、第二监控值和第二历史监控值,确定目标权重系数;determining the target weight coefficient based on the first monitoring value, the first historical monitoring value, the second monitoring value and the second historical monitoring value;
基于第一历史监控值,确定第一参考值;determining a first reference value based on the first historical monitoring value;
从第一历史监控值中,获取与当前时刻相邻的前一时刻的第一历史子监控值;From the first historical monitoring value, obtain the first historical sub-monitoring value of the previous moment adjacent to the current moment;
确定第一监控值与第一历史子监控值的差值,得到第二参考值;determining the difference between the first monitoring value and the first historical sub-monitoring value to obtain a second reference value;
基于目标权重系数、第二参考值、第一监控值、至少一个预设权重系数和第一参考值,得到目标阈值。The target threshold is obtained based on the target weight coefficient, the second reference value, the first monitoring value, the at least one preset weight coefficient and the first reference value.
在本申请实施例中,处理器执行步骤基于第一监控值、第一历史监控值、第二监控值和第二历史监控值,确定目标权重系数时,可以通过以下步骤来实现:In the embodiment of the present application, when the processor executes the step to determine the target weight coefficient based on the first monitoring value, the first historical monitoring value, the second monitoring value and the second historical monitoring value, the following steps may be used to achieve:
通过确定第二时刻的监控值与第一时刻的监控值的差值的方式对第一监控值和第一历史监控值进行处理,得到目标监控指标不同时刻的第二差值;其中,第一时刻与第二时刻是两个相邻时刻,第一时刻距离当前时刻比第二时刻距离当前时刻远;By determining the difference between the monitoring value at the second moment and the monitoring value at the first moment, the first monitoring value and the first historical monitoring value are processed to obtain the second difference value of the target monitoring index at different times; The moment and the second moment are two adjacent moments, and the first moment is farther from the current moment than the second moment is from the current moment;
通过确定第二时刻的监控值与第一时刻的监控值的差值的方式对第二监控值和第二历史监控值进行处理,得到参考监控指标不同时刻的第三差值;By determining the difference between the monitoring value at the second moment and the monitoring value at the first moment, the second monitoring value and the second historical monitoring value are processed to obtain the third difference value of the reference monitoring index at different times;
确定同一时刻的第三差值与第二差值比值,得到参考比值;Determine the ratio of the third difference and the second difference at the same moment to obtain a reference ratio;
基于参考比值,确定目标权重系数。Based on the reference ratio, the target weight coefficient is determined.
在本申请实施例中,处理器执行步骤基于参考比值,确定目标权重系数时,可以通过以下步骤来实现:In the embodiment of the present application, when the processor executes the step based on the reference ratio, when determining the target weight coefficient, the following steps may be used:
从参考比值中获取比值大于零,且距离当前时刻最近的第二预设数量个目标比值;Obtain from the reference ratio a second preset number of target ratios whose ratios are greater than zero and are closest to the current moment;
确定第二预设数量个目标比值的平均值,得到目标权重系数。The average value of the second preset number of target ratios is determined to obtain the target weight coefficient.
在本申请实施例中,处理器执行步骤基于第一历史监控值,确定第一参考值时,可以通过以下步骤来实现:In the embodiment of the present application, when the processor executes the step based on the first historical monitoring value, when determining the first reference value, the following steps may be used:
从第一历史监控值中,获取当前时刻之前的第一预设数量个时间周期内,与当前时刻相同时刻的第一预设数量个第二历史子监控值;From the first historical monitoring value, obtain the first predetermined number of second historical sub-monitoring values at the same moment as the current moment within the first predetermined number of time periods before the current moment;
确定第一预设数量个第二历史子监控值的标准差,得到第一参考值。The standard deviation of the first preset number of second historical sub-monitoring values is determined to obtain the first reference value.
在本申请实施例中,处理器执行步骤基于目标权重系数、第二参考值、第一监控值、至少一个预设权重系数和第一参考值,得到目标阈值时,可以通过以下步骤来实现:In this embodiment of the present application, when the processor executes the steps to obtain the target threshold based on the target weight coefficient, the second reference value, the first monitoring value, at least one preset weight coefficient, and the first reference value, the following steps may be performed:
确定目标权重系数与第二参考值的第一乘积;determining the first product of the target weight coefficient and the second reference value;
确定至少一个预设权重系数与第一参考值的乘积,得到至少一个第二乘积;determining the product of at least one preset weight coefficient and the first reference value to obtain at least one second product;
确定第一乘积、至少一个第二乘积中每一第二乘积和第一监控值的累加值,得到目标阈值。The accumulated value of each of the first product, each of the at least one second product, and the first monitoring value is determined to obtain the target threshold value.
在本申请实施例中,处理器执行步骤基于第一监控值和目标阈值,确定告警提示信息时,可以通过以下步骤来实现:In this embodiment of the present application, when the processor performs the step to determine the alarm prompt information based on the first monitoring value and the target threshold, the following steps may be used:
获取目标监控指标的目标上限值;Obtain the target upper limit value of the target monitoring indicator;
若第一监控值小于目标上限值,且第一监控值大于或等于第一阈值,按照第二采样间隔连续采集目标监控指标的第三预设数量个第三监控值;其中,第二采样间隔小于第一采样间隔,目标阈值包括第一阈值;If the first monitoring value is less than the target upper limit value, and the first monitoring value is greater than or equal to the first threshold value, a third preset number of third monitoring values of the target monitoring index are continuously collected according to the second sampling interval; The interval is less than the first sampling interval, and the target threshold includes the first threshold;
若第三预设数量个第三监控值均小于目标上限值,且至少一个第三监控值大于或等于第二阈值,生成第一告警信息;其中,目标阈值包括第二阈值,第二阈值大于第一阈值,第一告警信息用于实现针对目标监控对象的重大告警。If the third preset number of third monitoring values are all smaller than the target upper limit value, and at least one third monitoring value is greater than or equal to the second threshold value, first alarm information is generated; wherein, the target threshold value includes the second threshold value, the second threshold value When the value is greater than the first threshold, the first alarm information is used to implement a major alarm for the target monitoring object.
在本申请实施例中,处理器还用于执行以下步骤:In this embodiment of the present application, the processor is further configured to perform the following steps:
若第一监控值大于或等于目标上限值,生成第二告警信息;其中,第二告警信息用于实现针对目标监控对象的严重告警;If the first monitoring value is greater than or equal to the target upper limit value, generate second alarm information; wherein the second alarm information is used to implement a serious alarm for the target monitoring object;
若第三预设数量个第三监控值中的至少一个第三监控值大于或等于目标上限值,生成第二告警信息;If at least one of the third preset number of third monitoring values is greater than or equal to the target upper limit value, generating second alarm information;
若第三预设数量个第三监控值均小于第二阈值,生成第三告警信息;其中,第三告警信息用于实现针对目标监控对象的次要告警。If the third preset number of third monitoring values are all smaller than the second threshold, third alarm information is generated, wherein the third alarm information is used to implement a secondary alarm for the target monitoring object.
在本申请实施例中,处理器执行步骤基于第一监控值和目标阈值,确定告警提示信息之后,还用于执行以下步骤:In the embodiment of the present application, after determining the alarm prompt information based on the first monitoring value and the target threshold, the processor performing step is further configured to perform the following steps:
若告警提示信息是第一告警信息或第二告警信息,获取目标监控对象在当前时刻执行的至少一条结构化查询语言SQL语句;If the alarm prompt information is the first alarm information or the second alarm information, obtain at least one structured query language SQL statement executed by the target monitoring object at the current moment;
获取至少一条SQL语句对应的执行计划;Get the execution plan corresponding to at least one SQL statement;
基于至少一条SQL语句对应的执行计划,确定每一SQL语句对应目标监控指标的消耗成本,得到至少一条SQL语句对应的消耗成本;Based on the execution plan corresponding to at least one SQL statement, determine the consumption cost of each SQL statement corresponding to the target monitoring indicator, and obtain the consumption cost corresponding to at least one SQL statement;
基于至少一条SQL语句对应的消耗成本,按照消耗成本从高到低的排序顺序对至少一条SQL语句进行排序,得到SQL语句排序结果;Based on the consumption cost corresponding to the at least one SQL statement, sort the at least one SQL statement according to the sorting order of the consumption cost from high to low, and obtain the SQL statement sorting result;
显示SQL语句排序结果。Displays the sorted results of the SQL statement.
在本申请实施例中,每一SQL语句对应目标监控指标的消耗成本包括输入输出IO消耗成本,处理器执行步骤基于至少一条SQL语句对应的执行计划,确定每一SQL语句对应目标监控指标的消耗成本,得到至少一条SQL语句对应的消耗成本时,可以通过以下步骤来实现:In the embodiment of the present application, the consumption cost of each SQL statement corresponding to the target monitoring indicator includes the input and output IO consumption cost, and the processor executing step determines the consumption of each SQL statement corresponding to the target monitoring indicator based on the execution plan corresponding to at least one SQL statement When the consumption cost corresponding to at least one SQL statement is obtained, the following steps can be used to achieve:
基于至少一条SQL语句对应的执行计划,确定每一SQL语句对应的执行计划中包括的每一身份标识ID的行数;Determine, based on the execution plan corresponding to at least one SQL statement, the number of rows of each identity ID included in the execution plan corresponding to each SQL statement;
确定行数为1的ID为第一ID,获取第一ID对应的第一行数和平均长度;其中,第一行数记录于每一SQL语句对应的执行计划中;Determine that the ID whose number of rows is 1 is the first ID, and obtain the first row number and average length corresponding to the first ID; wherein, the first row number is recorded in the execution plan corresponding to each SQL statement;
对第一行数与平均长度的乘积与预设InnoDB数据页大小进行向上取整处理,得到第一ID的IO消耗成本;Round up the product of the first row number and the average length and the preset InnoDB data page size to obtain the IO consumption cost of the first ID;
确定行数大于1的ID为第二ID,确定第二ID的每一行的返回行数和扫描行数;Determine the ID with the number of rows greater than 1 as the second ID, and determine the number of returned rows and scan rows of each row of the second ID;
获取对应的第二ID对应的平均长度;Obtain the average length corresponding to the corresponding second ID;
确定第二ID的每一行的扫描行数与对应的相邻的前一行的返回行数的差值,得到第一差值;Determine the difference between the scan line number of each row of the second ID and the return line number of the corresponding adjacent previous row to obtain the first difference value;
确定第一差值与平均长度乘积与预设InnoDB数据页大小进行向上取整处理,得到第二ID的每一行对应的IO消耗子成本;Determine that the product of the first difference and the average length and the preset InnoDB data page size are rounded up to obtain the IO consumption sub-cost corresponding to each row of the second ID;
确定第二ID的每一行对应的IO消耗子成本的累加值,得到第二ID的IO消耗成本;Determine the cumulative value of the IO consumption sub-cost corresponding to each row of the second ID, and obtain the IO consumption cost of the second ID;
确定每一SQL语句对应的执行计划中包括的每一ID的IO消耗成本的累加值,得到至少一条SQL语句对应的消耗成本;其中,每一ID的IO消耗成本的累加值包 括第一ID的IO消耗成本和/或第二ID的IO消耗成本。Determine the cumulative value of the IO consumption cost of each ID included in the execution plan corresponding to each SQL statement, and obtain the consumption cost corresponding to at least one SQL statement; wherein, the cumulative value of the IO consumption cost of each ID includes the first ID. The IO consumption cost and/or the IO consumption cost of the second ID.
在本申请其他实施例中,每一SQL语句对应目标监控指标的消耗成本包括中央处理器CPU消耗成本,处理器执行步骤基于至少一条SQL语句对应的执行计划,确定每一SQL语句对应目标监控指标的消耗成本,得到至少一条SQL语句对应的消耗成本时,可以通过以下步骤来实现:In other embodiments of the present application, the consumption cost of the target monitoring indicator corresponding to each SQL statement includes the CPU consumption cost of the central processing unit, and the processor executing step determines the target monitoring indicator corresponding to each SQL statement based on the execution plan corresponding to at least one SQL statement When the consumption cost corresponding to at least one SQL statement is obtained, the following steps can be used to achieve:
基于至少一条SQL语句对应的执行计划,确定每一SQL语句对应的执行计划中包括的每一身份标识ID的行数;Determine, based on the execution plan corresponding to at least one SQL statement, the number of rows of each identity ID included in the execution plan corresponding to each SQL statement;
确定行数为1的ID为第一ID;It is determined that the ID with the row number of 1 is the first ID;
基于第一ID的执行计划,确定第一ID对应的CPU消耗成本;Determine the CPU consumption cost corresponding to the first ID based on the execution plan of the first ID;
确定行数大于1的ID为第二ID;It is determined that the ID with the number of rows greater than 1 is the second ID;
基于第二ID的每一行的执行计划,确定第二ID的每一行对应的CPU消耗成本;Determine the CPU consumption cost corresponding to each row of the second ID based on the execution plan of each row of the second ID;
将第二ID的每一行对应的CPU消耗成本进行累加,得到第二ID对应的CPU消耗成本。The CPU consumption cost corresponding to each row of the second ID is accumulated to obtain the CPU consumption cost corresponding to the second ID.
在本申请其他实施例中,处理器用于执行步骤基于第一ID的执行计划,确定第一ID对应的CPU消耗成本时,可以用于执行以下步骤:In other embodiments of the present application, when the processor is configured to execute the execution plan based on the first ID and determine the CPU consumption cost corresponding to the first ID, the processor may be configured to execute the following steps:
基于第一ID的执行计划,确定第一ID对应的第一行数和返回行数;Based on the execution plan of the first ID, determine the first number of rows and the number of returned rows corresponding to the first ID;
从第一ID的执行计划中,获取第一ID对应的访问类型Type和额外字段信息Extra;From the execution plan of the first ID, obtain the access type Type and extra field information Extra corresponding to the first ID;
基于第一ID对应的Type、Extra和第一行数,得到第一ID对应的第一CPU消耗子成本;Based on the Type, Extra and the first row number corresponding to the first ID, obtain the first CPU consumption sub-cost corresponding to the first ID;
确定返回行数的对数和返回行数的乘积,得到第一ID对应的第二CPU消耗子成本;Determine the product of the logarithm of the number of returned rows and the number of returned rows, and obtain the second CPU consumption sub-cost corresponding to the first ID;
确定第一ID对应的第一CPU消耗子成本和第二CPU消耗子成本的和值,得到第一ID对应的CPU消耗成本。The sum of the first CPU consumption sub-cost and the second CPU consumption sub-cost corresponding to the first ID is determined, and the CPU consumption cost corresponding to the first ID is obtained.
在本申请其他实施例中,处理器用于执行步骤基于第二ID的每一行的执行计划,确定第二ID的每一行对应的CPU消耗成本时,可以通过以下步骤来实现:In other embodiments of the present application, when the processor is used to execute the step based on the execution plan of each row of the second ID, when determining the CPU consumption cost corresponding to each row of the second ID, the following steps can be used to achieve:
基于第二ID的每一行的执行计划,确定第二ID的每一行的第二行数和返回行数;Based on the execution plan of each row of the second ID, determine the second row number and the returned row number of each row of the second ID;
从第二ID的每一行对应的执行计划中,获取第二ID的每一行对应的访问类型Type和额外字段信息Extra,和第二ID的每一行对应的表行数;From the execution plan corresponding to each row of the second ID, obtain the access type Type and the extra field information Extra corresponding to each row of the second ID, and the number of table rows corresponding to each row of the second ID;
基于第二ID的每一行对应的Type、Extra和第二行数,得到第二ID的每一行对应的第三CPU消耗子成本;Based on the Type, Extra and the number of the second rows corresponding to each row of the second ID, the third CPU consumption sub-cost corresponding to each row of the second ID is obtained;
确定第二ID的每一行的返回行数的对数与对应的返回行数的乘积,得到第二ID的每一行对应的第四CPU消耗子成本;Determine the product of the logarithm of the number of returned rows of each row of the second ID and the corresponding number of returned rows, and obtain the fourth CPU consumption sub-cost corresponding to each row of the second ID;
确定第二ID的每一行对应的表行数和第二ID的每一行对应的索引高度的乘积,得到第二ID的每一行对应的第五CPU消耗子成本;Determine the product of the number of table rows corresponding to each row of the second ID and the index height corresponding to each row of the second ID, and obtain the fifth CPU consumption sub-cost corresponding to each row of the second ID;
确定第二ID的每一行对应的第三CPU消耗子成本、第四CPU消耗子成本和第五CPU消耗子成本的累加值,得到第二ID的每一行的CPU消耗成本。Determine the cumulative value of the third CPU consumption sub-cost, the fourth CPU consumption sub-cost and the fifth CPU consumption sub-cost corresponding to each row of the second ID, and obtain the CPU consumption cost of each row of the second ID.
在本申请其他实施例中,处理器执行步骤基于第一监控值和目标阈值,确定告警提示信息之后,还用于执行以下步骤:In other embodiments of the present application, after determining the alarm prompt information based on the first monitoring value and the target threshold, the processor performing step is further configured to perform the following steps:
若告警提示信息是第一告警信息或第二告警信息,获取目标监控对象在当前时刻执行的至少一条SQL语句;If the alarm prompt information is the first alarm information or the second alarm information, obtain at least one SQL statement executed by the target monitoring object at the current moment;
对至少一条SQL语句按照语句相同的方式进行分组,得到至少一组SQL语句;Group at least one SQL statement in the same manner as the statement to obtain at least one set of SQL statements;
基于至少一组SQL语句,统计每一组SQL语句包括的SQL语句数量;Based on at least one set of SQL statements, count the number of SQL statements included in each set of SQL statements;
从至少一组SQL语句的每一组SQL语句中获取一条SQL语句,得到至少一条目标SQL语句;Obtain one SQL statement from each group of SQL statements of at least one group of SQL statements to obtain at least one target SQL statement;
确定目标备用数据库,通过目标备用数据库分别运行每一目标SQL语句;Determine the target standby database, and run each target SQL statement separately through the target standby database;
确定目标备用数据库运行每一目标SQL语句时消耗的资源消耗子成本,得到每一目标SQL语句的资源消耗子成本;Determine the resource consumption sub-cost consumed when the target standby database runs each target SQL statement, and obtain the resource consumption sub-cost of each target SQL statement;
确定每一目标SQL语句的资源消耗子成本与每一目标SQL语句所在组包括的SQL语句数量的乘积,得到每一组SQL语句的资源消耗成本;Determine the product of the resource consumption sub-cost of each target SQL statement and the number of SQL statements included in the group where each target SQL statement is located, and obtain the resource consumption cost of each group of SQL statements;
按照每一组SQL语句的资源消耗成本从大到小的排序顺序,对至少一组SQL语句进行排序,得到排序结果,并显示排序结果。Sort at least one set of SQL statements according to the sorting order of the resource consumption cost of each set of SQL statements, obtain the sorting result, and display the sorting result.
在本申请实施例中,处理器执行步骤确定目标备用数据库运行每一目标SQL语句时消耗的资源消耗子成本,得到每一目标SQL语句的资源消耗子成本时,可以通过以下步骤来实现:In the embodiment of the present application, when the processor executes the step to determine the resource consumption sub-cost consumed when the target standby database runs each target SQL statement, and obtains the resource consumption sub-cost of each target SQL statement, the following steps can be used to achieve:
若目标备用数据库运行每一目标SQL语句的时长大于或等于预设时长,确定每一目标SQL语句的资源消耗子成本为第一数值;If the target standby database runs each target SQL statement for a duration greater than or equal to the preset duration, determine the resource consumption sub-cost of each target SQL statement as the first value;
若目标备用数据库运行每一目标SQL语句的时长小于预设时长,获取目标备用数据库运行每一目标SQL语句时对应的资源消耗表;If the duration of running each target SQL statement on the target standby database is less than the preset duration, obtain the resource consumption table corresponding to the target standby database running each target SQL statement;
从每一目标SQL语句对应的资源消耗表中获取第一消耗资源和第二消耗资源,确定第一消耗资源和第二消耗资源的和值,得到每一目标SQL语句对应的资源消耗子成本;其中,第一消耗资源为用户态所消耗的CPU资源,第二消耗资源为核心态所消耗的CPU资源,和/或第一消耗资源为输入块操作的数量,第二消耗资源为输出块操作的数量。Obtain the first consumption resource and the second consumption resource from the resource consumption table corresponding to each target SQL statement, determine the sum of the first consumption resource and the second consumption resource, and obtain the resource consumption sub-cost corresponding to each target SQL statement; Wherein, the first consumption resource is the CPU resource consumed by the user mode, the second consumption resource is the CPU resource consumed by the core mode, and/or the first consumption resource is the number of input block operations, and the second consumption resource is the output block operation. quantity.
需要说明的是,本申请实施例中一个或者多个程序可被一个或者多个处理器的步骤的解释说明,可以参照图1~2和图5~6对应的实施例提供的方法实现过程,此处不再赘述。It should be noted that, one or more programs in this embodiment of the present application may be explained by the steps of one or more processors, and reference may be made to the method implementation process provided by the embodiments corresponding to FIGS. 1 to 2 and FIGS. 5 to 6 . It will not be repeated here.
本申请实施例中,信息告警设备按照第一采样间隔采集目标监控对象的目标监控指标,得到当前时刻的第一监控值后,获取历史时刻采集目标监控指标得到的第一历史监控值,并基于第一监控值和第一历史监控值,确定目标阈值,然后基于第一监控值和目标阈值,确定告警提示信息,最后显示告警提示信息。这样,信息告 警设备根据实时根据当前时刻的第一监控值和历史时刻的第一历史监控值动态确定目标阈值,来根据目标阈值和第一监控值之间的关系,生成告警提示信息,解决了目前的采用固定阈值来实现数据库监控导致不能准确实现数据库在各个时间段的告警需求的问题,实现了根据实际情况自适应调整阈值来实现数据库监控的方案,有效提高了针对数据库告警的准确率。并且,实现不同等级的告警提示,能够告知用户数据库当前的具体风险等级,以使用户根据风险等级决定是都需要立即进行相应处理。并根据不同风险等级,对造成风险的可能原因进行显示,方便用户进行分析处理,有效提高了人机交互过程,并提高了用户的使用体验效果。In the embodiment of the present application, the information alarm device collects the target monitoring index of the target monitoring object according to the first sampling interval, and after obtaining the first monitoring value at the current moment, obtains the first historical monitoring value obtained by collecting the target monitoring index at the historical moment, and based on A target threshold is determined for the first monitoring value and the first historical monitoring value, then based on the first monitoring value and the target threshold, alarm prompt information is determined, and finally the alarm prompt information is displayed. In this way, the information alarm device dynamically determines the target threshold value according to the first monitoring value at the current moment and the first historical monitoring value at the historical moment in real time, and generates alarm prompt information according to the relationship between the target threshold value and the first monitoring value, which solves the problem. The current use of fixed thresholds to achieve database monitoring leads to the problem that the alarm requirements of the database in various time periods cannot be accurately realized. The solution of adaptively adjusting the thresholds to realize database monitoring according to the actual situation is realized, which effectively improves the accuracy of database alarms. In addition, different levels of alarm prompts are implemented, which can inform the user of the current specific risk level of the database, so that the user needs to immediately perform corresponding processing according to the risk level. And according to different risk levels, the possible causes of risks are displayed, which is convenient for users to analyze and process, which effectively improves the human-computer interaction process and improves the user experience.
基于前述实施例,本申请的实施例提供一种计算机可读存储介质,简称为存储介质,该计算机可读存储介质存储有一个或者多个程序,该一个或者多个程序可被一个或者多个处理器执行,以实现如图1~2和图5~6对应的实施例提供的信息告警方法实现过程,此处不再赘述。Based on the foregoing embodiments, the embodiments of the present application provide a computer-readable storage medium, referred to as a storage medium for short, and the computer-readable storage medium stores one or more programs, and the one or more programs can be stored by one or more programs. The processor executes to implement the implementation process of the information alarm method provided by the embodiments corresponding to FIGS. 1 to 2 and FIGS. 5 to 6 , which will not be repeated here.
以上,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。The above are merely examples of the present application, and are not intended to limit the protection scope of the present application. Any modifications, equivalent replacements and improvements made within the spirit and scope of this application are included within the protection scope of this application.
工业实用性Industrial Applicability
本申请实施例提供一种信息告警方法、设备及存储介质,该方法包括:按照第一采样间隔采集目标监控对象的目标监控指标,得到当前时刻的第一监控值;获取历史时刻采集所述目标监控指标得到的第一历史监控值;基于所述第一监控值和所述第一历史监控值,确定目标阈值;其中,所述目标阈值包括至少一个不同的阈值;基于所述第一监控值和所述目标阈值,确定告警提示信息;其中,所述告警提示信息用于针对所述目标监控对象的目标监控指标进行告警提示;显示所述告警提示信息,这样,信息告警设备根据实时根据当前时刻的第一监控值和历史时刻的第一历史监控值动态确定目标阈值,来根据目标阈值和第一监控值之间的关系,生成告警提示信息,解决了目前的采用固定阈值来实现数据库监控导致不能准确实现数据库在各个时间段的告警需求的问题,实现了根据实际情况自适应调整阈值来实现数据库监控的方案,有效提高了针对数据库告警的准确率。Embodiments of the present application provide an information alarm method, device, and storage medium. The method includes: collecting a target monitoring index of a target monitoring object according to a first sampling interval to obtain a first monitoring value at the current moment; obtaining a historical moment to collect the target a first historical monitoring value obtained from a monitoring index; a target threshold is determined based on the first monitoring value and the first historical monitoring value; wherein the target threshold includes at least one different threshold; based on the first monitoring value and the target threshold value to determine alarm prompt information; wherein, the alarm prompt information is used to provide an alarm prompt for the target monitoring index of the target monitoring object; display the alarm prompt information, so that the information alarm device can The first monitoring value at the time and the first historical monitoring value at the historical moment dynamically determine the target threshold to generate alarm prompt information according to the relationship between the target threshold and the first monitoring value, which solves the problem of using a fixed threshold to realize database monitoring. This leads to the problem that the alarm requirements of the database in each time period cannot be accurately realized, and the solution of adaptively adjusting the threshold according to the actual situation to realize the database monitoring is realized, which effectively improves the accuracy of the database alarm.

Claims (17)

  1. 一种信息告警方法,所述方法包括:An information warning method, the method comprising:
    按照第一采样间隔采集目标监控对象的目标监控指标,得到当前时刻的第一监控值;Collect the target monitoring index of the target monitoring object according to the first sampling interval, and obtain the first monitoring value at the current moment;
    获取历史时刻采集所述目标监控指标得到的第一历史监控值;Obtain the first historical monitoring value obtained by collecting the target monitoring index at historical time;
    基于所述第一监控值和所述第一历史监控值,确定目标阈值;其中,所述目标阈值包括至少一个不同的阈值;A target threshold is determined based on the first monitoring value and the first historical monitoring value; wherein the target threshold includes at least one different threshold;
    基于所述第一监控值和所述目标阈值,确定告警提示信息;其中,所述告警提示信息用于针对所述目标监控对象的目标监控指标进行告警提示;Determine alarm prompt information based on the first monitoring value and the target threshold; wherein, the alarm prompt information is used to provide an alarm prompt for the target monitoring indicator of the target monitoring object;
    显示所述告警提示信息。The alarm prompt information is displayed.
  2. 根据权利要求1所述的方法,其中,所述基于所述第一监控值和所述第一历史监控值,确定目标阈值,包括:The method of claim 1, wherein the determining a target threshold based on the first monitoring value and the first historical monitoring value comprises:
    获取所述目标监控对象的参考监控指标的当前时刻的第二监控值和所述参考监控指标的历史时刻采集到的第二历史监控值;其中,所述参考监控指标与所述目标监控指标具有关联关系;Obtain the second monitoring value at the current moment of the reference monitoring indicator of the target monitoring object and the second historical monitoring value collected at the historical moment of the reference monitoring indicator; wherein, the reference monitoring indicator and the target monitoring indicator have connection relation;
    基于所述第一监控值、所述第一历史监控值、所述第二监控值和所述第二历史监控值,确定目标权重系数;determining a target weight coefficient based on the first monitoring value, the first historical monitoring value, the second monitoring value and the second historical monitoring value;
    基于所述第一历史监控值,确定第一参考值;determining a first reference value based on the first historical monitoring value;
    从所述第一历史监控值中,获取与所述当前时刻相邻的前一时刻的第一历史子监控值;From the first historical monitoring value, obtain the first historical sub-monitoring value of the previous moment adjacent to the current moment;
    确定所述第一监控值与所述第一历史子监控值的差值,得到第二参考值;determining the difference between the first monitoring value and the first historical sub-monitoring value to obtain a second reference value;
    基于所述目标权重系数、所述第二参考值、所述第一监控值、至少一个预设权重系数和所述第一参考值,得到所述目标阈值。The target threshold value is obtained based on the target weight coefficient, the second reference value, the first monitoring value, at least one preset weight coefficient and the first reference value.
  3. 根据权利要求2所述的方法,其中,所述基于所述第一监控值、所述第一历史监控值、所述第二监控值和所述第二历史监控值,确定目标权重系数,包括:The method according to claim 2, wherein the determining the target weight coefficient based on the first monitoring value, the first historical monitoring value, the second monitoring value and the second historical monitoring value comprises: :
    通过确定第二时刻的监控值与第一时刻的监控值的差值的方式对所述第一监控值和所述第一历史监控值进行处理,得到所述目标监控指标不同时刻的第二差值;其中,所述第一时刻与所述第二时刻是两个相邻时刻,所述第一时刻距离所述当前时刻比所述第二时刻距离所述当前时刻远;The first monitoring value and the first historical monitoring value are processed by determining the difference between the monitoring value at the second moment and the monitoring value at the first moment, so as to obtain the second difference between the target monitoring index at different times value; wherein, the first moment and the second moment are two adjacent moments, and the first moment is farther from the current moment than the second moment is from the current moment;
    通过确定第二时刻的监控值与第一时刻的监控值的差值的方式对所述第二监控值和所述第二历史监控值进行处理,得到所述参考监控指标不同时刻的第三差值;The second monitoring value and the second historical monitoring value are processed by determining the difference between the monitoring value at the second moment and the monitoring value at the first moment, so as to obtain the third difference between the reference monitoring index at different moments value;
    确定同一时刻的所述第三差值与所述第二差值比值,得到参考比值;determining the ratio of the third difference and the second difference at the same moment to obtain a reference ratio;
    基于所述参考比值,确定目标权重系数。Based on the reference ratio, a target weighting coefficient is determined.
  4. 根据权利要求3所述的方法,其中,所述基于所述参考比值,确定目标权重系数,包括:The method according to claim 3, wherein the determining a target weight coefficient based on the reference ratio comprises:
    从所述参考比值中获取比值大于零,且距离所述当前时刻最近的第二预设数量 个目标比值;Obtain from the reference ratio a second preset number of target ratios whose ratios are greater than zero and are closest to the current moment;
    确定所述第二预设数量个所述目标比值的平均值,得到所述目标权重系数。The average value of the second preset number of the target ratios is determined to obtain the target weight coefficient.
  5. 根据权利要求2所述的方法,其中,所述基于所述第一历史监控值,确定第一参考值,包括:The method according to claim 2, wherein the determining the first reference value based on the first historical monitoring value comprises:
    从所述第一历史监控值中,获取所述当前时刻之前的第一预设数量个时间周期内,与所述当前时刻相同时刻的所述第一预设数量个第二历史子监控值;From the first historical monitoring value, obtain the first predetermined number of second historical sub-monitoring values at the same moment as the current moment in a first predetermined number of time periods before the current moment;
    确定所述第一预设数量个所述第二历史子监控值的标准差,得到所述第一参考值。The standard deviation of the first preset number of the second historical sub-monitoring values is determined to obtain the first reference value.
  6. 根据权利要求2所述的方法,其中,所述基于所述目标权重系数、所述第二参考值、所述第一监控值、至少一个预设权重系数和所述第一参考值,得到所述目标阈值,包括:The method according to claim 2, wherein the said target weight coefficient, the second reference value, the first monitoring value, the at least one preset weight coefficient and the first reference value are obtained based on the target thresholds, including:
    确定所述目标权重系数与所述第二参考值的第一乘积;determining a first product of the target weight coefficient and the second reference value;
    确定所述至少一个预设权重系数与所述第一参考值的乘积,得到至少一个第二乘积;determining the product of the at least one preset weight coefficient and the first reference value to obtain at least one second product;
    确定所述第一乘积、所述至少一个第二乘积中每一所述第二乘积和所述第一监控值的累加值,得到所述目标阈值。An accumulated value of the first product, each of the at least one second product, and the first monitoring value is determined to obtain the target threshold value.
  7. 根据权利要求1所述的方法,其中,所述基于所述第一监控值和所述目标阈值,确定告警提示信息,包括:The method according to claim 1, wherein the determining the alarm prompt information based on the first monitoring value and the target threshold value comprises:
    获取所述目标监控指标的目标上限值;obtaining the target upper limit value of the target monitoring indicator;
    若所述第一监控值小于所述目标上限值,且所述第一监控值大于或等于第一阈值,按照第二采样间隔连续采集所述目标监控指标的第三预设数量个第三监控值;其中,所述第二采样间隔小于所述第一采样间隔,所述目标阈值包括所述第一阈值;If the first monitoring value is less than the target upper limit value, and the first monitoring value is greater than or equal to the first threshold value, continuously collect a third preset number of third target monitoring indicators according to the second sampling interval a monitoring value; wherein the second sampling interval is smaller than the first sampling interval, and the target threshold includes the first threshold;
    若所述第三预设数量个第三监控值均小于所述目标上限值,且至少一个所述第三监控值大于或等于第二阈值,生成第一告警信息;其中,所述目标阈值包括所述第二阈值,所述第二阈值大于所述第一阈值,所述第一告警信息用于实现针对所述目标监控对象的重大告警。If the third preset number of third monitoring values are all smaller than the target upper limit value, and at least one of the third monitoring values is greater than or equal to the second threshold value, first alarm information is generated; wherein, the target threshold value The second threshold value is included, the second threshold value is greater than the first threshold value, and the first alarm information is used to realize a major alarm for the target monitoring object.
  8. 根据权利要求7所述的方法,其中,所述方法还包括:The method of claim 7, wherein the method further comprises:
    若所述第一监控值大于或等于所述目标上限值,生成第二告警信息;其中,所述第二告警信息用于实现针对所述目标监控对象的严重告警;If the first monitoring value is greater than or equal to the target upper limit value, generate second alarm information; wherein, the second alarm information is used to implement a serious alarm for the target monitoring object;
    若所述第三预设数量个第三监控值中的至少一个所述第三监控值大于或等于所述目标上限值,生成所述第二告警信息;If at least one of the third preset number of third monitoring values is greater than or equal to the target upper limit value, generating the second alarm information;
    若所述第三预设数量个第三监控值均小于所述第二阈值,生成第三告警信息;其中,所述第三告警信息用于实现针对所述目标监控对象的次要告警。If the third preset number of third monitoring values are all smaller than the second threshold, third alarm information is generated; wherein, the third alarm information is used to implement a secondary alarm for the target monitoring object.
  9. 根据权利要求7或8所述的方法,其中,所述基于所述第一监控值和所述目标阈值,确定告警提示信息之后,所述方法还包括:The method according to claim 7 or 8, wherein after the alarm prompt information is determined based on the first monitoring value and the target threshold, the method further comprises:
    若所述告警提示信息是所述第一告警信息或所述第二告警信息,获取所述目标 监控对象在所述当前时刻执行的至少一条结构化查询语言SQL语句;If the alarm prompt information is the first alarm information or the second alarm information, obtain at least one structured query language SQL statement executed by the target monitoring object at the current moment;
    获取所述至少一条SQL语句对应的执行计划;obtaining the execution plan corresponding to the at least one SQL statement;
    基于所述至少一条SQL语句对应的执行计划,确定每一所述SQL语句对应所述目标监控指标的消耗成本,得到所述至少一条SQL语句对应的消耗成本;Determine, based on the execution plan corresponding to the at least one SQL statement, the consumption cost of each of the SQL statements corresponding to the target monitoring indicator, and obtain the consumption cost corresponding to the at least one SQL statement;
    基于所述至少一条SQL语句对应的消耗成本,按照消耗成本从高到低的排序顺序对所述至少一条SQL语句进行排序,得到SQL语句排序结果;Based on the consumption cost corresponding to the at least one SQL statement, sort the at least one SQL statement according to the sorting order of the consumption cost from high to low to obtain a SQL statement sorting result;
    显示所述SQL语句排序结果。Display the sorted result of the SQL statement.
  10. 根据权利要求9所述的方法,其中,每一所述SQL语句对应所述目标监控指标的消耗成本包括输入输出IO消耗成本,所述基于所述至少一条SQL语句对应的执行计划,确定每一所述SQL语句对应所述目标监控指标的消耗成本,得到所述至少一条SQL语句对应的消耗成本,包括:The method according to claim 9, wherein the consumption cost of each of the SQL statements corresponding to the target monitoring indicators includes input and output IO consumption costs, and the determination of each SQL statement is based on the execution plan corresponding to the at least one SQL statement. The SQL statement corresponds to the consumption cost of the target monitoring indicator, and the consumption cost corresponding to the at least one SQL statement is obtained, including:
    基于所述至少一条SQL语句对应的执行计划,确定每一所述SQL语句对应的执行计划中包括的每一身份标识ID的行数;Based on the execution plan corresponding to the at least one SQL statement, determine the number of rows of each identity ID included in the execution plan corresponding to each of the SQL statements;
    确定行数为1的ID为第一ID,获取所述第一ID对应的第一行数和平均长度;其中,所述第一行数记录于每一所述SQL语句对应的执行计划中;Determine that the ID whose row number is 1 is the first ID, and obtain the first row number and average length corresponding to the first ID; wherein, the first row number is recorded in the execution plan corresponding to each of the SQL statements;
    对所述第一行数与所述平均长度的乘积与预设InnoDB数据页大小进行向上取整处理,得到所述第一ID的IO消耗成本;Rounding up the product of the first row number and the average length and the preset InnoDB data page size to obtain the IO consumption cost of the first ID;
    确定行数大于1的ID为第二ID,确定所述第二ID的每一行的返回行数和扫描行数;Determine that the ID with the number of rows greater than 1 is the second ID, and determine the number of returned rows and the number of scanned rows of each row of the second ID;
    获取对应的所述第二ID对应的平均长度;Obtain the average length corresponding to the corresponding second ID;
    确定所述第二ID的每一行的扫描行数与对应的相邻的前一行的返回行数的差值,得到第一差值;Determine the difference between the scan line number of each row of the second ID and the return line number of the corresponding adjacent previous line to obtain the first difference value;
    确定所述第一差值与所述平均长度乘积与所述预设InnoDB数据页大小进行向上取整处理,得到所述第二ID的每一行对应的IO消耗子成本;Determine that the product of the first difference and the average length and the preset InnoDB data page size are rounded up to obtain the IO consumption sub-cost corresponding to each row of the second ID;
    确定所述第二ID的每一行对应的IO消耗子成本的累加值,得到所述第二ID的IO消耗成本;Determine the cumulative value of the IO consumption sub-cost corresponding to each row of the second ID, and obtain the IO consumption cost of the second ID;
    确定每一所述SQL语句对应的执行计划中包括的每一ID的IO消耗成本的累加值,得到所述至少一条SQL语句对应的消耗成本;其中,每一ID的IO消耗成本的累加值包括所述第一ID的IO消耗成本和/或所述第二ID的IO消耗成本。Determine the cumulative value of the IO consumption cost of each ID included in the execution plan corresponding to each of the SQL statements, and obtain the consumption cost corresponding to the at least one SQL statement; wherein, the cumulative value of the IO consumption cost of each ID includes The IO consumption cost of the first ID and/or the IO consumption cost of the second ID.
  11. 根据权利要求9所述的方法,其中,每一所述SQL语句对应所述目标监控指标的消耗成本包括中央处理器CPU消耗成本,所述基于所述至少一条SQL语句对应的执行计划,确定每一所述SQL语句对应所述目标监控指标的消耗成本,得到所述至少一条SQL语句对应的消耗成本,包括:The method according to claim 9, wherein the consumption cost of each of the SQL statements corresponding to the target monitoring indicators includes the CPU consumption cost of the central processing unit, and the determination of each SQL statement is based on the execution plan corresponding to the at least one SQL statement. 1. The consumption cost of the SQL statement corresponding to the target monitoring indicator, and obtaining the consumption cost corresponding to the at least one SQL statement, including:
    基于所述至少一条SQL语句对应的执行计划,确定每一所述SQL语句对应的执行计划中包括的每一身份标识ID的行数;Based on the execution plan corresponding to the at least one SQL statement, determine the number of rows of each identity ID included in the execution plan corresponding to each of the SQL statements;
    确定行数为1的ID为第一ID;It is determined that the ID with a row number of 1 is the first ID;
    基于所述第一ID的执行计划,确定所述第一ID对应的CPU消耗成本;Determine the CPU consumption cost corresponding to the first ID based on the execution plan of the first ID;
    确定行数大于1的ID为第二ID;It is determined that the ID with the number of rows greater than 1 is the second ID;
    基于所述第二ID的每一行的执行计划,确定所述第二ID的每一行对应的CPU消耗成本;Determine the CPU consumption cost corresponding to each row of the second ID based on the execution plan of each row of the second ID;
    将所述第二ID的每一行对应的CPU消耗成本进行累加,得到所述第二ID对应的CPU消耗成本。The CPU consumption cost corresponding to each row of the second ID is accumulated to obtain the CPU consumption cost corresponding to the second ID.
  12. 根据权利要求11所述的方法,其中,所述基于所述第一ID的执行计划,确定所述第一ID对应的CPU消耗成本,包括:The method according to claim 11, wherein, determining the CPU consumption cost corresponding to the first ID in the execution plan based on the first ID, comprising:
    基于所述第一ID的执行计划,确定所述第一ID对应的第一行数和返回行数;Determine the first row number and the returned row number corresponding to the first ID based on the execution plan of the first ID;
    从所述第一ID的执行计划中,获取所述第一ID对应的访问类型Type和额外字段信息Extra;From the execution plan of the first ID, obtain the access type Type and extra field information Extra corresponding to the first ID;
    基于所述第一ID对应的所述Type、所述Extra和所述第一行数,得到所述第一ID对应的第一CPU消耗子成本;Obtain the first CPU consumption sub-cost corresponding to the first ID based on the Type, the Extra, and the first row number corresponding to the first ID;
    确定所述返回行数的对数和所述返回行数的乘积,得到所述第一ID对应的第二CPU消耗子成本;Determine the product of the logarithm of the number of returned rows and the number of returned rows to obtain the second CPU consumption sub-cost corresponding to the first ID;
    确定所述第一ID对应的所述第一CPU消耗子成本和所述第二CPU消耗子成本的和值,得到所述第一ID对应的所述CPU消耗成本。The sum of the first CPU consumption sub-cost and the second CPU consumption sub-cost corresponding to the first ID is determined, and the CPU consumption cost corresponding to the first ID is obtained.
  13. 根据权利要求11所述的方法,其中,所述基于所述第二ID的每一行的执行计划,确定所述第二ID的每一行对应的CPU消耗成本,包括:The method according to claim 11, wherein, determining the CPU consumption cost corresponding to each row of the second ID based on the execution plan of each row of the second ID, comprising:
    基于所述第二ID的每一行的执行计划,确定所述第二ID的每一行的第二行数和返回行数;Based on the execution plan of each row of the second ID, determine the second row number and the returned row number of each row of the second ID;
    从所述第二ID的每一行对应的执行计划中,获取所述第二ID的每一行对应的访问类型Type和额外字段信息Extra,和所述第二ID的每一行对应的表行数;From the execution plan corresponding to each row of the second ID, obtain the access type Type and extra field information Extra corresponding to each row of the second ID, and the number of table rows corresponding to each row of the second ID;
    基于所述第二ID的每一行对应的所述Type、所述Extra和所述第二行数,得到所述第二ID的每一行对应的第三CPU消耗子成本;Based on the Type, the Extra, and the number of the second rows corresponding to each row of the second ID, obtain a third CPU consumption sub-cost corresponding to each row of the second ID;
    确定所述第二ID的每一行的所述返回行数的对数与对应的所述返回行数的乘积,得到所述第二ID的每一行对应的第四CPU消耗子成本;determining the product of the logarithm of the number of returned rows of each row of the second ID and the corresponding number of returned rows, to obtain the fourth CPU consumption sub-cost corresponding to each row of the second ID;
    确定所述第二ID的每一行对应的表行数和所述第二ID的每一行对应的索引高度的乘积,得到所述第二ID的每一行对应的第五CPU消耗子成本;Determine the product of the number of table rows corresponding to each row of the second ID and the index height corresponding to each row of the second ID, and obtain the fifth CPU consumption sub-cost corresponding to each row of the second ID;
    确定所述第二ID的每一行对应的所述第三CPU消耗子成本、所述第四CPU消耗子成本和所述第五CPU消耗子成本的累加值,得到所述第二ID的每一行的CPU消耗成本。Determine the cumulative value of the third CPU consumption sub-cost, the fourth CPU consumption sub-cost and the fifth CPU consumption sub-cost corresponding to each row of the second ID, and obtain each row of the second ID CPU consumption cost.
  14. 根据权利要求7或8所述的方法,其中,所述基于所述第一监控值和所述目标阈值,确定告警提示信息之后,所述方法还包括:The method according to claim 7 or 8, wherein after the alarm prompt information is determined based on the first monitoring value and the target threshold, the method further comprises:
    若所述告警提示信息是所述第一告警信息或所述第二告警信息,获取所述目标监控对象在所述当前时刻执行的至少一条SQL语句;If the alarm prompt information is the first alarm information or the second alarm information, obtain at least one SQL statement executed by the target monitoring object at the current moment;
    对所述至少一条SQL语句按照语句相同的方式进行分组,得到至少一组SQL语句;Grouping the at least one SQL statement in the same manner as the statement to obtain at least one group of SQL statements;
    基于所述至少一组SQL语句,统计每一组SQL语句包括的SQL语句数量;Based on the at least one group of SQL statements, count the number of SQL statements included in each group of SQL statements;
    从所述至少一组SQL语句的每一组SQL语句中获取一条SQL语句,得到至少一条目标SQL语句;Obtain one SQL statement from each group of SQL statements of the at least one group of SQL statements to obtain at least one target SQL statement;
    确定目标备用数据库,通过所述目标备用数据库分别运行每一所述目标SQL语句;Determine a target standby database, and run each of the target SQL statements respectively through the target standby database;
    确定所述目标备用数据库运行每一所述目标SQL语句时消耗的资源消耗子成本,得到每一所述目标SQL语句的资源消耗子成本;Determine the resource consumption sub-cost consumed when the target standby database runs each of the target SQL statements, and obtain the resource consumption sub-cost of each of the target SQL statements;
    确定每一所述目标SQL语句的资源消耗子成本与每一所述目标SQL语句所在组包括的SQL语句数量的乘积,得到每一组所述SQL语句的资源消耗成本;Determine the product of the resource consumption sub-cost of each of the target SQL statements and the number of SQL statements included in the group where each of the target SQL statements is located, to obtain the resource consumption cost of each group of the SQL statements;
    按照每一组所述SQL语句的资源消耗成本从大到小的排序顺序,对所述至少一组SQL语句进行排序,得到排序结果,并显示所述排序结果。The at least one set of SQL statements is sorted according to the sorting order of the resource consumption cost of each set of the SQL statements, to obtain a sorting result, and display the sorting result.
  15. 根据权利要求14所述的方法,其中,所述确定所述目标备用数据库运行每一所述目标SQL语句时消耗的资源消耗子成本,得到每一所述目标SQL语句的资源消耗子成本,包括:The method according to claim 14, wherein the determining the resource consumption sub-cost consumed when the target standby database runs each of the target SQL statements, and obtaining the resource consumption sub-cost of each of the target SQL statements, comprises: :
    若所述目标备用数据库运行每一所述目标SQL语句的时长大于或等于预设时长,确定每一所述目标SQL语句的资源消耗子成本为第一数值;If the target standby database runs each of the target SQL statements for a duration greater than or equal to a preset duration, determining that the resource consumption sub-cost of each of the target SQL statements is a first value;
    若所述目标备用数据库运行每一所述目标SQL语句的时长小于所述预设时长,获取所述目标备用数据库运行每一所述目标SQL语句时对应的资源消耗表;If the duration of the target standby database running each of the target SQL statements is less than the preset duration, obtaining a resource consumption table corresponding to the target standby database running each of the target SQL statements;
    从每一所述目标SQL语句对应的所述资源消耗表中获取第一消耗资源和第二消耗资源,确定所述第一消耗资源和所述第二消耗资源的和值,得到每一所述目标SQL语句对应的资源消耗子成本;其中,所述第一消耗资源为用户态所消耗的CPU资源,所述第二消耗资源为核心态所消耗的CPU资源,和/或所述第一消耗资源为输入块操作的数量,所述第二消耗资源为输出块操作的数量。Obtain the first consumption resource and the second consumption resource from the resource consumption table corresponding to each target SQL statement, determine the sum of the first consumption resource and the second consumption resource, and obtain each of the The resource consumption sub-cost corresponding to the target SQL statement; wherein, the first consumption resource is the CPU resource consumed by the user state, the second consumption resource is the CPU resource consumed by the core state, and/or the first consumption resource The resource is the number of input block operations, and the second consumed resource is the number of output block operations.
  16. 一种信息告警设备,所述设备包括存储器、处理器和通信总线;其中:An information alarm device, the device includes a memory, a processor and a communication bus; wherein:
    所述存储器,用于存储可执行指令;the memory for storing executable instructions;
    所述通信总线,用于实现所述处理器和所述存储器之间的通信连接;the communication bus for realizing the communication connection between the processor and the memory;
    所述处理器,用于执行所述存储器中存储的信息告警程序,实现如权利要求1至15中任一项所述的信息告警方法的步骤。The processor is configured to execute the information alarm program stored in the memory to implement the steps of the information alarm method according to any one of claims 1 to 15.
  17. 一种存储介质,所述存储介质上存储有信息告警程序,所述信息告警程序被处理器执行时实现如权利要求1至15中任一项所述的信息告警方法的步骤。A storage medium, storing an information alarm program on the storage medium, the information alarm program implementing the steps of the information alarm method according to any one of claims 1 to 15 when the information alarm program is executed by a processor.
PCT/CN2021/129296 2020-11-25 2021-11-08 Information alerting method and device, and storage medium WO2022111265A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011340177.8 2020-11-25
CN202011340177.8A CN112433919B (en) 2020-11-25 2020-11-25 Information warning method, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022111265A1 true WO2022111265A1 (en) 2022-06-02

Family

ID=74697750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129296 WO2022111265A1 (en) 2020-11-25 2021-11-08 Information alerting method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN112433919B (en)
WO (1) WO2022111265A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116016121A (en) * 2023-03-24 2023-04-25 卡奥斯工业智能研究院(青岛)有限公司 Method, device, equipment and storage medium for determining associated data of alarm data
CN116028309A (en) * 2023-02-01 2023-04-28 中煤协联合认证(北京)中心 Quantitative monitoring system and method for system operation condition
CN116070840A (en) * 2022-12-26 2023-05-05 北京国网富达科技发展有限责任公司 Transformer collaborative management method and system based on power grid digital twin model
CN117331793A (en) * 2023-11-27 2024-01-02 南京掌控网络科技有限公司 Automatic on-duty process monitoring method and system
CN117975699A (en) * 2024-03-29 2024-05-03 烟台信谊电器有限公司 Intelligent regulator cubicle control governing system based on thing networking

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112433919B (en) * 2020-11-25 2023-01-24 深圳前海微众银行股份有限公司 Information warning method, equipment and storage medium
CN115134246B (en) * 2021-03-22 2023-07-21 中国移动通信集团河南有限公司 Network performance index monitoring method, device, equipment and storage medium
CN113326132B (en) * 2021-06-04 2023-06-09 深圳前海微众银行股份有限公司 Information adjusting method, equipment and storage medium
CN113742169B (en) * 2021-08-13 2024-06-21 深圳前海微众银行股份有限公司 Service monitoring alarm method, device, equipment and storage medium
CN114915542A (en) * 2022-04-28 2022-08-16 远景智能国际私人投资有限公司 Data abnormity warning method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105450454A (en) * 2015-12-03 2016-03-30 广州华多网络科技有限公司 Service monitoring and warning method and device
CN108509314A (en) * 2018-02-09 2018-09-07 武汉楚鼎信息技术有限公司 A kind of host operating index monitoring alarm method and system device
CN108984370A (en) * 2018-07-13 2018-12-11 北京京东尚科信息技术有限公司 A kind of method and apparatus of determining monitoring threshold value
CN112433919A (en) * 2020-11-25 2021-03-02 深圳前海微众银行股份有限公司 Information warning method, equipment and storage medium

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002329611A1 (en) * 2001-07-20 2003-03-03 Altaworks Corporation System and method for adaptive threshold determination for performance metrics
US7467067B2 (en) * 2006-09-27 2008-12-16 Integrien Corporation Self-learning integrity management system and related methods
CN101989283B (en) * 2009-08-04 2014-06-11 中兴通讯股份有限公司 Monitoring method and device of performance of database
CN102129397A (en) * 2010-12-29 2011-07-20 深圳市永达电子股份有限公司 Method and system for predicating self-adaptive disk array failure
CN104536868A (en) * 2014-11-26 2015-04-22 北京广通信达科技有限公司 Dynamic threshold analysis method for operation index of IT system
CN105718715B (en) * 2015-12-23 2018-10-30 华为技术有限公司 Method for detecting abnormality and equipment
CN106407082B (en) * 2016-09-30 2019-06-14 国家电网公司 A kind of information system alarm method and device
US9846599B1 (en) * 2016-10-31 2017-12-19 International Business Machines Corporation Adaptive query cursor management
CN106844165B (en) * 2016-12-16 2020-09-29 华为技术有限公司 Alarm method and device
CN109298989A (en) * 2018-09-14 2019-02-01 北京市天元网络技术股份有限公司 Operational indicator threshold value acquisition methods and device
CN109815088B (en) * 2019-01-07 2022-04-15 珠海天燕科技有限公司 Monitoring assisting method and device
CN110096491A (en) * 2019-04-02 2019-08-06 南京信息职业技术学院 Database performance index prediction technique and system
CN110086666B (en) * 2019-04-25 2022-04-26 深圳前海微众银行股份有限公司 Alarm method, device and system
CN110489306A (en) * 2019-08-26 2019-11-22 北京博睿宏远数据科技股份有限公司 A kind of alarm threshold value determines method, apparatus, computer equipment and storage medium
CN111339074B (en) * 2020-02-24 2023-05-05 深圳市名通科技股份有限公司 Threshold generation method, device, equipment and storage medium
CN111679952B (en) * 2020-06-08 2023-09-19 中国银行股份有限公司 Alarm threshold generation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105450454A (en) * 2015-12-03 2016-03-30 广州华多网络科技有限公司 Service monitoring and warning method and device
CN108509314A (en) * 2018-02-09 2018-09-07 武汉楚鼎信息技术有限公司 A kind of host operating index monitoring alarm method and system device
CN108984370A (en) * 2018-07-13 2018-12-11 北京京东尚科信息技术有限公司 A kind of method and apparatus of determining monitoring threshold value
CN112433919A (en) * 2020-11-25 2021-03-02 深圳前海微众银行股份有限公司 Information warning method, equipment and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116070840A (en) * 2022-12-26 2023-05-05 北京国网富达科技发展有限责任公司 Transformer collaborative management method and system based on power grid digital twin model
CN116070840B (en) * 2022-12-26 2023-10-27 北京国网富达科技发展有限责任公司 Transformer collaborative management method and system based on power grid digital twin model
CN116028309A (en) * 2023-02-01 2023-04-28 中煤协联合认证(北京)中心 Quantitative monitoring system and method for system operation condition
CN116016121A (en) * 2023-03-24 2023-04-25 卡奥斯工业智能研究院(青岛)有限公司 Method, device, equipment and storage medium for determining associated data of alarm data
CN116016121B (en) * 2023-03-24 2023-07-18 卡奥斯工业智能研究院(青岛)有限公司 Method, device, equipment and storage medium for determining associated data of alarm data
CN117331793A (en) * 2023-11-27 2024-01-02 南京掌控网络科技有限公司 Automatic on-duty process monitoring method and system
CN117331793B (en) * 2023-11-27 2024-02-23 南京掌控网络科技有限公司 Automatic on-duty process monitoring method and system
CN117975699A (en) * 2024-03-29 2024-05-03 烟台信谊电器有限公司 Intelligent regulator cubicle control governing system based on thing networking
CN117975699B (en) * 2024-03-29 2024-06-04 烟台信谊电器有限公司 Intelligent regulator cubicle control governing system based on thing networking

Also Published As

Publication number Publication date
CN112433919B (en) 2023-01-24
CN112433919A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
WO2022111265A1 (en) Information alerting method and device, and storage medium
US11956137B1 (en) Analyzing servers based on data streams generated by instrumented software executing on the servers
US10977248B2 (en) Processing records in dynamic ranges
CN110399262B (en) Operation and maintenance monitoring alarm convergence method and device, computer equipment and storage medium
US11106561B2 (en) Method and device for evaluating IO performance of cache servers
CN107241440B (en) Method for determining energy-saving strategy of cluster
WO2021185182A1 (en) Anomaly detection method and apparatus
EP4343554A1 (en) System monitoring method and apparatus
CN112181704A (en) Big data task processing method and device, electronic equipment and storage medium
CN114035990A (en) Real-time anomaly detection method for time sequence data of Linux operating system
CN115328733A (en) Alarm method and device applied to business system, electronic equipment and storage medium
CN110889597A (en) Method and device for detecting abnormal business timing sequence indexes
CN116471174B (en) Log data monitoring system, method, device and storage medium
CN116186017B (en) Big data collaborative supervision method and platform
CN116668264A (en) Root cause analysis method, device, equipment and storage medium for alarm clustering
CN113177060B (en) Method, device and equipment for managing SQL (structured query language) sentences
CN114531338A (en) Monitoring alarm and tracing method and system based on call chain data
CN114706893A (en) Fault detection method, device, equipment and storage medium
CN113656452A (en) Method and device for detecting abnormal index of call chain, electronic equipment and storage medium
CN109766243B (en) Multi-core host performance monitoring method based on power function
CN115729907A (en) Method and device for classifying monitoring indexes of database instances and method and device for classifying database instances
CN112100139B (en) Automatic data quality detection system based on big data
CN111628901B (en) Index anomaly detection method and related device
CN116069595B (en) Operation and maintenance monitoring method based on log
CN111367640B (en) Data statistics period determining method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21896766

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.09.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21896766

Country of ref document: EP

Kind code of ref document: A1