WO2022068645A1 - 数据库故障发现方法、装置、电子设备及存储介质 - Google Patents

数据库故障发现方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022068645A1
WO2022068645A1 PCT/CN2021/119583 CN2021119583W WO2022068645A1 WO 2022068645 A1 WO2022068645 A1 WO 2022068645A1 CN 2021119583 W CN2021119583 W CN 2021119583W WO 2022068645 A1 WO2022068645 A1 WO 2022068645A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
target data
data
database
threshold
Prior art date
Application number
PCT/CN2021/119583
Other languages
English (en)
French (fr)
Inventor
薛文满
朱红燕
莫林林
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022068645A1 publication Critical patent/WO2022068645A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits

Definitions

  • the present application relates to the technical field of financial technology (Fintech), and in particular, to a database fault finding method, apparatus, electronic device and storage medium.
  • Another solution is to use a deep learning model to determine the prediction line, and obtain a safety interval around the prediction line according to a Gaussian distribution. Once the safety interval is exceeded, the database is considered to be faulty.
  • a deep learning model to determine the prediction line, and obtain a safety interval around the prediction line according to a Gaussian distribution. Once the safety interval is exceeded, the database is considered to be faulty.
  • thousands of databases may be required to provide services for a business. If corresponding deep learning models are generated for all databases, the model will occupy too much memory space and detect There are technical difficulties such as the inability to use on a large scale and the low utilization rate due to the low rate.
  • the present application provides a database fault finding method, device, electronic device and storage medium, which are used to solve the technical problems of lack of rationality, inability to use on a large scale and low utilization rate of existing database fault finding solutions.
  • the present application provides a database fault discovery method, including:
  • target data to be measured is determined to be abnormal data according to the target threshold, then it is determined whether other target data to be measured within the abnormal time window are the abnormal data, and the target to be measured data is used to represent the current data of the storage device. usage;
  • determining the target threshold according to the target data set and the probability distribution characteristic function corresponding to each target data in the target data set includes:
  • the target threshold is obtained by determining the average of all usage thresholds.
  • the usage threshold corresponding to each target data is determined according to the target probability threshold and the probability distribution characteristic function corresponding to each target data based on a preset inverse cumulative score function ,include:
  • each distribution result is the Beta distribution corresponding to each target data
  • the probability distribution characteristic function includes the Beta distribution function
  • each usage threshold is determined according to the target probability threshold and each distribution result.
  • the method before the determining the fluctuation coefficient of the target data set according to the first preset algorithm, the method further includes:
  • the method before determining that the target data to be measured is abnormal data according to the target threshold, the method further includes:
  • the data to be tested is screened according to the preset screening rule to obtain the corresponding target data to be tested.
  • the method before the judging whether the other target data to be measured in the abnormal time window are all the abnormal data, the method further includes:
  • a plurality of similarities are determined according to the target data set and the second preset algorithm, each similarity is used to represent the similarity between the target data corresponding to two adjacent unit durations, and the historical preset durations include multiple unit durations;
  • the abnormal time window is determined according to a third preset algorithm, a preset abnormal time window threshold, and the target similarity.
  • the determining a plurality of similarities according to the target data set and the second preset algorithm includes:
  • the second preset algorithm is used to sequentially determine the similarity between the target data subsets of every two adjacent unit durations, so as to obtain the plurality of similarities, and the target data subsets include all the target data subsets within a unit duration. target data.
  • determining the abnormal time window according to a third preset algorithm, a preset abnormal time window threshold and the target similarity includes:
  • the preset abnormal time window threshold is the abnormal time window.
  • the target data to be measured is not the abnormal data, it is determined that the target database operates normally;
  • the method further includes:
  • the alarm information is sent to the control terminal and/or the client terminal to prompt that the target database fails.
  • the present application provides a database fault finding device, including:
  • the first processing module is used to determine the target threshold according to the target data set and the probability distribution characteristic function corresponding to each target data in the target data set, and the target data is used to represent the storage device of the target database within the historical preset time period. historical usage;
  • the second processing module is configured to determine whether the other target data to be measured in the abnormal time window are all the abnormal data if the target data to be measured is determined to be abnormal data according to the target threshold, and the target to be measured data is used for characterizing the current usage of the storage device;
  • the third processing module is configured to determine that the target database is faulty if the judgment result is yes.
  • the first processing module is specifically used for:
  • the target threshold is obtained by determining the average of all usage thresholds.
  • the first processing module is also specifically used for:
  • each distribution result is the Beta distribution corresponding to each target data
  • the probability distribution characteristic function includes the Beta distribution function
  • each usage threshold is determined according to the target probability threshold and each distribution result.
  • the database fault finding device further includes:
  • an acquisition module configured to acquire a plurality of historical data of the storage device within the historical preset duration according to a preset time period
  • a screening module configured to mark each historical data according to preset screening rules, and eliminate historical data that do not meet the preset screening rules to obtain candidate data
  • the operation module is used for performing percentage operation on each candidate data to obtain the corresponding target data.
  • the acquisition module is further configured to acquire the data to be measured according to the preset time period
  • the screening module is further configured to screen the data to be measured according to the preset screening rules to obtain the corresponding target data to be measured.
  • the database fault finding device further includes: a fourth processing module; the fourth processing module is used for:
  • a plurality of similarities are determined according to the target data set and the second preset algorithm, each similarity is used to represent the similarity between the target data corresponding to two adjacent unit durations, and the historical preset durations include multiple unit durations;
  • the abnormal time window is determined according to a third preset algorithm, a preset abnormal time window threshold, and the target similarity.
  • the fourth processing module is further used for:
  • the second preset algorithm is used to sequentially determine the similarity between the target data subsets of every two adjacent unit durations, so as to obtain the plurality of similarities, and the target data subsets include all the target data subsets within a unit duration. target data.
  • the fourth processing module is further used for:
  • the preset abnormal time window threshold is the abnormal time window.
  • the third processing module is further used for:
  • the target data to be measured is not the abnormal data, it is determined that the target database operates normally; or
  • the database fault finding device further includes:
  • a generation module is used to generate alarm information
  • a sending module configured to send the alarm information to the control terminal and/or the client terminal to prompt that the target database fails.
  • the application provides an electronic device, comprising:
  • a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to execute the first
  • the present application provides a non-transitory computer-readable storage medium storing computer instructions, the computer instructions being used to cause the computer to execute any one of the first aspect and the optional solutions of the first aspect.
  • the present application provides a database fault finding method, device, electronic device and storage medium.
  • a target threshold is determined according to a target data set and a probability distribution characteristic function corresponding to each target data in the target data set, wherein the target data is used to represent the target The historical usage rate of the storage device of the database within the historical preset time period. If it is determined that the target data to be measured is abnormal data according to the target threshold, it is further judged whether other target data to be measured within the abnormal time window are abnormal data. If the judgment result is yes, it is determined that the target database is faulty. Used to characterize the current usage of the storage device. The target threshold is determined based on the historical usage rate of the target database storage device and the corresponding probability distribution feature function.
  • the determination of the target threshold is more in line with the actual operating conditions of the target database and improves the rationality and accuracy of the target threshold.
  • the judgment of the abnormal time window is introduced, so that the fault determination process is strongly related to the periodicity of the actual operation of the target database, which further improves the rationality of fault discovery.
  • the process of fault discovery does not need to consider business diversity to build a corresponding deep learning model, which has the advantages of strong achievability and high utilization.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a database fault discovery method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another database fault discovery method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a usage rate provided by an embodiment of the present application.
  • FIG. 5 is another schematic diagram of usage rate provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of still another database fault discovery method provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of determining an abnormal time window according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a database fault finding apparatus provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of another database fault finding apparatus provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the threshold needs to be continuously adjusted according to the actual situation, which makes the maintenance workload in the later stage large and unreasonable. If the threshold is set too high, some sudden usage situations may not be found, and there is a risk of underreporting. However, if the setting is too low, it will cause a large number of false alarms because the fault monitoring is too sensitive. Moreover, with the continuous development of the business, the load of the database increases accordingly.
  • the solution using a fixed threshold does not consider the actual running trend of the database, and may also have some unknown effects.
  • Another solution is to use a deep learning model to determine a prediction line, and determine a safety interval around the prediction line according to a Gaussian distribution.
  • the database is considered to be faulty.
  • thousands of databases are often required to provide services for a business. If a corresponding deep learning model is generated for all databases, the memory space dedicated to the model is too large and the detection rate is relatively high. There are technical problems that cannot be used on a large scale and have low utilization due to low and other reasons.
  • the present application provides a database fault finding method, apparatus, electronic device and storage medium.
  • the target threshold is determined according to the target data set and the probability distribution characteristic function corresponding to each target data in the target data set, wherein the target data is used to represent the storage device of the target database in the historical preset Historical usage over time.
  • the determined target threshold is used for the judgment of abnormal data in the subsequent fault discovery process. Since the target threshold is determined based on each target data and its corresponding probability distribution feature function, the target threshold is in line with the actual operating conditions of the target database, which improves the rationality and accuracy of setting the target threshold.
  • the target data to be measured is determined to be abnormal data based on the target threshold, it will be further judged whether other target data to be measured within the abnormal time window are abnormal data, and only when the judgment result is yes, it is determined that the target database is faulty.
  • the judgment of abnormal time window is introduced, so that the fault determination process is strongly correlated with the periodicity of the actual operating conditions of the target database, which further improves the rationality of fault discovery.
  • the fault discovery process provided by the present application does not need to consider the diversity of services to additionally build a deep learning model, which has the advantages of strong achievability and high utilization.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • a network is used as a medium for providing a communication link between a server 11 and a server 12 .
  • a network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the server 11 and the server 12 can interact through the network to receive or send messages.
  • one of the server 11 and the server 12 is a database server corresponding to the target database, and correspondingly, the other is an electronic device corresponding to the database fault finding apparatus provided in the embodiment of the present application.
  • the server 11 is the target database.
  • the corresponding database server, the server 12 is an electronic device that executes the database fault finding method provided by the embodiment of the present application. Information is exchanged between the server 11 and the server 12 through the network, so as to monitor whether the target database corresponding to the server 11 is faulty.
  • the database server corresponding to the target database may be set as a server cluster according to the actual working conditions of the target database, which is not limited in this embodiment of the present application. Only the server 11 is shown in FIG. 1 as an example.
  • FIG. 2 is a schematic flowchart of a database fault finding method provided by an embodiment of the present application. As shown in FIG. 2 , the database fault finding method provided by this embodiment includes:
  • S101 Determine a target threshold according to the target data set and the probability distribution characteristic function corresponding to each target data in the target data set.
  • the target data is used to represent the historical usage rate of the storage device of the target database within the historical preset time period.
  • Each target data is used to represent the historical usage rate of the storage device of the target database within the historical preset time period, and all the corresponding target data within the historical preset time period form the target data set.
  • each target data in the target data set is the actual usage rate generated by the storage device of the target database in the actual working conditions within the historical preset time period.
  • the preset historical duration may be a certain continuous duration in the past, such as a historical week, a historical month, etc., and each target data in the target data set is each historical usage generated by the storage device in a historical week. Rate.
  • each item here does not mean that the historical usage rate generated by the storage device of the target database during the running process exists according to the number, but that the database fault finding device can obtain the historical usage rate according to a certain period of time. rate, and each acquisition operation corresponds to a corresponding data.
  • the probability distribution feature function corresponding to each target data may be a feature function capable of representing the probability of occurrence of an event corresponding to the target data, for example, a beta ( ⁇ , Beta) distribution function.
  • the beta distribution function is a density function that is a conjugate prior distribution of Bernoulli and binomial distributions, which has the inherent advantage of reflecting the utilization of storage devices in the target database, such as reflecting the CPU and/or I/O interface. usage.
  • the value range of each target data representing the historical usage rate of the storage device is in the [0, 1] interval, which is exactly matching the definition domain of the Beta distribution function. Therefore, the Beta distribution function corresponding to each target data can be used as its corresponding probability distribution feature function, which makes the fitting process of the target data in the target data set more reasonable.
  • the embodiment of the present application utilizes the probability distribution feature function corresponding to each target data, instead of using one feature function for fitting all historical data in the prior art. Therefore, the fitting process for the target data is more reasonable, and the fitting results are smoother, so that the determination process of the target threshold value is closer to the actual operating conditions of the target database, which is beneficial to improve the accuracy of the target threshold value.
  • the probability distribution feature functions provided by the embodiments of the present application include, but are not limited to, the Beta distribution functions. In the case of achieving the same effect, other feature functions may also be used, which are not limited in the embodiments of the present application.
  • step S101 may be as shown in FIG. 3 , which is a schematic flowchart of another database fault discovery method provided by this embodiment of the present application.
  • the target threshold is determined according to the target data set and the probability distribution characteristic function corresponding to each target data in the target data set, which may include:
  • S1011 Determine the fluctuation coefficient of the target data set according to the first preset algorithm, and determine the target probability threshold according to the fluctuation coefficient.
  • the fluctuation coefficient is used to represent the fluctuation range of historical usage rate.
  • the fluctuation coefficient is used to measure the fluctuation range of the historical usage rate. The larger the fluctuation range, the more likely the target database is to have a high usage rate.
  • the fluctuation coefficient of the target data set can be determined according to the first preset algorithm, that is, the fluctuation range of the target database within the historical preset time period can be determined according to the first preset algorithm.
  • the first preset algorithm may be a corresponding formula for determining the volatility coefficient, as shown in the following formula (1):
  • represents the fluctuation coefficient
  • X max represents the target data with the largest value in the target data set
  • X min represents the target data with the smallest value in the target data set
  • X mean represents the average value of all target data in the target data set.
  • the fluctuation coefficient of the target data set can be determined to reflect the fluctuation range of the target data set within a preset historical time period.
  • FIG. 4 is a schematic diagram of a usage rate provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of another usage rate provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram corresponding to a usage rate with a small fluctuation coefficient
  • FIG. 4 is a schematic diagram corresponding to a usage rate with a small fluctuation coefficient
  • FIG. 4 is a schematic diagram corresponding to a usage rate with a larger fluctuation coefficient
  • different thresholds should be set for corresponding data with different fluctuation coefficients.
  • the corresponding fluctuation coefficient is also determined according to the fluctuation coefficient.
  • the target probability threshold value for example, the target probability threshold value corresponding to the fluctuation coefficient can be determined by using the formula (2) shown below.
  • T c represents the target probability threshold
  • is the fluctuation coefficient determined by formula (1)
  • exp represents the exponential function operation with the natural constant e as the base.
  • the target probability threshold corresponding to the fluctuation coefficient is obtained based on the fluctuation coefficient of the target data set through the above formula (2), so that the subsequent target threshold can be determined according to the characteristics of the target data itself, thereby reflecting the actual operating condition of the target database.
  • S1012 Based on the preset inverse cumulative score function, determine the respective usage thresholds of each target data according to the target probability threshold and the probability distribution characteristic function corresponding to each target data.
  • the target probability threshold corresponding to the target data set is determined, further, based on the preset inverse cumulative score function, according to the target probability threshold and the probability distribution characteristic function distribution corresponding to each target data, the respective usage rate of each target data is determined. threshold.
  • step S1012 may include:
  • each distribution result is the Beta distribution corresponding to each target data
  • the probability distribution characteristic function includes the Beta distribution function
  • each usage threshold is determined according to the target probability threshold and each distribution result, and each usage threshold is also a usage threshold corresponding to each target data.
  • the probability distribution feature function corresponding to each target data may be a Beta distribution function, and each target data is operated based on the corresponding Beta distribution function to obtain a distribution result, which is the Beta of each target data.
  • distribution which can be expressed by formula (3) as shown below:
  • Beta and ⁇ respectively represent the fitting parameters in the Beta distribution function
  • B represents the Beta distribution function
  • f represents the distribution result determined according to the target data X and the corresponding Beta distribution function, that is, the Beta distribution corresponding to the target data X.
  • each distribution result can be obtained correspondingly, and each distribution result is the Beta distribution corresponding to the target data. How many target data exist in the target data set, the same number of Beta distributions can be obtained.
  • Beta distribution corresponding to each target data in the target data set that is, each distribution result obtained above, and the target probability threshold are calculated based on the preset inverse cumulative score function, and the calculation result is the usage threshold, thereby obtaining.
  • Each target data in the target data set has a corresponding usage threshold, and the same number of corresponding usage thresholds can be obtained as many target data exist in the target data set.
  • the preset inverse cumulative score function provided in the embodiment of the present application is a corresponding function that has a corresponding relationship with the Beta distribution function in the Beta distribution, that is, when the parameters ⁇ of the Beta distribution function in formula (3) and After ⁇ is determined, by calling the software package and running the software package, after inputting the target probability threshold and each distribution result, that is, the Beta distribution corresponding to each target data, the corresponding result based on the preset inverse cumulative score function operation can be obtained. , the result is the usage threshold corresponding to each target data.
  • the embodiments of the present application do not limit the software package that implements the operation of the preset inverse cumulative integral function.
  • S1013 Obtain a target threshold by determining the average value of all usage thresholds.
  • the average value of all usage thresholds is determined, and the average is used as the target threshold to judge whether the target database is faulty. It can be understood that the determined target threshold is stored for use in subsequent steps.
  • the target threshold provided by the embodiment of the present application is obtained by first determining the corresponding usage threshold based on the probability distribution feature function corresponding to each target data in the target data set, and then performing an average operation on all the usage thresholds. Compared with using a parameter to directly determine the target threshold, the target threshold determined in the embodiment of the present application has higher precision and more accurate detection of database faults.
  • the database fault detection method when determining the target threshold for abnormal data judgment, is performed based on the probability distribution characteristic function corresponding to each target data, so that the corresponding target data can be obtained.
  • the determination of the target threshold is based on multiple sets of data, rather than a set of data in the prior art. The determined target threshold is smoother, reducing the effect of outliers.
  • the target threshold provided by the embodiment of the present application obtains the corresponding Beta distribution through each target data, and each Beta distribution has corresponding fitting parameters. If the target data set There are N target data, then N Beta distributions and N groups of corresponding fitting parameters ⁇ and ⁇ are obtained, and then the target probability threshold and N distribution results are subjected to the corresponding N preset inverse cumulative score function operations to obtain N There are usage thresholds, and the average of the N usage thresholds is determined as the target threshold.
  • a Beta distribution function is usually used for all data, that is, a set of ⁇ and ⁇ fitting parameters are obtained, and a corresponding threshold is obtained based on the one result.
  • the fitting process when the target threshold is determined in the present application is more suitable for the actual operating conditions of the target database, and the fitting result is smoother, which is beneficial to improve the accuracy of the target threshold.
  • S102 Determine whether the target data to be measured is greater than the target threshold.
  • the target data to be measured is used to represent the current usage rate of the storage device.
  • the target threshold determines whether the target data to be measured is abnormal data by comparing the target data to be measured with the target threshold.
  • the target data to be measured is used to represent the current usage rate of the storage device. In other words, after the target threshold is determined, the current usage rate of the storage device of the target database is obtained, and whether the target data to be measured is abnormal data can be determined by judging the magnitude relationship between the target data to be measured and the target threshold.
  • the corresponding judgment result is obtained by comparing the magnitude relationship between the target data to be measured and the target threshold. If the determination result is yes, that is, the target data to be measured is greater than the target threshold, it is determined that the current target to be measured data used for the determination is abnormal data, and step S103 is executed. On the contrary, if the judgment result is no, that is, the target data to be measured is not greater than the target threshold, it is determined that the current target data to be measured used for the judgment is not abnormal data, and step S105 is executed. It can be understood that in this step, the number of target data to be measured that is compared with the target threshold is one each time, that is, each time a target data to be measured is obtained, it is compared with the target threshold to determine the target to be measured. Whether the measured data is abnormal data.
  • the basis for judging whether the target data to be measured is abnormal data includes judging whether the target data to be measured is greater than the target threshold in the above-mentioned example, but is not limited to the judgment basis.
  • a corresponding judgment basis is set for the actual operation of the target database in the working condition, which is not limited in this embodiment of the present application.
  • S103 If the target data to be measured is determined to be abnormal data according to the target threshold, then determine whether other target data to be measured within the abnormal time window are abnormal data.
  • the abnormal time window is used to characterize the periodic characteristics of the running state of the target database, in other words, the abnormal time window is used to characterize the similarity of the target data.
  • the judgment process of introducing the abnormal time window is combined with the actual operating conditions of the target database to improve the stability of the database fault discovery method. In actual working conditions, there may be a sudden increase of a certain target data in the operation of the target database, but it cannot be judged that the target database is faulty just because of the current data, but should continue to monitor other targets of the target database within the abnormal time window. Whether the data to be tested are abnormal data.
  • a possible implementation method of judging whether the other target data to be measured in the abnormal time window is abnormal data can be the same as determining whether the target data to be measured is abnormal data according to the target threshold in the previous embodiment. That is, it is judged whether the data to be measured of other targets are all greater than the target threshold. If the result of the judgment is yes, then it is determined that the data to be measured of other targets within the abnormal time window are abnormal data. Accordingly, if the result of the judgment is no, the abnormal time If one or all of the other target data to be measured in the window is not greater than the target threshold, it is determined that the other target to be measured data in the abnormal time window are not all abnormal data.
  • step S104 is executed, and if the judgment result is no, step S105 is executed.
  • the basis for judging whether other target data to be measured in the abnormal time window are abnormal data may be consistent with that in step S102, or may not be consistent, and may be set according to the actual operating conditions of the target database. Therefore, the embodiments of the present application are not limited.
  • the target data to be measured is determined to be abnormal data according to the target threshold, and it is further determined that other target data to be measured within the abnormal time window are also abnormal data, it is determined that the target database is faulty.
  • the target data to be measured is determined to be non-abnormal data according to the target threshold, or, the target data to be measured is determined to be abnormal data according to the target threshold, but it is further determined whether other target data to be measured within the abnormal time window are abnormal. If the judgment result obtained is that the other target data to be tested are not all abnormal data, it is determined that the target database is running normally and no fault has occurred.
  • whether the target data to be measured is abnormal data is determined through the determination process of the target threshold value and according to the target threshold value, and on the premise that it is abnormal data, it is further judged that the abnormal data is within the abnormal time window. Whether other target data to be measured are abnormal data, to determine if the target database is faulty when other target data to be measured within the abnormal time window are abnormal data.
  • the target threshold is determined according to the target data set and the probability distribution characteristic function corresponding to each target data in the target data set, such as the Beta distribution function, wherein the target data is used to represent the target database The historical usage rate of the storage device within the historical preset time period. Then, according to the target threshold, it is judged whether the obtained target data to be measured is abnormal data. If the target data to be measured is determined to be abnormal data according to the target threshold, it is further judged whether other target data to be measured in the abnormal time window are all abnormal data. for abnormal data.
  • the target database is faulty, and the target data to be measured is used to represent the current usage rate of the storage device.
  • the target threshold is determined based on the historical usage rate of the storage device and the corresponding probability distribution feature function, so that the determination of the target threshold is more in line with the actual operating conditions of the target database, and the rationality and accuracy of the target threshold are improved.
  • the judgment of abnormal time window is introduced, so that the fault determination process is strongly correlated with the actual operation period of the target database, which further improves the reasonableness of fault detection. sex.
  • the fault discovery process provided by the embodiments of the present application does not need to consider business diversity to additionally construct a corresponding deep learning model, which has the advantages of strong achievability and high utilization rate.
  • alarm information may be generated, and then the alarm information may be sent to the control terminal and/or the client terminal to prompt operation and maintenance personnel or users that the target database is faulty .
  • the specific manner of the generated alarm information may be set according to factors such as the business type involved in the actual working condition of the target database, which is not limited in this embodiment of the present application.
  • the control terminal can be, for example, the operation platform of the operation and maintenance personnel
  • the client terminal can be, for example, the user terminal of the target database.
  • the target objects that prompt the fault of the target database include but are not limited to the control terminal and/or or the client, as well as the specific work content and authority of the control terminal and the client, which are not limited in the embodiments of the present application.
  • alarm information is also generated, and the alarm information is sent to the control terminal and/or the client terminal, so as to prompt the relevant personnel or the platform that the target database is running. failure, improve the user experience.
  • the target data when the target threshold is determined and the target data to be measured when the abnormal data is judged have a value range of [0, 1].
  • the data that can be obtained is usually an integer between 0 and 100. Therefore, in a possible design, before step S1011, the database fault discovery method provided by the embodiment of the present application, The steps shown in FIG. 6 may also be included.
  • FIG. 6 is a schematic flowchart of still another database fault discovery method provided by an embodiment of the present application. As shown in FIG. 6 , this embodiment includes:
  • S201 Acquire a plurality of historical data of a storage device within a historical preset time period according to a preset time period.
  • the preset historical duration may be the past historical one week, or the historical one month and other historical durations.
  • the usage of the storage device is obtained every one minute.
  • the obtained data corresponding to the usage rate is the historical data, and every minute is the preset time period, then a plurality of historical data within the historical preset time period can be obtained, and each historical data represents the history of the storage device status of use.
  • S202 Mark each historical data according to the preset screening rules, and eliminate the historical data that do not meet the preset screening rules to obtain candidate data.
  • each historical data is labeled according to the preset screening rules, such as manual labeling methods, and the historical data that does not meet the preset screening rules is eliminated. Accordingly, those that meet the preset screening rules are candidate data.
  • the preset filtering rule may be greater than or equal to 0 and less than or equal to 100, etc., and may be specifically set according to the usage state of the storage device during the operation of the target database, which is not limited in this embodiment of the present application.
  • S203 Perform percentage operation on each candidate data to obtain corresponding target data.
  • the database fault discovery method before determining the fluctuation coefficient of the target data set according to the first preset algorithm, first obtains a plurality of historical data of the storage device within the historical preset time period according to the preset time period, and then according to the preset time period
  • the preset screening rules mark each historical data to eliminate the historical data that does not meet the preset screening rules.
  • the historical data that meets the preset screening rules is determined as candidate data, and a percentage operation is performed on each candidate data to obtain the corresponding target data. Therefore, the historical data obtained during the actual operation of the target database is subjected to a certain preprocessing process to obtain target data whose value ranges from [0, 1] to determine the target threshold based on its corresponding probability distribution feature function. , which makes the determination process of the target threshold more reasonable and helps to improve the accuracy of the determined target threshold.
  • the target data to be measured before determining whether the target data to be measured is abnormal data according to the target threshold, that is, before judging whether the target data to be measured is greater than the target threshold, to determine whether the target data to be measured is abnormal data according to the target threshold.
  • the corresponding steps in the embodiment shown in FIG. 6 may be used to preprocess the data to be measured each time to obtain the target data to be measured, and then determine the magnitude relationship between the data to be measured and the target threshold.
  • the data to be measured is first obtained according to a preset time period, that is, the data to be measured is obtained at the same time interval as the acquisition of historical data, and then the data to be measured is filtered according to a preset screening rule to obtain the corresponding target data to be measured.
  • the data to be tested can be understood as the current usage state of the storage device.
  • the specific implementation manner and technical effect of the preprocessing of the data to be measured to obtain the target data to be measured are similar to the relevant steps in the embodiment shown in FIG. 6 , and will not be repeated here.
  • FIG. 7 is a schematic flowchart of determining an abnormal time window provided by an embodiment of the present application. As shown in FIG. 7 , the method provided by this embodiment ,include:
  • S301 Determine a plurality of similarities according to the target data set and the second preset algorithm.
  • each similarity is used to represent the similarity between the target data corresponding to two adjacent unit durations
  • the historical preset duration includes multiple unit durations.
  • the target data set is composed of a plurality of target data within a preset historical duration, and the preset historical duration includes multiple unit durations. Since the historical preset duration is a continuous duration in physical time, the historical preset duration can be divided into multiple consecutive unit durations of equal duration. For example, the historical preset duration is one week in history, that is, seven days in history, and the unit duration is Can be every day of the week.
  • a plurality of similarities are determined according to the target data set and the second preset algorithm, and each similarity is used to represent the similarity between the target data corresponding to two adjacent unit durations. It can be understood that according to the target data set and the first The second preset algorithm is to determine the similarity between the target data of every two adjacent two days in the history of seven days, and then a number of six similarities can be obtained.
  • a possible implementation manner of determining multiple similarities according to the target data set and the second preset algorithm is:
  • the second preset algorithm is used to sequentially determine the similarity between two adjacent target data subsets of unit duration to obtain multiple similarities, wherein the target data subset includes all target data within one unit duration.
  • the historical preset duration is divided into multiple consecutive unit durations of equal duration. For example, if the historical preset duration is seven days of a week, the unit duration is sequentially from the first day to the seventh day of the week. Therefore, the similarity between the target data subsets of every two adjacent unit durations is sequentially determined by the second preset algorithm, that is, the first day and the second day, the second day and the second day are sequentially determined by the second preset algorithm. Three days, until the sixth and seventh days, the similarity between the target data subsets of each adjacent two days, six similarities can be obtained.
  • the target data subset includes all target data within a unit duration, that is, the target data set is divided into seven target data subsets, and each target data subset includes a unit duration, that is, all target data per day.
  • the second preset algorithm may be a cosine similarity algorithm, as shown in the following formula (4):
  • the preset time period is every minute
  • the value of n in formula (4) can be 1440, that is, there are 1440 target data in the target data subset, and the target data of one day in the historical duration can be passed through a 1440-dimensional
  • the vector A represents that the target data of the day adjacent to the vector A is also represented by a 1440-dimensional vector B
  • S represents the similarity between the vector A and the vector B, that is, the target data between the adjacent two days. similarity.
  • the target data in the target data set is based on the six similarities that can be determined based on the above formula (4).
  • the specific duration corresponding to the unit duration in the historical preset duration can be set according to the actual operation of the target database.
  • the above is only an exemplary description.
  • the unit duration is set to one day, but it is not the case. limited.
  • the operation method specifically adopted by the second preset algorithm may also be other related operation methods with the same effect, including but not limited to the above-mentioned cosine similarity algorithm.
  • An average value operation is performed on the plurality of similarities determined in step S301, and the operation result is determined as the target similarity.
  • S303 Determine the abnormal time window according to the third preset algorithm, the preset abnormal time window threshold and the target similarity.
  • the abnormal time window is determined according to the third preset algorithm, the preset abnormal time window threshold and the target similarity, so as to reflect the periodic characteristics of the actual operation of the target database through the abnormal time window.
  • abnormal data in a short period of time should attract the attention of control terminals such as operation and maintenance personnel.
  • the target database with weak periodicity it can be considered that the actual operation of the target database fails when abnormal data occurs for a period of time. Therefore, for the judgment of whether the target database is faulty, an abnormal time window is introduced to improve the stability of the judgment method.
  • step S303 possible implementations include:
  • the candidate abnormal time window is determined to be the abnormal time window
  • the preset abnormal time window threshold is the abnormal time window.
  • the conversion between the target similarity and the candidate abnormal time window is performed by the third preset algorithm, and the third preset algorithm can be expressed by the following formula (5):
  • the target similarity is multiplied by 10, the integer part of the product is taken, and the difference is made by 10, and the obtained result is the candidate abnormal time window. For example, if the target similarity is 0.7, the determined candidate anomaly time window is 3.
  • an empirical value is usually set according to a long-term operation condition, that is, a preset abnormal time window threshold. Therefore, in order to improve the stability of the database fault finding method provided by the embodiment of the present application, after the candidate abnormal time window is determined according to the third preset algorithm and the target similarity, it is also necessary to compare the candidate abnormal time window with the preset abnormal time window. The corresponding values of the thresholds are compared, and the abnormal time window is determined according to the comparison result.
  • the candidate abnormal time window is determined to be the abnormal time window.
  • the value corresponding to the candidate abnormal time window is less than or equal to the value corresponding to the preset abnormal time window threshold, it is determined that the preset abnormal time window threshold is the abnormal time window.
  • the preset abnormal time window threshold is usually set to 3.
  • the candidate abnormal time window is determined to be the abnormal time window.
  • the preset abnormal time window threshold is determined to be the abnormal time window. It can be understood that the value corresponding to the preset abnormal time window threshold is not limited to 3, and may be set according to the actual working conditions of the target database, which is not limited in this embodiment of the present application.
  • the target data to be measured is determined to be abnormal data according to the target threshold, it is further determined whether the two other data to be measured of the target after the target data to be measured are abnormal data.
  • the third preset algorithm may also be other conversion formulas, and the above formula (5) is only illustrative and not limited thereto.
  • the database fault discovery method if it is determined that the target data to be measured is abnormal data, it is further judged whether other target data to be measured in the abnormal time window are abnormal data. Therefore, before the judgment step, you can First, a plurality of similarities are determined according to the target data set and the second preset algorithm, wherein each similarity is used to represent the similarity between the target data corresponding to two adjacent unit durations, and the historical preset duration includes multiple unit duration. Then, the average value of all the similarities is obtained to obtain the target similarity, and then the abnormal time window is determined according to the third preset algorithm, the preset abnormal time window threshold and the target similarity. The fault determination process is strongly correlated with the actual running period of the target database, which further improves the rationality and stability of fault discovery.
  • FIG. 8 is a schematic structural diagram of a database fault finding apparatus provided by an embodiment of the present application. As shown in FIG. 8 , the database fault finding apparatus 400 provided by this embodiment includes:
  • the first processing module 401 is configured to determine the target threshold according to the target data set and the probability distribution characteristic function corresponding to each target data in the target data set.
  • the target data is used to represent the historical usage rate of the storage device of the target database within the historical preset time period.
  • the second processing module 402 is configured to determine whether the other target data to be measured in the abnormal time window are abnormal data if the target data to be measured is determined to be abnormal data according to the target threshold.
  • the target data to be measured is used to represent the current usage rate of the storage device.
  • the third processing module 403 is configured to determine that the target database is faulty if the judgment result is yes.
  • the first processing module 401 is specifically used for:
  • the target threshold is obtained by determining the average of all usage thresholds.
  • the first processing module 401 is also specifically used for:
  • each distribution result is the Beta distribution corresponding to each target data
  • the probability distribution characteristic function includes the Beta distribution function
  • each usage threshold is determined according to the target probability threshold and each distribution result.
  • the second processing module 402 is further configured to:
  • the second processing module 402 is further configured to:
  • the third processing module 403 is further configured to:
  • target data to be tested is not abnormal data, it is determined that the target database is running normally; or
  • FIG. 9 is a schematic structural diagram of another database fault finding apparatus provided by an embodiment of the present application.
  • the database fault finding apparatus 400 provided by this embodiment further includes :
  • an acquisition module 404 configured to acquire a plurality of historical data of the storage device within the historical preset duration according to a preset time period
  • the screening module 405 is configured to manually mark each historical data according to the preset screening rules, and eliminate the historical data that do not meet the preset screening rules, so as to obtain candidate data;
  • the operation module 406 is configured to perform percentage operation on each candidate data to obtain corresponding target data.
  • the obtaining module 404 is further configured to obtain the data to be measured according to a preset time period
  • the screening module 405 is further configured to screen the data to be measured according to preset screening rules to obtain the corresponding target data to be measured.
  • the database fault finding apparatus 400 provided by the embodiment of the present application further includes: a fourth processing module;
  • the fourth processing module is used for:
  • each similarity is used to represent the similarity between the target data corresponding to two adjacent unit durations, and the historical preset duration includes multiple unit durations;
  • the abnormal time window is determined according to the third preset algorithm, the preset abnormal time window threshold and the target similarity.
  • the fourth processing module is also used to:
  • the second preset algorithm is used to sequentially determine the similarity between two adjacent target data subsets of unit duration to obtain multiple similarities, and the target data subset includes all target data within one unit duration.
  • the fourth processing module is also used to:
  • the candidate abnormal time window is determined to be the abnormal time window
  • the preset abnormal time window threshold is the abnormal time window.
  • the database fault finding apparatus 400 provided by the embodiment of the present application further includes:
  • a generation module is used to generate alarm information
  • the sending module is used for sending alarm information to the control terminal and/or the client terminal to prompt the target database to fail.
  • modules division is only a logical function division, and there may be other division manners in actual implementation.
  • multiple modules can be combined or can be integrated into another system.
  • the coupling between the various modules may be achieved through some interfaces, which are usually electrical communication interfaces, but may be mechanical interfaces or other forms of interfaces. Therefore, modules described as separate components may or may not be physically separate, and may be located in one place or distributed in different locations on the same or different devices.
  • the database fault finding apparatus provided by the above-mentioned embodiment can be used to execute the corresponding steps of the database fault finding method provided by the above-mentioned embodiment. Repeat.
  • FIG 10 is a schematic structural diagram of an electronic device provided by an embodiment of the application. As shown in Figure 10, the electronic device 500 provided by the present embodiment includes:
  • the memory 502 stores instructions that can be executed by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can execute the various steps of the database fault finding method in the above method embodiments.
  • the memory 502 stores instructions that can be executed by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can execute the various steps of the database fault finding method in the above method embodiments.
  • the memory 502 may be independent or integrated with the processor 501 .
  • the electronic device 500 may further include:
  • the bus 503 is used to connect the processor 501 and the memory 502 .
  • the embodiments of the present application further provide a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions are used to cause the computer to execute each step of the database fault finding methods in the foregoing embodiments.
  • the readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

Abstract

本申请提供一种数据库故障发现方法、装置、电子设备及存储介质。根据目标数据集合以及目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率。若根据目标阈值确定目标待测数据为异常数据,则判断异常时间窗口内的其他目标待测数据是否均为异常数据,若是,目标数据库发生故障。基于每个历史使用率及对应的概率分布特征函数确定目标阈值,有效提高了目标阈值的合理性及精度。引入异常时间窗口,使得故障发现过程与目标数据库实际运行周期性强关联,进而提高了故障发现的合理性。无需考虑业务多样性以构建相应模型,具有可实现性强及利用率高等优点。

Description

数据库故障发现方法、装置、电子设备及存储介质
本申请要求于2020年09月30日提交中国专利局、申请号为202011058803.4、申请名称为“数据库故障发现方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及金融科技(Fintech)技术领域,尤其涉及一种数据库故障发现方法、装置、电子设备及存储介质。
背景技术
随着计算机技术以及互联网技术的快速发展,金融科技(Fintech)作为金融与科技深度融合的产物,目前正成为金融行业创新发展的热点。由于金融行业的安全性、实时性等高要求特点,势必对应用于金融行业的各种相关应用系统也提出了更高要求,例如,对应用于数据交易等环节的数据库而言,为了给交易环节提供良好的运行环境,往往需要对数据库存储设备的使用情况是否存在故障进行监测,例如对内存(CPU)和/或硬盘输入/输出(Input/Output,简称I/O)接口的使用情况进行监测,以确定数据库的运行过程是否存在使用超额等故障。
目前,通常基于使用阈值确定数据库是否存在异常故障。一种是根据运维人员的经验值设置一固定阈值,一旦数据库的使用情况超过该固定阈值,则确定数据库存在故障。然而采用这种固定阈值进行判断的方法,往往存在诸多问题。例如,在数据库的运行过程中需要不断根据实际情况进行阈值的调整,不但后期维护工作量较大还欠缺合理性,例如当固定阈值设置的过高,数据库正常运行过程中的突增情况可能无法被发现,存在漏报风险。而当设置的过低,又存在故障监测过于敏感的问题,可能会引起大量误报。又例如随着业务的不断发展,数据库的负载随之增加,采用固定阈值的解决方案未考虑数据库的实际运行趋势,可能会由于设置不当产生未知影响。
另一种解决方案是利用深度学习模型确定预测线,在预测线周围根据高斯分布得到安全区间,一旦超过该安全区间,则认为数据库存在故障。然而,由于业务多样性的特点,通常针对一个业务可能需要成千上万个数据库为其提供服务,若为所有的数据库都相应生成对应的深度学习模型,会因为模型占用内存空间过大以及检测率较低等原因存在无法大规模使用以及利用率较低等技术难题。
发明内容
本申请提供一种数据库故障发现方法、装置、电子设备及存储介质,用于解决现有的数据库故障发现方案缺乏合理性,以及无法大规模使用和利用率较低等技术问题。
第一方面,本申请提供一种数据库故障发现方法,包括:
根据目标数据集合以及所述目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,所述目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率;
若根据所述目标阈值确定目标待测数据为异常数据,则判断异常时间窗口内的其他目标待测数据是否均为所述异常数据,所述目标待测数据用于表征所述存储设备的当前使用率;
若判断结果为是,则确定所述目标数据库发生故障。
在一种可能的设计中,所述根据目标数据集合以及所述目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,包括:
根据第一预设算法确定所述目标数据集合的波动系数,并根据所述波动系数确定目标概率阈值,所述波动系数用于表征所述历史使用率的波动幅度;
基于预设逆累积分函数,根据所述目标概率阈值以及每个目标数据对应的所述概率分布特征函数确定所述每个目标数据各自对应的使用率阈值;
通过确定所有使用率阈值的平均值,得到所述目标阈值。
在一种可能的设计中,所述基于预设逆累积分函数,根据所述目标概率阈值以及每个目标数据对应的所述概率分布特征函数确定所述每个目标数据各自对应的使用率阈值,包括:
根据每个目标数据以及对应的Beta分布函数确定对应的每个分布结果,每个分布结果为每个目标数据对应的Beta分布,所述概率分布特征函数包括所述Beta分布函数;
基于所述预设逆累积分函数,根据所述目标概率阈值以及每个分布结果确定每个使用率阈值。
在一种可能的设计中,在所述根据第一预设算法确定所述目标数据集合的波动系数之前,还包括:
根据预设时间周期获取所述存储设备在所述历史预设时长内的多个历史数据;
按照预设筛选规则对每个历史数据进行标注,剔除不符合所述预设筛选规则的历史数据,以得到候选数据;
对每个候选数据进行百分比运算,得到对应的所述目标数据。
在一种可能的设计中,在所述根据所述目标阈值确定目标待测数据为异常数据之前,还包括:
根据所述预设时间周期获取待测数据;
按照所述预设筛选规则对所述待测数据进行筛选,得到对应的所述目标待测数据。
在一种可能的设计中,在所述判断异常时间窗口内的其他目标待测数据是否均为所述异常数据之前,还包括:
根据所述目标数据集合以及第二预设算法确定多个相似度,每个相似度用于表征相邻两个单位时长对应的所述目标数据之间的相似程度,所述历史预设时长包括多个单位时长;
获取所有相似度的平均值,以得到目标相似度;
根据第三预设算法、预设异常时间窗口阈值以及所述目标相似度确定所述异常时间窗口。
在一种可能的设计中,所述根据所述目标数据集合以及第二预设算法确定多个相似度,包括:
通过所述第二预设算法依次确定每相邻两个单位时长的目标数据子集之间的相似度,以得到所述多个相似度,所述目标数据子集包括一个单位时长内的所有目标数据。
在一种可能的设计中,所述根据第三预设算法、预设异常时间窗口阈值以及所述目标相似度确定所述异常时间窗口,包括:
通过所述第三预设算法和所述目标相似度确定候选异常时间窗口;
若所述候选异常时间窗口对应的数值大于所述预设异常时间窗口阈值对应的数值,则确定所述候选异常时间窗口为所述异常时间窗口;
若所述候选异常时间窗口对应的数值小于等于所述预设异常时间窗口阈值对应的数值,则确定所述预设异常时间窗口阈值为所述异常时间窗口。
在一种可能的设计中,若所述目标待测数据非所述异常数据,则确定所述目标数据库运行正常;或者
若所述其他目标待测数据并非均为所述异常数据,则确定所述目标数据库运行正常。
在一种可能的设计中,在所述确定所述目标数据库发生故障之后,还包括:
生成告警信息;
发送所述告警信息至控制端和/或客户端,以提示所述目标数据库发生故障。
第二方面,本申请提供一种数据库故障发现装置,包括:
第一处理模块,用于根据目标数据集合以及所述目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,所述目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率;
第二处理模块,用于若根据所述目标阈值确定目标待测数据为异常数据,则判断异常时间窗口内的其他目标待测数据是否均为所述异常数据,所述目标待测数据用于表征所述存储设备的当前使用率;
第三处理模块,用于若判断结果为是,则确定所述目标数据库发生故障。
在一种可能的设计中,所述第一处理模块,具体用于:
根据第一预设算法确定所述目标数据集合的波动系数,并根据所述波动系数确定目标概率阈值,所述波动系数用于表征所述历史使用率的波动幅度;
基于预设逆累积分函数,根据所述目标概率阈值以及每个目标数据对应的所述概率分布特征函数确定所述每个目标数据各自对应的使用率阈值;
通过确定所有使用率阈值的平均值,得到所述目标阈值。
在一种可能的设计中,所述第一处理模块,还具体用于:
根据每个目标数据以及对应的Beta分布函数确定对应的每个分布结果,每个分布结果为每个目标数据对应的Beta分布,所述概率分布特征函数包括所述Beta分布函数;
基于所述预设逆累积分函数,根据所述目标概率阈值以及每个分布结果确定每个使用率阈值。
在一种可能的设计中,所述数据库故障发现装置,还包括:
获取模块,用于根据预设时间周期获取所述存储设备在所述历史预设时长内的多个历史数据;
筛选模块,用于按照预设筛选规则对每个历史数据进行标注,剔除不符合所述预设筛选规则的历史数据,以得到候选数据;
运算模块,用于对每个候选数据进行百分比运算,得到对应的所述目标数据。
在一种可能的设计中,所述获取模块,还用于根据所述预设时间周期获取待测数据;
所述筛选模块,还用于按照所述预设筛选规则对所述待测数据进行筛选,得到对应的所述目标待测数据。
在一种可能的设计中,所述数据库故障发现装置,还包括:第四处理模块;所述 第四处理模块,用于:
根据所述目标数据集合以及第二预设算法确定多个相似度,每个相似度用于表征相邻两个单位时长对应的所述目标数据之间的相似程度,所述历史预设时长包括多个单位时长;
获取所有相似度的平均值,以得到目标相似度;
根据第三预设算法、预设异常时间窗口阈值以及所述目标相似度确定所述异常时间窗口。
在一种可能的设计中,所述第四处理模块,还用于:
通过所述第二预设算法依次确定每相邻两个单位时长的目标数据子集之间的相似度,以得到所述多个相似度,所述目标数据子集包括一个单位时长内的所有目标数据。
在一种可能的设计中,所述第四处理模块,还用于:
通过所述第三预设算法和所述目标相似度确定候选异常时间窗口;
若所述候选异常时间窗口对应的数值大于所述预设异常时间窗口阈值对应的数值,则确定所述候选异常时间窗口为所述异常时间窗口;
若所述候选异常时间窗口对应的数值小于等于所述预设异常时间窗口阈值对应的数值,则确定所述预设异常时间窗口阈值为所述异常时间窗口。
在一种可能的设计中,所述第三处理模块,还用于:
若所述目标待测数据非所述异常数据,则确定所述目标数据库运行正常;或者
若所述其他目标待测数据并非均为所述异常数据,则确定所述目标数据库运行正常。
在一种可能的设计中,所述数据库故障发现装置,还包括:
生成模块,用于生成告警信息;
发送模块,用于发送所述告警信息至控制端和/或客户端,以提示所述目标数据库发生故障。
第三方面,本申请提供一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,存储器存储有可被所述至少一个处理器执行的指令,指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行第一方面及第一方面的可选方案中的任意一种所述的数据库故障发现方法。
第四方面,本申请提供一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行第一方面及第一方面的可选方案中的任意一种所 述的数据库故障发现方法。
本申请提供一种数据库故障发现方法、装置、电子设备及存储介质,首先根据目标数据集合以及目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,其中,目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率。若根据目标阈值确定目标待测数据为异常数据,进一步判断异常时间窗口内的其他目标待测数据是否均为异常数据,若判断结果为是,则确定目标数据库发生故障,其中,目标待测数据用于表征存储设备的当前使用率。基于目标数据库存储设备的历史使用率以及对应的概率分布特征函数确定目标阈值,对于目标阈值的确定更加符合目标数据库实际运行工况,提高了目标阈值的合理性以及精度。并且,引入异常时间窗口的判断,使得故障的确定过程与目标数据库实际运行的周期性强关联,进一步提高了故障发现的合理性。另外,故障发现的过程无需考虑业务多样性以额外构建相应的深度学习模型,具有可实现性强以及利用率高等优点。
附图说明
图1为本申请实施例提供的一种应用场景示意图;
图2为本申请实施例提供的一种数据库故障发现方法的流程示意图;
图3为本申请实施例提供的另一种数据库故障发现方法的流程示意图;
图4为本申请实施例提供的一种使用率示意图;
图5为本申请实施例提供的另一种使用率示意图;
图6为本申请实施例提供的再一种数据库故障发现方法的流程示意图;
图7为本申请实施例提供的一种确定异常时间窗口的流程示意图;
图8为本申请实施例提供的一种数据库故障发现装置的结构示意图;
图9为本申请实施例提供的另一种数据库故障发现装置的结构示意图;
图10为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的方法和装置的例子。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后 次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
由于金融行业的安全性、实时性等高要求特点,相应地,对应用于金融行业的各相关应用系统也应当提出更高要求,以应用于数据交易等环节的数据库为例,为了给交易环节提供良好的运行环境,往往需要对数据库的存储设备的使用情况是否存在故障进行监测,例如对内存(CPU)和/或硬盘的输入/输出(Input/Output,简称I/O)接口的使用情况进行监测,以确定数据库的运行过程是否存在使用超额等异常故障。现有技术中,可以通过设置使用阈值确定数据库是否存在异常故障。一种是根据经验值设置一固定阈值,一旦其使用情况超过该固定阈值,则确定数据库存在故障。然而,采用固定阈值进行故障判断的方法,存在一些问题。例如,在运行过程中,需要不断根据实际情况调整该阈值,使得后期维护工作量较大并缺乏合理性。假若阈值设置的过高,一些突增的使用情况可能无法被发现,存在漏报风险。而若设置的过低,又因为故障监测过于敏感,引起大量误报。并且,随着业务的不断发展,数据库的负载随之增加,采用固定阈值的解决方案未考虑数据库的实际运行趋势,也可能会产生一些未知影响。另一种解决方案是利用深度学习模型确定一预测线,并在预测线周围根据高斯分布确定一安全区间,使用情况一旦超过安全区间,则认为数据库存在故障。但是,由于业务多样性的特点,针对一个业务往往需要成千上万个数据库对其提供服务,若为所有的数据库都生成相应的深度学习模型,会由于模型专用内存空间过大以及检测率较低等原因而存在无法大规模使用以及利用率较低的技术难题。
可见,现有技术对于数据库的故障发现未能与数据库的实际运行过程进行关联,使得所确定的阈值以及确定过程都缺乏合理性。并且,基于深度学习模型的解决方案存在无法大规模使用以及利用率较低的技术难题。
针对现有技术中的上述问题,本申请提供一种数据库故障发现方法、装置、电子设备及存储介质。本申请提供的数据库故障发现方法,首先,根据目标数据集合以及目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,其中,目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率。所确定的目标阈值用于后续故障发现过程中异常数据的判断。由于目标阈值是基于每个目标数据及其 对应的概率分布特征函数确定,使得目标阈值的符合目标数据库的实际运行工况,提高了设置目标阈值的合理性及精度。其次,若基于目标阈值确定目标待测数据为异常数据,会进一步判断异常时间窗口内的其他目标待测数据是否均为异常数据,在判断结果为是时,才确定目标数据库发生故障。引入异常时间窗口的判断,使得故障的确定过程与目标数据库实际运行工况的周期性进行强关联,进一步提高故障发现的合理性。另外,本申请提供的故障发现的过程无需考虑业务的多样性以额外构建深度学习模型,具有可实现性强以及利用率高等优点。
以下,对本申请实施例的示例性应用场景进行介绍。
本申请实施例提供的数据库故障发现方法可以通过本申请实施例提供的数据库故障发现装置执行,本申请实施例提供的数据库故障发现装置可以是服务器或服务器集群。图1为本申请实施例提供的一种应用场景示意图,如图1所示,网络用于为服务器11与服务器12之间提供通信链路的介质。网络可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。服务器11与服务器12之间可以通过网络进行交互,以接收或发送消息。其中,服务器11和服务器12中的一者为目标数据库对应的数据库服务器,相应地,另一者则为本申请实施例提供的数据库故障发现装置所对应的电子设备,例如,服务器11为目标数据库对应的数据库服务器,则服务器12为执行本申请实施例提供的数据库故障发现方法的电子设备。服务器11与服务器12之间通过网络进行信息的交互,以对服务器11对应的目标数据库是否发生故障进行监测。
值得被理解的是,根据目标数据库的实际工况可以将其对应的数据库服务器,即图1中的服务器11设置为服务器集群,对此,本申请实施例不作限定。图1中仅以服务器11为例示出。
需要说明的是,上述应用场景仅仅是示意性的,本申请实施例提供的数据库故障发现方法、装置、电子设备及存储介质包括但不仅限于上述应用场景。
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。
图2为本申请实施例提供的一种数据库故障发现方法的流程示意图,如图2所示,本实施例提供的数据库故障发现方法,包括:
S101:根据目标数据集合以及目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值。
其中,目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率。
每个目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率,在历史预设时长内对应的所有目标数据即形成目标数据集合。换言之,目标数据集合中的每个目标数据,为目标数据库的存储设备在历史预设时长内的实际工况中所产生的实际使用率。其中,历史预设时长可以是过去的某一段连续时长,例如历史一周、历史一月等等,目标数据集合中的每个目标数据即为存储设备在历史一周中的所产生的每个历史使用率。需要说明的是,此处的每个并非是指目标数据库在运行过程中存储设备所产生的历史使用率是按照个数存在的,而是指数据库故障发现装置可以按照一定的时间周期获取历史使用率,每次的获取操作则对应存在一个相应的数据。
每个目标数据对应的概率分布特征函数,可以是一种能够表征目标数据所对应的事件发生的概率的特征函数,例如,贝塔(β,Beta)分布函数。贝塔分布函数是一个作为伯努利分布和二项式分布的共轭先验分布的密度函数,其具备反映目标数据库中存储设备使用率的先天优势,例如反映CPU和/或I/O接口的使用率。并且,表征存储设备的历史使用率的每个目标数据的取值范围在[0,1]区间,正与Beta分布函数的定义域相匹配。因而,可以采用每个目标数据对应的Beta分布函数作为其对应的概率分布特征函数,使得对目标数据集合中目标数据的拟合过程更加合理。
另外,本申请实施例在基于概率分布特征函数确定目标阈值时,利用了每个目标数据所对应的概率分布特征函数,并非现有技术中对所有历史数据采用一个特征函数进行拟合。因而,对于目标数据的拟合过程更为合理,拟合结果更为平滑,进而使得目标阈值的确定过程更加接近目标数据库的实际运行工况,有利于提高目标阈值的精度。
需要说明的是,本申请实施例提供的概率分布特征函数包括但不仅限于Beta分布函数,在实现同等功效的情况下,也可以采用其他特征函数,对此,本申请实施例不作限定。
在一种可能的设计中,步骤S101可能的实现方式可以如图3所示,图3为本申请实施例提供的另一种数据库故障发现方法的流程示意图,如图3所示,本实施例提供的数据库故障方法中,根据目标数据集合以及目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,可以包括:
S1011:根据第一预设算法确定目标数据集合的波动系数,并根据波动系数确定目标概率阈值。
其中,波动系数用于表征历史使用率的波动幅度。
波动系数用于衡量历史使用率的波动幅度,波动幅度越大,表明目标数据库越容易出现高使用率的情况。根据第一预设算法可以确定目标数据集合的波动系数,即根据第一预设算法可以确定目标数据库在历史预设时长内的波动幅度。第一预设算法可以为确定波动系数的相应公式,如下公式(1)所示:
Figure PCTCN2021119583-appb-000001
其中,ω表示波动系数,X max表示目标数据集合中数值最大的目标数据,X min表示目标数据集合中数值最小的目标数据,X mean表示目标数据集合中所有目标数据对应数值的平均值。
根据上述公式(1)可以确定目标数据集合的波动系数,以反映目标数据集合在历史预设时长内的波动幅度。
图4为本申请实施例提供的一种使用率示意图,图5为本申请实施例提供的另一种使用率示意图,其中,图4为波动系数较小的使用率对应的示意图,图5为波动系数较大的使用率对应的示意图,参照图4和图5可见,对于不同波动系数的相应数据,应当设置不同阈值。因而,为了更加反映目标数据库的实际运行状况,本申请实施例提供的数据库故障发现方法,在基于目标数据集合利用第一预设算法确定了波动系数之后,进一步地,还根据波动系数确定对应的目标概率阈值,例如,可以利用如下所示的公式(2)确定波动系数对应的目标概率阈值。
T c=1-exp (-1*ω)        (2)
其中,T c表示目标概率阈值,ω为通过公式(1)确定的波动系数,exp表示以自然常数e为底数的指数函数运算。
从而,通过上述公式(2)基于目标数据集合的波动系数得到该波动系数所对应的目标概率阈值,以能够根据目标数据自身的特性确定后续的目标阈值,进而反映目标数据库的实际运行工况。
S1012:基于预设逆累积分函数,根据目标概率阈值以及每个目标数据对应的概率分布特征函数确定每个目标数据各自对应的使用率阈值。
在确定了目标数据集合对应的目标概率阈值之后,进一步地,基于预设逆累积分函数,根据目标概率阈值以及每个目标数据对应的概率分布特征函数分布确定每个目标数据各自对应的使用率阈值。
在一种可能的设计中,步骤S1012可能的实现方式可以包括:
根据每个目标数据以及对应的Beta分布函数确定对应的每个分布结果,其中,每个分布结果为每个目标数据对应的Beta分布,概率分布特征函数包括Beta分布函数;
基于预设逆累积分函数,根据目标概率阈值以及每个分布结果确定每个使用率阈值,每个使用率阈值也即每个目标数据各自对应的使用率阈值。
如步骤S101中所描述,每个目标数据对应的概率分布特征函数可以是Beta分布函数,将每个目标数据基于对应的Beta分布函数运算得到分布结果,该分布结果即为每个目标数据的Beta分布,可以通过如下所示的公式(3)表示:
Figure PCTCN2021119583-appb-000002
其中,α和β分别表示Beta分布函数中的拟合参数,B表示Beta分布函数,f表示根据目标数据X与对应的Beta分布函数所确定的分布结果,即目标数据X对应的Beta分布。
将每个目标数据及其对应的Beta分布函数代入公式(3),可以对应得到每个分布结果,每个分布结果即为该目标数据对应的Beta分布。目标数据集合中存在多少个目标数据,则可以得到相同数量的Beta分布。
进一步地,将目标数据集合中每个目标数据对应的Beta分布,即上述得到的每个分布结果,与目标概率阈值基于预设逆累积分函数进行运算,运算结果即为使用率阈值,从而得到目标数据集合中每个目标数据都对应的使用率阈值,目标数据集合中存在多少个目标数据,即可得到相同数量的对应使用率阈值。
值得被理解的是,本申请实施例提供的预设逆累积分函数,是与Beta分布中的Beta分布函数具有对应关系的相应函数,即当公式(3)中的Beta分布函数的参数α和β确定之后,可以通过调用软件包以及运行软件包的方式,在输入目标概率阈值以及每个分布结果,即每个目标数据对应的Beta分布,得到基于预设逆累积分函数运算之后的相应结果,该结果即为每个目标数据对应使用率阈值。其中,对于实现预设逆累积分函数运算的软件包,本申请实施例不作限定。
S1013:通过确定所有使用率阈值的平均值,得到目标阈值。
在确定了目标数据集合中每个目标数据对应的使用率阈值之后,确定所有使用率阈值的平均值,将该平均值作为目标阈值,以进行目标数据库是否发生故障的判断。可以理解的是,将所确定的目标阈值进行存储以备后续步骤使用。
可见,本申请实施例提供的目标阈值是基于目标数据集合中每个目标数据对应的概率分布特征函数首先确定其对应的使用率阈值,进而对所有的使用率阈值进行平均值运算得到。相比于采用一参数直接确定目标阈值,本申请实施例所确定的目标阈值精度更高,对于数据库故障的发现更为准确。
通过上述实施例的描述可知,本申请实施例提供的数据库故障发现方法,在确定用于 异常数据判断的目标阈值时,基于每个目标数据对应的概率分布特征函数进行,从而可以对应得到与目标数据数量相同的多个使用率阈值,进而将所有使用率阈值的均值可以作为目标阈值,可见,目标阈值的确定是基于多组数据获得,而并非现有技术中的一组数据,从而可以使得所确定的目标阈值更加平滑,减少异常点的影响。
以概率分布特征函数为Beta分布函数为例进行说明,本申请实施例提供的目标阈值,通过每个目标数据得到对应的Beta分布,每个Beta分布都具有对应的拟合参数,假如目标数据集合有N个目标数据,则得到N个Beta分布,以及N组对应的拟合参数α和β,再对目标概率阈值和N个分布结果经过对应的N次预设逆累积分函数运算,得到N个使用率阈值,将N个使用率阈值的均值确定为目标阈值。而现有技术中,在应用Beta分布函数时,通常是对所有的数据使用一个Beta分布函数,即得到一组α和β拟合参数,基于该一个结果得到对应的阈值。相比可见,本申请所确定目标阈值时的拟合过程更贴合目标数据库的实际运行工况,拟合结果更为平滑,有利于提高目标阈值的精度。
S102:判断目标待测数据是否大于目标阈值。
其中,目标待测数据用于表征存储设备的当前使用率。
在确定了目标阈值之后,通过将目标待测数据与目标阈值进行比较,以确定目标待测数据是否为异常数据。其中,目标待测数据用于表征存储设备的当前使用率。换言之,在确定了目标阈值之后,获取目标数据库的存储设备的当前使用率,即可通过判断目标待测数据与目标阈值之间的大小关系,以确定目标待测数据是否为异常数据。
通过比较目标待测数据与目标阈值之间的大小关系,得到对应的判断结果。若判断结果为是,即目标待测数据大于目标阈值,则确定用于判断的当前目标待测数据为异常数据,执行步骤S103。反之,若判断结果为否,即目标待测数据不大于目标阈值,则确定用于判断的当前目标待测数据非异常数据,则执行步骤S105。可以理解的是,本步骤中每次与目标阈值进行比较判断的目标待测数据的数量为一个,即每获取到一个目标待测数据,都将其与目标阈值进行比较,以确定该目标待测数据是否为异常数据。
需要说明的是,本申请实施例提供的对于目标待测数据是否为异常数据的判断依据包括上述示例性中通过判断目标待测数据是否大于目标阈值,但并不仅限于该判断依据,可以根据实际工况中目标数据库的运行实况设置相应的判断依据,对此,本申请实施例不作限定。
S103:若根据目标阈值确定目标待测数据为异常数据,则判断异常时间窗口内的其他目标待测数据是否均为异常数据。
在根据目标阈值确定了目标待测数据为异常数据之后,需要进一步判断异常时间窗口 内的其他目标待测数据是否均为异常数据。其中,异常时间窗口用于表征目标数据库运行状况的周期性特征,换言之,异常时间窗口用于表征目标数据的相似度。在此,引入异常时间窗口的判断过程,是结合目标数据库的实际运行工况以提高数据库故障发现方法的稳定性。在实际工况中,可能目标数据库的运行会存在某个目标数据的突增,但并不能仅因为当前一个数据就判断目标数据库发生故障,而应该继续监测目标数据库在异常时间窗口内的其他目标待测数据是否都为异常数据。
在一种可能的设计中,判断异常时间窗口内的其他目标待测数据是否均为异常数据的可能的实现方式,可以采用前述实施例中根据目标阈值确定目标待测数据是否为异常数据同样的方式,即判断其他目标待测数据是否均大于目标阈值,若判断结果为是,则确定异常时间窗口内的其他目标待测数据均为异常数据,相应地,若判断结果为否,即异常时间窗口内的其他目标待测数据中的一个或者全部都不大于目标阈值,则确定异常时间窗口内的其他目标待测数据并非均为异常数据。
可见,在对获取到的目标待测数据是否为异常数据判断之后,只有当其为异常数据时,进一步判断异常时间窗口内的其他目标待测数据是否也均为异常数据,若判断结果为是,则执行步骤S104,若判断结果为否,则执行步骤S105。
需要说明的是,对于异常时间窗口内的其他目标待测数据是否均为异常数据的判断依据可以与步骤S102中的一致,也可以不一致,具体可以根据目标数据库的实际运行工况进行设置,对此,本申请实施例不作限定。
S104:确定目标数据库发生故障。
S105:确定目标数据库运行正常。
若根据目标阈值确定目标待测数据为异常数据,并进一步判断异常时间窗口内的其他目标待测数据也均为异常数据,则确定目标数据库发生故障。
另一方面,若根据目标阈值确定目标待测数据非异常数据,或者,在根据目标阈值确定目标待测数据为异常数据,但在进一步判断异常时间窗口内的其他目标待测数据是否均为异常数据时,得到的判断结果是其他目标待测数据并非均为异常数据,则确定目标数据库运行正常,未发生故障。
至此,本申请实施例提供的数据库故障发现方法,通过目标阈值的确定过程,以及根据目标阈值确定目标待测数据是否为异常数据,在其为异常数据的前提下,进一步判断异常时间窗口内的其他目标待测数据是否均为异常数据,以当异常时间窗口内的其他目标待测数据均为异常数据时,确定目标数据库发生故障。
本申请实施例提供的数据库故障发现方法,首先根据目标数据集合以及目标数据集合 中每个目标数据对应的概率分布特征函数,例如Beta分布函数,确定目标阈值,其中,目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率。然后根据目标阈值对每次所获取到的目标待测数据是否为异常数据进行判断,若根据目标阈值确定目标待测数据为异常数据,则进一步判断异常时间窗口内的其他目标待测数据是否均为异常数据。在经过判断之后,若异常时间窗口内的其他目标待测数据也均为异常数据,则确定目标数据库发生故障,目标待测数据用于表征存储设备的当前使用率。基于存储设备的历史使用率以及对应的概率分布特征函数确定目标阈值,使得对于目标阈值的确定更加符合目标数据库的实际运行工况,提高了目标阈值的合理性以及精度。当根据目标阈值将每次获取到的目标待测数据确定为异常数据之后,引入异常时间窗口的判断,使得故障的确定过程与目标数据库实际运行的周期性强关联,进一步提高了故障发现的合理性。另外,本申请实施例提供的故障发现的过程,无需考虑业务多样性以额外构建相应的深度学习模型,具有可实现性强以及利用率高等优点。
在一种可能的设计中,在确定目标数据库存储在故障之后,还包括:
生成告警信息;
发送告警信息至控制端和/或客户端,以提示目标数据库发生故障。
本申请实施例提供的故障发现方法,在步骤S104确定目标数据库发生故障之后,还可以生成告警信息,然后发送告警信息至控制端和/或客户端,以提示运维人员或用户目标数据库发生故障。其中,所生成的告警信息的具体方式,可以根据目标数据库实际工况所涉及的业务类型等因素进行设置,本申请实施例不作限定。控制端例如可以为运维人员的操作平台,客户端例如可以为目标数据库的用户端,显然,本实施例仅是示意性答,提示目标数据库发生故障的目标对象包括但不仅限于控制端和/或客户端,以及控制端及客户端的具体工作内容及权限,本申请实施例不作限定。
本申请实施例提供的数据库故障发现方法,在确定目标数据库存在故障之后,还生成的告警信息,并将告警信息发送至控制端和/或客户端,以提示相关人员或平台目标数据库的运行存在故障,提高了用户的使用体验。
在上述实施例中,确定目标阈值时的目标数据以及异常数据判断时的目标待测数据,其取值范围均为[0,1]。而目标数据库实际运行过程中,所能够获取到的数据通常为0至100之间的整数,因而,在一种可能的设计中,在步骤S1011之前,本申请实施例提供的数据库故障发现方法,还可以包括如图6所示的步骤,图6为本申请实施例提供的再一种数据库故障发现方法的流程示意图,如图6所示,本实施例包括:
S201:根据预设时间周期获取存储设备在历史预设时长内的多个历史数据。
如前述实施例中所描述,历史预设时长可以为过去的历史一周,或历史一个月等历史时长,以历史一周为例,在过去的历史一周中,每隔一分钟获取一次存储设备的使用状态,所获取到的该使用率对应的数据即为历史数据,每分钟即为预设时间周期,则可以获取到历史预设时长内的多个历史数据,每个历史数据表征存储设备的历史使用状态。
S202:按照预设筛选规则对每个历史数据进行标注,剔除不符合预设筛选规则的历史数据,以得到候选数据。
目标数据库在实际运行过程中,例如,可能会出现负数或超过100的情况,因而,可以通过设置的预设筛选规则对诸如此类的历史数据进行剔除。例如,按照预设筛选规则对每个历史数据进行标注,例如人工等标注方式,剔除不符合预设筛选规则的历史数据,相应地,符合预设筛选规则的即为候选数据。其中,预设筛选规则可以是大于等于0小于等于100等等,可以根据目标数据库运行过程中存储设备的使用状态具体设置,对此本申请实施例不作限定。
S203:对每个候选数据进行百分比运算,得到对应的目标数据。
对得到的每个候选数据进行百分比运算,即将其换算取值范围在[0,1]区间的对应数据,则得到历史数据对应的目标数据。从而基于所得到的多个目标数据形成目标数据集合,进而确定目标阈值。
本申请实施例提供的数据库故障发现方法,在根据第一预设算法确定目标数据集合的波动系数之前,首先根据预设时间周期获取存储设备在历史预设时长内的多个历史数据,然后按照预设筛选规则对每个历史数据进行标注,以剔除不符合预设筛选规则的历史数据,符合预设筛选规则的历史数据即确定为候选数据,并对每个候选数据进行百分比运算,得到对应的目标数据。从而,将目标数据库实际运行过程中所获取到的历史数据,经过一定的预处理过程,得到取值范围在[0,1]区间的目标数据,以基于其对应的概率分布特征函数确定目标阈值,使得目标阈值的确定过程更为合理,并有利于提高所确定的目标阈值的精度。
在一种可能的设计中,在根据目标阈值确定目标待测数据是否为异常数据之前,即在判断目标待测数据是否大于目标阈值,以根据目标阈值确定目标待测数据为异常数据之前,还可以对每次所获取到的待测数据采用图6所示实施例的相应步骤进行预处理,以得到目标待测数据,进而判断其与目标阈值之间的大小关系。
例如,首先根据预设时间周期获取待测数据,即与获取历史数据相同的时间间隔获取待测数据,然后按照预设筛选规则对待测数据进行筛选,得到对应的目标待测数据。其中,待测数据可以理解为存储设备的当前使用状态。对于待测数据的预处理以得到目标待测数 据的具体实现方式及技术效果,则与图6所示实施例中的相关步骤类似,在此不再赘述。
如前述实施例所描述,若根据目标阈值确定目标待测数据为异常数据之后,还进一步判断异常时间窗口内的其他目标待测数据是否均为异常数据。其中,确定异常时间窗口的一种可能的实现方式如图7所示,图7为本申请实施例提供的一种确定异常时间窗口的流程示意图,如图7所示,本实施例提供的方法,包括:
S301:根据目标数据集合以及第二预设算法确定多个相似度。
其中,每个相似度用于表征相邻两个单位时长对应的目标数据之间的相似程度,历史预设时长包括多个单位时长。
目标数据集合是由历史预设时长内的多个目标数据构成,历史预设时长包括多个单位时长。由于历史预设时长为物理时间中的连续时长,因而,将历史预设时长可以划分为连续的且时长相等的多个单位时长,例如,历史预设时长为历史一周,即历史七天,单位时长可以为一周中的每一天。根据目标数据集合以及第二预设算法确定多个相似度,每个相似度用于表征相邻两个单位时长对应的目标数据之间的相似程度,则可以理解为,根据目标数据集合以及第二预设算法,确定历史七天中,每相邻两天的目标数据之间的相似程度,则可以得到数量为六的多个相似度。
例如,根据目标数据集合以及第二预设算法确定多个相似度可能的实现方式为:
通过第二预设算法依次确定每相邻两个单位时长的目标数据子集之间的相似度,以得到多个相似度,其中,目标数据子集包括一个单位时长内的所有目标数据。
历史预设时长被划分为连续的且时长相等的多个单位时长,例如,历史预设时长为一周的七天,则单位时长依次为一周的第一天至第七天。因而,通过第二预设算法依次确定每相邻两个单位时长的目标数据子集之间的相似度,即通过第二预设算法依次确定第一天和第二天、第二天和第三天,直至第六天和第七天的每相邻两天的目标数据子集之间的相似度,则可以得到六个相似度。目标数据子集包括有一个单位时长内的所有目标数据,即目标数据集合被划分为七个目标数据子集,每个目标数据子集包括一个单位时长,即每天的所有目标数据。其中,第二预设算法可以为余弦相似度算法,如下公式(4)所示:
Figure PCTCN2021119583-appb-000003
依据上述描述,预设时间周期为每分钟,则公式(4)中n的取值可以为1440,即目标数据子集中存在1440个目标数据,历史时长中一天的目标数据可以通过一个1440维的向量A表示,与向量A相邻的这一天的目标数据也同样为一个1440维的向量B表示,S则表示向量A与向量B之间的相似度,即相邻两天的目标数据之间的相似度。
当历史时长为历史一周时,则目标数据集合中的目标数据基于上述公式(4)可以确定的六个相似度。
需要说明的是,历史预设时长中单位时长所对应的具体时长可以根据目标数据库的实际运行情况进行设置,上述仅是示例性的描述,将单位时长设置为一天,但并非是对其进行了限定。另外,第二预设算法所具体采用的运算方法也可以是其他具有同等功效的相关运算方法,包括但不仅限与上述的余弦相似度算法。
S302:获取所有相似度的平均值,以得到目标相似度。
对通过步骤S301所确定的多个相似度进行平均值运算,将运算结果确定为目标相似度。
S303:根据第三预设算法、预设异常时间窗口阈值以及目标相似度确定异常时间窗口。
在确定了目标相似度之后,根据第三预设算法、预设异常时间窗口阈值以及目标相似度确定异常时间窗口,以通过异常时间窗口反映目标数据库实际运行情况的周期性特征。
对于周期性较强的目标数据库,短时间内出现异常数据就应当引起运维人员等控制端的重视。而对于周期性较弱的目标数据库,出现异常数据的情况持续一段时间才能被认为目标数据库的实际运行发生故障。因而,对于目标数据库是否发生故障的判断,引入异常时间窗口,以提高判断方法的稳定性。
在一种可能的设计中,步骤S303可能的实现方式包括:
通过第三预设算法和目标相似度确定候选异常时间窗口;
若候选异常时间窗口对应的数值大于预设异常时间窗口阈值对应的数值,则确定候选异常时间窗口为异常时间窗口;
若候选异常时间窗口对应的数值小于等于预设异常时间窗口阈值对应的数值,则确定预设异常时间窗口阈值为异常时间窗口。
其中,首先通过第三预设算法进行目标相似度与候选异常时间窗口的转换,第三预设算法可以采用如下所示的公式(5)表示:
T=10-上取整(目标相似度*10)       (5)
具体地,将目标相似度与10相乘,取其乘积的整数部分,再与10做差,得到的结果则为候选异常时间窗口。例如,目标相似度为0.7,则所确定的候选异常时间窗口即为3。
在实际工况中,对于一实际工况明确的目标数据库而言,通常会根据长期的运行情况设置一经验值,即预设异常时间窗口阈值。因而,为了提高本申请实施例提供的数据库故障发现方法的稳定性,在根据第三预设算法和目标相似度确定了候选异常时间窗口后,还需将候选异常时间窗口与预设异常时间窗口阈值各自对应的数值进行比较,根据比较结果 确定异常时间窗口。
例如,若候选异常时间窗口对应的数值大于预设异常时间窗口阈值对应的数值,则确定候选异常时间窗口为异常时间窗口。另一方面,若候选异常时间窗口对应的数值小于等于预设异常时间窗口阈值对应的数值,则确定预设异常时间窗口阈值为异常时间窗口。
根据经验值,通常设置预设异常时间窗口阈值为3。当候选异常时间窗口对应的数值大于3时,则确定候选异常时间窗口为异常时间窗口。而当候选异常时间窗口对应的数值小于3时,例如2或1,则确定预设异常时间窗口阈值为异常时间窗口。可以理解的是,预设异常时间窗口阈值对应的数值并不限定于3,可以根据目标数据库的实际工况进行设置,对此,本申请实施例不作限定。
假如确定的异常时间窗口对应的数值为3,则在利用异常时间窗口进行判断时,异常时间窗口内的其他目标待测数据的数量为2个,即进行数据库故障判断的目标待测数据的总数量为3,因而若根据目标阈值确定了目标待测数据为异常数据后,进一步判断该目标待测数据之后的2个其他目标待测数据是否均为异常数据。
需要说明的是,第三预设算法也可以是其他的转换公式,上述公式(5)仅是示意性的,并非限定于此。
本申请实施例提供的数据库故障发现方法,若确定了目标待测数据为异常数据,则进一步判断异常时间窗口内的其他目标待测数据是否均为异常数据,因而,在该判断步骤之前,可以首先根据目标数据集合以及第二预设算法确定多个相似度,其中,每个相似度用于表征相邻两个单位时长对应的目标数据之间的相似程度,而历史预设时长包括多个单位时长。然后获取所有相似度的平均值,以得到目标相似度,再根据第三预设算法、预设异常时间窗口阈值以及目标相似度确定异常时间窗口。使得故障的确定过程与目标数据库实际运行的周期性强关联,进一步提高了故障发现的合理性及稳定性。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
图8为本申请实施例提供的一种数据库故障发现装置的结构示意图,如图8所示,本实施例提供的数据库故障发现装置400,包括:
第一处理模块401,用于根据目标数据集合以及目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值。
其中,目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率。
第二处理模块402,用于若根据目标阈值确定目标待测数据为异常数据,则判断异常时间窗口内的其他目标待测数据是否均为异常数据。
其中,目标待测数据用于表征存储设备的当前使用率。
第三处理模块403,用于若判断结果为是,则确定目标数据库发生故障。
在一种可能的设计中,第一处理模块401,具体用于:
根据第一预设算法确定目标数据集合的波动系数,并根据波动系数确定目标概率阈值,波动系数用于表征历史使用率的波动幅度;
基于预设逆累积分函数,根据目标概率阈值以及每个目标数据对应的概率分布特征函数确定每个目标数据各自对应的使用率阈值;
通过确定所有使用率阈值的平均值,得到目标阈值。
在一种可能的设计中,第一处理模块401,还具体用于:
根据每个目标数据以及对应的Beta分布函数确定对应的每个分布结果,每个分布结果为每个目标数据对应的Beta分布,概率分布特征函数包括Beta分布函数;
基于预设逆累积分函数,根据目标概率阈值以及每个分布结果确定每个使用率阈值。
在一种可能的设计中,第二处理模块402,还用于:
判断目标待测数据是否大于目标阈值;
若是,则确定目标待测数据为异常数据;
若否,则确定目标待测数据非异常数据。
在一种可能的设计中,第二处理模块402,还用于:
判断其他目标待测数据是否均大于目标阈值;
若是,则确定其他目标待测数据均为异常数据;
若否,则确定其他目标待测数据并非均为异常数据。
在一种可能的设计中,第三处理模块403,还用于:
若目标待测数据非异常数据,则确定目标数据库运行正常;或者
若其他目标待测数据并非均为异常数据,则确定目标数据库运行正常。
在图8所示实施例的基础上,图9为本申请实施例提供的另一种数据库故障发现装置的结构示意图,如图9所示,本实施例提供的数据库故障发现装置400,还包括:
获取模块404,用于根据预设时间周期获取存储设备在历史预设时长内的多个历史数据;
筛选模块405,用于按照预设筛选规则对每个历史数据进行人工标注,剔除不符合预设筛选规则的历史数据,以得到候选数据;
运算模块406,用于对每个候选数据进行百分比运算,得到对应的目标数据。
在一种可能的设计中,获取模块404,还用于根据预设时间周期获取待测数据;
筛选模块405,还用于按照预设筛选规则对待测数据进行筛选,得到对应的目标待测数据。
在上述实施例的基础上,本申请实施例提供的数据库故障发现装置400,还包括:第四处理模块;
其中,第四处理模块,用于:
根据目标数据集合以及第二预设算法确定多个相似度,每个相似度用于表征相邻两个单位时长对应的目标数据之间的相似程度,历史预设时长包括多个单位时长;
获取所有相似度的平均值,以得到目标相似度;
根据第三预设算法、预设异常时间窗口阈值以及目标相似度确定异常时间窗口。
在一种可能的设计中,第四处理模块,还用于:
通过第二预设算法依次确定每相邻两个单位时长的目标数据子集之间的相似度,以得到多个相似度,目标数据子集包括一个单位时长内的所有目标数据。
在一种可能的设计中,第四处理模块,还用于:
通过第三预设算法和目标相似度确定候选异常时间窗口;
若候选异常时间窗口对应的数值大于预设异常时间窗口阈值对应的数值,则确定候选异常时间窗口为异常时间窗口;
若候选异常时间窗口对应的数值小于等于预设异常时间窗口阈值对应的数值,则确定预设异常时间窗口阈值为异常时间窗口。
在一种可能的设计中,本申请实施例提供的数据库故障发现装置400,还包括:
生成模块,用于生成告警信息;
发送模块,用于发送告警信息至控制端和/或客户端,以提示目标数据库发生故障。
本申请所提供的上述装置实施例仅仅是示意性的,其中的模块划分仅仅是一种逻辑功能划分,实际实现时可以有另外的划分方式。例如多个模块可以结合或者可以集成到另一个系统。各个模块相互之间的耦合可以是通过一些接口实现,这些接口通常是电性通信接口,但是也不排除可能是机械接口或其它的形式接口。因此,作为分离部件说明的模块可以是或者也可以不是物理上分开的,既可以位于一个地方,也可以分布到同一个或不同设备的不同位置上。
值得说明的,上述所示实施例提供的数据库故障发现装置,可用于执行上述实施例提供的数据库故障发现方法的对应步骤,具体实现方式、原理以及技术效果与前述方法实施例类似,在此不再赘述。
图10为本申请实施例提供的一种电子设备的结构示意图,如图10所示,本实施例提 供的电子设备500,包括:
至少一个处理器501;以及
与至少一个处理器501通信连接的存储器502;其中,
存储器502存储有可被至少一个处理器501执行的指令,指令被至少一个处理器501执行,以使至少一个处理器501能够执行上述方法实施例中的数据库故障发现方法的各个步骤,具体可以参考前述方法实施例中的相关描述。
可选地,存储器502既可以是独立的,也可以跟处理器501集成在一起。
当存储器502是独立于处理器501之外的器件时,电子设备500,还可以包括:
总线503,用于连接处理器501以及存储器502。
此外,本申请实施例还提供一种存储有计算机指令的非瞬时计算机可读存储介质,计算机指令用于使计算机执行上述各实施例中的数据库故障发现方法的各个步骤。例如,可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由权利要求书指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求书来限制。

Claims (13)

  1. 一种数据库故障发现方法,其特征在于,包括:
    根据目标数据集合以及所述目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,所述目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率;
    若根据所述目标阈值确定目标待测数据为异常数据,则判断异常时间窗口内的其他目标待测数据是否均为所述异常数据,所述目标待测数据用于表征所述存储设备的当前使用率;
    若判断结果为是,则确定所述目标数据库发生故障。
  2. 根据权利要求1所述的数据库故障发现方法,其特征在于,所述根据目标数据集合以及所述目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,包括:
    根据第一预设算法确定所述目标数据集合的波动系数,并根据所述波动系数确定目标概率阈值,所述波动系数用于表征所述历史使用率的波动幅度;
    基于预设逆累积分函数,根据所述目标概率阈值以及每个目标数据对应的所述概率分布特征函数确定所述每个目标数据各自对应的使用率阈值;
    通过确定所有使用率阈值的平均值,得到所述目标阈值。
  3. 根据权利要求2所述的数据库故障发现方法,其特征在于,所述基于预设逆累积分函数,根据所述目标概率阈值以及每个目标数据对应的所述概率分布特征函数确定所述每个目标数据各自对应的使用率阈值,包括:
    根据每个目标数据以及对应的Beta分布函数确定对应的每个分布结果,每个分布结果为每个目标数据对应的Beta分布,所述概率分布特征函数包括所述Beta分布函数;
    基于所述预设逆累积分函数,根据所述目标概率阈值以及每个分布结果确定每个使用率阈值。
  4. 根据权利要求2所述的数据库故障发现方法,其特征在于,在所述根据第一预设算法确定所述目标数据集合的波动系数之前,还包括:
    根据预设时间周期获取所述存储设备在所述历史预设时长内的多个历史数据;
    按照预设筛选规则对每个历史数据进行标注,剔除不符合所述预设筛选规则的历史数据,以得到候选数据;
    对每个候选数据进行百分比运算,得到对应的所述目标数据。
  5. 根据权利要求4所述的数据库故障发现方法,其特征在于,在所述根据所述目标阈值确定目标待测数据为异常数据之前,还包括:
    根据所述预设时间周期获取待测数据;
    按照所述预设筛选规则对所述待测数据进行筛选,得到对应的所述目标待测数据。
  6. 根据权利要求1-5任一项所述的数据库故障发现方法,其特征在于,在所述判断异常时间窗口内的其他目标待测数据是否均为所述异常数据之前,还包括:
    根据所述目标数据集合以及第二预设算法确定多个相似度,每个相似度用于表征相邻两个单位时长对应的所述目标数据之间的相似程度,所述历史预设时长包括多个单位时长;
    获取所有相似度的平均值,以得到目标相似度;
    根据第三预设算法、预设异常时间窗口阈值以及所述目标相似度确定所述异常时间窗口。
  7. 根据权利要求6所述的数据库故障发现方法,其特征在于,所述根据所述目标数据集合以及第二预设算法确定多个相似度,包括:
    通过所述第二预设算法依次确定每相邻两个单位时长的目标数据子集之间的相似度,以得到所述多个相似度,所述目标数据子集包括一个单位时长内的所有目标数据。
  8. 根据权利要求6所述的数据库故障发现方法,其特征在于,所述根据第三预设算法、预设异常时间窗口阈值以及所述目标相似度确定所述异常时间窗口,包括:
    通过所述第三预设算法和所述目标相似度确定候选异常时间窗口;
    若所述候选异常时间窗口对应的数值大于所述预设异常时间窗口阈值对应的数值,则确定所述候选异常时间窗口为所述异常时间窗口;
    若所述候选异常时间窗口对应的数值小于等于所述预设异常时间窗口阈值对应的数值,则确定所述预设异常时间窗口阈值为所述异常时间窗口。
  9. 根据权利要求7所述的数据库故障发现方法,其特征在于,若所述目标待测数据非所述异常数据,则确定所述目标数据库运行正常;或者
    若所述其他目标待测数据并非均为所述异常数据,则确定所述目标数据库运行正常。
  10. 根据权利要求1-5任一项所述的数据库故障发现方法,其特征在于,在所述确定所述目标数据库发生故障之后,还包括:
    生成告警信息;
    发送所述告警信息至控制端和/或客户端,以提示所述目标数据库发生故障。
  11. 一种数据库故障发现装置,其特征在于,包括:
    第一处理模块,用于根据目标数据集合以及所述目标数据集合中每个目标数据对应的概率分布特征函数确定目标阈值,所述目标数据用于表征目标数据库的存储设备在历史预设时长内的历史使用率;
    第二处理模块,用于若根据所述目标阈值确定目标待测数据为异常数据,则判断异常时间窗口内的其他目标待测数据是否均为所述异常数据,所述目标待测数据用于表征所述存储设备的当前使用率;
    第三处理模块,用于若判断结果为是,则确定所述目标数据库发生故障。
  12. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,存储器存储有可被所述至少一个处理器执行的指令,指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-10中任一项所述的数据库故障发现方法。
  13. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行权利要求1-10中任一项所述的数据库故障发现方法。
PCT/CN2021/119583 2020-09-30 2021-09-22 数据库故障发现方法、装置、电子设备及存储介质 WO2022068645A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011058803.4A CN112162878B (zh) 2020-09-30 2020-09-30 数据库故障发现方法、装置、电子设备及存储介质
CN202011058803.4 2020-09-30

Publications (1)

Publication Number Publication Date
WO2022068645A1 true WO2022068645A1 (zh) 2022-04-07

Family

ID=73861650

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/119583 WO2022068645A1 (zh) 2020-09-30 2021-09-22 数据库故障发现方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112162878B (zh)
WO (1) WO2022068645A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115221211A (zh) * 2022-09-21 2022-10-21 国网智能电网研究院有限公司 一种图计算处理方法、装置、电子设备及存储介质
CN115242621A (zh) * 2022-07-21 2022-10-25 北京天一恩华科技股份有限公司 网络专线监控方法、装置、设备及计算机可读存储介质
CN115659134A (zh) * 2022-11-02 2023-01-31 上海米硅科技有限公司 一种提高芯片灵敏度的信号处理方法与相关装置
CN115817178A (zh) * 2022-11-14 2023-03-21 宁德时代新能源科技股份有限公司 故障预警方法、装置、电池、车辆及存储介质
CN115935243A (zh) * 2023-03-01 2023-04-07 武汉同创万智数字科技有限公司 一种基于数据处理的故障分析方法
CN116302899A (zh) * 2023-05-18 2023-06-23 中诚华隆计算机技术有限公司 一种芯粒故障诊断方法和装置
CN116593816A (zh) * 2023-04-19 2023-08-15 国网黑龙江省电力有限公司齐齐哈尔供电公司 一种配电网同步数据故障定位方法
CN117349781A (zh) * 2023-12-06 2024-01-05 东莞市郡嘉电子科技有限公司 一种变压器故障智能诊断方法及系统
CN117591530A (zh) * 2024-01-17 2024-02-23 杭银消费金融股份有限公司 一种数据截面处理方法及系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112162878B (zh) * 2020-09-30 2021-09-28 深圳前海微众银行股份有限公司 数据库故障发现方法、装置、电子设备及存储介质
CN113033839A (zh) * 2021-03-17 2021-06-25 山东通维信息工程有限公司 一种基于itss的高速公路机电智能运维改进的方法
CN113158234B (zh) * 2021-03-29 2022-09-27 上海雾帜智能科技有限公司 一种安全事件发生频率量化方法、装置、设备及介质
CN113325824B (zh) * 2021-06-02 2022-10-25 三门核电有限公司 一种基于阈值监测的调节阀异常识别方法及系统
CN113759790A (zh) * 2021-09-15 2021-12-07 北京三快在线科技有限公司 一种无人驾驶设备的系统优化方法及装置
CN113918376B (zh) * 2021-12-14 2022-03-04 湖南天云软件技术有限公司 故障检测方法、装置、设备及计算机可读存储介质
CN115687447B (zh) * 2022-10-13 2023-09-26 杭州憬知梦蓝科技有限公司 一种基于物联网的海洋环境监测系统及方法
CN116975574B (zh) * 2023-08-31 2024-04-16 国家海洋环境监测中心 一种海洋环境重金属污染评价方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271150A1 (en) * 2008-04-23 2009-10-29 Honeywell International Inc. System, method and algorithm for data-driven equipment performance monitoring
CN107197473A (zh) * 2017-06-15 2017-09-22 三星电子(中国)研发中心 一种终端异常状态确定方法和装置
CN108829535A (zh) * 2018-06-08 2018-11-16 上海擎创信息技术有限公司 数据处理方法、终端及计算机可读存储介质
CN109213654A (zh) * 2018-07-05 2019-01-15 北京奇艺世纪科技有限公司 一种异常检测方法及装置
CN110046054A (zh) * 2018-01-17 2019-07-23 中兴通讯股份有限公司 虚拟机异常检测方法、装置、设备及计算机可读存储介质
CN110348718A (zh) * 2019-06-28 2019-10-18 北京淇瑀信息科技有限公司 金融业务指标监控方法、装置及电子设备
CN111625413A (zh) * 2020-04-23 2020-09-04 平安科技(深圳)有限公司 指标异常分析方法、装置及存储介质
CN112162878A (zh) * 2020-09-30 2021-01-01 深圳前海微众银行股份有限公司 数据库故障发现方法、装置、电子设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MY150126A (en) * 2008-04-11 2013-11-29 Telekom Malaysia Berhad Interference identifier in digital subscriber line system
CN105718715B (zh) * 2015-12-23 2018-10-30 华为技术有限公司 异常检测方法和设备
CN107943809B (zh) * 2016-10-13 2022-02-01 阿里巴巴集团控股有限公司 数据质量监控方法、装置及大数据计算平台
CN110764474B (zh) * 2019-10-16 2023-01-31 上海电气集团股份有限公司 监测设备运行状态的方法和系统

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271150A1 (en) * 2008-04-23 2009-10-29 Honeywell International Inc. System, method and algorithm for data-driven equipment performance monitoring
CN107197473A (zh) * 2017-06-15 2017-09-22 三星电子(中国)研发中心 一种终端异常状态确定方法和装置
CN110046054A (zh) * 2018-01-17 2019-07-23 中兴通讯股份有限公司 虚拟机异常检测方法、装置、设备及计算机可读存储介质
CN108829535A (zh) * 2018-06-08 2018-11-16 上海擎创信息技术有限公司 数据处理方法、终端及计算机可读存储介质
CN109213654A (zh) * 2018-07-05 2019-01-15 北京奇艺世纪科技有限公司 一种异常检测方法及装置
CN110348718A (zh) * 2019-06-28 2019-10-18 北京淇瑀信息科技有限公司 金融业务指标监控方法、装置及电子设备
CN111625413A (zh) * 2020-04-23 2020-09-04 平安科技(深圳)有限公司 指标异常分析方法、装置及存储介质
CN112162878A (zh) * 2020-09-30 2021-01-01 深圳前海微众银行股份有限公司 数据库故障发现方法、装置、电子设备及存储介质

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242621A (zh) * 2022-07-21 2022-10-25 北京天一恩华科技股份有限公司 网络专线监控方法、装置、设备及计算机可读存储介质
CN115242621B (zh) * 2022-07-21 2024-01-02 北京天一恩华科技股份有限公司 网络专线监控方法、装置、设备及计算机可读存储介质
CN115221211A (zh) * 2022-09-21 2022-10-21 国网智能电网研究院有限公司 一种图计算处理方法、装置、电子设备及存储介质
CN115659134A (zh) * 2022-11-02 2023-01-31 上海米硅科技有限公司 一种提高芯片灵敏度的信号处理方法与相关装置
CN115659134B (zh) * 2022-11-02 2024-03-22 上海米硅科技有限公司 一种提高芯片灵敏度的信号处理方法与相关装置
CN115817178A (zh) * 2022-11-14 2023-03-21 宁德时代新能源科技股份有限公司 故障预警方法、装置、电池、车辆及存储介质
CN115935243B (zh) * 2023-03-01 2023-09-15 华谋咨询技术(深圳)有限公司 一种基于数据处理的故障分析方法
CN115935243A (zh) * 2023-03-01 2023-04-07 武汉同创万智数字科技有限公司 一种基于数据处理的故障分析方法
CN116593816A (zh) * 2023-04-19 2023-08-15 国网黑龙江省电力有限公司齐齐哈尔供电公司 一种配电网同步数据故障定位方法
CN116593816B (zh) * 2023-04-19 2023-12-19 国网黑龙江省电力有限公司齐齐哈尔供电公司 一种配电网同步数据故障定位方法
CN116302899B (zh) * 2023-05-18 2023-07-28 中诚华隆计算机技术有限公司 一种芯粒故障诊断方法和装置
CN116302899A (zh) * 2023-05-18 2023-06-23 中诚华隆计算机技术有限公司 一种芯粒故障诊断方法和装置
CN117349781A (zh) * 2023-12-06 2024-01-05 东莞市郡嘉电子科技有限公司 一种变压器故障智能诊断方法及系统
CN117349781B (zh) * 2023-12-06 2024-03-22 东莞市郡嘉电子科技有限公司 一种变压器故障智能诊断方法及系统
CN117591530A (zh) * 2024-01-17 2024-02-23 杭银消费金融股份有限公司 一种数据截面处理方法及系统
CN117591530B (zh) * 2024-01-17 2024-04-19 杭银消费金融股份有限公司 一种数据截面处理方法及系统

Also Published As

Publication number Publication date
CN112162878B (zh) 2021-09-28
CN112162878A (zh) 2021-01-01

Similar Documents

Publication Publication Date Title
WO2022068645A1 (zh) 数据库故障发现方法、装置、电子设备及存储介质
US7647524B2 (en) Anomaly detection
US6792456B1 (en) Systems and methods for authoring and executing operational policies that use event rates
JP2018170006A (ja) 電力グリッドにおけるサイバー脅威を検出する汎用フレームワーク
CN111309565B (zh) 告警处理方法、装置、电子设备以及计算机可读存储介质
US20050188263A1 (en) Detecting and correcting a failure sequence in a computer system before a failure occurs
US9860109B2 (en) Automatic alert generation
CN112380089A (zh) 一种数据中心监控预警方法及系统
JP2018028783A (ja) システム状態可視化プログラム、システム状態可視化方法及びシステム状態可視化装置
CN113438110B (zh) 一种集群性能的评价方法、装置、设备及存储介质
CN117041029A (zh) 网络设备故障处理方法、装置、电子设备及存储介质
CN113900844A (zh) 一种基于服务码级别的故障根因定位方法、系统及存储介质
CN109783324A (zh) 系统运行预警方法及装置
CN113837596A (zh) 一种故障确定方法、装置、电子设备及存储介质
CN115529595A (zh) 一种日志数据的异常检测方法、装置、设备及介质
CN115373888A (zh) 故障定位方法、装置、电子设备和存储介质
US9397921B2 (en) Method and system for signal categorization for monitoring and detecting health changes in a database system
CN113656252A (zh) 故障定位方法、装置、电子设备以及存储介质
CN116414608A (zh) 异常检测方法、装置、设备及存储介质
JP2009245154A (ja) シンプトンを評価するためのコンピュータ・システム、並びにその方法及びコンピュータ・プログラム
CN116226644A (zh) 设备故障类型的确定方法、装置、电子设备及存储介质
US20200213203A1 (en) Dynamic network health monitoring using predictive functions
CN115941441A (zh) 系统链路自动化监控运维方法、系统、设备以及介质
CN113656452A (zh) 调用链指标异常的检测方法、装置、电子设备及存储介质
US20220107858A1 (en) Methods and systems for multi-resource outage detection for a system of networked computing devices and root cause identification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21874306

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 030723)

122 Ep: pct application non-entry in european phase

Ref document number: 21874306

Country of ref document: EP

Kind code of ref document: A1