CN116541743A - Time sequence data abnormality detection method and device for database - Google Patents

Time sequence data abnormality detection method and device for database Download PDF

Info

Publication number
CN116541743A
CN116541743A CN202310371595.0A CN202310371595A CN116541743A CN 116541743 A CN116541743 A CN 116541743A CN 202310371595 A CN202310371595 A CN 202310371595A CN 116541743 A CN116541743 A CN 116541743A
Authority
CN
China
Prior art keywords
data
time sequence
abnormal
abnormality
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310371595.0A
Other languages
Chinese (zh)
Inventor
朱峰
何佳佳
张博超
刘畅
郭雁
蒋之皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Postal Savings Bank of China Ltd
Original Assignee
Postal Savings Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Postal Savings Bank of China Ltd filed Critical Postal Savings Bank of China Ltd
Priority to CN202310371595.0A priority Critical patent/CN116541743A/en
Publication of CN116541743A publication Critical patent/CN116541743A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The application provides a time sequence data anomaly detection method and device of a database. The method comprises the following steps: firstly, acquiring an initial data sample, and processing the initial data sample by at least adopting a pre-trained waveform classification model to obtain at least one abnormality judgment algorithm matched with the data change trend of the initial data sample; then, judging each time sequence data in the initial data sample according to at least one abnormality judgment algorithm, and determining a plurality of normal data and a plurality of abnormal data; then, normal data and abnormal data with a preset quantity ratio are selected to form a target data sample, and an initial abnormal detection model is trained based on the target data sample to obtain a target abnormal detection model; and finally, performing anomaly detection on the time sequence data to be detected by adopting a target anomaly detection model. The method solves the problems of low accuracy and high labor cost of time sequence data anomaly detection of the database in the prior art.

Description

Time sequence data abnormality detection method and device for database
Technical Field
The present invention relates to the field of data processing, and in particular, to a method for detecting a time series data abnormality of a database, a device for detecting a time series data abnormality of a database, a computer readable storage medium, and an electronic device.
Background
The architecture of the database system gradually shifts to the architecture of the branch deployment, and in order to meet the requirement of rapid business growth, the number of clusters is rapidly increased to be several times and tens of times before. On such scale, even though the reliability of each node is high and disaster recovery is considered as much as possible, sporadic failure is unavoidable, and a fast and accurate failure discovery capability is the basis for handling the failure.
The current common automatic fault discovery means is based on the alarm of monitoring indexes, an operation and maintenance engineer extracts various monitoring indexes possibly used, an alarm rule (detection threshold) is configured on the indexes, and an alarm notification is sent when the index value exceeds the threshold. The effect of the alarm rule is strong depending on experience of the deep operation and maintenance, and two types of problems easily occur: (1) The alarm rule configuration cost is high, the service analysis of the deep operation and maintenance is needed to configure alarms on which services one by one, and each alarm threshold is tried; (2) The alarm accuracy is low, the threshold is loosened after false alarm, the threshold is tightened after false alarm, and the threshold is often not accurate enough.
Therefore, a method for analyzing the monitoring index of the database is needed to solve the problems of high cost and low accuracy of manual configuration and alarming.
Disclosure of Invention
The main objective of the present application is to provide a method for detecting abnormal time series data of a database, a device for detecting abnormal time series data of a database, a computer readable storage medium and an electronic device, so as to at least solve the problems of low accuracy and high labor cost of detecting abnormal time series data of a database in the prior art.
In order to achieve the above object, according to one aspect of the present application, there is provided a method for detecting a time series data abnormality of a database, including: acquiring an initial data sample, and processing the initial data sample by at least adopting a pre-trained waveform classification model to obtain at least one abnormality judgment algorithm matched with the data change trend of the initial data sample, wherein the initial data sample comprises normal time sequence data and abnormal time sequence data, and the abnormality judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal or not; judging each time sequence data in the initial data sample according to at least one abnormality judgment algorithm to determine a plurality of normal data and a plurality of abnormal data; selecting normal data and abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model; and carrying out anomaly detection on the time sequence data to be detected by adopting the target anomaly detection model.
Optionally, determining a plurality of normal data and a plurality of abnormal data according to at least one anomaly determination algorithm to determine each time series data in the initial data sample, including: under the condition that a plurality of abnormal judgment algorithms exist, judging each time sequence data in the initial data sample by adopting each abnormal judgment algorithm to obtain a plurality of judgment result groups, wherein each judgment result group comprises a plurality of judgment results; in the plurality of judgment result groups, determining that the time sequence data with the judgment result being abnormal is the abnormal data; and in the plurality of judging result groups, determining the time sequence data with normal judging results as the normal data.
Optionally, the method further comprises: processing the initial data sample by adopting the pre-trained waveform classification model, and determining a plurality of abnormality prediction algorithms, wherein the abnormality prediction algorithms are used for judging whether the time sequence data after the existing time sequence data prediction is abnormal or not; determining the abnormality prediction algorithm meeting a preset condition as a target abnormality prediction algorithm, wherein the preset condition comprises that a regression evaluation index of the abnormality prediction algorithm is in a preset range; determining a target abnormality prediction model according to the target abnormality prediction algorithm and the initial data sample; and carrying out anomaly prediction on the time sequence data to be detected by adopting the target anomaly prediction model.
Optionally, the anomaly prediction algorithm comprises a regression algorithm.
Optionally, the regression evaluation index includes at least one of MAE (Mean Absolute Error ), RMSE (Root Mean Squared Error, root mean square error), and R Squared score.
Optionally, the type of anomaly determination algorithm includes at least one of: statistical discrimination, outlier detection, and neural networks.
Optionally, acquiring the initial data sample includes: and acquiring historical time sequence data and manual time sequence data in a preset time period, wherein the manual time sequence data is the time sequence data generated by adopting a time sequence generator, and the historical time sequence data and the manual time sequence data form the initial data sample.
According to another aspect of the present application, there is provided a device for detecting anomalies in time series data in a database, including an acquisition unit, a first determination unit, a training unit, and a detection unit, where the acquisition unit is configured to acquire an initial data sample, and process the initial data sample at least by using a pre-trained waveform classification model, to obtain at least one anomaly determination algorithm matched with a data variation trend of the initial data sample, where the initial data sample includes normal time series data and anomaly time series data, and the anomaly determination algorithm is configured to determine whether each time series data in the initial data sample is anomalous; the first determining unit is configured to determine a plurality of normal data and a plurality of abnormal data according to at least one anomaly determination algorithm for determining each of the time series data in the initial data sample; the training unit is used for selecting normal data and abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model; the detection unit is used for carrying out anomaly detection on the time sequence data to be detected by adopting the target anomaly detection model.
According to still another aspect of the present application, there is provided a computer readable storage medium, where the computer readable storage medium includes a stored program, and when the program runs, the apparatus in which the computer readable storage medium is controlled to execute any one of the methods for detecting a time-series data anomaly of a database.
According to another aspect of the present application, there is provided an electronic device including: one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising a time-series data anomaly detection method for executing any one of the databases.
In the technical scheme of the application, in the time sequence data anomaly detection method of the database, firstly, an initial data sample is obtained, and at least a pre-trained waveform classification model is adopted to process the initial data sample, so as to obtain at least one anomaly judgment algorithm matched with the data change trend of the initial data sample, wherein the initial data sample comprises normal time sequence data and anomaly time sequence data, and the anomaly judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal or not; then, judging each time sequence data in the initial data sample according to at least one abnormality judgment algorithm to determine a plurality of normal data and a plurality of abnormal data; then, selecting normal data and abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model; and finally, carrying out anomaly detection on the time sequence data to be detected by adopting the target anomaly detection model. According to the method, at least a pre-trained waveform classification model is adopted to process an initial data sample, at least one abnormality judgment algorithm is determined, the initial data sample is judged by the abnormality judgment algorithm to obtain a plurality of normal data and abnormal data, a target data sample is formed by selecting normal data and abnormal data with a preset quantity ratio, and a target abnormality detection model trained based on the target data sample is adopted to perform abnormality detection on time sequence data to be detected, so that the accuracy of abnormality detection can be greatly improved, the labor cost is reduced, and the problems of lower accuracy and high labor cost of time sequence data abnormality detection of a database in the prior art are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
fig. 1 is a block diagram showing a hardware configuration of a mobile terminal for performing a method for detecting a time-series data abnormality of a database according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for detecting anomalies in time series data of a database according to an embodiment of the present application;
fig. 3 shows a logic diagram of a method for detecting a time series data anomaly of a database according to an embodiment of the present application.
Fig. 4 shows a block diagram of a timing data anomaly detection apparatus of a database according to an embodiment of the present application.
Wherein the above figures include the following reference numerals:
102. a processor; 104. a memory; 106. a transmission device; 108. and an input/output device.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of description, the following will describe some terms or terms related to the embodiments of the present application:
time series data: refers to time series data, and the same unified index is recorded in time sequence during time series data. The individual data in the same data column must be of the same caliber, requiring comparability. The time series data may be the number of time periods or the number of time points.
As described in the background art, in order to solve the problems of low accuracy and high labor cost in detecting the abnormal time series data of the database in the prior art, embodiments of the present application provide a method for detecting the abnormal time series data of the database, an apparatus for detecting the abnormal time series data of the database, a computer readable storage medium, and an electronic device.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of a mobile terminal of a method for detecting a time-series data anomaly of a database according to an embodiment of the present invention. As shown in fig. 1, a mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a display method of device information in an embodiment of the present invention, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-described method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In the present embodiment, a method of detecting a time series data abnormality of a database operating on a mobile terminal, a computer terminal or the like is provided, it is to be noted that the steps shown in the flowcharts of the drawings may be executed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowcharts, in some cases, the steps shown or described may be executed in an order different from that shown here.
Fig. 2 is a flowchart of a method of timing data anomaly detection for a database according to an embodiment of the present application. As shown in fig. 2, the method comprises the steps of:
step S201, as shown in FIG. 3, an initial data sample is obtained, and at least a pre-trained waveform classification model is adopted to process the initial data sample, so as to obtain at least one anomaly judgment algorithm matched with the data change trend of the initial data sample, wherein the initial data sample comprises normal time sequence data and anomaly time sequence data, and the anomaly judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal;
specifically, the sequential anomaly detection is carried out through a single machine learning algorithm and a single deep learning algorithm, the positive sample of the sequential index is far greater than the negative sample, and certain algorithms cannot adapt to data; the single index time sequence data features are few, and the abnormality of the complex scene can not be detected. Therefore, the pre-trained waveform classification model is adopted to process the initial data sample, and the pre-trained waveform classification model adopts an unsupervised algorithm without requiring sample balance, so that an abnormality judgment algorithm adapting to the initial sample data is obtained.
The single algorithm has more false alarms and cannot guarantee the accuracy, multiple algorithm types can be adopted to improve the accuracy of the algorithm, the calculation of the multiple algorithms is mutually independent and can be performed in parallel, and the types of the abnormality judgment algorithm comprise at least one of the following: statistical discrimination, outlier detection, and neural networks.
In practical application, the anomaly judgment algorithm comprises a statistical method, unsupervised learning and supervised learning, at least one algorithm can be selected under different anomaly judgment algorithm types, the anomaly judgment algorithm can be determined by adopting a pre-trained waveform classification model, the algorithm can be selected manually, and when the algorithm is selected manually, a calling party is required to complete the introduction and selection of the algorithm in a factory initialization mode. The algorithm may be chosen within 5 to prevent longer overall computation times. If the online calculation time is longer, the overall calculation efficiency can be improved in a parallel mode. The waveform classification model has the functions of classifying waveforms, displaying the waveform classification model as the display characteristic to the user, and enabling the user to know the trend and periodicity of the monitoring indexes, so that the user is more helped to know the conditions of the indexes.
In step S201, an initial data sample is acquired, which may be implemented as: and acquiring historical time sequence data and manual time sequence data in a preset time period, wherein the manual time sequence data is the time sequence data generated by a time sequence generator, and the historical time sequence data and the manual time sequence data form the initial data sample. The historical time sequence data and the manual time sequence data together form an initial data sample, so that the sample is richer, and the accuracy of the subsequent calculation judgment result is higher.
Specifically, data collection is performed, the number of collection points=time window/data interval is the collection period number, wherein the time window is the total time of sampling, the data interval is the unit of time sequence data embedded points, for example, one point is generated in 1 minute, the collection period is the collection period, for example, data before the day, 1 day, 3 days and 7 days, or data in each quarter is collected, and the number of collection points is the number of time sequence data in a sample. The time series data collected may be in the form of a DataFrame, e.g., dataFrame. Index = time (year-month-day hours: minutes: seconds), e.g., "2020-01-01 08:00:00", or DataFrame. Shape= (480,1), 480 and 1 represent the number of rows and dimensions of DataFrame, respectively, and can be set by those skilled in the art according to actual requirements.
Step S202, as shown in FIG. 3, determining a plurality of normal data and a plurality of abnormal data according to at least one anomaly determination algorithm for determining each of the time series data in the initial data sample;
specifically, by performing judgment of a plurality of algorithms on the initial data sample, each algorithm judges whether or not the time series data in the initial data sample is an abnormal point.
In an alternative scheme, the determining the plurality of normal data and the plurality of abnormal data according to at least one anomaly determination algorithm determines each of the time series data in the initial data sample includes: under the condition that a plurality of abnormality judgment algorithms exist, judging each time sequence data in the initial data sample by adopting each abnormality judgment algorithm respectively to obtain a plurality of judgment result groups, wherein each judgment result group comprises a plurality of judgment results; determining, from among the plurality of determination result groups, that the time-series data in which the determination result is abnormal is the abnormal data; and determining, from the plurality of determination result groups, that the time-series data for which each determination result is normal is the normal data. Because the logic of the algorithms is different, the obtained results are different, so that the time sequence data with the abnormal result calculated by one abnormal judgment algorithm is abnormal data, and the time sequence data with the normal calculation result of all the abnormal judgment algorithms is normal data, thereby preventing missed judgment.
In practical application, the judgment of the abnormal result may have misjudgment and missed judgment, so that a manual labeling mode can be introduced to further judge, which is called label engineering. And manually judging the normal data and the abnormal data, and updating the normal data and the abnormal data into a sample library after the result is finally determined.
Step S203, as shown in FIG. 3, selecting the normal data and the abnormal data with a predetermined quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model;
specifically, the normal data and the abnormal data with the predetermined quantity ratio are selected, unbalance of the quantity of the normal data and the abnormal data in the sample can be prevented, accuracy of a subsequent training model is improved, and characteristic engineering can be adopted to process the normal data and the abnormal data.
In step S204, as shown in fig. 3, the target anomaly detection model is used to perform anomaly detection on the time-series data to be detected.
Specifically, the accuracy of abnormality detection by using the target abnormality detection model is high.
After step S204, the method further includes: processing the initial data sample by adopting the pre-trained waveform classification model, and determining a plurality of abnormality prediction algorithms, wherein the abnormality prediction algorithms are used for judging whether the time sequence data after the existing time sequence data prediction is abnormal or not; determining the abnormality prediction algorithm meeting a predetermined condition as a target abnormality prediction algorithm, wherein the predetermined condition comprises that a regression evaluation index of the abnormality prediction algorithm is in a predetermined range; determining a target abnormality prediction model according to the target abnormality prediction algorithm and the initial data sample; and carrying out anomaly prediction on the time sequence data to be detected by adopting the target anomaly prediction model.
Specifically, the anomaly prediction method is basically consistent with the anomaly judgment algorithm, and is mainly used for predicting scenes of time sequence data such as operation and maintenance indexes, business KPIs and the like. The predetermined conditions may include MAE, RMSE, and R Squared scores, where MAE, RMSE, and R Squared are regression evaluation indexes of the prediction model, and whether each of the regression evaluation indexes satisfies a corresponding predetermined range, thereby determining whether the quantized regression prediction algorithm is accurate.
In order to improve the accuracy of the anomaly prediction, optionally, the anomaly prediction algorithm includes a regression algorithm.
According to the method, at least a pre-trained waveform classification model is adopted to process an initial data sample, at least one abnormality judgment algorithm is determined, the initial data sample is judged by the abnormality judgment algorithm to obtain a plurality of normal data and abnormal data, a target data sample is formed by selecting normal data and abnormal data with a preset quantity ratio, and a target abnormality detection model trained based on the target data sample is adopted to perform abnormality detection on time sequence data to be detected, so that the accuracy of abnormality detection can be greatly improved, the labor cost is reduced, and the problems that in the prior art, the accuracy of time sequence data abnormality detection of a database is low and the labor cost is high are solved.
In practical application, the AIPaaS system of the mail storage bank is an intelligent operation and maintenance platform which is designed aiming at the new generation of personal business core system of the mail storage, and the system provides a private cloud intelligent operation and maintenance solution based on big data, machine learning, deep learning and other technologies for the mail storage bank, thereby meeting the requirements of customer automation and intelligent operation and maintenance. As an AI technology platform in the operation and maintenance field, time sequence data from monitoring, business, operation and maintenance events and the like are important data bases, and prediction and anomaly detection of the time sequence data are always hot spots and difficult problems for academia and industry. In a real scene, based on technical means such as big data, machine learning, deep learning and the like, the realization of intelligent analysis and detection is an important function of an AIPaaS platform for tens of thousands of time series and various time series data types under the condition that a threshold value is not required to be set. Before the intelligent analysis method of the database monitoring indexes is not provided, the database abnormal index discovery means is based on the alarm of the monitoring indexes, various monitoring indexes possibly used are extracted by operation and maintenance engineers, alarm rules (detection thresholds) are configured on the indexes, and alarm notification is sent when the index value exceeds the threshold, but the traditional threshold detection is adopted for analysis and alarm, so that the labor cost is high, the maintenance cost is high and the effect is unstable. The intelligent analysis is realized by means of artificial intelligence, machine learning, deep learning and the like, and the method for intelligently analyzing the database monitoring indexes can relieve the strong dependence on the experience of development, operation and maintenance personnel; the updating can be not limited with the iteration of the product, as the business data is updated; supporting multiple detection models, different service scenes can have different detection models; the method has more pertinence to the detection of the business scene. The embodiment of the application can be realized by relying on an AIPaaS platform. The abnormal time sequence data is supported to be detected by the granularity of minutes, and a calling interface is provided for the upstream and downstream; supporting multi-algorithm model integrated detection; feature engineering is supported, and time-sequential multi-feature processing is performed.
Specifically, by adopting the method for detecting the abnormal time sequence data of the database, disclosed by the embodiment of the application, SQL portrait, index prediction and abnormal index detection of the database can be realized, operation and maintenance personnel are helped to quickly and accurately locate the abnormal index of the database, labor cost and fault loss prevention cost are saved, abnormal SQL display is realized under the irregular condition, users are guided to pay attention to the SQL preferentially, and the accuracy reaches 99%.
The embodiment of the application also provides a device for detecting the time sequence data abnormality of the database, and the device for detecting the time sequence data abnormality of the database can be used for executing the method for detecting the time sequence data abnormality of the database. The device is used for realizing the above embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The following describes a device for detecting abnormality in time series data in a database provided in an embodiment of the present application.
Fig. 4 is a schematic diagram of a time series data anomaly detection device of a database according to an embodiment of the present application. As shown in fig. 4, the apparatus includes an acquisition unit 10, a first determination unit 20, a training unit 30, and a detection unit 40, wherein:
the acquiring unit 10 is configured to acquire an initial data sample, and process the initial data sample by at least using a pre-trained waveform classification model to obtain at least one anomaly determination algorithm matched with a data variation trend of the initial data sample, where the initial data sample includes normal time-series data and anomaly time-series data, and the anomaly determination algorithm is configured to determine whether each time-series data in the initial data sample is anomalous;
specifically, the sequential anomaly detection is carried out through a single machine learning algorithm and a single deep learning algorithm, the positive sample of the sequential index is far greater than the negative sample, and certain algorithms cannot adapt to data; the single index time sequence data features are few, and the abnormality of the complex scene can not be detected. Therefore, the pre-trained waveform classification model is adopted to process the initial data sample, and the pre-trained waveform classification model adopts an unsupervised algorithm without requiring sample balance, so that an abnormality judgment algorithm adapting to the initial sample data is obtained.
The single algorithm has more false alarms and cannot guarantee the accuracy, multiple algorithm types can be adopted to improve the accuracy of the algorithm, the calculation of the multiple algorithms is mutually independent and can be performed in parallel, and the types of the abnormality judgment algorithm comprise at least one of the following: statistical discrimination, outlier detection, and neural networks.
In practical application, at least one algorithm can be selected under different abnormal judgment algorithm types, besides the pre-trained waveform classification model is adopted to determine the abnormal judgment algorithm, the algorithm can be selected manually, and when the algorithm is selected manually, a calling party is required to complete the introduction and selection of the algorithm in a factory initialization mode. The algorithm may be chosen within 5 to prevent longer overall computation times. If the online calculation time is longer, the overall calculation efficiency can be improved in a parallel mode. The waveform classification model has the functions of classifying waveforms, displaying the waveform classification model as the display characteristic to the user, and enabling the user to know the trend and periodicity of the monitoring indexes, so that the user is more helped to know the conditions of the indexes.
The acquisition unit comprises an acquisition module, wherein the acquisition module is used for acquiring historical time sequence data and manual time sequence data in a preset time period, the manual time sequence data is the time sequence data generated by a time sequence generator, and the historical time sequence data and the manual time sequence data form the initial data sample. The historical time sequence data and the manual time sequence data together form an initial data sample, so that the sample is richer, and the accuracy of the subsequent calculation judgment result is higher.
Specifically, data collection is performed, the number of collection points=time window/data interval is the collection period number, wherein the time window is the total time of sampling, the data interval is the unit of time sequence data embedded points, for example, one point is generated in 1 minute, the collection period is the collection period, for example, data before the day, 1 day, 3 days and 7 days, or data in each quarter is collected, and the number of collection points is the number of time sequence data in a sample. The time series data collected may be in the form of a DataFrame, e.g., dataFrame. Index = time (year-month-day hours: minutes: seconds), e.g., "2020-01-01 08:00:00", or DataFrame. Shape= (480,1), 480 and 1 represent the number of rows and dimensions of DataFrame, respectively, and can be set by those skilled in the art according to actual requirements.
The first determining unit 20 is configured to determine a plurality of normal data and a plurality of abnormal data according to at least one anomaly determination algorithm for determining each of the time series data in the initial data sample;
specifically, by performing judgment of a plurality of algorithms on the initial data sample, each algorithm judges whether or not the time series data in the initial data sample is an abnormal point.
In an alternative scheme, the first determining unit includes a judging module, a first determining module and a second determining module, where the judging module is configured to, when there are multiple abnormality judging algorithms, respectively adopt each abnormality judging algorithm to judge each time sequence data in the initial data sample, so as to obtain multiple judging result groups, where each judging result group includes multiple judging results; the first determining module is configured to determine, from among the plurality of determination result groups, that the time-series data in which the determination result is abnormal is the abnormal data; the second determining module is configured to determine, from the plurality of determination result groups, that the time-series data in which each determination result is normal is the normal data. Because the logic of the algorithms is different, the obtained results are different, so that the time sequence data with the abnormal result calculated by one abnormal judgment algorithm is abnormal data, and the time sequence data with the normal calculation result of all the abnormal judgment algorithms is normal data, thereby preventing missed judgment.
In practical application, the judgment of the abnormal result may have misjudgment and missed judgment, so that a manual labeling mode can be introduced to further judge, which is called label engineering. And manually judging the normal data and the abnormal data, and updating the normal data and the abnormal data into a sample library after the result is finally determined.
The training unit 30 is configured to select the normal data and the abnormal data with a predetermined number ratio to form a target data sample, and train the initial abnormality detection model based on the target data sample to obtain a target abnormality detection model;
specifically, the normal data and the abnormal data with the predetermined quantity ratio are selected, unbalance of the quantity of the normal data and the abnormal data in the sample can be prevented, accuracy of a subsequent training model is improved, and characteristic engineering can be adopted to process the normal data and the abnormal data.
The detecting unit 40 is configured to perform anomaly detection on the time-series data to be detected using the target anomaly detection model.
Specifically, the accuracy of abnormality detection by using the target abnormality detection model is high.
The device further comprises a second determining unit, a third determining unit, a fourth determining unit and a predicting unit, wherein the second determining unit is used for processing the initial data sample by adopting the pre-trained waveform classification model and determining a plurality of abnormality prediction algorithms, and the abnormality prediction algorithms are used for determining whether the time sequence data after the existing time sequence data prediction is abnormal or not; the third determining unit is configured to determine that the anomaly prediction algorithm satisfying a predetermined condition is a target anomaly prediction algorithm, where the predetermined condition includes that a regression evaluation index of the anomaly prediction algorithm is within a predetermined range; the fourth determining unit is configured to determine a target abnormality prediction model according to the target abnormality prediction algorithm and the initial data sample; the prediction unit is used for performing anomaly prediction on the time sequence data to be detected by adopting the target anomaly prediction model.
Specifically, the anomaly prediction is mainly used for predicting scenes of time sequence data such as operation and maintenance indexes, business KPIs and the like. The predetermined conditions may include MAE, RMSE, and R Squared scores, where MAE, RMSE, and R Squared are regression evaluation indexes of the prediction model, and whether each of the regression evaluation indexes satisfies a corresponding predetermined range, thereby determining whether the quantized regression prediction algorithm is accurate.
In order to improve the accuracy of the anomaly prediction, optionally, the anomaly prediction algorithm includes a regression algorithm.
Through the embodiment, the device processes the initial data sample at least by adopting the pre-trained waveform classification model, determines at least one abnormality judgment algorithm, judges the initial data sample by using the abnormality judgment algorithm to obtain a plurality of normal data and abnormal data, selects the normal data and the abnormal data with a preset quantity ratio to form the target data sample, carries out abnormality detection on the time sequence data to be detected by adopting the target abnormality detection model trained based on the target data sample, can greatly improve the accuracy of abnormality detection and reduce the labor cost, and further solves the problems of lower accuracy and high labor cost of time sequence data abnormality detection of a database in the prior art.
The time series data abnormity detection device of the database comprises a processor and a memory, wherein the acquisition unit, the first determination unit, the training unit, the detection unit and the like are all stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions. The modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the problems of lower accuracy and high labor cost of time sequence data anomaly detection of the database in the prior art are solved by adjusting kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
The embodiment of the invention provides a computer readable storage medium, which comprises a stored program, wherein the program is used for controlling equipment where the computer readable storage medium is located to execute a time sequence data abnormality detection method of a database.
Specifically, the method for detecting the time sequence data abnormality of the database comprises the following steps:
step S201, an initial data sample is obtained, and at least a pre-trained waveform classification model is adopted to process the initial data sample, so as to obtain at least one abnormality judgment algorithm matched with the data change trend of the initial data sample, wherein the initial data sample comprises normal time sequence data and abnormal time sequence data, and the abnormality judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal or not;
specifically, timing anomaly detection is performed through a single machine learning and deep learning algorithm, a positive sample of a timing index is far greater than a negative sample, and certain algorithms cannot adapt to data; the single index time sequence data features are few, and the abnormality of the complex scene can not be detected. Therefore, the pre-trained waveform classification model is adopted to process the initial data sample, and the pre-trained waveform classification model adopts an unsupervised algorithm without requiring sample balance, so that an abnormality judgment algorithm adapting to the initial sample data is obtained.
Step S202, judging each time sequence data in the initial data sample according to at least one abnormality judgment algorithm to determine a plurality of normal data and a plurality of abnormal data;
Specifically, by performing judgment of a plurality of algorithms on the initial data sample, each algorithm judges whether or not the time series data in the initial data sample is an abnormal point.
Step S203, selecting the normal data and the abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model;
specifically, the normal data and the abnormal data with the predetermined quantity ratio are selected, unbalance of the quantity of the normal data and the abnormal data in the sample can be prevented, accuracy of a subsequent training model is improved, and characteristic engineering can be adopted to process the normal data and the abnormal data.
Step S204, the target abnormality detection model is adopted to detect the abnormality of the time sequence data to be detected.
Specifically, the accuracy of abnormality detection by using the target abnormality detection model is high.
Optionally, determining a plurality of normal data and a plurality of abnormal data according to at least one anomaly determination algorithm for determining each of the time series data in the initial data sample includes: under the condition that a plurality of abnormality judgment algorithms exist, judging each time sequence data in the initial data sample by adopting each abnormality judgment algorithm respectively to obtain a plurality of judgment result groups, wherein each judgment result group comprises a plurality of judgment results; determining, from among the plurality of determination result groups, that the time-series data in which the determination result is abnormal is the abnormal data; and determining, from the plurality of determination result groups, that the time-series data for which each determination result is normal is the normal data.
Optionally, the method further comprises: processing the initial data sample by adopting the pre-trained waveform classification model, and determining a plurality of abnormality prediction algorithms, wherein the abnormality prediction algorithms are used for judging whether the time sequence data after the existing time sequence data prediction is abnormal or not; determining the abnormality prediction algorithm meeting a predetermined condition as a target abnormality prediction algorithm, wherein the predetermined condition comprises that a regression evaluation index of the abnormality prediction algorithm is in a predetermined range; determining a target abnormality prediction model according to the target abnormality prediction algorithm and the initial data sample; and carrying out anomaly prediction on the time sequence data to be detected by adopting the target anomaly prediction model.
Optionally, the anomaly prediction algorithm comprises a regression algorithm.
Optionally, the regression evaluation index includes at least one of MAE, RMSE, and R Squared score.
Optionally, the type of the abnormality determination algorithm includes at least one of: statistical discrimination, outlier detection, and neural networks.
Optionally, acquiring the initial data sample includes: and acquiring historical time sequence data and manual time sequence data in a preset time period, wherein the manual time sequence data is the time sequence data generated by a time sequence generator, and the historical time sequence data and the manual time sequence data form the initial data sample.
The embodiment of the invention provides a processor which is used for running a program, wherein the program runs to execute the time sequence data abnormality detection method of a database.
Specifically, the method for detecting the time sequence data abnormality of the database comprises the following steps:
step S201, an initial data sample is obtained, and at least a pre-trained waveform classification model is adopted to process the initial data sample, so as to obtain at least one abnormality judgment algorithm matched with the data change trend of the initial data sample, wherein the initial data sample comprises normal time sequence data and abnormal time sequence data, and the abnormality judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal or not;
specifically, timing anomaly detection is performed through a single machine learning and deep learning algorithm, a positive sample of a timing index is far greater than a negative sample, and certain algorithms cannot adapt to data; the single index time sequence data features are few, and the abnormality of the complex scene can not be detected. Therefore, the pre-trained waveform classification model is adopted to process the initial data sample, and the pre-trained waveform classification model adopts an unsupervised algorithm without requiring sample balance, so that an abnormality judgment algorithm adapting to the initial sample data is obtained.
Step S202, judging each time sequence data in the initial data sample according to at least one abnormality judgment algorithm to determine a plurality of normal data and a plurality of abnormal data;
specifically, by performing judgment of a plurality of algorithms on the initial data sample, each algorithm judges whether or not the time series data in the initial data sample is an abnormal point.
Step S203, selecting the normal data and the abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model;
specifically, the normal data and the abnormal data with the predetermined quantity ratio are selected, unbalance of the quantity of the normal data and the abnormal data in the sample can be prevented, accuracy of a subsequent training model is improved, and characteristic engineering can be adopted to process the normal data and the abnormal data.
Step S204, the target abnormality detection model is adopted to detect the abnormality of the time sequence data to be detected.
Specifically, the accuracy of abnormality detection by using the target abnormality detection model is high.
Optionally, determining a plurality of normal data and a plurality of abnormal data according to at least one anomaly determination algorithm for determining each of the time series data in the initial data sample includes: under the condition that a plurality of abnormality judgment algorithms exist, judging each time sequence data in the initial data sample by adopting each abnormality judgment algorithm respectively to obtain a plurality of judgment result groups, wherein each judgment result group comprises a plurality of judgment results; determining, from among the plurality of determination result groups, that the time-series data in which the determination result is abnormal is the abnormal data; and determining, from the plurality of determination result groups, that the time-series data for which each determination result is normal is the normal data.
Optionally, the method further comprises: processing the initial data sample by adopting the pre-trained waveform classification model, and determining a plurality of abnormality prediction algorithms, wherein the abnormality prediction algorithms are used for judging whether the time sequence data after the existing time sequence data prediction is abnormal or not; determining the abnormality prediction algorithm meeting a predetermined condition as a target abnormality prediction algorithm, wherein the predetermined condition comprises that a regression evaluation index of the abnormality prediction algorithm is in a predetermined range; determining a target abnormality prediction model according to the target abnormality prediction algorithm and the initial data sample; and carrying out anomaly prediction on the time sequence data to be detected by adopting the target anomaly prediction model.
Optionally, the anomaly prediction algorithm comprises a regression algorithm.
Optionally, the regression evaluation index includes at least one of MAE, RMSE, and R Squared score.
Optionally, the type of the abnormality determination algorithm includes at least one of: statistical discrimination, outlier detection, and neural networks.
Optionally, acquiring the initial data sample includes: and acquiring historical time sequence data and manual time sequence data in a preset time period, wherein the manual time sequence data is the time sequence data generated by a time sequence generator, and the historical time sequence data and the manual time sequence data form the initial data sample.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program stored in the memory and capable of running on the processor, wherein the processor realizes at least the following steps when executing the program:
step S201, an initial data sample is obtained, and at least a pre-trained waveform classification model is adopted to process the initial data sample, so as to obtain at least one abnormality judgment algorithm matched with the data change trend of the initial data sample, wherein the initial data sample comprises normal time sequence data and abnormal time sequence data, and the abnormality judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal or not;
step S202, judging each time sequence data in the initial data sample according to at least one abnormality judgment algorithm to determine a plurality of normal data and a plurality of abnormal data;
step S203, selecting the normal data and the abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model;
step S204, the target abnormality detection model is adopted to detect the abnormality of the time sequence data to be detected.
Optionally, determining a plurality of normal data and a plurality of abnormal data according to at least one anomaly determination algorithm for determining each of the time series data in the initial data sample includes: under the condition that a plurality of abnormality judgment algorithms exist, judging each time sequence data in the initial data sample by adopting each abnormality judgment algorithm respectively to obtain a plurality of judgment result groups, wherein each judgment result group comprises a plurality of judgment results; determining, from among the plurality of determination result groups, that the time-series data in which the determination result is abnormal is the abnormal data; and determining, from the plurality of determination result groups, that the time-series data for which each determination result is normal is the normal data.
Optionally, the method further comprises: processing the initial data sample by adopting the pre-trained waveform classification model, and determining a plurality of abnormality prediction algorithms, wherein the abnormality prediction algorithms are used for judging whether the time sequence data after the existing time sequence data prediction is abnormal or not; determining the abnormality prediction algorithm meeting a predetermined condition as a target abnormality prediction algorithm, wherein the predetermined condition comprises that a regression evaluation index of the abnormality prediction algorithm is in a predetermined range; determining a target abnormality prediction model according to the target abnormality prediction algorithm and the initial data sample; and carrying out anomaly prediction on the time sequence data to be detected by adopting the target anomaly prediction model.
Optionally, the anomaly prediction algorithm comprises a regression algorithm.
Optionally, the regression evaluation index includes at least one of MAE, RMSE, and R Squared score.
Optionally, the type of the abnormality determination algorithm includes at least one of: statistical discrimination, outlier detection, and neural networks.
Optionally, acquiring the initial data sample includes: and acquiring historical time sequence data and manual time sequence data in a preset time period, wherein the manual time sequence data is the time sequence data generated by a time sequence generator, and the historical time sequence data and the manual time sequence data form the initial data sample.
The device herein may be a server, PC, PAD, cell phone, etc.
The present application also provides a computer program product adapted to perform a program initialized with at least the following method steps when executed on a data processing device:
step S201, an initial data sample is obtained, and at least a pre-trained waveform classification model is adopted to process the initial data sample, so as to obtain at least one abnormality judgment algorithm matched with the data change trend of the initial data sample, wherein the initial data sample comprises normal time sequence data and abnormal time sequence data, and the abnormality judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal or not;
Step S202, judging each time sequence data in the initial data sample according to at least one abnormality judgment algorithm to determine a plurality of normal data and a plurality of abnormal data;
step S203, selecting the normal data and the abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model;
step S204, the target abnormality detection model is adopted to detect the abnormality of the time sequence data to be detected.
Optionally, determining a plurality of normal data and a plurality of abnormal data according to at least one anomaly determination algorithm for determining each of the time series data in the initial data sample includes: under the condition that a plurality of abnormality judgment algorithms exist, judging each time sequence data in the initial data sample by adopting each abnormality judgment algorithm respectively to obtain a plurality of judgment result groups, wherein each judgment result group comprises a plurality of judgment results; determining, from among the plurality of determination result groups, that the time-series data in which the determination result is abnormal is the abnormal data; and determining, from the plurality of determination result groups, that the time-series data for which each determination result is normal is the normal data.
Optionally, the method further comprises: processing the initial data sample by adopting the pre-trained waveform classification model, and determining a plurality of abnormality prediction algorithms, wherein the abnormality prediction algorithms are used for judging whether the time sequence data after the existing time sequence data prediction is abnormal or not; determining the abnormality prediction algorithm meeting a predetermined condition as a target abnormality prediction algorithm, wherein the predetermined condition comprises that a regression evaluation index of the abnormality prediction algorithm is in a predetermined range; determining a target abnormality prediction model according to the target abnormality prediction algorithm and the initial data sample; and carrying out anomaly prediction on the time sequence data to be detected by adopting the target anomaly prediction model.
Optionally, the anomaly prediction algorithm comprises a regression algorithm.
Optionally, the regression evaluation index includes at least one of MAE, RMSE, and R Squared score.
Optionally, the type of the abnormality determination algorithm includes at least one of: statistical discrimination, outlier detection, and neural networks.
Optionally, acquiring the initial data sample includes: and acquiring historical time sequence data and manual time sequence data in a preset time period, wherein the manual time sequence data is the time sequence data generated by a time sequence generator, and the historical time sequence data and the manual time sequence data form the initial data sample.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
From the above description, it can be seen that the above embodiments of the present application achieve the following technical effects:
1) In the method for detecting the abnormal time sequence data of the database, firstly, an initial data sample is obtained, at least a pre-trained waveform classification model is adopted to process the initial data sample, at least one abnormal judgment algorithm matched with the data change trend of the initial data sample is obtained, the initial data sample comprises normal time sequence data and abnormal time sequence data, and the abnormal judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal or not; then, judging each time sequence data in the initial data sample according to at least one abnormality judgment algorithm to determine a plurality of normal data and a plurality of abnormal data; then, selecting the normal data and the abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model; and finally, carrying out anomaly detection on the time sequence data to be detected by adopting the target anomaly detection model. According to the method, at least a pre-trained waveform classification model is adopted to process an initial data sample, at least one abnormality judgment algorithm is determined, the initial data sample is judged by the abnormality judgment algorithm to obtain a plurality of normal data and abnormal data, a target data sample is formed by selecting normal data and abnormal data with a preset quantity ratio, and a target abnormality detection model trained based on the target data sample is adopted to perform abnormality detection on time sequence data to be detected, so that the accuracy of abnormality detection can be greatly improved, the labor cost is reduced, and the problems of lower accuracy and high labor cost of time sequence data abnormality detection of a database in the prior art are solved.
2) The time sequence data abnormality detection device of the database comprises an acquisition unit, a first determination unit, a training unit and a detection unit, wherein the acquisition unit is used for acquiring initial data samples, processing the initial data samples at least by adopting a pre-trained waveform classification model to obtain at least one abnormality judgment algorithm matched with the data change trend of the initial data samples, the initial data samples comprise normal time sequence data and abnormal time sequence data, and the abnormality judgment algorithm is used for judging whether each time sequence data in the initial data samples is abnormal; the first determining unit is configured to determine a plurality of normal data and a plurality of abnormal data according to at least one anomaly determination algorithm for determining each of the time series data in the initial data sample; the training unit is used for selecting the normal data and the abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model; the detection unit is used for carrying out anomaly detection on the time sequence data to be detected by adopting the target anomaly detection model. The device processes the initial data sample at least by adopting a pre-trained waveform classification model, determines at least one abnormality judgment algorithm, judges the initial data sample by using the abnormality judgment algorithm to obtain a plurality of normal data and abnormal data, selects normal data and abnormal data with a preset quantity ratio to form a target data sample, adopts a target abnormality detection model trained based on the target data sample to carry out abnormality detection on time sequence data to be detected, can greatly improve the accuracy of abnormality detection and reduce the labor cost, and further solves the problems of lower accuracy and high labor cost of time sequence data abnormality detection of a database in the prior art.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method for detecting anomalies in time series data of a database, comprising:
acquiring an initial data sample, and processing the initial data sample by at least adopting a pre-trained waveform classification model to obtain at least one abnormality judgment algorithm matched with the data change trend of the initial data sample, wherein the initial data sample comprises normal time sequence data and abnormal time sequence data, and the abnormality judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal or not;
judging each time sequence data in the initial data sample according to at least one abnormality judgment algorithm to determine a plurality of normal data and a plurality of abnormal data;
selecting normal data and abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model;
And carrying out anomaly detection on the time sequence data to be detected by adopting the target anomaly detection model.
2. The method of claim 1, wherein determining a plurality of normal data and a plurality of abnormal data based on determining each of the time series data in the initial data samples by at least one of the abnormality determination algorithms comprises:
under the condition that a plurality of abnormal judgment algorithms exist, judging each time sequence data in the initial data sample by adopting each abnormal judgment algorithm to obtain a plurality of judgment result groups, wherein each judgment result group comprises a plurality of judgment results;
in the plurality of judgment result groups, determining that the time sequence data with the judgment result being abnormal is the abnormal data;
and in the plurality of judging result groups, determining the time sequence data with normal judging results as the normal data.
3. The method according to claim 1, wherein the method further comprises:
processing the initial data sample by adopting the pre-trained waveform classification model, and determining a plurality of abnormality prediction algorithms, wherein the abnormality prediction algorithms are used for judging whether the time sequence data after the existing time sequence data prediction is abnormal or not;
Determining the abnormality prediction algorithm meeting a preset condition as a target abnormality prediction algorithm, wherein the preset condition comprises that a regression evaluation index of the abnormality prediction algorithm is in a preset range;
determining a target abnormality prediction model according to the target abnormality prediction algorithm and the initial data sample;
and carrying out anomaly prediction on the time sequence data to be detected by adopting the target anomaly prediction model.
4. A method according to claim 3, wherein the anomaly prediction algorithm comprises a regression algorithm.
5. The method of claim 3, wherein the regression assessment indicator comprises at least one of an MAE, RMSE, and R Squared score.
6. The method of claim 1, wherein the type of anomaly determination algorithm comprises at least one of: statistical discrimination, outlier detection, and neural networks.
7. The method according to any one of claims 1 to 6, wherein obtaining an initial data sample comprises:
and acquiring historical time sequence data and manual time sequence data in a preset time period, wherein the manual time sequence data is the time sequence data generated by adopting a time sequence generator, and the historical time sequence data and the manual time sequence data form the initial data sample.
8. A time series data abnormality detection apparatus of a database, comprising:
the acquisition unit is used for acquiring an initial data sample, processing the initial data sample by at least adopting a pre-trained waveform classification model to obtain at least one abnormality judgment algorithm matched with the data change trend of the initial data sample, wherein the initial data sample comprises normal time sequence data and abnormal time sequence data, and the abnormality judgment algorithm is used for judging whether each time sequence data in the initial data sample is abnormal or not;
the first determining unit is used for determining each time sequence data in the initial data sample according to at least one abnormality determination algorithm and determining a plurality of normal data and a plurality of abnormal data;
the training unit is used for selecting the normal data and the abnormal data with a preset quantity ratio to form a target data sample, and training an initial abnormal detection model based on the target data sample to obtain a target abnormal detection model;
and the detection unit is used for carrying out anomaly detection on the time sequence data to be detected by adopting the target anomaly detection model.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium includes a stored program, wherein the program, when run, controls a device in which the computer-readable storage medium is located to execute the time-series data abnormality detection method of the database according to any one of claims 1 to 7.
10. An electronic device, comprising: one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising a time-series data anomaly detection method for executing the database of any one of claims 1 to 7.
CN202310371595.0A 2023-04-07 2023-04-07 Time sequence data abnormality detection method and device for database Pending CN116541743A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310371595.0A CN116541743A (en) 2023-04-07 2023-04-07 Time sequence data abnormality detection method and device for database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310371595.0A CN116541743A (en) 2023-04-07 2023-04-07 Time sequence data abnormality detection method and device for database

Publications (1)

Publication Number Publication Date
CN116541743A true CN116541743A (en) 2023-08-04

Family

ID=87456805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310371595.0A Pending CN116541743A (en) 2023-04-07 2023-04-07 Time sequence data abnormality detection method and device for database

Country Status (1)

Country Link
CN (1) CN116541743A (en)

Similar Documents

Publication Publication Date Title
US9679243B2 (en) System and method for detecting platform anomalies through neural networks
CN113098723B (en) Fault root cause positioning method and device, storage medium and equipment
CN112800116B (en) Method and device for detecting abnormity of service data
CN107566163A (en) A kind of alarm method and device of user behavior analysis association
CN111897705B (en) Service state processing and model training method, device, equipment and storage medium
CN111507376A (en) Single index abnormality detection method based on fusion of multiple unsupervised methods
CN114328198A (en) System fault detection method, device, equipment and medium
CN112769605B (en) Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
CN115514619B (en) Alarm convergence method and system
CN109063885A (en) A kind of substation's exception metric data prediction technique
CN110580492A (en) Track circuit fault precursor discovery method based on small fluctuation detection
CN113626502A (en) Power grid data anomaly detection method and device based on ensemble learning
CN112187914A (en) Remote control robot management method and system
CN114138601A (en) Service alarm method, device, equipment and storage medium
CN113123955B (en) Plunger pump abnormity detection method and device, storage medium and electronic equipment
CN110807014B (en) Cross validation based station data anomaly discrimination method and device
CN112904148A (en) Intelligent cable operation monitoring system, method and device
CN117170915A (en) Data center equipment fault prediction method and device and computer equipment
CN115495274B (en) Exception handling method based on time sequence data, network equipment and readable storage medium
CN116541743A (en) Time sequence data abnormality detection method and device for database
CN116318386A (en) Failure prediction method of optical module, system and storage medium thereof
CN115858606A (en) Method, device and equipment for detecting abnormity of time series data and storage medium
CN114444602A (en) Method and system for automatically constructing anomaly detection model
CN112398706B (en) Data evaluation standard determining method and device, storage medium and electronic equipment
CN117056171B (en) Kafka abnormity monitoring method and device based on AI algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination