CN114996257A - Data amount abnormality detection method, device, medium, and program product - Google Patents

Data amount abnormality detection method, device, medium, and program product Download PDF

Info

Publication number
CN114996257A
CN114996257A CN202210695713.9A CN202210695713A CN114996257A CN 114996257 A CN114996257 A CN 114996257A CN 202210695713 A CN202210695713 A CN 202210695713A CN 114996257 A CN114996257 A CN 114996257A
Authority
CN
China
Prior art keywords
time
time sequence
data
abnormal
interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210695713.9A
Other languages
Chinese (zh)
Inventor
李治
曾岩
李晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202210695713.9A priority Critical patent/CN114996257A/en
Publication of CN114996257A publication Critical patent/CN114996257A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking

Abstract

The application provides a data volume anomaly detection method, device, medium and program product, by obtaining an initial time sequence, the initial time sequence comprising: data volume of at least one target data table in the database at different time points; calculating a plurality of change rates of the initial time sequence according to a preset time interval; calculating logarithmic values of the change rates according to preset base numbers; combining the log values into a first time sequence according to the time sequence corresponding to the change rate; judging whether each data in the first time sequence falls into an abnormal interval or not; if yes, determining that the first time sequence has an abnormal outlier; and outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier. The method solves the technical problems that the existing data anomaly monitoring can not discriminate anomalies occurring at continuous time points and can not be suitable for trend, periodicity and seasonal time series.

Description

Data amount abnormality detection method, device, medium, and program product
Technical Field
The present application relates to the field of financial technology (Fintech), and in particular, to a method, an apparatus, a medium, and a program product for detecting data volume abnormality.
Background
With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually shifting to financial technology (Fintech).
At present, the detection of abnormal data values or abnormal data values is very important for internet systems in many financial fields, and because the abnormal data values are discovered in time and relevant personnel are informed immediately to follow up processing, unnecessary loss can be avoided or recovered, so that abnormal data monitoring is an important component in various internet system operation and maintenance tools.
The existing data anomaly monitoring technology has two extremes, one is that the monitoring logic is too simple, and by comparing the numerical changes of two data points before and after the data points, if the variable quantity exceeds the threshold range, the data anomaly monitoring method is considered to be abnormal, and the method can only identify the anomaly at a certain time point, and cannot identify the continuous multi-time-point anomaly; another algorithm, such as an extremum student distribution outlier test algorithm or other improved algorithms, is complex in logic, inefficient in monitoring, and is not suitable for monitoring trending, periodic, seasonal time series.
Therefore, the existing data anomaly monitoring has the technical problems that anomalies occurring at continuous time points cannot be distinguished, and the existing data anomaly monitoring cannot be applied to trend, periodic and seasonal time series.
Disclosure of Invention
The application provides a data volume abnormality detection method, a data volume abnormality detection device, a data volume abnormality detection medium and a program product, and aims to solve the technical problems that the existing data abnormality monitoring method cannot discriminate abnormalities occurring at continuous time points and cannot be applied to trend, periodic and seasonal time series.
In a first aspect, the present application provides a data amount abnormality detection method, including:
obtaining an initial time series, the initial time series comprising: data volume of at least one target data table in the database at different time points;
calculating a plurality of change rates of the initial time sequence according to a preset time interval;
calculating logarithmic values of the change rates according to preset base numbers;
combining the log values into a first time sequence according to the time sequence corresponding to the change rate;
judging whether each data in the first time sequence falls into an abnormal interval or not;
if yes, determining that the first time sequence has an abnormal outlier;
and outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier.
In one possible design, the predetermined base number corresponds to a predicted distribution form to which the rate of change is subject, the predetermined base number including a natural constant e.
In one possible design, determining whether each of the data in the first time series falls within an abnormal interval includes:
calculating a plurality of moving average values corresponding to the first time sequence according to the moving average period, and combining all the moving average values into a second time sequence;
and judging whether each moving average value in the second time series falls into an abnormal interval.
In one possible design, the exception types include: at least one of single-point shock anomalies, periodic fluctuation anomalies, and trending fluctuation anomalies;
correspondingly, outputting corresponding alarm information, including:
when only one kind of abnormality occurs, recording the abnormality type into the timing or immediate notification information;
and when more than two kinds of abnormity occur, immediately sending alarm information to operation and maintenance personnel.
In one possible design, the preset time interval corresponding to the single-point impact anomaly is at least one time unit, and the moving average period is at least twice the preset time interval, where the time unit includes: minutes, hours, days, weeks, months, quarters, years.
In one possible design, the preset time interval corresponding to the periodic fluctuation anomaly is greater than or equal to two time units, the moving average period is greater than the preset time interval, and the time units include: minutes, hours, days, weeks, months, quarters, years.
In one possible design, the predetermined time interval corresponding to the trending anomalous fluctuation is at least one time unit, the moving average period is greater than the predetermined time interval and is at least 7 to 10 time units, and the time unit includes: minutes, hours, days, weeks, months, quarters, years.
In a second aspect, the present application provides a data amount abnormality detection apparatus, including:
an obtaining module, configured to obtain an initial time sequence, where the initial time sequence includes: data volume of at least one target data table in the database at different time points;
a processing module to:
calculating a plurality of change rates of the initial time sequence according to a preset time interval;
calculating logarithmic values of the change rates according to preset base numbers;
combining the log values into a first time sequence according to the time sequence corresponding to the change rate;
judging whether each data in the first time sequence falls into an abnormal interval or not;
if yes, determining that the first time sequence has an abnormal outlier;
and outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier.
In one possible design, the predetermined base number corresponds to a predicted distribution form to which the rate of change is subject, the predetermined base number including a natural constant e.
In one possible design, the processing module is to:
calculating a plurality of moving average values corresponding to the first time sequence according to the moving average period, and combining all the moving average values into a second time sequence;
and judging whether each moving average value in the second time series falls into an abnormal interval.
In one possible design, the exception types include: at least one of single-point impact anomalies, periodic fluctuation anomalies, and trending fluctuation anomalies;
correspondingly, the processing module is configured to:
when only one kind of abnormality occurs, recording the abnormality type into the timing or immediate notification information;
and when more than two kinds of abnormity occur, immediately sending alarm information to operation and maintenance personnel.
In one possible design, the preset time interval corresponding to the single-point impact anomaly is at least one time unit, and the moving average period is at least twice the preset time interval, where the time unit includes: minutes, hours, days, weeks, months, quarters, years.
In one possible design, the preset time interval corresponding to the periodic fluctuation anomaly is greater than or equal to two time units, the moving average period is greater than the preset time interval, and the time units include: minutes, hours, days, weeks, months, quarters, years.
In one possible design, the predetermined time interval corresponding to the trending anomalous fluctuation is at least one time unit, the moving average period is greater than the predetermined time interval and is at least 7 to 10 time units, and the time unit includes: minutes, hours, days, weeks, months, quarters, years.
In a third aspect, the present application provides an electronic device comprising:
a memory for storing program instructions;
and the processor is used for calling and executing the program instructions in the memory to execute any one of the possible methods provided by the first aspect.
In a fourth aspect, the present application provides a storage medium, wherein a computer program is stored in the readable storage medium, and the computer program is used to execute any one of the possible data amount abnormality detection methods provided in the first aspect.
In a fifth aspect, the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements any one of the possible data amount abnormality detection methods provided in the first aspect.
The application provides a data volume anomaly detection method, device, medium and program product, by obtaining an initial time sequence, the initial time sequence comprising: data volume of at least one target data table in the database at different time points; calculating a plurality of change rates of the initial time sequence according to a preset time interval; calculating logarithmic values of the change rates according to preset base numbers; combining the logarithmic values into a first time sequence according to the time sequence corresponding to the change rate; judging whether each data in the first time sequence falls into an abnormal interval or not; if yes, determining that the first time sequence has an abnormal outlier; and outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier. The method and the device realize periodic verification and trend verification, can detect all outliers of the whole time sequence without judging whether the data at the current time point are outliers, and solve the technical problems that the existing data anomaly monitoring cannot discriminate the anomalies of continuous time points and cannot be applied to trend, periodic and seasonal time sequences.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1a-1b are schematic diagrams illustrating a detection effect of a conventional data quantity anomaly detection method provided in the present application;
fig. 2 is a schematic flowchart of a data amount anomaly detection method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a time-series structure provided in an embodiment of the present application;
FIG. 4 is a statistical plot of a time series of periodic variations provided herein;
FIG. 5 is a graph of a time series of trend changes provided by an embodiment of the present application;
FIG. 6 is a schematic flow chart of another data quantity anomaly detection method provided in the present application;
FIG. 7 is a graph of an initial time series provided by an embodiment of the present application;
FIG. 8 is a graph of a first time series corresponding to the initial time series of FIG. 7 provided by an embodiment of the present application;
FIG. 9 is a graph of a second time series provided by an embodiment of the present application;
fig. 10 is a schematic structural diagram of a data quantity anomaly detection apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device provided in the present application.
Specific embodiments of the present application have been shown by way of example in the drawings and will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, including but not limited to combinations of embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any inventive step are within the scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The following explanations are made for terms to which this application refers:
time-series: the chronological observations of a variable, such as the memory occupancy, correspond to the dynamic values at times t1, t2, …, tn, where t1< t2< … < tn.
Database, a warehouse that organizes, stores, and manages data according to a certain data structure.
Table: the database is an object used for storing data in the database, is a set of structured data, and is the basis of the whole database system.
Time point value: a value corresponding to a certain time point t in the time series.
And the outlier refers to an abnormal fluctuation time point in the time sequence.
N(μ,σ 2 ): expresses the mathematical expectation as mu and the variance as sigma 2 Is normally distributed.
Lognormal distribution: if the natural logarithm LN (X) of X follows a normal distribution, then X follows a log-normal distribution.
LN (X) or ln (x): representing a natural logarithmic function.
Extreme Studded Development (ESD) Test: twitter proposes an extreme student distribution outlier test algorithm.
Mean value μ: the mean of the sample or ensemble is represented in the probability.
Standard deviation σ: the probability indicates the degree of deviation of an individual from the overall average.
Interval: a preset time interval for identifying an interval of two time points, such as: at 7 days intervals, interval is 7.
Period: a moving average period for identifying an interval for taking a moving average, such as: taking a five-day moving average, period is 5.
P Threshold a preset probability Threshold for identifying a Threshold for marginal probability.
Alpha _ value: used for identifying the probability quantile corresponding to the P _ Threshold.
The detection of data outliers or data fluctuation is particularly important for many IT (Internet Technology) systems, and if the outliers of the data can be found in time and the relevant personnel can be informed to follow up the processing immediately, unnecessary loss is likely to be recovered. Therefore, data anomaly monitoring is the most basic and important component of various IT system operation and maintenance tools, and most data anomaly monitoring scenes are all time sequence inspection, such as time sequence monitoring of server CPU (central processing unit) or memory use conditions.
Many IT systems monitor fluctuations in the data volume of some data tables in the database, usually in a time series, which may be counted by minutes, hours, and days, and if there is an abnormal fluctuation, an alarm message is sent, and the operation and maintenance or related personnel follow up the process.
The existing data anomaly detection schemes mainly comprise the following two types:
scheme 1: an existing simple and direct inspection method is to compare a value of a current time point with a value of a previous time point to obtain a value variation between the two time points, and if the value variation exceeds a preset threshold, the value variation is regarded as an abnormal data point.
Scheme 2: extreme Student Development (ESD) Test: in the algorithm proposed by Titter, assuming that the data values at each time point in the time series are normally distributed as a whole, 1 outlier can be detected by using t distribution.
For the scheme 1, the detection mode is too simple and rough, and specifically, only the abnormality of a certain point (the data at the previous point is normal) can be detected.
Fig. 1a to 1b are schematic diagrams illustrating a detection effect of a conventional data amount anomaly detection method provided in the present application. As shown in fig. 1a, the data at the previous time point is D1, and the data at the next time point is D2, such an abnormal situation can be detected by the above scheme 1. However, as shown in fig. 1b, if the data D1 at the previous time point is already an abnormal outlier, the data D2 at the next time point is an abnormal outlier, and then the data D3 at the next time point returns to normal, but since the scheme 1 compares the variation of the two adjacent previous and next data, the data D1 can be identified as an abnormal outlier, but the data D2 cannot be identified as an abnormal outlier, and the data D3 should be a normal value but is misjudged as an abnormal outlier.
For the scheme 2, only one outlier is detected, although some improved algorithms in the prior art can detect a plurality of outliers, these methods are very complex, the detection efficiency is reduced, and the defects of the scheme 1 still cannot be avoided, because it is assumed that the data values in the time series are normally distributed, which cannot perform anomaly detection on the trending, periodic and seasonal time series.
In order to solve the above problems, the inventive concept of the present application is:
the inventor of the application analyzes the two prior art schemes, and finds that the scheme 1 can only solve a single scene and has a high error probability, the scheme 2 basically assumes that the single scene is not in accordance with the real situation, ignores the trend and periodicity of data, needs to introduce other methods for solving, and improves the monitoring cost. Moreover, if a supervised machine learning algorithm is adopted, marking needs to be artificially judged, external data is introduced, such as whether the mark is weekend or not and whether the mark is holiday or not, model parameters need to be continuously iterated, and the calculation complexity is extremely high.
The method adopts the change rate of the table data volume at two points as a basic calculation unit, namely the change rate across time periods, combines the difference of basic principles of moving average, normal distribution and skewed distribution, skillfully solves the problems of trend and periodicity, and has low algorithm time complexity.
The application provides a solution of a set of operation and maintenance tool, the tool can detect abnormal values or outliers in real time for various time series data, and mainly monitors the fluctuation change of the time series of the table data volume in a production database, and the solution comprises three parts: the system comprises a data real-time collector, an outlier fluctuation detector and a real-time alarm platform.
The data real-time collector is used for collecting data quantity of one or more data tables in the database in real time and combining the data quantity into a time sequence. The outlier fluctuation detector is used for detecting whether the time sequence has abnormal outliers, if so, an early warning is sent to a user through the real-time warning platform, and the data volume abnormality detection method has the following characteristics:
(1) and (3) periodic inspection, which can detect fluctuation abnormity of the time sequence with periodic expression.
(2) And (4) trend checking, wherein fluctuation abnormity of a time sequence with trend can be detected.
(3) Not only can judge whether the current time point is outlier, but also can detect all outliers in the whole time sequence.
(4) Simple and efficient, simple algorithm, and counting time complexity of O (n) 2 ) The single time complexity point is determined to be O (1), and all time points are determined to be O (n).
(5) The data volume change of the designated table of the database can be monitored and reported in real time.
The data quantity anomaly detection method provided by the application is specifically described as follows:
fig. 2 is a schematic flow chart of a data amount anomaly detection method according to an embodiment of the present application. As shown in fig. 2, the data amount abnormality detection method is used for performing time-series monitoring on the number of data tables in a database or fluctuation changes of data amount, and includes the specific steps of:
s201, acquiring an initial time sequence.
In this step, the initial time series includes: the data volume of at least one target data table in the database at different time points.
In this embodiment, the initial time sequence is obtained by the data real-time collector. At least one data table in the database can be pre-designated as a target data table, and the data volume of the target data table is obtained at regular time, or the data volume of the target data table is obtained in real time in other non-regular modes, and the data volumes are combined into an initial time sequence according to time sequence.
Specifically, a connection is first established with the Database, for example, using JAVA language, a connection pool is established through JDBC (JAVA Database Connectivity), and the connection pool can be adapted to various types of databases. Then, in response to the collection instruction of the time series, reading the data volume of the target data table through the connection pool as needed or at regular time, for example, executing a data collection SQL statement once every hour: select count from table. And then recording the read data volume of the target data table, the name of the target data table and the current acquisition time into a predefined result table, wherein an exemplary structure of the result table is shown as table one:
table name Numerical value Time
Tbl1 12313 2021/01/01 21:00:00
Tbl1 13231 2021/01/01 22:00:00
Watch 1
Then, the initial time series of the outputs of the real-time data collector is read: x (X) 1 ,X 2 ,X 3 ,……,X n-2 ,X n-1 ,X n )。
Fig. 3 is a schematic diagram of a time-series structure provided in the embodiment of the present application. As shown in FIG. 3, any value X in the time series X i Is a key-value pair, such as: (data value i, time i), the arrow direction in fig. 3 represents the chronological order.
S202, calculating a plurality of change rates of the initial time sequence according to a preset time interval.
In this step, the preset time interval may be input by the operation and maintenance staff in advance, such as: interval 1 (day). I.e. the rate of change of one time point to previous interval time points, such as the rate of change to the previous day or 7 days ago.
It should be noted that the preset time interval is an integer multiple of a preset time unit, and the time unit may be set as: minutes, hours, days, weeks, months, quarters, years, etc.
It should be noted that the change rate used in the present application is the ratio of the data values at the two time points before and after, i.e. X n /X n-interval For example, when interval is 1, the rate of change is x 2 /x 1 . Rather than the variation of the prior art, i.e. the difference between the data values at two time points before and after, i.e. X n -X n-interval
The periodicity that needs to be captured from the time sequence can be adjusted by adjusting the size of the preset time interval and/or the unit of the corresponding time unit, or the preset time interval can be adjusted by the periodicity of the time sequence. For example, the actual data may exhibit a monthly periodicity, a weekly periodicity, or even a half-day periodicity, i.e., the data may change regularly according to these periods. The size of the preset time interval and/or the unit of the corresponding time unit may be determined at an actual periodicity.
Fig. 4 is a statistical graph of a time series of periodic variations provided herein. As shown in fig. 4, a curve 1 is a curve of the data amount of the target data table in the database changing with time, i.e. an initial time sequence, the unit of the data amount is million, and the unit of the time is day, and it can be known from fig. 4 that the curve 1 shows a periodic fluctuation change by cycles, i.e. the preset time interval is 7. Curve 2 shows the time variation of the rate of change calculated at predetermined time intervals for the initial time series. It can be seen that curve 2 smoothes out the periodic fluctuating disturbance. The technical problem that abnormal outliers cannot be accurately judged on a periodically changing time sequence in the prior art can be solved by taking the change rate, and the problem that the abnormal fluctuation of the data volume can be more accurately identified by only using the inertia thinking of difference value change in the prior art is broken through.
Fig. 5 is a graph of a time series of trend changes provided by an embodiment of the present application. As shown in fig. 5, curve 3 is a time-varying curve of the data amount of the target data table in the database, i.e., an initial time series, the data amount is in millions, and the time unit is days, and it can be seen from fig. 5 that curve 3 shows a trend fluctuating growth. For this type of time series, both the prior art scheme 1 and scheme 2 have false judgment, and at this time, the assumption that the whole time series conforms to the normal distribution in the scheme 2 is obviously not applicable. Curve 4 is the initial time series rate of change curve proposed in this application, and it can be seen that curve 4 excludes the interference of the trend fluctuation.
S203, calculating the logarithm values of all the change rates according to the preset base numbers, and combining the logarithm values into a first time sequence according to the time sequence corresponding to the change rates.
In this step, unlike the prior art Extreme Study Device (ESD) Test which assumes that the time series follows a normal distribution, the present application breaks periodic or trend fluctuation interference by the change rate, and at this time, the detected object is converted from the initial time series to a sequence formed by the change rate, and the present inventors found that the change rate does not follow the normal distribution in each type of actual use scenario, but rather a more approximate biased distribution, i.e., the change rate is generally distributed more concentrated in a certain region or a certain side of the biased mean value, rather than being distributed completely symmetrically on both sides of the mean value or as an unbiased distribution. Therefore, the logarithm operation of the change rate can eliminate the bias influence of the change rate, or eliminate the characteristic that the change rate does not conform to the normal distribution caused by the heteroscedasticity of the change rate, or the characteristic that the change rate does not conform to the linear regression characteristic.
The first time sequence formed in this way solves the technical problem that abnormal outlier judgment is not accurate enough due to the fact that the prior art does not consider the bias characteristic of the change rate.
And S204, judging whether each data in the first time sequence falls into an abnormal interval.
In this step, the abnormal interval corresponds to a probability distribution obeyed by the first time series, for example, if the first time series obeys a lognormal distribution, the probability quantile point can be determined by setting a corresponding probability threshold.
In this embodiment, if yes, S205 is executed, and if no, the process returns to S201.
Specifically, the pre-trained state distribution model analyzes and divides the state distribution situation of the change rate of the data volume, and it can be understood that in a certain region on the change rate distribution diagram, the probability of the change rate appearing in the region is greater than or equal to a preset probability threshold.
In this embodiment, the preset probability threshold is divided into an upper threshold and a lower threshold, and the upper threshold and the lower threshold respectively correspond to a certain value on an upper coordinate axis of the change rate distribution map, that is, a probability quantile point, so the method specifically includes the following steps:
determining probability quantiles in the skewing distribution model according to a preset probability threshold;
determining an abnormal interval according to the probability quantile;
judging whether each data in the first time sequence falls into an abnormal interval or not;
if yes, determining the data as abnormal outliers.
And S205, outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier.
In this step, the types of the abnormality include: at least one of single point shock anomalies, periodic wave anomalies, and trending wave anomalies. Correspondingly, outputting corresponding alarm information, including:
when only one kind of abnormality occurs, recording the abnormality type into the timing or immediate notification information;
when more than two kinds of abnormity occur, alarm information is immediately sent to operation and maintenance personnel.
In the present embodiment, the exception level is divided into four levels:
no abnormality: N/A, i.e. without any processing and without alarm
And (4) notification level: and the INFO is only used as a notification information record and is set by operation and maintenance personnel of the alarm route.
Warning level: the WARNING is considered to be most likely to generate data abnormality, needs attention of engine operation and maintenance personnel and needs to give an alarm.
Error level: ERROR, i.e. determining that the data is abnormal and erroneous, must send an alarm.
For the above four levels, the specific allocation manner for sending the alarm is as shown in table two:
numerical impact Periodicity of the cycle Tendency of Alarm level
Is normal Is normal Is normal N/A
Abnormality (S) Is normal and normal Is normal INFO
Is normal and normal Abnormality (S) Is normal INFO
Is normal Is normal Abnormality (S) INFO
Abnormality (S) Abnormality (S) Is normal WARNING
Is normal Abnormality (S) Abnormality (S) WARNING
Abnormality (S) Is normal Abnormality (S) WARNING
Abnormality (S) Abnormality (S) Abnormality (S) ERROR
Watch two
The embodiment of the application provides a data volume anomaly detection method, which comprises the following steps of obtaining an initial time sequence: data volume of at least one target data table in the database at different time points; calculating a plurality of change rates of the initial time sequence according to a preset time interval; calculating logarithmic values of the change rates according to preset base numbers; combining the log values into a first time sequence according to the time sequence corresponding to the change rate; judging whether each data in the first time sequence falls into an abnormal interval or not; if yes, determining that the first time sequence has an abnormal outlier; and outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier. The method and the device realize periodic verification and trend verification, can detect all outliers of the whole time sequence without judging whether the data at the current time point are outliers, and solve the technical problems that the existing data anomaly monitoring cannot discriminate the anomalies of continuous time points and cannot be applied to trend, periodic and seasonal time sequences.
Fig. 6 is a schematic flow chart of another data amount abnormality detection method according to the present application. As shown in fig. 6, the data amount abnormality detection method specifically includes the steps of:
s601, acquiring an initial time sequence.
In this step, the initial time series includes: the data volume of at least one target data table in the database at different time points.
In the present embodiment, for convenience of understanding, it is assumed that the data amount (unit: million) of the target data table acquired at each sampling time in the initial time series X is as follows:
X=(122.2,123.3,123.1,124.5,123.7,122.9,122.1,121.7,122.6,123.1,122.5,123.4,123,124.6,123.8,122.6,122.3,121.8,122.7,123.2,122.2,123.3,123.1,124.5,123.7,122.9,122.1,121.7,122.6,123.1,122.2,123.3,123.1,124.5,123.7,122.9,122.1,121.7,122.6,129.1)。
fig. 7 is a graph of an initial time sequence provided by an embodiment of the present application. As shown in fig. 7, the initial time series is both periodic and shows a trend in one period.
S602, calculating a plurality of change rates of the initial time sequence according to a preset time interval.
In this step, the respective rates of change of the initial time series are found from formula (1), which is as follows:
Figure BDA0003702386100000121
wherein interval is a preset time interval.
And S603, calculating logarithmic values of all the change rates according to the preset base number.
In this step, the predetermined base number is greater than zero, for example, a natural constant e, or 10. The function of the logarithm value is to eliminate the heteroscedasticity among all the change rates, because the inventor finds that the sequence corresponding to the change rate does not conform to the symmetry characteristic of normal distribution in practical application, but is concentrated to a certain area and has asymmetric deviation characteristic, the logarithm operation is to use the deviation characteristic, so as to lay a foundation for judging abnormal outliers by using lognormal distribution subsequently, and the lognormal distribution is a biased distribution.
S604, determining a first time sequence according to the time sequence corresponding to each log value.
For steps S603 and S604, specifically, it can be assumed that the preset base number is a (a)>0) Any data X 'in the first time sequence X' i Can be calculated using equation (2):
X′ i =log a B i (2)
wherein, B i Is the rate of change.
It should be noted that the preset base number may take different values according to different bias distribution models, or may be set to different values according to different detected objects.
For the sake of understanding, assuming that the natural constant e is a preset base number, the first time series X '(X' k ,X' k+1 ,…,X' n-2 ,X' n-1 ,X' n ) Wherein, in the step (A),
Figure BDA0003702386100000131
it should be noted that k here is counted from the first interval point of the initial time series X because the previous interval point of the initial time series X does not count the change rate.
Fig. 8 is a graph of a first time series corresponding to the initial time series shown in fig. 7 according to an embodiment of the present application. As shown in fig. 8, the periodic and trending fluctuations of the initial time series are effectively filtered by the logarithmic rate of change, which is because the rate of change is more in line with the skewed distribution.
S605, determining probability quantiles in the skewed distribution model according to a preset probability threshold, and determining abnormal intervals according to the probability quantiles.
In this embodiment, the skewness distribution model includes: a log normal distribution model.
And S606, determining a second time sequence according to the moving average period and the first time sequence by using a moving average algorithm.
The effect of this step is to smooth the normal fluctuations of the first time series in order to more clearly find abnormal outliers.
In particular, a new log moving average sequence X ", i.e. a second time sequence, may be output by the moving average converter moveaveragetimeseries converter. And calculating a moving average sequence X' according to the moving average Period, wherein:
X” i is X' i And the average of the previous consecutive Period time points. X' i =∑ i i-period+1 X' j Generating a new sequence X "((X))" m ,X” m+1 ,………X” n-2 ,X” n-1 ,X” n )。
It is noted that since the period points before X 'are not counted to the moving average, m here starts counting from the first period point in time of the first time series X'.
S607, judging whether each mean value in the second time sequence falls into an abnormal interval.
In this step, if yes, the data is determined to be an abnormal outlier, S608 is executed, if no, the next numerical value is continuously determined, and if the second time series is completely finished and no abnormal outlier is found, S601 is returned to.
Fig. 9 is a graph of a second time series provided by the embodiment of the present application. As shown in FIG. 9, the log change rates are all substantially within + -0.005, and the outlier is an outlier, the last data point in the graph is an outlier.
And S608, outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier.
In this step, the types of the abnormality include: at least one of single point shock anomalies, periodic wave anomalies, and trending wave anomalies.
In this embodiment, (1) the preset time interval corresponding to the single-point impact anomaly is at least one time unit, and the moving average period is at least twice the preset time interval;
(2) the preset time interval corresponding to the abnormal periodic fluctuation is greater than or equal to two time units, and the moving average period is greater than the preset time interval;
(3) the preset time interval corresponding to the trend fluctuation abnormity is at least one time unit, and the moving average period is larger than the preset time interval and is at least 7 to 10 time units.
The time unit includes: minutes, hours, days, weeks, months, quarters, years.
Specifically, in order to more accurately identify the data quantity abnormality of the target data table, the periodicity and the trend of the data are comprehensively considered, so that the situation detection according to the periodicity, the trend and the like is performed on the time series data, and the outlier fluctuation is divided into three situations:
single point shock anomaly: that is, data at the current time point fluctuates sharply compared to data at the previous time point.
Periodic fluctuation anomaly: during a certain time period, an aperiodic, abnormal fluctuation point occurs.
Trending fluctuation anomaly: in a trending (increasing trend or falling trend) time series, an abnormal fluctuation point that clearly deviates from the trend appears.
In this embodiment, a calculation instruction module in the real-time alarm platform sends calculation parameter instructions corresponding to the calculation parameters of the scenes to the fluctuation detector in a JSON format. Different anomaly types are identified by the fluctuation detector.
For example, Single point anomalous impact scenario (Single Detect): the point value change in the scene can be compared with one another, that is, the calculation instruction is: interval { "P _ Threshold": 0.99 "," period ": 5", "interval": 1 "}.
Periodic fluctuation exception scene (Frequency Detect): at an abnormal fluctuation point in the cycle expression, the interval should represent the cycle frequency, for example, if the data has a characteristic of taking "cycle" as the cycle, the interval is 7 days, and the moving interval is larger than the interval. The calculation instruction is as follows: e.g., { "P _ Threshold": 0.99 "," period ": 14", "interval": 7 "}.
Trend fluctuation anomaly scene (Trend Detect)): at an abnormal fluctuation point with a trend (a growing trend or a falling trend), the parameter of the trend is a moving average period, for example, in a growing trend for one month, the calculation instruction is: the moving average period is 30 days, e.g., { "P _ Threshold": 0.99 "," period ": 50", "interval": 1 "}.
Correspondingly, outputting corresponding alarm information, including:
when only one kind of abnormality occurs, recording the abnormality type into the timing or immediate notification information;
when more than two kinds of abnormity occur, alarm information is immediately sent to operation and maintenance personnel.
The embodiment of the application provides a data volume anomaly detection method, which comprises the following steps of obtaining an initial time sequence: data volume of at least one target data table in the database at different time points; calculating a plurality of change rates of the initial time sequence according to a preset time interval; calculating logarithmic values of the change rates according to preset base numbers; combining the log values into a first time sequence according to the time sequence corresponding to the change rate; judging whether each data in the first time sequence falls into an abnormal interval or not; if yes, determining that the first time sequence has an abnormal outlier; and outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier. The method and the device realize periodic verification and trend verification, can detect all outliers of the whole time sequence without judging whether the data at the current time point are outliers, and solve the technical problems that the existing data anomaly monitoring cannot discriminate the anomalies of continuous time points and cannot be applied to trend, periodic and seasonal time sequences.
Fig. 10 is a schematic structural diagram of a data amount abnormality detection apparatus according to an embodiment of the present application. The data amount abnormality detection apparatus 1000 may be implemented by software, hardware, or a combination of both.
As shown in fig. 10, the data amount abnormality detection apparatus 1000 includes:
an obtaining module 1001, configured to obtain an initial time sequence, where the initial time sequence includes: data volume of at least one target data table in the database at different time points;
a processing module 1002 configured to:
calculating a plurality of change rates of the initial time sequence according to a preset time interval;
calculating logarithmic values of the change rates according to preset base numbers;
combining the logarithmic values into a first time sequence according to the time sequence corresponding to the change rate;
judging whether each data in the first time sequence falls into an abnormal interval or not;
if yes, determining that the first time sequence has an abnormal outlier;
and outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier.
In one possible design, the predetermined base number corresponds to a predicted distribution form to which the rate of change is subject, the predetermined base number including a natural constant e.
In one possible design, the processing module 1002 is configured to:
calculating a plurality of moving average values corresponding to the first time sequence according to the moving average period, and combining all the moving average values into a second time sequence;
and judging whether each moving average value in the second time series falls into an abnormal interval.
In one possible design, the exception types include: at least one of single-point shock anomalies, periodic fluctuation anomalies, and trending fluctuation anomalies;
correspondingly, the processing module 1002 is configured to:
when only one kind of abnormality occurs, recording the abnormality type into the timing or immediate notification information;
when more than two kinds of abnormity occur, alarm information is immediately sent to operation and maintenance personnel.
In one possible design, the preset time interval corresponding to the single-point impact anomaly is at least one time unit, and the moving average period is at least twice the preset time interval, where the time unit includes: minutes, hours, days, weeks, months, quarters, years.
In one possible design, the preset time interval corresponding to the periodic fluctuation anomaly is greater than or equal to two time units, the moving average period is greater than the preset time interval, and the time units include: minutes, hours, days, weeks, months, quarters, years.
In one possible design, the predetermined time interval corresponding to the trend toward fluctuating anomaly is at least one time unit, the moving average period is greater than the predetermined time interval and is at least 7 to 10 time units, and the time unit includes: minutes, hours, days, weeks, months, quarters, years.
It should be noted that the apparatus provided in the embodiment shown in fig. 10 can execute the method provided in any of the above method embodiments, and the specific implementation principle, technical features, term interpretation and technical effects thereof are similar and will not be described herein again.
Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 11, the electronic device 1100 may include: at least one processor 1101 and memory 1102. Fig. 11 shows an electronic device as an example of a processor.
The memory 1102 stores programs. In particular, the program may include program code including computer operating instructions.
Memory 1102 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 1101 is configured to execute computer-executable instructions stored by the memory 1102 to implement the methods described in the above method embodiments.
The processor 1101 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
Alternatively, the memory 1102 may be separate or integrated with the processor 1101. When the memory 1102 is a device independent from the processor 1101, the electronic apparatus 1100 may further include:
a bus 1103 is used to connect the processor 1101 and the memory 1102. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. Buses may be classified as address buses, data buses, control buses, etc., but do not represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 1102 and the processor 1101 are integrated on a single chip, the memory 1102 and the processor 1101 may communicate through an internal interface.
An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium may include: various media that can store program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and in particular, the computer-readable storage medium stores program instructions for the methods in the above method embodiments.
An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the method in the foregoing method embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (11)

1. A data amount abnormality detection method is characterized by comprising:
obtaining an initial time series, the initial time series comprising: data volume of at least one target data table in the database at different time points;
calculating a plurality of change rates of the initial time sequence according to a preset time interval;
calculating the logarithm value of each change rate according to a preset base number;
combining the logarithmic values into a first time sequence according to the time sequence corresponding to the change rate;
judging whether each data in the first time sequence falls into an abnormal interval or not;
if yes, determining that the first time sequence has an abnormal outlier;
and outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier.
2. The method according to claim 1, wherein the predetermined base number corresponds to a predicted distribution form to which the change rate is subjected, and the predetermined base number includes a natural constant e.
3. The method according to claim 1 or 2, wherein the determining whether each data in the first time series falls into an abnormal interval includes:
calculating a plurality of moving average values corresponding to the first time sequence according to a moving average period, and combining the moving average values into a second time sequence;
and judging whether each moving average value in the second time series falls into the abnormal interval.
4. The data amount abnormality detection method according to claim 3, characterized in that the abnormality type includes: at least one of single-point impact anomalies, periodic fluctuation anomalies, and trending fluctuation anomalies;
correspondingly, the outputting the corresponding alarm information includes:
when only one kind of abnormity occurs, recording the abnormity type into timing or instant notification information;
and when more than two kinds of abnormity occur, immediately sending the alarm information to operation and maintenance personnel.
5. The method according to claim 4, wherein the preset time interval corresponding to the single-point impact anomaly is at least one time unit, the moving average period is at least twice as long as the preset time interval, and the time unit includes: minutes, hours, days, weeks, months, quarters, years.
6. The method according to claim 4, wherein the preset time interval corresponding to the periodic fluctuation anomaly is greater than or equal to two time units, the moving average period is greater than the preset time interval, and the time units include: minutes, hours, days, weeks, months, quarters, years.
7. The data amount abnormality detection method according to claim 4, wherein the preset time interval corresponding to the trending fluctuation abnormality is at least one time unit, the moving average period is greater than the preset time interval and is at least 7 to 10 time units, and the time unit includes: minutes, hours, days, weeks, months, quarters, years.
8. An apparatus for detecting an abnormality in data amount, comprising:
an obtaining module, configured to obtain an initial time sequence, where the initial time sequence includes: data volume of at least one target data table in the database at different time points;
a processing module to:
calculating a plurality of change rates of the initial time sequence according to a preset time interval;
calculating the logarithm value of each change rate according to a preset base number;
combining the logarithmic values into a first time sequence according to the time sequence corresponding to the change rate;
judging whether each data in the first time sequence falls into an abnormal interval or not;
if yes, determining that the first time sequence has an abnormal outlier;
and outputting corresponding alarm information according to the abnormal type corresponding to the abnormal outlier.
9. An electronic device, comprising:
a processor; and (c) a second step of,
a memory for storing a computer program for the processor;
wherein the processor is configured to perform the data amount abnormality detection method of any one of claims 1 to 7 via execution of the computer program.
10. A computer-readable storage medium on which a computer program is stored, the computer program realizing the data amount abnormality detection method according to any one of claims 1 to 7 when executed by a processor.
11. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the data volume anomaly detection method according to any one of claims 1 to 7.
CN202210695713.9A 2022-06-20 2022-06-20 Data amount abnormality detection method, device, medium, and program product Pending CN114996257A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210695713.9A CN114996257A (en) 2022-06-20 2022-06-20 Data amount abnormality detection method, device, medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210695713.9A CN114996257A (en) 2022-06-20 2022-06-20 Data amount abnormality detection method, device, medium, and program product

Publications (1)

Publication Number Publication Date
CN114996257A true CN114996257A (en) 2022-09-02

Family

ID=83034765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210695713.9A Pending CN114996257A (en) 2022-06-20 2022-06-20 Data amount abnormality detection method, device, medium, and program product

Country Status (1)

Country Link
CN (1) CN114996257A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370906A (en) * 2023-08-21 2024-01-09 长江生态环保集团有限公司 Tube explosion detection and performance evaluation method based on single-point and time sequence anomaly detection
CN117560232A (en) * 2024-01-12 2024-02-13 深圳市纽创信安科技开发有限公司 Detection device and chip

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370906A (en) * 2023-08-21 2024-01-09 长江生态环保集团有限公司 Tube explosion detection and performance evaluation method based on single-point and time sequence anomaly detection
CN117560232A (en) * 2024-01-12 2024-02-13 深圳市纽创信安科技开发有限公司 Detection device and chip
CN117560232B (en) * 2024-01-12 2024-04-02 深圳市纽创信安科技开发有限公司 Detection device and chip

Similar Documents

Publication Publication Date Title
US11048729B2 (en) Cluster evaluation in unsupervised learning of continuous data
CN114996257A (en) Data amount abnormality detection method, device, medium, and program product
CN111045894B (en) Database abnormality detection method, database abnormality detection device, computer device and storage medium
CN108429649B (en) System for comprehensive abnormity judgment based on multiple single-type acquisition results
US20120016886A1 (en) Determining a seasonal effect in temporal data
CN109934268B (en) Abnormal transaction detection method and system
CN113518011A (en) Abnormality detection method and apparatus, electronic device, and computer-readable storage medium
CN107391335B (en) Method and equipment for checking health state of cluster
US9600391B2 (en) Operations management apparatus, operations management method and program
US20190064789A1 (en) System analyzing device, system analyzing method, and computer-readable recording medium
Atzmueller et al. Anomaly detection and structural analysis in industrial production environments
CN110795324B (en) Data processing method and device
JP5771317B1 (en) Abnormality diagnosis apparatus and abnormality diagnosis method
KR101960755B1 (en) Method and apparatus of generating unacquired power data
CN110399903B (en) Abnormal data detection method and device and computer readable storage medium
CN115098740B (en) Data quality detection method and device based on multi-source heterogeneous data source
CN109990803A (en) The method, apparatus of method, apparatus and the sensor processing of detection system exception
CN112168188A (en) Processing method and device for pressure detection data
US11954945B2 (en) Systems and methods for analyzing machine performance
US10649874B2 (en) Long-duration time series operational analytics
CN115510998A (en) Transaction abnormal value detection method and device
US20190003927A1 (en) Monitoring device, monitoring method, and program
JP5771318B1 (en) Abnormality diagnosis apparatus and abnormality diagnosis method
CN112749035A (en) Anomaly detection method, device and computer readable medium
CN116448062B (en) Bridge settlement deformation detection method, device, computer and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination