CN102945320A - Time series data abnormity detection method and device - Google Patents

Time series data abnormity detection method and device Download PDF

Info

Publication number
CN102945320A
CN102945320A CN2012104211430A CN201210421143A CN102945320A CN 102945320 A CN102945320 A CN 102945320A CN 2012104211430 A CN2012104211430 A CN 2012104211430A CN 201210421143 A CN201210421143 A CN 201210421143A CN 102945320 A CN102945320 A CN 102945320A
Authority
CN
China
Prior art keywords
data
time series
neighbor node
series data
sigma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012104211430A
Other languages
Chinese (zh)
Inventor
余宇峰
朱跃龙
万定生
李士进
张建新
杨方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Hehai Technology Co Ltd
Hohai University HHU
Original Assignee
Nanjing Hehai Technology Co Ltd
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Hehai Technology Co Ltd, Hohai University HHU filed Critical Nanjing Hehai Technology Co Ltd
Priority to CN2012104211430A priority Critical patent/CN102945320A/en
Publication of CN102945320A publication Critical patent/CN102945320A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a time series data abnormity detection method which comprises the following steps: defining neighbor nodes of a data point di in a time series, calculating the mean value of the neighbor nodes of the data point di, calculating an absolute error value ei<k> and an accumulation variable quantity ACi, setting a threshold tau, respectively comparing the absolute error value ei<k>, the accumulation variable quantity ACi and the threshold tau, marking an abnormity point if ei<k> is greater than tau or ACi is greater than tau, and otherwise, keeping the data point di. The invention further discloses a time series data abnormity detection device. The judgment of the abnormality of the data point is related to the neighbor nodes of the data point, thereby reflecting the local concept. The width of the neighbor node can be dynamically regulated according to specific requirements in different time intervals, the parameter local optimum is guaranteed under the conditions of different time intervals, the abnormal data in the time series can be effectively detected, and the method and the device have an extensive application prospect.

Description

A kind of time series data method for detecting abnormality and device
Technical field
The invention belongs to data management and business support field, relate to the data quality control in information acquisition and the information process, be specifically related to a kind of detection method and device of real-time time sequence variation data.
Background technology
Be the high speed development of computer information technology of representative and the widespread use of sensor technology along with the internet, people produce and life in accumulated the data of magnanimity.These data that just presenting explosive growth are processed oneself through having exceeded people's limit of power.Data mining is as an emerging technology that has merged the subject such as statistical method, database technology, artificial intelligence network, method for visualizing, high-performance calculation and field, can help people in time to excavate out Useful Information and abundant knowledge, forecast analysis ability and the decision supporting capability of raising system, thereby be widely used and promote.
Abnormality detection is one of four class Knowledge Discovery tasks in the data mining, and its purpose is to find small probability event or the pattern of data centralization, namely with other data behavior or the obvious inconsistent data object of model (abnormity point).
So-called unusual (or claim isolated point, abnormity point, lower with) refer to database (collection) to such an extent as in inconsistent from other Data Representations or depart from widely other data points and suspect that it is fraction object by different mechanism generations.When the data that gather when infosystem were used for modeling, the abnormity point that exists in the system is effectively modeling and descriptive system not only, and can reduce the quality of data, and data analysis, management and decision level are produced harmful effect.Accuracy and the reliability expressed in order to improve infosystem, the result of use of assurance system model must be identified and be processed accordingly abnormal data before system modelling.
At present, method for detecting abnormality is based upon on the statistical basis mostly, mainly comprise based on the method that departs from, based on the method for the method that distributes, distance-based and density-based method etc., but the method for the type need to be known the distribution of data in advance, in addition, Outlier Detection Algorithm based on statistics is only suitable for mostly in excavating univariate numeric type data, to higher-dimension, time series data and inapplicable.And the method for biological method, machine learning and be applied to the seasonal effect in time series method for detecting abnormality based on the method for feature space etc. and still be in the exploratory stage also has a lot of jejune places, and a lot of method applicabilities are not strong, and the obvious defective of ubiquity.
Therefore, need a kind of new time series data method for detecting abnormality to address the above problem.
Summary of the invention
Goal of the invention: the present invention is directed to the analysis precision that the abnormal data that exists in the infosystem of prior art can reduce system model, the defective of the essence of reflection system that can not objective provides a kind of time series data method for detecting abnormality that improves abnormality detection efficient in the available data analytic process.
Technical scheme: for solving the problems of the technologies described above, time series data method for detecting abnormality of the present invention adopts following technical scheme:
A kind of time series data method for detecting abnormality, setting-up time sequence D={ d 1=(v 1, t 1), d 2=(v 2, t 2) ... d n=(v n, t n), time series data d i=(v i, t i) expression t iObserved reading v constantly i, its feature may further comprise the steps:
(1), data point d in the definition time sequence iNeighbor node Wherein, k is data point d iThe neighbor node window width;
(2), the neighbor node of computational data point di
Figure BDA00002321517300022
Average
Figure BDA00002321517300023
(3), difference computational data point d iWith the abutment points average
Figure BDA00002321517300024
Between absolute error value
Figure BDA00002321517300025
Data point d iBe adjacent a little
Figure BDA00002321517300026
Between accumulated change amount AC i
(4), setting-up time sequence data abnormality detection threshold tau, the respectively more above-mentioned absolute error value that calculates Accumulated change amount AC iAnd the magnitude relationship between the threshold tau: if
Figure BDA00002321517300028
Or AC iτ, then mark d iBe abnormity point, otherwise, d kept i
Beneficial effect: in the time series method for detecting abnormality that proposes among the present invention, the judgement that data point is unusual is relevant with the neighbor node of this data point, and this has embodied the concept of " part ", and this is it and abnormality detection difference in the past, also is the advantage place.Simultaneously, the neighbor node window width can dynamically be adjusted according to the real needs of different periods, has guaranteed the parameter local optimum in the different period situations.The time series Outlier Detection Algorithm that the present invention proposes can effectively detect the abnormal data in the time series, is with a wide range of applications.
Further, described k value represents the neighbor node window width, and it has determined the neighbor node number that participation computation of mean values (or accumulated change) relates to.The k value is larger, and the neighbor node that participates in calculating is more.For obtaining the best value of variable k, make that k value scope is 3-31, increment is 2, i.e. k={3,5 ..., 31}.
Further, the value of described threshold tau is comprised of two parts: the mean change amount on the period sequence and neighbor node variance.The former illustrated on the whole should period time series variation amount average level; The latter has illustrated the fluctuation situation of present node di ambient data from the part.Therefore, the size of threshold tau is that dynamic change calculates, and in the larger situation of observed reading fluctuation, threshold tau is also higher; In the less situation of observed reading fluctuation, threshold tau is lower.Seasonal effect in time series overall condition and local feature have been considered in the setting of threshold value, can dynamically update according to the fluctuation situation of neighbor node, have eliminated the harmful effect that predetermined threshold value is brought detection efficiency, thereby have improved the abnormality detection efficient of algorithm.
Further, described neighbor node
Figure BDA00002321517300029
Can be defined as bilateral neighbor node,
Figure BDA000023215173000210
Wherein 2k is data d iThe neighbor node window width (from i-k to i+k, do not contain d iItself).
Further, when described neighbor node
Figure BDA00002321517300031
During for bilateral neighbor node, its average Absolute error value
Figure BDA00002321517300033
And accumulated change amount AC iCan calculate respectively by following formula:
m i ( k ) = ( &Sigma; j = 1 2 k v i - j ) / ( 2 k )
e i ( k ) = | m i ( k ) - v i |
A C i = ( &Sigma; j = 1 2 k w j * | v i - v i - j | ) / ( &Sigma; j = 1 2 k w j ) , W in the formula k, w K-1... w 1, w ' 1, w ' 2... w ' kThe weight vectors of expression neighbor node.
In the bilateral neighbor node method for detecting abnormality of abnormal data, because right abutment points is the data that not yet detect, wherein may contain abnormity point; And only select left neighbor node can eliminate detected abnormity point, the testing result that will make abnormity point more accurately, more meaningful.Therefore, can use monolateral neighbor node method for detecting abnormality to improve bilateral neighbor node method for detecting abnormality.Monolateral neighbor node Outlier Detection Algorithm step is identical with bilateral neighbor node Outlier Detection Algorithm with judgment basis, but only defines data point d in the monolateral Outlier Detection Algorithm iLeft neighbor node.
Further, described neighbor node
Figure BDA00002321517300037
Can be defined as monolateral neighbor node,
Figure BDA00002321517300038
Wherein, 2k is data d iNeighbor node window width (from i-2k to i-1).
Further, when described neighbor node
Figure BDA00002321517300039
During for monolateral neighbor node, its average
Figure BDA000023215173000310
Absolute error value
Figure BDA000023215173000311
And accumulated change amount AC iCan calculate respectively by following formula:
m i ( k ) = ( &Sigma; s = 1 k v i - s + &Sigma; t = 1 k v i + t ) / ( 2 k ) ;
e i ( k ) = | m i ( k ) - v i |
A C i = ( &Sigma; s = 1 k w s * | v i - v i - s | + &Sigma; t = 1 k w t &prime; * | v i - v i + t | ) / ( &Sigma; s = 1 k w s + &Sigma; t = 1 k w t &prime; ) , W in the formula k, w K-1... w 1, w ' 1, w ' 2... w ' kThe weight vectors of expression neighbor node.
The invention also discloses a kind of time series data abnormal detector.
Time series data abnormal detector of the present invention adopts following technical scheme:
A kind of time series data abnormal detector, comprise load module, abnormality detection module, output module, described load module is used for providing abnormality detection required time series data collection, described abnormality detection module adopts aforesaid time series data method for detecting abnormality to carry out abnormality detection, and described output module is according to the testing result output abnormality data set of described abnormality detection module.
Further, described abnormality detection module comprises data pre-processing assembly, computation module and analytic unit, described data pre-processing assembly reception is carried out pre-service from data and process that the time series data load module collects, and described pre-service is to select data point to be assessed and define its neighbor node collection
Figure BDA00002321517300041
Described computation module is used for calculating through pretreated data
Figure BDA00002321517300042
And AC i, described analytic unit is to result of calculation
Figure BDA00002321517300043
AC iCompare with given threshold tau, and whether belong to definite unusual according to comparative result discriminatory analysis data to be tested.
Time series data abnormal detector of the present invention is simple in structure, and the judgement that data point is unusual is relevant with the neighbor node of this data point, and this has embodied the concept of " part ", and this is it and abnormality detection difference in the past, also is the advantage place.Simultaneously, the neighbor node window width can dynamically be adjusted according to the real needs of different periods, has guaranteed the parameter local optimum in the different period situations.Time series abnormal detector of the present invention can effectively detect the abnormal data in the time series, is with a wide range of applications.
Description of drawings
Fig. 1 is the process flow diagram according to the specific embodiment of bilateral time series data method for detecting abnormality among the present invention;
Fig. 2 is the process flow diagram according to the specific embodiment of monolateral time series data method for detecting abnormality among the present invention;
Fig. 3 is the structural representation of time series data abnormal detector;
Fig. 4 comprises unusual data distribution situation synoptic diagram in the time series data.
Embodiment
Below in conjunction with the drawings and specific embodiments, further illustrate the present invention, should understand these embodiment only is used for explanation the present invention and is not used in and limits the scope of the invention, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the modification of the various equivalent form of values of the present invention.
A kind of time series data method for detecting abnormality of the present invention, setting-up time sequence D={ d 1=(v 1, t 1), d 2=(v 2, t 2) ... d n=(v n, t n), time series data d i=(v i, t i) expression t iObserved reading v constantly i, may further comprise the steps:
(1), data point d in the definition time sequence iNeighbor node
Figure BDA00002321517300044
Wherein, k is data point d iThe neighbor node window width; It has determined the neighbor node number that participation computation of mean values (or accumulated change) relates to.The k value is larger, and the neighbor node that participates in calculating is more.For obtaining the best value of variable k, make that k value scope is 3-31, increment is 2, i.e. k={3,5 ..., 31}.
(2), the neighbor node of computational data point di Average
Figure BDA00002321517300046
(3), difference computational data point d iWith the abutment points average
Figure BDA00002321517300047
Between absolute error value
Figure BDA00002321517300048
Data point d iBe adjacent a little
Figure BDA00002321517300049
Between accumulated change amount AC i
(4), setting-up time sequence data abnormality detection threshold tau, the respectively more above-mentioned absolute error value that calculates
Figure BDA000023215173000410
Accumulated change amount AC iAnd the magnitude relationship between the threshold tau: if e i (k)τ or AC iτ, then mark d iBe abnormity point, otherwise, d kept iWherein, the value of threshold tau is comprised of two parts: the mean change amount on the period sequence and neighbor node variance.The former illustrated on the whole should period time series variation amount average level; The latter has illustrated present node d from the part iThe fluctuation situation of ambient data.Therefore, the size of threshold tau is that dynamic change calculates, and in the larger situation of observed reading fluctuation, threshold tau is also higher; In the less situation of observed reading fluctuation, threshold tau is lower.Seasonal effect in time series overall condition and local feature have been considered in the setting of threshold value, can dynamically update according to the fluctuation situation of neighbor node, have eliminated the harmful effect that predetermined threshold value is brought detection efficiency, thereby have improved the abnormality detection efficient of algorithm.
Described neighbor node
Figure BDA00002321517300051
Can be defined as monolateral neighbor node,
Figure BDA00002321517300052
Wherein 2k is data d iNeighbor node window width (from i-2k to i-1).When
Figure BDA00002321517300053
During for monolateral neighbor node, average
Figure BDA00002321517300054
Absolute error value
Figure BDA00002321517300055
And accumulated change amount AC iCan calculate respectively by following formula:
m i ( k ) = ( &Sigma; s = 1 k v i - s + &Sigma; t = 1 k v i + t ) / ( 2 k ) ;
e i ( k ) = | m i ( k ) - v i |
A C i = ( &Sigma; s = 1 k w s * | v i - v i - s | + &Sigma; t = 1 k w t &prime; * | v i - v i + t | ) / ( &Sigma; s = 1 k w s + &Sigma; t = 1 k w t &prime; ) , W in the formula k, w K-1... w 1, w ' 1, w ' 2... w ' kThe weight vectors of expression neighbor node.
Neighbor node
Figure BDA00002321517300059
Also can be defined as bilateral neighbor node,
Figure BDA000023215173000510
Wherein 2k is data d iThe neighbor node window width (from i-k to i+k, do not contain d iItself).Wherein, when
Figure BDA000023215173000511
During for bilateral neighbor node, average
Figure BDA000023215173000512
Absolute error value
Figure BDA000023215173000513
And accumulated change amount AC iCan calculate respectively by following formula:
m i ( k ) = ( &Sigma; j = 1 2 k v i - j ) / ( 2 k )
e i (k)=|m i (k)-v i|
W in the formula k, w K-1... w 1, w ' 1, w ' 2... w ' kThe weight vectors of expression neighbor node.
The invention also discloses a kind of time series data abnormal detector, comprise load module, abnormality detection module, output module, load module is used for providing abnormality detection required time series data collection, the abnormality detection module adopts aforesaid time series data method for detecting abnormality to carry out abnormality detection, and output module is according to the testing result output abnormality data set of abnormality detection module.Wherein, the abnormality detection module comprises data pre-processing assembly, computation module and analytic unit, the reception of data pre-processing assembly is carried out pre-service from data and process that the time series data load module collects, and pre-service is to select data point to be assessed and define its neighbor node collection
Figure BDA00002321517300061
Computation module is used for calculating through pretreated data
Figure BDA00002321517300062
And AC i, analytic unit is to result of calculation
Figure BDA00002321517300063
AC iCompare with given threshold tau, and whether belong to definite unusual according to comparative result discriminatory analysis data to be tested.
Embodiment 1
Seeing also shown in Figure 1ly, is a preferred embodiment of bilateral neighbor node time series data method for detecting abnormality.Its step is as follows:
Step S101: receive the time series data that gathers from data input module, such as average daily water level, the every daily fluctuation of stock price, resident's daily power consumption etc.If the data set that gathers is D=<d 1=(v 1, t 1), d 2=(v 2, t 2) ... d n=(v n, t n), d wherein i=(v i, t i) expression t iObserved reading v constantly i
Step S102: select data point d to be detected iAnd define its neighbor node η i (k), wherein, neighbor node η i (k)Be bilateral neighbor node.
&eta; i ( k ) = { d i - k , . . . d i - 1 , d i + 1 , . . . d i + k } - - - ( 1 )
Wherein 2k is data d iThe neighbor node window width (from i-k to i+k, do not contain d iItself).
Step S103: calculate data to be tested point d iNeighbor node bilateral with it
Figure BDA00002321517300065
Average
Figure BDA00002321517300066
Between absolute error value
Figure BDA00002321517300067
d iWith abutment points
Figure BDA00002321517300068
Accumulated change amount AC i, wherein And AC iCan pass through following formula (2), (3),
(4) calculate respectively.
m i ( k ) = ( &Sigma; s = 1 k v i - s + &Sigma; t = 1 k v i + t ) / ( 2 k ) ; - - - ( 2 )
e i ( k ) = | m i ( k ) - v i | - - - ( 3 )
A C i = ( &Sigma; s = 1 k w s * | v i - v i - s | + &Sigma; t = 1 k w t &prime; * | v i - v i + t | ) / ( &Sigma; s = 1 k w s + &Sigma; t = 1 k w t &prime; ) - - - ( 4 )
(4) w in the formula k, w K-1... w 1, w ' 1, w ' 2... w ' kThe weight vectors of expression abutment points, euclidean distance between node pair is nearer, and weight is larger.Because data point d in the bilateral equal value detection method iHave symmetry in abutting connection with window, be easy calculating, generally with weight vectors<w k, w K-1... w 1, w ' 1... w ' kAssignment is<1,2 ... k, k ... 2,1 〉.
Step S104: calculate according to step S103
Figure BDA000023215173000613
And AC i, and given threshold tau compares, and according to comparative result data point to be detected is carried out anomalous discrimination.If
Figure BDA000023215173000614
Or AC iτ, then carry out step S105, with data point d iBe labeled as abnormity point, and adopt suitable method that this abnormity point is processed; Otherwise, carry out step S106, with encumbrance strong point d iParticipating in subsequent analysis for normal data processes.
Step S107: whether abnormality detection is complete to judge all data on the data-oriented collection, if abnormality detection is not finished, step S109 will produce next data point d I+1And adopt method described in this implementation step to d I+1Carry out abnormality detection; Otherwise step S108 generates abnormal data set O and " totally " the data set D ' after abnormality detection is processed according to abnormality detection result.
In the bilateral equal value detection method of abnormal data, because right abutment points is the data that not yet detect, wherein may contain abnormity point; And only select left neighbours point can eliminate detected abnormity point, the testing result that will make abnormity point more accurately, more meaningful.Therefore, can use monolateral method for detecting abnormality to improve bilateral method for detecting abnormality.
Embodiment 2
Seeing also shown in Figure 2ly, is a preferred embodiment of monolateral neighbor node time series data method for detecting abnormality.Its step is as follows:
Step S201: receive the time series data that gathers from data input module, such as average daily water level, the every daily fluctuation of stock price, resident's daily power consumption etc.If institute's image data integrates as D=<d 1=(v 1, t 1), d 2=(v 2, t 2) ... d n=(v n, t n), d wherein i=(v i, t i) expression t iObserved reading v constantly i
Step S202: select data point d to be detected iAnd define its neighbor node
Figure BDA00002321517300071
Wherein, neighbor node
Figure BDA00002321517300072
Be monolateral neighbor node.
&eta; i ( k ) = { d i - 2 k , d i - 2 k + 1 , . . . d i - 1 } - - - ( 5 )
Wherein 2k is data d iNeighbor node window width (from i-2k to i-1).
Step S203: calculate data to be tested point d iNeighbor node monolateral with it
Figure BDA00002321517300074
Average
Figure BDA00002321517300075
Between absolute error value
Figure BDA00002321517300076
d iWith abutment points
Figure BDA00002321517300077
Accumulated change amount AC i, wherein
Figure BDA00002321517300078
And AC iCan calculate respectively by following formula (6), (7), (8).
m i ( k ) = ( &Sigma; j = 1 2 k v i - j ) / ( 2 k ) - - - ( 6 )
e i ( k ) = | m i ( k ) - v i | - - - ( 7 )
A C i = ( &Sigma; j = 1 2 k w j * | v i - v i - j | ) / ( &Sigma; j = 1 2 k w j ) - - - ( 8 )
Abutment points is without symmetry, generally with weight vectors<w in the described monolateral detection algorithm 1, w 2... w 2kAssignment is<2k, and 2k-1 ... 1 〉.
Step S204: calculate according to step S203
Figure BDA000023215173000712
And AC i, and given threshold tau compares, and according to comparative result data point to be detected is carried out anomalous discrimination.If
Figure BDA000023215173000713
Or AC iτ, then carry out step S205, with data point d iBe labeled as abnormity point, and adopt suitable method that this abnormity point is processed; Otherwise, carry out step S206, with encumbrance strong point d iParticipating in subsequent analysis for normal data processes.
Step S207: whether abnormality detection is complete to judge all data on the data-oriented collection, if abnormality detection is not finished, carries out step S209, will produce next data point d I+1And adopt method described in this implementation step to d I+1Carry out abnormality detection; Otherwise, carry out step S208, generate abnormal data set O and " totally " the data set D ' after abnormality detection is processed according to abnormality detection result.
See also shown in Figure 3ly, corresponding to above-mentioned Outlier Detection Algorithm, Fig. 3 provides a kind of time series abnormal detector 300 that designs among the present invention, and this device comprises load module 301, abnormality detection module 302 and the unusual output module 303 of time series data.Load module 301 is time series data collection of a typical pending abnormality detection, such as every daily fluctuation of stock market, power consumption situation, hydrometric station water level etc. day by day in the unit interval.Load module 301 is prepared the time series data collection of pending abnormality detection and is passed to abnormality detection module 302.The core of abnormal detector embodiment is abnormality detection module 302, abnormality detection module 302 is utilized the time series data that collects from load module 301, adopt the time series data method for detecting abnormality among the present invention to carry out abnormality detection, and testing result is exported displaying by output module 506.The data set that this module time of reception sequence data load module 301 gathers also carries out pre-service, calculating and discriminatory analysis, to determine whether data to be tested belong to unusual.The unusual output module 303 of time series is used for the abnormal data that output abnormality detection module 302 detects.Abnormality detection module 302 comprises data pre-processing assembly 304, computation module 305 and analytic unit 306.304 receptions of data pre-processing assembly are carried out pre-service from data and process that load module 301 collects, and pre-service mainly is to select data point to be assessed and define its neighbours' contact Ji Linjujiedianji
Figure BDA00002321517300081
Bilateral Outlier Detection Algorithm and monolateral Outlier Detection Algorithm
Figure BDA00002321517300082
Can obtain according to formula (1), (5) respectively.The 305 pairs of pretreated the data of process algorithms of the present invention of computation module calculate
Figure BDA00002321517300083
And AC i, computing method are referring to formula (2)-(4), (6)-(8).306 pairs of result of calculations of analytic unit
Figure BDA00002321517300084
AC iCompare with given threshold tau, and whether belong to definite unusual according to comparative result discriminatory analysis data to be tested.
Result verification
Realization resembled when hydrology phenomenon was, this change procedure is called hydrologic process.Hydrographic data is the discrete record to hydrologic process, hydrographic data is divided into various types of Hydrological Time Series by the physical quantity of its description, wherein comparatively common physical quantity has: flow, water level, rainfall amount, the validity of time series outlier detection method among the average daily ordinary water level data test the present invention in 1993 of Taihu Lake discharge site is selected in this test such as evaporation capacity.Algorithm makes k since 3 for initial value, take 2 for step-length begins to increase, and calculating have a few and its neighbor node
Figure BDA00002321517300085
Average
Figure BDA00002321517300086
Between absolute error value
Figure BDA00002321517300087
d iWith abutment points
Figure BDA00002321517300088
Accumulated change amount AC iFig. 4 is the data distribution situation synoptic diagram of abnormality detection in the time series data.Wherein circle mark is abnormal data.Can clearly be seen that, utilize time series data method for detecting abnormality of the present invention can effectively detect unusual data point.

Claims (9)

1. time series data method for detecting abnormality, setting-up time sequence D={ d 1=(v 1, t 1), d 2=(v 2, t 2) ... d n=(v n, t n), time series data d i=(v i, t i) expression t iObserved reading v constantly i, it is characterized in that, may further comprise the steps:
(1), data point d in the definition time sequence iNeighbor node Wherein, k is data point d iThe neighbor node window width;
(2), computational data point d iNeighbor node Average
Figure FDA00002321517200013
(3), difference computational data point d iWith the abutment points average
Figure FDA00002321517200014
Between absolute error value
Figure FDA00002321517200015
Data point d iBe adjacent a little
Figure FDA00002321517200016
Between accumulated change amount AC i
(4), setting-up time sequence data abnormality detection threshold tau, the respectively more above-mentioned absolute error value that calculates
Figure FDA00002321517200017
Accumulated change amount AC iAnd the magnitude relationship between the threshold tau: if e i (k)τ or AC iτ, then mark d iBe abnormity point, otherwise, d kept i
2. time series data method for detecting abnormality as claimed in claim 1 is characterized in that, described k value is k={3,5 ..., 31}.
3. time series data method for detecting abnormality as claimed in claim 1 is characterized in that, the value of described threshold tau is comprised of two parts: the mean change amount on the period sequence and neighbor node variance.
4. time series data method for detecting abnormality as claimed in claim 1 is characterized in that, described neighbor node Be monolateral neighbor node,
Figure FDA00002321517200019
Wherein, 2k is data d iNeighbor node window width (from i-2k to i-1).
5. time series data method for detecting abnormality as claimed in claim 4 is characterized in that, described average Absolute error value And accumulated change amount AC iCan calculate respectively by following formula:
m i ( k ) = ( &Sigma; s = 1 k v i - s + &Sigma; t = 1 k v i + t ) / ( 2 k ) ;
e i ( k ) = | m i ( k ) - v i |
A C i = ( &Sigma; s = 1 k w s * | v i - v i - s | + &Sigma; t = 1 k w t &prime; * | v i - v i + t | ) / ( &Sigma; s = 1 k w s + &Sigma; t = 1 k w t &prime; ) , W in the formula k, w K-1... w 1, w ' 1, w ' 2... w ' kThe weight vectors of expression neighbor node.
6. time series data method for detecting abnormality as claimed in claim 1 is characterized in that, described neighbor node
Figure FDA000023215172000115
Be bilateral neighbor node,
Figure FDA000023215172000116
Wherein 2k is data d iThe neighbor node window width (from i-k to i+k, do not contain d iItself).
7. time series data method for detecting abnormality as claimed in claim 6 is characterized in that data point d iBilateral neighbor node
Figure FDA00002321517200021
Average
Figure FDA00002321517200022
Absolute error value
Figure FDA00002321517200023
And accumulated change amount AC iCan calculate respectively by following formula:
m i ( k ) = ( &Sigma; j = 1 2 k v i - j ) / ( 2 k )
e i ( k ) = | m i ( k ) - v i |
A C i = ( &Sigma; j = 1 2 k w j * | v i - v i - j | ) / ( &Sigma; j = 1 2 k w j ) , W in the formula k, w K-1... w 1, w ' 1, w ' 2... w ' kThe weight vectors of expression neighbor node.
8. time series data abnormal detector, it is characterized in that, comprise load module, abnormality detection module, output module, described load module is used for providing abnormality detection required time series data collection, described abnormality detection module adopts carries out abnormality detection such as each described time series data method for detecting abnormality of claim 1-7, and described output module is according to the testing result output abnormality data set of described abnormality detection module.
9. time series data abnormal detector as claimed in claim 8, it is characterized in that, described abnormality detection module comprises data pre-processing assembly, computation module and analytic unit, described data pre-processing assembly reception is carried out pre-service from data and process that the time series data load module collects, and described pre-service is to select data point to be assessed and define its neighbor node collection
Figure FDA00002321517200027
Described computation module is used for calculating through pretreated data
Figure FDA00002321517200028
And AC i, described analytic unit is to result of calculation
Figure FDA00002321517200029
AC iCompare with given threshold tau, and whether belong to definite unusual according to comparative result discriminatory analysis data to be tested.
CN2012104211430A 2012-10-29 2012-10-29 Time series data abnormity detection method and device Pending CN102945320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104211430A CN102945320A (en) 2012-10-29 2012-10-29 Time series data abnormity detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104211430A CN102945320A (en) 2012-10-29 2012-10-29 Time series data abnormity detection method and device

Publications (1)

Publication Number Publication Date
CN102945320A true CN102945320A (en) 2013-02-27

Family

ID=47728263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104211430A Pending CN102945320A (en) 2012-10-29 2012-10-29 Time series data abnormity detection method and device

Country Status (1)

Country Link
CN (1) CN102945320A (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123448A (en) * 2014-07-14 2014-10-29 南京理工大学 Multi-data-stream anomaly detection method based on context
CN104915846A (en) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 Electronic commerce time sequence data anomaly detection method and system
CN105335592A (en) * 2014-06-25 2016-02-17 国际商业机器公司 Method and equipment for generating data in missing section of time data sequence
CN105653129A (en) * 2015-12-29 2016-06-08 江苏飞尚安全监测咨询有限公司 Classic algorithm based real-time signal discrimination and correction method
CN106104496A (en) * 2014-03-18 2016-11-09 微软技术许可有限责任公司 The abnormality detection not being subjected to supervision for arbitrary sequence
CN106156470A (en) * 2015-04-16 2016-11-23 腾讯科技(深圳)有限公司 A kind of time series abnormality detection mask method and system
CN106295683A (en) * 2016-08-01 2017-01-04 上海理工大学 A kind of outlier detection method of time series data based on sharpness
CN106548035A (en) * 2016-11-24 2017-03-29 腾讯科技(深圳)有限公司 A kind of diagnostic method and device of data exception
CN106951680A (en) * 2017-02-21 2017-07-14 河海大学 A kind of Hydrological Time Series abnormal patterns detection method
CN106971058A (en) * 2017-02-21 2017-07-21 河海大学 A kind of pumping station operation monitoring data abnormal point detecting method
CN107346301A (en) * 2016-12-02 2017-11-14 西交利物浦大学 Water quality monitoring noise data real-time detection method based on Double time window checking
CN103648111B (en) * 2013-12-12 2017-12-05 京信通信系统(中国)有限公司 Disturb the method and system of alarm
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
WO2018086025A1 (en) * 2016-11-10 2018-05-17 Nokia Technologies Oy Node identification in distributed adaptive networks
CN108205432A (en) * 2016-12-16 2018-06-26 中国航天科工飞航技术研究院 A kind of real-time eliminating method of observation experiment data outliers
CN109063947A (en) * 2018-06-11 2018-12-21 阿里巴巴集团控股有限公司 A kind of abnormality recognition method of time series, device and service server
CN109189827A (en) * 2018-08-16 2019-01-11 阿里巴巴集团控股有限公司 Time Series Processing method and apparatus, electronic equipment
CN109800858A (en) * 2018-12-21 2019-05-24 东软集团股份有限公司 Data exception detection method, device, readable storage medium storing program for executing and electronic equipment
CN109861857A (en) * 2019-01-28 2019-06-07 网联清算有限公司 Fault detection method and device
CN110166464A (en) * 2019-05-27 2019-08-23 北京信息科技大学 A kind of detection method and system of content center network interest extensive aggression
CN110312225A (en) * 2019-07-30 2019-10-08 平顶山学院 A kind of wireless sensor hardware device
CN110474862A (en) * 2018-05-10 2019-11-19 中移(苏州)软件技术有限公司 A kind of network flow abnormal detecting method and device
CN110631650A (en) * 2019-09-23 2019-12-31 杭州鸿泉物联网技术股份有限公司 Data cleaning method and device based on time series data self-increment characteristics
CN111191676A (en) * 2019-12-09 2020-05-22 国网辽宁省电力有限公司电力科学研究院 Power consumption data trend anomaly analysis method based on backtraceable dynamic window model
CN111984827A (en) * 2019-05-24 2020-11-24 上海东方富联科技有限公司 Door sensor data anomaly detection method and system, storage medium and terminal
CN112461340A (en) * 2020-12-03 2021-03-09 上海普适导航科技股份有限公司 Fault correcting and detecting method and device for water level meter
CN114822040A (en) * 2022-06-23 2022-07-29 南京城建隧桥智慧管理有限公司 Good neighbor set construction method for assisting mobile node position anomaly detection
CN115001853A (en) * 2022-07-18 2022-09-02 山东云天安全技术有限公司 Abnormal data identification method and device, storage medium and computer equipment
CN117540325A (en) * 2024-01-05 2024-02-09 杭银消费金融股份有限公司 Business database anomaly detection method and system based on data variation capture
CN117612694A (en) * 2023-12-04 2024-02-27 西安好博士医疗科技有限公司 Data recognition method and system for thermal therapy machine based on data feedback

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
周大镯 等: "《时间序列异常检测》", 《计算机工程与应用》 *
杨正一: "《误差理论与测量不确定度》", 30 April 2000, 石油工业出版社 *
林森: "《时间序列异常检测的研究与应用》", 《万方学位论文全文数据库》 *
裴丽鹊: "《一种基于滑动窗口的时间序列异常检测算法》", 《巢湖学院学报》 *
詹艳艳 等: "《时间序列异常模式的k-均距异常因子检测》", 《计算机工程与应用》 *

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103648111B (en) * 2013-12-12 2017-12-05 京信通信系统(中国)有限公司 Disturb the method and system of alarm
CN106104496A (en) * 2014-03-18 2016-11-09 微软技术许可有限责任公司 The abnormality detection not being subjected to supervision for arbitrary sequence
CN106104496B (en) * 2014-03-18 2019-07-30 微软技术许可有限责任公司 The abnormality detection not being subjected to supervision for arbitrary sequence
CN105335592A (en) * 2014-06-25 2016-02-17 国际商业机器公司 Method and equipment for generating data in missing section of time data sequence
CN104123448B (en) * 2014-07-14 2017-05-17 南京理工大学 Multi-data-stream anomaly detection method based on context
CN104123448A (en) * 2014-07-14 2014-10-29 南京理工大学 Multi-data-stream anomaly detection method based on context
CN106156470A (en) * 2015-04-16 2016-11-23 腾讯科技(深圳)有限公司 A kind of time series abnormality detection mask method and system
CN106156470B (en) * 2015-04-16 2020-10-23 腾讯科技(深圳)有限公司 Time series abnormity detection and labeling method and system
CN104915846A (en) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 Electronic commerce time sequence data anomaly detection method and system
CN105653129A (en) * 2015-12-29 2016-06-08 江苏飞尚安全监测咨询有限公司 Classic algorithm based real-time signal discrimination and correction method
CN106295683A (en) * 2016-08-01 2017-01-04 上海理工大学 A kind of outlier detection method of time series data based on sharpness
WO2018086025A1 (en) * 2016-11-10 2018-05-17 Nokia Technologies Oy Node identification in distributed adaptive networks
CN106548035A (en) * 2016-11-24 2017-03-29 腾讯科技(深圳)有限公司 A kind of diagnostic method and device of data exception
CN106548035B (en) * 2016-11-24 2019-08-06 腾讯科技(深圳)有限公司 A kind of diagnostic method and device of data exception
CN107346301A (en) * 2016-12-02 2017-11-14 西交利物浦大学 Water quality monitoring noise data real-time detection method based on Double time window checking
CN107346301B (en) * 2016-12-02 2020-09-04 西交利物浦大学 Water quality monitoring noise data real-time detection method based on double-time-window verification
CN108205432A (en) * 2016-12-16 2018-06-26 中国航天科工飞航技术研究院 A kind of real-time eliminating method of observation experiment data outliers
CN108205432B (en) * 2016-12-16 2020-08-21 中国航天科工飞航技术研究院 Real-time elimination method for observation experiment data abnormal value
CN106951680A (en) * 2017-02-21 2017-07-14 河海大学 A kind of Hydrological Time Series abnormal patterns detection method
CN106971058A (en) * 2017-02-21 2017-07-21 河海大学 A kind of pumping station operation monitoring data abnormal point detecting method
CN107528722B (en) * 2017-07-06 2020-10-23 创新先进技术有限公司 Method and device for detecting abnormal point in time sequence
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
CN110474862B (en) * 2018-05-10 2021-08-13 中移(苏州)软件技术有限公司 Network traffic anomaly detection method and device
CN110474862A (en) * 2018-05-10 2019-11-19 中移(苏州)软件技术有限公司 A kind of network flow abnormal detecting method and device
CN109063947A (en) * 2018-06-11 2018-12-21 阿里巴巴集团控股有限公司 A kind of abnormality recognition method of time series, device and service server
CN109189827B (en) * 2018-08-16 2022-04-15 创新先进技术有限公司 Time sequence processing method and device and electronic equipment
CN109189827A (en) * 2018-08-16 2019-01-11 阿里巴巴集团控股有限公司 Time Series Processing method and apparatus, electronic equipment
CN109800858B (en) * 2018-12-21 2021-03-05 东软集团股份有限公司 Application system abnormality detection method and device, readable storage medium and electronic equipment
CN109800858A (en) * 2018-12-21 2019-05-24 东软集团股份有限公司 Data exception detection method, device, readable storage medium storing program for executing and electronic equipment
CN109861857A (en) * 2019-01-28 2019-06-07 网联清算有限公司 Fault detection method and device
CN111984827A (en) * 2019-05-24 2020-11-24 上海东方富联科技有限公司 Door sensor data anomaly detection method and system, storage medium and terminal
CN110166464A (en) * 2019-05-27 2019-08-23 北京信息科技大学 A kind of detection method and system of content center network interest extensive aggression
CN110166464B (en) * 2019-05-27 2021-10-15 北京信息科技大学 Method and system for detecting content-centric network interest flooding attack
CN110312225A (en) * 2019-07-30 2019-10-08 平顶山学院 A kind of wireless sensor hardware device
CN110631650A (en) * 2019-09-23 2019-12-31 杭州鸿泉物联网技术股份有限公司 Data cleaning method and device based on time series data self-increment characteristics
CN111191676A (en) * 2019-12-09 2020-05-22 国网辽宁省电力有限公司电力科学研究院 Power consumption data trend anomaly analysis method based on backtraceable dynamic window model
CN112461340A (en) * 2020-12-03 2021-03-09 上海普适导航科技股份有限公司 Fault correcting and detecting method and device for water level meter
CN112461340B (en) * 2020-12-03 2024-06-28 上海普适导航科技股份有限公司 Fault correcting and detecting method and device for water level meter
CN114822040B (en) * 2022-06-23 2022-11-11 南京城建隧桥智慧管理有限公司 Good neighbor set construction method for assisting mobile node position anomaly detection
CN114822040A (en) * 2022-06-23 2022-07-29 南京城建隧桥智慧管理有限公司 Good neighbor set construction method for assisting mobile node position anomaly detection
CN115001853A (en) * 2022-07-18 2022-09-02 山东云天安全技术有限公司 Abnormal data identification method and device, storage medium and computer equipment
CN115001853B (en) * 2022-07-18 2022-11-04 山东云天安全技术有限公司 Abnormal data identification method and device, storage medium and computer equipment
CN117612694A (en) * 2023-12-04 2024-02-27 西安好博士医疗科技有限公司 Data recognition method and system for thermal therapy machine based on data feedback
CN117540325A (en) * 2024-01-05 2024-02-09 杭银消费金融股份有限公司 Business database anomaly detection method and system based on data variation capture
CN117540325B (en) * 2024-01-05 2024-04-26 杭银消费金融股份有限公司 Business database anomaly detection method and system based on data variation capture

Similar Documents

Publication Publication Date Title
CN102945320A (en) Time series data abnormity detection method and device
Corizzo et al. Anomaly detection and repair for accurate predictions in geo-distributed big data
Ezzat et al. Spatio-temporal short-term wind forecast: A calibrated regime-switching method
Kapucu et al. A supervised ensemble learning method for fault diagnosis in photovoltaic strings
Zhao et al. Hierarchical anomaly detection and multimodal classification in large-scale photovoltaic systems
CN112381181B (en) Dynamic detection method for building energy consumption abnormity
Madan et al. Analysis of weather prediction using machine learning & big data
CN114399081A (en) Photovoltaic power generation power prediction method based on weather classification
Vamsi et al. Machine learning based hybrid model for fault detection in wireless sensors data
Qu et al. An unsupervised hourly weather status pattern recognition and blending fitting model for PV system fault detection
Jin et al. Bayesian hierarchical model for change point detection in multivariate sequences
CN105722129A (en) Wireless sensing network event detection method and system based on FSAX-MARKOV model
Tangrand Some new contributions to neural networks and wavelets with applications
CN115115137B (en) Photovoltaic power prediction method and device based on matching statistical wavelets
Hassan et al. A heuristic approach for sensor network outlier detection
CN111221479B (en) Method, system and storage medium for judging abnormal storage capacity variation
CN114971062A (en) Photovoltaic power prediction method and device
Huang et al. Application-driven sensing data reconstruction and selection based on correlation mining and dynamic feedback
Zhang et al. An estimation maximization based approach for finding reliable sensors in environmental sensing
Shapsough et al. Using siamese networks to detect shading on the edge of solar farms
Zhao et al. Detection of impending ramp for improved wind farm power forecasting
Achouri et al. Gaussian processes for efficient photovoltaic power prediction
Piotrowski et al. Baczy nski, D.; Robak, S.; Gulczy nski, T. Hybrid and Ensemble Methods of Two Days Ahead Forecasts of Electric Energy Production in a Small Wind Turbine. Energies 2021, 14, 1225
Souza et al. A Log-Logistic Predictor for Power Generation in Photovoltaic Systems. Energies 2022, 15, 5973
Gupta et al. Energy anomaly detection and modelling on smart premises using sdar

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130227