CN110569890A - Hydrological data abnormal mode detection method based on similarity measurement - Google Patents
Hydrological data abnormal mode detection method based on similarity measurement Download PDFInfo
- Publication number
- CN110569890A CN110569890A CN201910784182.9A CN201910784182A CN110569890A CN 110569890 A CN110569890 A CN 110569890A CN 201910784182 A CN201910784182 A CN 201910784182A CN 110569890 A CN110569890 A CN 110569890A
- Authority
- CN
- China
- Prior art keywords
- mode
- meta
- sequence
- similarity
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The invention discloses a method for detecting abnormal patterns of hydrological data based on similarity measurement, which is characterized in that KPRA-PLR algorithm is expressed on the basis of linear segmentation of key points, the hydrological data is cut according to the definition of the key points, each subsequence is subjected to straight line fitting through the PLR algorithm, and the slope a of the straight line is usediAnd a time interval deltatRepresents the subsequence; each subsequence after segmentation is called as a meta-mode, adjacent meta-modes are combined to obtain a sequence mode, a weighted distance and an SDTW algorithm are respectively used for similarity measurement of the meta-modes and the sequence modes, and then an abnormal score of each sequence mode is calculated, namely the reciprocal of the average distance between the mode and other modes; the abnormal score is the sequence pattern SxAnd calculating a local abnormal factor LOF according to a k-neighbor local detection principle. The abnormal mode detected by the method for measuring the similarity is more accurate, and the data analysis is performedA new technology is provided for detecting abnormal patterns of hydrological data.
Description
Technical Field
the invention belongs to a hydrological detection technology, and particularly relates to a hydrological data abnormal mode detection method based on similarity measurement.
Background
The hydrologic time series is an observed value of a series of physical quantities (water level, flow rate, rainfall, etc.) acquired by an observation system in time series. Hydrologic time series is a common complex data type that objectively records observation information that is obtained by an observation system in time order. In China, with the development of water conservancy informatization, a hydrological time sequence can be transmitted to a water conservancy information sharing platform and is processed by workers to be stored in a national water library. Due to the measuring error of the acquisition equipment, the manual operation error, the evolution rule change of hydrology itself and other influence factors, some abnormal data exist in the hydrology time sequence.
Anomaly Detection (Anomaly Detection) mainly aims at mining out unusual data hidden in a large data set, namely potentially meaningful information with obvious differences from other data. The method and the technology for detecting abnormal data in the hydrologic time series are still in an exploration stage, the industry has no perfect abnormal detection scheme, and the conventional method for detecting abnormal points of the hydrologic time series comprises a Box-Plot method and a Benford rule.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems of low precision and low detection speed of abnormal conditions in the prior art, the invention provides a method for detecting abnormal patterns of hydrological data based on similarity measurement, which provides a theoretical basis for detecting abnormal patterns of hydrological data.
the technical scheme is as follows: a hydrologic data abnormal pattern detection method based on similarity measurement, the method is based on a KPRA-PLR algorithm of a segmented linear representation method of key points, a time series pattern similarity measurement method and a local abnormal pattern detection algorithm based on k-nearest neighbors, and comprises the following steps:
(1) A key point-based piecewise linear representation KPRA-PLR algorithm: firstly, segmenting an original hydrological time sequence according to the definition of key points to obtain each segment of subsequence, then, carrying out straight line fitting on each segment of subsequence according to a PLR algorithm to obtain the slope of a straight line, solving the intersection point between two adjacent straight lines to obtain the initial time t of each subsequenceilAnd a termination time tirBy (a)i,til,tir) It is also called meta-mode to indicate each sub-sequence;
(2) The method for measuring the similarity of time series patterns comprises the following steps: combining adjacent meta-patterns into a sequence pattern; when the similarity of the two modes is measured, establishing a similarity distance measurement function between the two modes, representing the similarity between the two modes by using the distance, wherein the average similarity of each mode represents the abnormal score of the mode;
(3) k-nearest neighbor local anomaly detection algorithm: and expressing the k-neighbor distance of the pattern by using the abnormal score value to obtain the local reachable distance, the local reachable density and the local abnormal factor of the k-neighbor, wherein the largest abnormal factor is the abnormal pattern.
Further, the step (1) comprises the following processes:
(11) Defining the key point as S ═ S1,s2,...,smAll local extreme points of the time series X are defined as extreme points skSatisfy | sk-sk-1< ε and | tk-tk-1When | is less than δ, where ε and δ are given constants, tkThe observation time of each hydrological physical quantity is designated as skThe non-important local extreme points are the key points, and other extreme points in the set S are the key points;
(12) then according to a linear piecewise expression algorithm, fitting a straight line by using a least square method through two adjacent key points to obtain a slope corresponding to the straight line, solving an intersection point between the two adjacent straight lines to obtain the starting time and the termination time t of each subsequenceilAnd tir(ii) a The pattern of each subsequence is denoted as (a)i,til,tir) Called meta-mode;
(13) And determining constant epsilon and delta values suitable for the current data according to the form coincidence degree of the obtained fitting straight line and the original curve by adjusting parameters.
In the method for measuring similarity of time series patterns in step (2), similarity distance measurement functions are respectively established for meta-patterns and series patterns, which are as follows:
(21) for the similarity degree between two meta-modes, a weighted distance method is adopted, and the mode sequence S of the time sequence X is obtained by a PLR algorithmx={M1,M2,M3,...,Mn},SxIn which two meta-modes M are arbitrarily selectedi=(ai,Δti),Mj=(aj,Δtj),(i,j=1,2,3,...,n),
Recording as follows: dwei(Mi,Mj)=β·|ai-aj|+(1-β)·|Δti-Δtj|
In the formula, airepresenting the slope of the fitted line, Δ tjrepresenting the time interval of the adjacent j points, Mi representing the meta-mode, Dwei(Mi,Mj) Is a meta-mode MiAnd MjBeta is more than or equal to 0 and less than or equal to 1;
(22) Combining every two adjacent meta-modes to obtain a sequence mode, wherein the similarity measurement of the sequence mode is to measure the similarity of the sequence mode by using a dynamic time warping distance method on the basis of the similarity of the meta-modes;
Time series X ═ X1,x2,x3,...,xn) And the time series Y ═ Y1,y2,...,ym) The mode sequences obtained by performing dimension reduction and linear representation through the KPRA-PLR algorithm are respectively as follows:
Sx={M1,M2,M3,...,Mp},
Sy={N1,N2,N3,...,Nt},
Wherein (i ═ 1,2, 3.., p; j ═ 1,2, 3.., t), Mi=(ai,Δti) Is SxThe meta-mode of (a) is,
Nj=(aj,Δtj) Is SyMeta-schema of (2); the calculation formula of the distance matrix between the two pattern sequences is established as follows:
DM=(aij)p×t
In the formula, aij=Dwei(Mi,Nj) Represented as a pattern sequence SxThe ith meta-mode Miand Syof the jth meta-mode Njthe weighted distance of (2). a isijthe size of the value represents meta-mode Mi,NjDegree of similarity between them, when meta-mode Mi,Njthe more similar therebetween, aijThe closer the value is to 0; when meta mode Mi,NjThe greater the difference between, aijThe larger the value;
(23) Similar to the dynamic time warping distance between time sequences, in the distance matrix DM ═ aij)p×tSearching for the best curved path W ═ W1,w2,..,wkand calculating a curved path having a minimum cumulative distance, this minimum being called the pattern sequence SxAnd SyIs abbreviated as Ddtw(Sx,Sy) The following formula is obtained:
in the formula, w is the distance between two meta-patterns.
in the local k-neighbor anomaly detection algorithm in the step (3), the anomaly score value is equal to the k-neighbor distance of the pattern, and the specific calculation process is as follows:
Assuming k is a positive integer, the k-nearest neighbor reachable distance between instance x and instance t is represented by the following equation:
rdist(x,t)=max{kdistance(t),d(x,t)}
The local reachable density of example x is defined as the inverse of the average reachable distance of the k nearest neighbors of example x, and is formulated as follows:
In the data point set D, the LOF (x) of the local anomaly coefficient of the instance x can be defined as:
Wherein an anomaly score, r, for each sequence pattern is calculated according to a DTW algorithmdist(x,t)For each sequence pattern's anomaly score, instance x represents a sequenceThe column pattern anomaly score, the local reachable density is calculated according to the formula, lof represents the anomaly factor.
furthermore, for the singular point problem existing in the DTW method, based on the improved method of the space dimension and the time dimension, the spatial dimension (numerical value) and the time dimension (gradient) are weighted and summed in a self-adaptive weighting manner without introducing an additional weighting coefficient, so as to calculate the minimum distance of the accumulation path.
Furthermore, the singular point problem in the DTW method includes a method based on the SDTW algorithm as a time series similarity measure, and the specific process is as follows:
defining feature Fs(x (i)) is:
Wherein max (| Δ x |) represents the maximum gradient at all time points in the time sequence, and the gradient at a certain point is represented by the difference between adjacent points, such as x (i) -x (i-1); max (Deltax) has the function of constraining the gradient size of each point in the time sequence to [ -1,1], so that the time dimension and the space dimension are combined in a ratio mode to obtain an expression of the distance d (i, j) between elements in the matrix, wherein the expression is as follows:
d(i,j)=(Fs(x(i))-Fs(y(j)))2。
Has the advantages that: the invention discloses a hydrologic data abnormal mode detection scheme, which realizes that a KPRA-PLR algorithm divides an original sequence, a weighted distance is used for measuring the similarity of a meta-mode, an improved DTW algorithm is used for measuring the similarity of a sequence mode, a k-nearest neighbor local abnormal detection algorithm is used for an abnormal mode detection principle based on the similarity measurement, and compared with a commonly used abnormal mode detection method, namely a symbolized SAX method, fixed segmentation and the number of fixed mode categories are not needed.
drawings
FIG. 1 is a schematic view showing the overall structure of the process of the present invention;
FIG. 2 is a schematic diagram of a KPRA-PLR algorithm fitting line in the present invention;
FIG. 3 is a schematic diagram illustrating the flow variations of the embodiments 185-195;
FIG. 4 is a schematic diagram illustrating the flow rate variation in modes 282-287 of an embodiment;
FIG. 5 is a schematic diagram illustrating the flow variations in modes 290-302 according to an embodiment;
FIG. 6 is a schematic diagram illustrating the flow rate changes in the modes 363-370 in the embodiment.
Detailed Description
For the purpose of illustrating the technical solutions disclosed in the present invention in detail, the following description is further provided with reference to the accompanying drawings and specific embodiments.
The invention relates to a method for detecting abnormal patterns of hydrologic data based on similarity measurement, which comprises three modules, namely a KPRA-PLR algorithm based on the linear segmented representation of key points, a time sequence pattern similarity measurement method and a K-nearest neighbor principle, and is shown in figure 1. In the embodiment, the hourly flow (unit is m) of flood season of the gantry hydrological measuring station is selected for hydrological data selection3And/s) data, and carrying out experiments by adopting actual measurement flood season hour flow data of the observation station 2000/6/16: 00: 00-2015/9/3017: 39: 00.
Firstly, a KPRA-PLR algorithm is expressed in a linear segmentation mode based on key points, the same threshold values epsilon and delta are adopted for dimensionality reduction of the two kinds of hydrological data according to a formula defined by the key points, the key points of a historical hydrological time sequence (flood season data from 2000 to 2015) and a real-time hydrological time sequence (flood season data from 2016 to 2017) are obtained respectively, and statistics are carried out: the real-time hydrological data obtains 189 key points, and the historical hydrological data obtains 2844 key points. The key points are expressed according to a [ t, i ] format, wherein t represents a time series corner mark, and i represents the observed value size of the time series corresponding to the moment. The key points obtained by intercepting part of the real-time hydrological time are put into a table for display, as shown in table 1 below.
TABLE 1 Key points demonstration
[1,159] | [4,258] | [8,168] | [11,244] | [12,232] | [14,220] |
[29,188] | [32,184] | [35,216] | [38,183] | [43,256] | [46,214] |
[50,205] | [52,276] | [61,173] | [64,406] | [65,344] | [67,537] |
[73,330] | [75,421] | [89,142] | [96,164] | [105,141] | [111,160] |
[120,387] | [121,511], | [122,475] | [128,615] | [131,960] | [142,273] |
[148,768] | [149,747] | [150,780] | [151,705] | [155,556] | [165,147] |
[174,396] | [177,651] | [180,476] | [181,554] | [184,331] | [190,234] |
[194,1160] | [195,1300], | [196,1150] | [197,125] | [198,122] | [89,142] |
… | … | … | … | … | … |
[605,296] | [606,266] | [610,185] | [612,181] | [615,451] | [618,360] |
[621,580] | [624,308] | [627,525] | [631,388] | [635,808] | [639,490] |
[640,514] | [643,332] | [645,377] | [649,232] | [651,226] | [653,215] |
[654,266] | [657,184] | [658,201] | [664,480] | [666,301] | [668,426] |
[670,301] | [672,499] | [675,306] | [677,322] | [678,310] | [680,322] |
And segmenting the original time sequence through the key points to obtain a subsequence, and performing mode representation on the subsequence by using a PLR algorithm to obtain a single element mode. According to definition, adjacent meta-modes are combined to obtain sequence modes, different combination modes are adopted, the obtained sequence modes are different, the sequence modes are obtained by combining every two adjacent meta-modes in the chapter, and M is usedi=(ai,Δti) Is shown in the specification, wherein airepresents the slope of a straight line, represents the trend of the curve, atithe length of the straight line is represented, and the length of the curve is represented.
According to statistics: 188 meta-modes are obtained by the real-time hydrologic sequence, and 94 meta-modes form a sequence mode by two adjacent meta-modes; the historical hydrologic sequences total 2844 meta-patterns, and 1422 sequence patterns are formed by two adjacent meta-patterns. The resulting sequence patterns for the real-time data are shown in part in table 2 below.
TABLE 2 partial sequence mode Table
[33.0,3] | [-22.5,4] | [25.33,3] | [-12.0,1] | [-6.0,2] | [-1.2,5] |
[-2.6,10] | [-1.32,3] | [10.67,3] | [-11.0,3] | [14.6,5] | [-14.0,3] |
[15.0,2] | [-19.5,2] | [35.5,2] | [-11.43,9] | [77.67,3] | [-62.0,1] |
[96.5,2] | [-43.25,4], | [-17.0,2 | [45.5,2] | [-19.9,14] | [3.14,7] |
[-2.54,9] | [3.17,6] | [42.25,8] | [-111.0,1] | [124.0,1] | [-36.0,1] |
[23.33,6] | [115.0,3] | [-62.44,11] | [82.5,6] | [-21.0,1] | [33.0,1] |
[-75.0,1] | [-37.25,4] | [91.4,10] | [-119.3,9] | [85.0,3] | [-58.3,3] |
[78.0,1] | [-74.32,3] | [334.83,6] | [-295.0,4] | [140.0,1] | [-150,1] |
[100.0,1] | [-30.0,1] | [13.33,3] | [75.0,2] | [-85.5,6] | [32.0,1] |
… | … | … | … | … | … |
[-15.0,4] | [20.13,8] | [-18.0,6] | [18.5,4] | [-4.0,4] | [-23.3,3] |
[15.0,4] | [24.67,3] | [-19.0,6] | [13.33,3] | [-13.0,1] | [27.3,3] |
[-30.0,1] | [-20.25,4] | [-2.0,2] | [90.0,3] | [-30.32,3] | [73.3,3] |
[-90.6,3] | [72.33,3] | [-34.25,4] | [105.0,4] | [-79.5,4] | [24.0,1] |
[-60.66,3] | [22.5,2] | [-36.25,4] | [-3.0,2] | [-5.5,2], | [51.0,1] |
[99.0,2] | [8.0,2] | [-12.0,1] | [6.0,2] |
In the time series pattern similarity measurement method, the experiment in this chapter mainly excavates abnormal patterns in real-time hydrological data, so that the real-time hydrological data is divided according to a KPRA-PLR algorithm, the patterns are expressed as meta patterns, two adjacent meta patterns form the sequence patterns, 94 sequence patterns are obtained in total, historical hydrological data is divided, the patterns are expressed as meta patterns, two adjacent meta patterns form the sequence patterns, and 1422 sequence patterns are obtained in total. Wherein, the distance between each sequence pattern and the 1422 sequence patterns obtained from historical hydrological data is calculated pairwise, and the results of partial similarity measurement are shown in table 3 below.
Table 3 similarity measurement results
Serial number | 1 | 2 | 3 | … | 1420 | 1421 | 1422 |
1 | 9.4 | 11.91 | 11.5 | … | 69.83 | 3.83 | 52.15 |
2 | 11.91 | 12.68 | 16.41 | … | 75.74 | 12.08 | 71.76 |
3 | 11.5 | 16.41 | 6.35 | … | 80.33 | 9.67 | 25.35 |
4 | 25.03 | 28.12 | 17.13 | …. | 93.86 | 23.2 | 24.88 |
5 | 8.83 | 8.33 | 8.08 | … | 77.66 | 7.0 | 6.43 |
6 | 50.665 | 56.575 | 60.165 | … | 20.165 | 53.495 | 54.815 |
7 | 18.67 | 24.58 | 29.17 | … | 53.16 | 19.5 | 22.82 |
8 | 27.74 | 30.83 | 21.24 | … | 96.57 | 24.91 | 27.59 |
9 | 26.685 | 23.985 | 18.395 | … | 95.725 | 25.065 | 21.745 |
10 | 33.895 | 31.985 | 26.395 | … | 102.725 | 32.065 | 28.745 |
11 | 33.58 | 35.67 | 24.08 | … | 102.41 | 31.75 | 32.43 |
12 | 12.83 | 23.74 | 11.33 | … | 73.0 | 13.66 | 12.98 |
… | … | … | … | … | … | … | … |
85 | 66.58 | 69.67 | 59.08 | … | 135.41 | 62.75 | 66.43 |
86 | 55.41 | 58.5 | 48.91 | … | 124.24 | 52.58 | 55.26 |
87 | 30.955 | 34.045 | 23.455 | … | 99.785 | 28.125 | 30.805 |
88 | 42.08 | 45.17 | 36.58 | … | 110.91 | 38.25 | 41.93 |
89 | 36.49 | 39.58 | 29.99 | … | 105.32 | 32.66 | 37.34 |
90 | 58.67 | 64.58 | 65.17 | … | 41.34 | 60.5 | 68.82 |
91 | 51.17 | 57.08 | 61.67 | … | 19.66 | 53.0 | 56.32 |
92 | 69.83 | 75.74 | 80.33 | … | 54.9 | 72.66 | 74.98 |
93 | 3.83 | 12.08 | 9.67 | … | 72.66 | 56.4 | 5.68 |
94 | 5.15 | 10.76 | 11.35 | … | 73.98 | 10.68 | 9.4 |
In the K-nearest neighbor principle, the SDTW algorithm is used to calculate the similarity between the sequence patterns obtained from the real-time hydrographic data and the sequence patterns obtained from the historical hydrographic data, and the similarity measurement results are shown in table 4 and table 3 above. Then calculating the abnormal score of each sequence mode in the real-time hydrological data, wherein the abnormal score calculation mode is as follows: the inverse of the average distance between the 1422 sequence patterns obtained from historical data. After the abnormal score is calculated, a point abnormal detection algorithm based on similarity can be adopted to detect an abnormal mode, two methods can be used, and one method is a k-nearest neighbor principle; another is a clustering-based approach.
In this chapter, local abnormal factors are calculated by using a k-nearest neighbor method, and the sequence mode corresponding to the local abnormal factor with the largest Top-k is selected as the abnormal mode. The important parameter of the K-nearest neighbor principle is the nearest neighbor number K, the corresponding local anomaly factor is calculated by continuously adjusting the value of the nearest neighbor number K, and the local anomaly factor with the largest Top-K is selected as shown in table 4 below.
TABLE 4 Top-k local anomaly factor Table
The analysis of the experimental results, the local abnormal factors are ranked from large to small, the mode 185-195 is the largest, and the mode 282-287 is the next. From fig. 3, fig. 4 can find the current station hour flowSudden swell within 2 hours, slope a of the sequence patterniLarge, reflecting that the growth rate of the sequence pattern is fast and then decreases. Such a sequence pattern exists in the traffic data between 2000 and 2015, but occurs only a small number of times and can be similar to the trend and time length of the sequence pattern. In FIG. 3, the two sequential patterns shown in FIG. 4 are too short in the flow rate increase tendency by 2 hours, and then the patterns 185 to 195 are too long in the pattern increase and decrease time, and the patterns 282 to 287 are too short in the increase and decrease time, so that the patterns 185 to 195 and the patterns 282 to 287 are determined as abnormal patterns.
And sequencing the local abnormal factors from large to small, wherein the third bits of the modes 290-302 and the fourth bits of the modes 363-370 are included. From fig. 5, fig. 6 can find that the flow slowly increases for a short time and then decreases for a long time, the slope a of the sequence patterniThe sequence pattern is a negative value and reflects the descending degree of the sequence pattern, the whole time of the sequence pattern is too long, 1520 is not a key point according to the definition of the key point, the descending-ascending trend appears, the sudden increase occurs in the first 1 hour, the descending speed in the following is rapidly reduced from 2640 to 755, and the descending trend and the time length of the flow data in 2000 to 2015 are very small and similar to those of the patterns 290 to 302. In the sequence mode of fig. 6, the flow rate is slowly increased for 1 hour, and then the flow rate is in a descending state for a long time, so that the trend rarely occurs, and the descending time is too long, so that the mode is strange, and therefore the modes 290-302 and 363-370 are determined as abnormal modes.
Claims (6)
1. a hydrological data abnormal mode detection method based on similarity measurement is characterized in that: the method is based on a KPRA-PLR algorithm of a segmented linear representation method of key points, a time series pattern similarity measurement method and a local anomaly detection algorithm based on k-nearest neighbors, and comprises the following steps:
(1) a key point-based piecewise linear representation KPRA-PLR algorithm: firstly, segmenting an original hydrological time sequence according to the definition of key points to obtain each segment of subsequence, then, carrying out straight line fitting on each segment of subsequence according to a PLR algorithm to obtain the slope of a straight line,Solving the intersection point between two adjacent straight lines to obtain the initial time t of each subsequenceilAnd a termination time tirby (a)i,til,tir) It is also called meta-mode to indicate each sub-sequence;
(2) the method for measuring the similarity of time series patterns comprises the following steps: combining adjacent meta-patterns into a sequence pattern; when the similarity of the two modes is measured, establishing a similarity distance measurement function between the two modes, representing the similarity between the two modes by using the distance, wherein the average similarity of each mode represents the abnormal score of the mode;
(3) k-nearest neighbor local anomaly detection algorithm: and expressing the k-neighbor distance of the pattern by using the abnormal score value to obtain the local reachable distance, the local reachable density and the local abnormal factor of the k-neighbor, wherein the largest abnormal factor is the abnormal pattern.
2. The method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 1, characterized in that: the step (1) comprises the following processes:
(11) Defining the key point as S ═ S1,s2,…,smAll local extreme points of the time series X are defined as extreme points skSatisfy | sk-sk-1< ε and | tk-tk-1When | is less than δ, where ε and δ are given constants, tkThe observation time of each hydrological physical quantity is designated as skThe non-important local extreme points are the key points, and other extreme points in the set S are the key points;
(12) According to a linear piecewise expression algorithm, fitting a straight line by using a least square method through two adjacent key points to obtain a slope corresponding to the straight line, solving an intersection point between the two adjacent straight lines to obtain the starting time and the termination time t of each subsequenceiland tir(ii) a The pattern of each subsequence is denoted as (a)i,til,tir) Called meta-mode;
(13) And determining constant epsilon and delta values suitable for the current data according to the form coincidence degree of the obtained fitting straight line and the original curve by adjusting parameters.
3. The method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 1, characterized in that: in the method for measuring similarity of time series patterns in step (2), similarity distance measurement functions are respectively established for meta-patterns and series patterns, which are as follows:
(21) For the similarity degree between two meta-modes, a weighted distance method is adopted, and the mode sequence S of the time sequence X is obtained by a PLR algorithmx={M1,M2,M3,…,Mn},SxIn which two meta-modes M are arbitrarily selectedi=(ai,Δti),Mj=(aj,Δtj),(i,j=1,2,3,…,n),
Recording as follows: dwei(Mi,Mj)=β·|ai-aj|+(1-β)·|Δti-Δtj|
In the formula, airepresenting the slope of the fitted line, Δ tjRepresenting the time interval of the adjacent j points, Mi representing the meta-mode, Dwei(Mi,Mj) Is a meta-mode MiAnd Mjbeta is more than or equal to 0 and less than or equal to 1;
(22) Combining every two adjacent meta-modes to obtain a sequence mode, wherein the similarity measurement of the sequence mode is to measure the similarity of the sequence mode by using a dynamic time warping distance method on the basis of the similarity of the meta-modes;
Time series X ═ X1,x2,x3,…,xn) And the time series Y ═ Y1,y2,…,ym) The mode sequences obtained by performing dimension reduction and linear representation through the KPRA-PLR algorithm are respectively as follows:
Sx={M1,M2,M3,…,Mp},
Sy={N1,N2,N3,…,Nt},
wherein (i ═ 1,2,3, …, p; (j ═ 1,2,3, …, t), Mi=(ai,Δti) Is Sxmeta-schema of Nj=(aj,Δtj) Is thatSyMeta-schema of (2); the calculation formula of the distance matrix between the two pattern sequences is established as follows:
DM=(aij)p×t
In the formula, aij=Dwei(Mi,Nj) Represented as a pattern sequence Sxthe ith meta-mode MiAnd Syof the jth meta-mode NjThe weighted distance of (2). a isijthe size of the value represents meta-mode Mi,NjDegree of similarity between them, when meta-mode Mi,Njthe more similar therebetween, aijThe closer the value is to 0; when meta mode Mi,Njthe greater the difference between, aijThe larger the value;
(23) Similar to the dynamic time warping distance between time sequences, in the distance matrix DM ═ aij)p×tsearching for the best curved path W ═ W1,w2,…,wkAnd calculating a curved path having a minimum cumulative distance, this minimum being called the pattern sequence SxAnd SyIs abbreviated as Ddtw(Sx,Sy) The following formula is obtained:
in the formula, w is the distance between two meta-patterns.
4. The method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 1, characterized in that: in the local k-neighbor anomaly detection algorithm in the step (3), the anomaly score value is equal to the k-neighbor distance of the pattern, and the specific process is as follows:
Assuming k is a positive integer, the k-nearest neighbor reachable distance between instance x and instance t is represented by the following equation:
rdist(x,t)=max{kdistance(t),d(x,t)}
The local reachable density of example x is defined as the inverse of the average reachable distance of the k nearest neighbors of example x, and is formulated as follows:
In the data point set D, the LOF (x) of the local anomaly coefficient of the instance x can be defined as:
wherein an anomaly score, r, for each sequence pattern is calculated according to a DTW algorithmdist(x,t)for each sequence pattern's anomaly score, instance x represents the anomaly score for a sequence pattern, the local reachable density is calculated according to the formula, and lof represents the anomaly factor.
5. the method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 3, characterized in that: for the singular point problem existing in the DTW method, based on the improved method of the space dimension and the time dimension, the extra weight coefficient is not introduced to carry out weighted summation on the space dimension and the time dimension in a self-adaptive weight mode, and then the minimum distance of the accumulation path is calculated.
6. the method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 5, characterized in that: the singular point problem in the DTW method comprises data processing based on an SDTW algorithm as a time series similarity measurement method, and the specific process is as follows:
Defining feature Fs(x (i)) is:
Wherein max (| Δ x |) represents the maximum gradient at all time points in the time sequence, and the gradient at a certain point is represented by the difference between adjacent points, such as x (i) -x (i-1); max (Deltax) has the function of constraining the gradient size of each point in the time sequence to [ -1,1], so that the time dimension and the space dimension are combined in a ratio mode to obtain an expression of the distance d (i, j) between elements in the matrix, wherein the expression is as follows:
d(i,j)=(Fs(x(i))-Fs(y(j)))2。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910784182.9A CN110569890A (en) | 2019-08-23 | 2019-08-23 | Hydrological data abnormal mode detection method based on similarity measurement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910784182.9A CN110569890A (en) | 2019-08-23 | 2019-08-23 | Hydrological data abnormal mode detection method based on similarity measurement |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110569890A true CN110569890A (en) | 2019-12-13 |
Family
ID=68775898
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910784182.9A Withdrawn CN110569890A (en) | 2019-08-23 | 2019-08-23 | Hydrological data abnormal mode detection method based on similarity measurement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110569890A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183721A (en) * | 2020-09-16 | 2021-01-05 | 河海大学 | Construction method of combined hydrological prediction model based on self-adaptive differential evolution |
CN112597539A (en) * | 2020-12-28 | 2021-04-02 | 上海观安信息技术股份有限公司 | Unsupervised learning-based time series anomaly detection method and system |
CN112966017A (en) * | 2021-03-01 | 2021-06-15 | 北京青萌数海科技有限公司 | Abnormal subsequence detection method with indefinite length in time sequence |
CN113190406A (en) * | 2021-04-30 | 2021-07-30 | 上海爱数信息技术股份有限公司 | IT entity group anomaly detection method under cloud native observability |
CN115729981A (en) * | 2022-11-29 | 2023-03-03 | 中国长江电力股份有限公司 | Similar water regime data mining method based on editing distance and application thereof |
CN116429220A (en) * | 2023-06-14 | 2023-07-14 | 济宁安泰矿山设备制造有限公司 | Hydraulic engineering anomaly detection method |
CN116703485A (en) * | 2023-08-04 | 2023-09-05 | 山东创亿智慧信息科技发展有限责任公司 | Advertisement accurate marketing method and system based on big data |
CN116957634A (en) * | 2023-09-19 | 2023-10-27 | 贵昌集团有限公司 | Information intelligent acquisition processing method for electronic commerce platform |
-
2019
- 2019-08-23 CN CN201910784182.9A patent/CN110569890A/en not_active Withdrawn
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183721B (en) * | 2020-09-16 | 2022-04-26 | 河海大学 | Construction method of combined hydrological prediction model based on self-adaptive differential evolution |
CN112183721A (en) * | 2020-09-16 | 2021-01-05 | 河海大学 | Construction method of combined hydrological prediction model based on self-adaptive differential evolution |
CN112597539A (en) * | 2020-12-28 | 2021-04-02 | 上海观安信息技术股份有限公司 | Unsupervised learning-based time series anomaly detection method and system |
CN112966017B (en) * | 2021-03-01 | 2023-11-14 | 北京青萌数海科技有限公司 | Abnormal subsequence detection method for indefinite length in time sequence |
CN112966017A (en) * | 2021-03-01 | 2021-06-15 | 北京青萌数海科技有限公司 | Abnormal subsequence detection method with indefinite length in time sequence |
CN113190406A (en) * | 2021-04-30 | 2021-07-30 | 上海爱数信息技术股份有限公司 | IT entity group anomaly detection method under cloud native observability |
CN115729981A (en) * | 2022-11-29 | 2023-03-03 | 中国长江电力股份有限公司 | Similar water regime data mining method based on editing distance and application thereof |
CN115729981B (en) * | 2022-11-29 | 2024-02-13 | 中国长江电力股份有限公司 | Editing distance-based similar water condition data mining method and application thereof |
CN116429220A (en) * | 2023-06-14 | 2023-07-14 | 济宁安泰矿山设备制造有限公司 | Hydraulic engineering anomaly detection method |
CN116429220B (en) * | 2023-06-14 | 2023-09-26 | 济宁安泰矿山设备制造有限公司 | Hydraulic engineering anomaly detection method |
CN116703485B (en) * | 2023-08-04 | 2023-10-20 | 山东创亿智慧信息科技发展有限责任公司 | Advertisement accurate marketing method and system based on big data |
CN116703485A (en) * | 2023-08-04 | 2023-09-05 | 山东创亿智慧信息科技发展有限责任公司 | Advertisement accurate marketing method and system based on big data |
CN116957634A (en) * | 2023-09-19 | 2023-10-27 | 贵昌集团有限公司 | Information intelligent acquisition processing method for electronic commerce platform |
CN116957634B (en) * | 2023-09-19 | 2023-11-21 | 贵昌集团有限公司 | Information intelligent acquisition processing method for electronic commerce platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110569890A (en) | Hydrological data abnormal mode detection method based on similarity measurement | |
CN108038044B (en) | Anomaly detection method for continuous monitored object | |
CN110018670B (en) | Industrial process abnormal working condition prediction method based on dynamic association rule mining | |
CN111709465B (en) | Intelligent identification method for rough difference of dam safety monitoring data | |
CN112911627B (en) | Wireless network performance detection method, device and storage medium | |
CN108447057B (en) | SAR image change detection method based on significance and depth convolution network | |
CN110851931A (en) | Optimal arrangement method for flow monitoring points of urban water supply pipe network | |
CN109977546B (en) | Four-dimensional track online anomaly detection method based on unsupervised learning | |
CN114792044B (en) | Intelligent early warning method and system for subsidence of foundation pit adjacent to ground surface with coupling spatial characteristics | |
CN105512206A (en) | Outlier detection method based on clustering | |
CN106951680A (en) | A kind of Hydrological Time Series abnormal patterns detection method | |
CN107764458A (en) | A kind of aircraft handing characteristics curve generation method | |
CN118211082B (en) | Oil filter element residual life prediction method and system based on data analysis | |
CN111122162A (en) | Industrial system fault detection method based on Euclidean distance multi-scale fuzzy sample entropy | |
CN115130499A (en) | Vibration signal-based online measurement and prediction method for slip ratio of bearing retainer under variable working conditions | |
CN112380992A (en) | Method and device for evaluating and optimizing accuracy of monitoring data in machining process | |
CN114819260A (en) | Dynamic generation method of hydrologic time series prediction model | |
CN114387332B (en) | Pipeline thickness measuring method and device | |
CN117688498B (en) | Ship comprehensive safety state monitoring system based on ship-shore cooperation | |
CN112711052B (en) | GNSS coordinate sequence step detection improvement method and system based on continuous t test | |
CN117150244B (en) | Intelligent power distribution cabinet state monitoring method and system based on electrical parameter analysis | |
CN118013277A (en) | Multi-model combined runoff forecasting method with time-varying weight | |
CN110780342A (en) | Rock slope deformation early warning method | |
CN113705738A (en) | Engineering equipment bearing degradation assessment method | |
CN113553232B (en) | Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191213 |