CN110569890A - Hydrological data abnormal mode detection method based on similarity measurement - Google Patents

Hydrological data abnormal mode detection method based on similarity measurement Download PDF

Info

Publication number
CN110569890A
CN110569890A CN201910784182.9A CN201910784182A CN110569890A CN 110569890 A CN110569890 A CN 110569890A CN 201910784182 A CN201910784182 A CN 201910784182A CN 110569890 A CN110569890 A CN 110569890A
Authority
CN
China
Prior art keywords
mode
meta
sequence
similarity
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910784182.9A
Other languages
Chinese (zh)
Inventor
万定生
张祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201910784182.9A priority Critical patent/CN110569890A/en
Publication of CN110569890A publication Critical patent/CN110569890A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a method for detecting abnormal patterns of hydrological data based on similarity measurement, which is characterized in that KPRA-PLR algorithm is expressed on the basis of linear segmentation of key points, the hydrological data is cut according to the definition of the key points, each subsequence is subjected to straight line fitting through the PLR algorithm, and the slope a of the straight line is usediAnd a time interval deltatRepresents the subsequence; each subsequence after segmentation is called as a meta-mode, adjacent meta-modes are combined to obtain a sequence mode, a weighted distance and an SDTW algorithm are respectively used for similarity measurement of the meta-modes and the sequence modes, and then an abnormal score of each sequence mode is calculated, namely the reciprocal of the average distance between the mode and other modes; the abnormal score is the sequence pattern SxAnd calculating a local abnormal factor LOF according to a k-neighbor local detection principle. The abnormal mode detected by the method for measuring the similarity is more accurate, and the data analysis is performedA new technology is provided for detecting abnormal patterns of hydrological data.

Description

Hydrological data abnormal mode detection method based on similarity measurement
Technical Field
the invention belongs to a hydrological detection technology, and particularly relates to a hydrological data abnormal mode detection method based on similarity measurement.
Background
The hydrologic time series is an observed value of a series of physical quantities (water level, flow rate, rainfall, etc.) acquired by an observation system in time series. Hydrologic time series is a common complex data type that objectively records observation information that is obtained by an observation system in time order. In China, with the development of water conservancy informatization, a hydrological time sequence can be transmitted to a water conservancy information sharing platform and is processed by workers to be stored in a national water library. Due to the measuring error of the acquisition equipment, the manual operation error, the evolution rule change of hydrology itself and other influence factors, some abnormal data exist in the hydrology time sequence.
Anomaly Detection (Anomaly Detection) mainly aims at mining out unusual data hidden in a large data set, namely potentially meaningful information with obvious differences from other data. The method and the technology for detecting abnormal data in the hydrologic time series are still in an exploration stage, the industry has no perfect abnormal detection scheme, and the conventional method for detecting abnormal points of the hydrologic time series comprises a Box-Plot method and a Benford rule.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems of low precision and low detection speed of abnormal conditions in the prior art, the invention provides a method for detecting abnormal patterns of hydrological data based on similarity measurement, which provides a theoretical basis for detecting abnormal patterns of hydrological data.
the technical scheme is as follows: a hydrologic data abnormal pattern detection method based on similarity measurement, the method is based on a KPRA-PLR algorithm of a segmented linear representation method of key points, a time series pattern similarity measurement method and a local abnormal pattern detection algorithm based on k-nearest neighbors, and comprises the following steps:
(1) A key point-based piecewise linear representation KPRA-PLR algorithm: firstly, segmenting an original hydrological time sequence according to the definition of key points to obtain each segment of subsequence, then, carrying out straight line fitting on each segment of subsequence according to a PLR algorithm to obtain the slope of a straight line, solving the intersection point between two adjacent straight lines to obtain the initial time t of each subsequenceilAnd a termination time tirBy (a)i,til,tir) It is also called meta-mode to indicate each sub-sequence;
(2) The method for measuring the similarity of time series patterns comprises the following steps: combining adjacent meta-patterns into a sequence pattern; when the similarity of the two modes is measured, establishing a similarity distance measurement function between the two modes, representing the similarity between the two modes by using the distance, wherein the average similarity of each mode represents the abnormal score of the mode;
(3) k-nearest neighbor local anomaly detection algorithm: and expressing the k-neighbor distance of the pattern by using the abnormal score value to obtain the local reachable distance, the local reachable density and the local abnormal factor of the k-neighbor, wherein the largest abnormal factor is the abnormal pattern.
Further, the step (1) comprises the following processes:
(11) Defining the key point as S ═ S1,s2,...,smAll local extreme points of the time series X are defined as extreme points skSatisfy | sk-sk-1< ε and | tk-tk-1When | is less than δ, where ε and δ are given constants, tkThe observation time of each hydrological physical quantity is designated as skThe non-important local extreme points are the key points, and other extreme points in the set S are the key points;
(12) then according to a linear piecewise expression algorithm, fitting a straight line by using a least square method through two adjacent key points to obtain a slope corresponding to the straight line, solving an intersection point between the two adjacent straight lines to obtain the starting time and the termination time t of each subsequenceilAnd tir(ii) a The pattern of each subsequence is denoted as (a)i,til,tir) Called meta-mode;
(13) And determining constant epsilon and delta values suitable for the current data according to the form coincidence degree of the obtained fitting straight line and the original curve by adjusting parameters.
In the method for measuring similarity of time series patterns in step (2), similarity distance measurement functions are respectively established for meta-patterns and series patterns, which are as follows:
(21) for the similarity degree between two meta-modes, a weighted distance method is adopted, and the mode sequence S of the time sequence X is obtained by a PLR algorithmx={M1,M2,M3,...,Mn},SxIn which two meta-modes M are arbitrarily selectedi=(ai,Δti),Mj=(aj,Δtj),(i,j=1,2,3,...,n),
Recording as follows: dwei(Mi,Mj)=β·|ai-aj|+(1-β)·|Δti-Δtj|
In the formula, airepresenting the slope of the fitted line, Δ tjrepresenting the time interval of the adjacent j points, Mi representing the meta-mode, Dwei(Mi,Mj) Is a meta-mode MiAnd MjBeta is more than or equal to 0 and less than or equal to 1;
(22) Combining every two adjacent meta-modes to obtain a sequence mode, wherein the similarity measurement of the sequence mode is to measure the similarity of the sequence mode by using a dynamic time warping distance method on the basis of the similarity of the meta-modes;
Time series X ═ X1,x2,x3,...,xn) And the time series Y ═ Y1,y2,...,ym) The mode sequences obtained by performing dimension reduction and linear representation through the KPRA-PLR algorithm are respectively as follows:
Sx={M1,M2,M3,...,Mp},
Sy={N1,N2,N3,...,Nt},
Wherein (i ═ 1,2, 3.., p; j ═ 1,2, 3.., t), Mi=(ai,Δti) Is SxThe meta-mode of (a) is,
Nj=(aj,Δtj) Is SyMeta-schema of (2); the calculation formula of the distance matrix between the two pattern sequences is established as follows:
DM=(aij)p×t
In the formula, aij=Dwei(Mi,Nj) Represented as a pattern sequence SxThe ith meta-mode Miand Syof the jth meta-mode Njthe weighted distance of (2). a isijthe size of the value represents meta-mode Mi,NjDegree of similarity between them, when meta-mode Mi,Njthe more similar therebetween, aijThe closer the value is to 0; when meta mode Mi,NjThe greater the difference between, aijThe larger the value;
(23) Similar to the dynamic time warping distance between time sequences, in the distance matrix DM ═ aij)p×tSearching for the best curved path W ═ W1,w2,..,wkand calculating a curved path having a minimum cumulative distance, this minimum being called the pattern sequence SxAnd SyIs abbreviated as Ddtw(Sx,Sy) The following formula is obtained:
in the formula, w is the distance between two meta-patterns.
in the local k-neighbor anomaly detection algorithm in the step (3), the anomaly score value is equal to the k-neighbor distance of the pattern, and the specific calculation process is as follows:
Assuming k is a positive integer, the k-nearest neighbor reachable distance between instance x and instance t is represented by the following equation:
rdist(x,t)=max{kdistance(t),d(x,t)}
The local reachable density of example x is defined as the inverse of the average reachable distance of the k nearest neighbors of example x, and is formulated as follows:
In the data point set D, the LOF (x) of the local anomaly coefficient of the instance x can be defined as:
Wherein an anomaly score, r, for each sequence pattern is calculated according to a DTW algorithmdist(x,t)For each sequence pattern's anomaly score, instance x represents a sequenceThe column pattern anomaly score, the local reachable density is calculated according to the formula, lof represents the anomaly factor.
furthermore, for the singular point problem existing in the DTW method, based on the improved method of the space dimension and the time dimension, the spatial dimension (numerical value) and the time dimension (gradient) are weighted and summed in a self-adaptive weighting manner without introducing an additional weighting coefficient, so as to calculate the minimum distance of the accumulation path.
Furthermore, the singular point problem in the DTW method includes a method based on the SDTW algorithm as a time series similarity measure, and the specific process is as follows:
defining feature Fs(x (i)) is:
Wherein max (| Δ x |) represents the maximum gradient at all time points in the time sequence, and the gradient at a certain point is represented by the difference between adjacent points, such as x (i) -x (i-1); max (Deltax) has the function of constraining the gradient size of each point in the time sequence to [ -1,1], so that the time dimension and the space dimension are combined in a ratio mode to obtain an expression of the distance d (i, j) between elements in the matrix, wherein the expression is as follows:
d(i,j)=(Fs(x(i))-Fs(y(j)))2
Has the advantages that: the invention discloses a hydrologic data abnormal mode detection scheme, which realizes that a KPRA-PLR algorithm divides an original sequence, a weighted distance is used for measuring the similarity of a meta-mode, an improved DTW algorithm is used for measuring the similarity of a sequence mode, a k-nearest neighbor local abnormal detection algorithm is used for an abnormal mode detection principle based on the similarity measurement, and compared with a commonly used abnormal mode detection method, namely a symbolized SAX method, fixed segmentation and the number of fixed mode categories are not needed.
drawings
FIG. 1 is a schematic view showing the overall structure of the process of the present invention;
FIG. 2 is a schematic diagram of a KPRA-PLR algorithm fitting line in the present invention;
FIG. 3 is a schematic diagram illustrating the flow variations of the embodiments 185-195;
FIG. 4 is a schematic diagram illustrating the flow rate variation in modes 282-287 of an embodiment;
FIG. 5 is a schematic diagram illustrating the flow variations in modes 290-302 according to an embodiment;
FIG. 6 is a schematic diagram illustrating the flow rate changes in the modes 363-370 in the embodiment.
Detailed Description
For the purpose of illustrating the technical solutions disclosed in the present invention in detail, the following description is further provided with reference to the accompanying drawings and specific embodiments.
The invention relates to a method for detecting abnormal patterns of hydrologic data based on similarity measurement, which comprises three modules, namely a KPRA-PLR algorithm based on the linear segmented representation of key points, a time sequence pattern similarity measurement method and a K-nearest neighbor principle, and is shown in figure 1. In the embodiment, the hourly flow (unit is m) of flood season of the gantry hydrological measuring station is selected for hydrological data selection3And/s) data, and carrying out experiments by adopting actual measurement flood season hour flow data of the observation station 2000/6/16: 00: 00-2015/9/3017: 39: 00.
Firstly, a KPRA-PLR algorithm is expressed in a linear segmentation mode based on key points, the same threshold values epsilon and delta are adopted for dimensionality reduction of the two kinds of hydrological data according to a formula defined by the key points, the key points of a historical hydrological time sequence (flood season data from 2000 to 2015) and a real-time hydrological time sequence (flood season data from 2016 to 2017) are obtained respectively, and statistics are carried out: the real-time hydrological data obtains 189 key points, and the historical hydrological data obtains 2844 key points. The key points are expressed according to a [ t, i ] format, wherein t represents a time series corner mark, and i represents the observed value size of the time series corresponding to the moment. The key points obtained by intercepting part of the real-time hydrological time are put into a table for display, as shown in table 1 below.
TABLE 1 Key points demonstration
[1,159] [4,258] [8,168] [11,244] [12,232] [14,220]
[29,188] [32,184] [35,216] [38,183] [43,256] [46,214]
[50,205] [52,276] [61,173] [64,406] [65,344] [67,537]
[73,330] [75,421] [89,142] [96,164] [105,141] [111,160]
[120,387] [121,511], [122,475] [128,615] [131,960] [142,273]
[148,768] [149,747] [150,780] [151,705] [155,556] [165,147]
[174,396] [177,651] [180,476] [181,554] [184,331] [190,234]
[194,1160] [195,1300], [196,1150] [197,125] [198,122] [89,142]
[605,296] [606,266] [610,185] [612,181] [615,451] [618,360]
[621,580] [624,308] [627,525] [631,388] [635,808] [639,490]
[640,514] [643,332] [645,377] [649,232] [651,226] [653,215]
[654,266] [657,184] [658,201] [664,480] [666,301] [668,426]
[670,301] [672,499] [675,306] [677,322] [678,310] [680,322]
And segmenting the original time sequence through the key points to obtain a subsequence, and performing mode representation on the subsequence by using a PLR algorithm to obtain a single element mode. According to definition, adjacent meta-modes are combined to obtain sequence modes, different combination modes are adopted, the obtained sequence modes are different, the sequence modes are obtained by combining every two adjacent meta-modes in the chapter, and M is usedi=(ai,Δti) Is shown in the specification, wherein airepresents the slope of a straight line, represents the trend of the curve, atithe length of the straight line is represented, and the length of the curve is represented.
According to statistics: 188 meta-modes are obtained by the real-time hydrologic sequence, and 94 meta-modes form a sequence mode by two adjacent meta-modes; the historical hydrologic sequences total 2844 meta-patterns, and 1422 sequence patterns are formed by two adjacent meta-patterns. The resulting sequence patterns for the real-time data are shown in part in table 2 below.
TABLE 2 partial sequence mode Table
[33.0,3] [-22.5,4] [25.33,3] [-12.0,1] [-6.0,2] [-1.2,5]
[-2.6,10] [-1.32,3] [10.67,3] [-11.0,3] [14.6,5] [-14.0,3]
[15.0,2] [-19.5,2] [35.5,2] [-11.43,9] [77.67,3] [-62.0,1]
[96.5,2] [-43.25,4], [-17.0,2 [45.5,2] [-19.9,14] [3.14,7]
[-2.54,9] [3.17,6] [42.25,8] [-111.0,1] [124.0,1] [-36.0,1]
[23.33,6] [115.0,3] [-62.44,11] [82.5,6] [-21.0,1] [33.0,1]
[-75.0,1] [-37.25,4] [91.4,10] [-119.3,9] [85.0,3] [-58.3,3]
[78.0,1] [-74.32,3] [334.83,6] [-295.0,4] [140.0,1] [-150,1]
[100.0,1] [-30.0,1] [13.33,3] [75.0,2] [-85.5,6] [32.0,1]
[-15.0,4] [20.13,8] [-18.0,6] [18.5,4] [-4.0,4] [-23.3,3]
[15.0,4] [24.67,3] [-19.0,6] [13.33,3] [-13.0,1] [27.3,3]
[-30.0,1] [-20.25,4] [-2.0,2] [90.0,3] [-30.32,3] [73.3,3]
[-90.6,3] [72.33,3] [-34.25,4] [105.0,4] [-79.5,4] [24.0,1]
[-60.66,3] [22.5,2] [-36.25,4] [-3.0,2] [-5.5,2], [51.0,1]
[99.0,2] [8.0,2] [-12.0,1] [6.0,2]
In the time series pattern similarity measurement method, the experiment in this chapter mainly excavates abnormal patterns in real-time hydrological data, so that the real-time hydrological data is divided according to a KPRA-PLR algorithm, the patterns are expressed as meta patterns, two adjacent meta patterns form the sequence patterns, 94 sequence patterns are obtained in total, historical hydrological data is divided, the patterns are expressed as meta patterns, two adjacent meta patterns form the sequence patterns, and 1422 sequence patterns are obtained in total. Wherein, the distance between each sequence pattern and the 1422 sequence patterns obtained from historical hydrological data is calculated pairwise, and the results of partial similarity measurement are shown in table 3 below.
Table 3 similarity measurement results
Serial number 1 2 3 1420 1421 1422
1 9.4 11.91 11.5 69.83 3.83 52.15
2 11.91 12.68 16.41 75.74 12.08 71.76
3 11.5 16.41 6.35 80.33 9.67 25.35
4 25.03 28.12 17.13 …. 93.86 23.2 24.88
5 8.83 8.33 8.08 77.66 7.0 6.43
6 50.665 56.575 60.165 20.165 53.495 54.815
7 18.67 24.58 29.17 53.16 19.5 22.82
8 27.74 30.83 21.24 96.57 24.91 27.59
9 26.685 23.985 18.395 95.725 25.065 21.745
10 33.895 31.985 26.395 102.725 32.065 28.745
11 33.58 35.67 24.08 102.41 31.75 32.43
12 12.83 23.74 11.33 73.0 13.66 12.98
85 66.58 69.67 59.08 135.41 62.75 66.43
86 55.41 58.5 48.91 124.24 52.58 55.26
87 30.955 34.045 23.455 99.785 28.125 30.805
88 42.08 45.17 36.58 110.91 38.25 41.93
89 36.49 39.58 29.99 105.32 32.66 37.34
90 58.67 64.58 65.17 41.34 60.5 68.82
91 51.17 57.08 61.67 19.66 53.0 56.32
92 69.83 75.74 80.33 54.9 72.66 74.98
93 3.83 12.08 9.67 72.66 56.4 5.68
94 5.15 10.76 11.35 73.98 10.68 9.4
In the K-nearest neighbor principle, the SDTW algorithm is used to calculate the similarity between the sequence patterns obtained from the real-time hydrographic data and the sequence patterns obtained from the historical hydrographic data, and the similarity measurement results are shown in table 4 and table 3 above. Then calculating the abnormal score of each sequence mode in the real-time hydrological data, wherein the abnormal score calculation mode is as follows: the inverse of the average distance between the 1422 sequence patterns obtained from historical data. After the abnormal score is calculated, a point abnormal detection algorithm based on similarity can be adopted to detect an abnormal mode, two methods can be used, and one method is a k-nearest neighbor principle; another is a clustering-based approach.
In this chapter, local abnormal factors are calculated by using a k-nearest neighbor method, and the sequence mode corresponding to the local abnormal factor with the largest Top-k is selected as the abnormal mode. The important parameter of the K-nearest neighbor principle is the nearest neighbor number K, the corresponding local anomaly factor is calculated by continuously adjusting the value of the nearest neighbor number K, and the local anomaly factor with the largest Top-K is selected as shown in table 4 below.
TABLE 4 Top-k local anomaly factor Table
The analysis of the experimental results, the local abnormal factors are ranked from large to small, the mode 185-195 is the largest, and the mode 282-287 is the next. From fig. 3, fig. 4 can find the current station hour flowSudden swell within 2 hours, slope a of the sequence patterniLarge, reflecting that the growth rate of the sequence pattern is fast and then decreases. Such a sequence pattern exists in the traffic data between 2000 and 2015, but occurs only a small number of times and can be similar to the trend and time length of the sequence pattern. In FIG. 3, the two sequential patterns shown in FIG. 4 are too short in the flow rate increase tendency by 2 hours, and then the patterns 185 to 195 are too long in the pattern increase and decrease time, and the patterns 282 to 287 are too short in the increase and decrease time, so that the patterns 185 to 195 and the patterns 282 to 287 are determined as abnormal patterns.
And sequencing the local abnormal factors from large to small, wherein the third bits of the modes 290-302 and the fourth bits of the modes 363-370 are included. From fig. 5, fig. 6 can find that the flow slowly increases for a short time and then decreases for a long time, the slope a of the sequence patterniThe sequence pattern is a negative value and reflects the descending degree of the sequence pattern, the whole time of the sequence pattern is too long, 1520 is not a key point according to the definition of the key point, the descending-ascending trend appears, the sudden increase occurs in the first 1 hour, the descending speed in the following is rapidly reduced from 2640 to 755, and the descending trend and the time length of the flow data in 2000 to 2015 are very small and similar to those of the patterns 290 to 302. In the sequence mode of fig. 6, the flow rate is slowly increased for 1 hour, and then the flow rate is in a descending state for a long time, so that the trend rarely occurs, and the descending time is too long, so that the mode is strange, and therefore the modes 290-302 and 363-370 are determined as abnormal modes.

Claims (6)

1. a hydrological data abnormal mode detection method based on similarity measurement is characterized in that: the method is based on a KPRA-PLR algorithm of a segmented linear representation method of key points, a time series pattern similarity measurement method and a local anomaly detection algorithm based on k-nearest neighbors, and comprises the following steps:
(1) a key point-based piecewise linear representation KPRA-PLR algorithm: firstly, segmenting an original hydrological time sequence according to the definition of key points to obtain each segment of subsequence, then, carrying out straight line fitting on each segment of subsequence according to a PLR algorithm to obtain the slope of a straight line,Solving the intersection point between two adjacent straight lines to obtain the initial time t of each subsequenceilAnd a termination time tirby (a)i,til,tir) It is also called meta-mode to indicate each sub-sequence;
(2) the method for measuring the similarity of time series patterns comprises the following steps: combining adjacent meta-patterns into a sequence pattern; when the similarity of the two modes is measured, establishing a similarity distance measurement function between the two modes, representing the similarity between the two modes by using the distance, wherein the average similarity of each mode represents the abnormal score of the mode;
(3) k-nearest neighbor local anomaly detection algorithm: and expressing the k-neighbor distance of the pattern by using the abnormal score value to obtain the local reachable distance, the local reachable density and the local abnormal factor of the k-neighbor, wherein the largest abnormal factor is the abnormal pattern.
2. The method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 1, characterized in that: the step (1) comprises the following processes:
(11) Defining the key point as S ═ S1,s2,…,smAll local extreme points of the time series X are defined as extreme points skSatisfy | sk-sk-1< ε and | tk-tk-1When | is less than δ, where ε and δ are given constants, tkThe observation time of each hydrological physical quantity is designated as skThe non-important local extreme points are the key points, and other extreme points in the set S are the key points;
(12) According to a linear piecewise expression algorithm, fitting a straight line by using a least square method through two adjacent key points to obtain a slope corresponding to the straight line, solving an intersection point between the two adjacent straight lines to obtain the starting time and the termination time t of each subsequenceiland tir(ii) a The pattern of each subsequence is denoted as (a)i,til,tir) Called meta-mode;
(13) And determining constant epsilon and delta values suitable for the current data according to the form coincidence degree of the obtained fitting straight line and the original curve by adjusting parameters.
3. The method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 1, characterized in that: in the method for measuring similarity of time series patterns in step (2), similarity distance measurement functions are respectively established for meta-patterns and series patterns, which are as follows:
(21) For the similarity degree between two meta-modes, a weighted distance method is adopted, and the mode sequence S of the time sequence X is obtained by a PLR algorithmx={M1,M2,M3,…,Mn},SxIn which two meta-modes M are arbitrarily selectedi=(ai,Δti),Mj=(aj,Δtj),(i,j=1,2,3,…,n),
Recording as follows: dwei(Mi,Mj)=β·|ai-aj|+(1-β)·|Δti-Δtj|
In the formula, airepresenting the slope of the fitted line, Δ tjRepresenting the time interval of the adjacent j points, Mi representing the meta-mode, Dwei(Mi,Mj) Is a meta-mode MiAnd Mjbeta is more than or equal to 0 and less than or equal to 1;
(22) Combining every two adjacent meta-modes to obtain a sequence mode, wherein the similarity measurement of the sequence mode is to measure the similarity of the sequence mode by using a dynamic time warping distance method on the basis of the similarity of the meta-modes;
Time series X ═ X1,x2,x3,…,xn) And the time series Y ═ Y1,y2,…,ym) The mode sequences obtained by performing dimension reduction and linear representation through the KPRA-PLR algorithm are respectively as follows:
Sx={M1,M2,M3,…,Mp},
Sy={N1,N2,N3,…,Nt},
wherein (i ═ 1,2,3, …, p; (j ═ 1,2,3, …, t), Mi=(ai,Δti) Is Sxmeta-schema of Nj=(aj,Δtj) Is thatSyMeta-schema of (2); the calculation formula of the distance matrix between the two pattern sequences is established as follows:
DM=(aij)p×t
In the formula, aij=Dwei(Mi,Nj) Represented as a pattern sequence Sxthe ith meta-mode MiAnd Syof the jth meta-mode NjThe weighted distance of (2). a isijthe size of the value represents meta-mode Mi,NjDegree of similarity between them, when meta-mode Mi,Njthe more similar therebetween, aijThe closer the value is to 0; when meta mode Mi,Njthe greater the difference between, aijThe larger the value;
(23) Similar to the dynamic time warping distance between time sequences, in the distance matrix DM ═ aij)p×tsearching for the best curved path W ═ W1,w2,…,wkAnd calculating a curved path having a minimum cumulative distance, this minimum being called the pattern sequence SxAnd SyIs abbreviated as Ddtw(Sx,Sy) The following formula is obtained:
in the formula, w is the distance between two meta-patterns.
4. The method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 1, characterized in that: in the local k-neighbor anomaly detection algorithm in the step (3), the anomaly score value is equal to the k-neighbor distance of the pattern, and the specific process is as follows:
Assuming k is a positive integer, the k-nearest neighbor reachable distance between instance x and instance t is represented by the following equation:
rdist(x,t)=max{kdistance(t),d(x,t)}
The local reachable density of example x is defined as the inverse of the average reachable distance of the k nearest neighbors of example x, and is formulated as follows:
In the data point set D, the LOF (x) of the local anomaly coefficient of the instance x can be defined as:
wherein an anomaly score, r, for each sequence pattern is calculated according to a DTW algorithmdist(x,t)for each sequence pattern's anomaly score, instance x represents the anomaly score for a sequence pattern, the local reachable density is calculated according to the formula, and lof represents the anomaly factor.
5. the method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 3, characterized in that: for the singular point problem existing in the DTW method, based on the improved method of the space dimension and the time dimension, the extra weight coefficient is not introduced to carry out weighted summation on the space dimension and the time dimension in a self-adaptive weight mode, and then the minimum distance of the accumulation path is calculated.
6. the method for detecting abnormal patterns of hydrologic data based on similarity measurement according to claim 5, characterized in that: the singular point problem in the DTW method comprises data processing based on an SDTW algorithm as a time series similarity measurement method, and the specific process is as follows:
Defining feature Fs(x (i)) is:
Wherein max (| Δ x |) represents the maximum gradient at all time points in the time sequence, and the gradient at a certain point is represented by the difference between adjacent points, such as x (i) -x (i-1); max (Deltax) has the function of constraining the gradient size of each point in the time sequence to [ -1,1], so that the time dimension and the space dimension are combined in a ratio mode to obtain an expression of the distance d (i, j) between elements in the matrix, wherein the expression is as follows:
d(i,j)=(Fs(x(i))-Fs(y(j)))2
CN201910784182.9A 2019-08-23 2019-08-23 Hydrological data abnormal mode detection method based on similarity measurement Withdrawn CN110569890A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910784182.9A CN110569890A (en) 2019-08-23 2019-08-23 Hydrological data abnormal mode detection method based on similarity measurement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910784182.9A CN110569890A (en) 2019-08-23 2019-08-23 Hydrological data abnormal mode detection method based on similarity measurement

Publications (1)

Publication Number Publication Date
CN110569890A true CN110569890A (en) 2019-12-13

Family

ID=68775898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910784182.9A Withdrawn CN110569890A (en) 2019-08-23 2019-08-23 Hydrological data abnormal mode detection method based on similarity measurement

Country Status (1)

Country Link
CN (1) CN110569890A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183721A (en) * 2020-09-16 2021-01-05 河海大学 Construction method of combined hydrological prediction model based on self-adaptive differential evolution
CN112597539A (en) * 2020-12-28 2021-04-02 上海观安信息技术股份有限公司 Unsupervised learning-based time series anomaly detection method and system
CN112966017A (en) * 2021-03-01 2021-06-15 北京青萌数海科技有限公司 Abnormal subsequence detection method with indefinite length in time sequence
CN113190406A (en) * 2021-04-30 2021-07-30 上海爱数信息技术股份有限公司 IT entity group anomaly detection method under cloud native observability
CN115729981A (en) * 2022-11-29 2023-03-03 中国长江电力股份有限公司 Similar water regime data mining method based on editing distance and application thereof
CN116429220A (en) * 2023-06-14 2023-07-14 济宁安泰矿山设备制造有限公司 Hydraulic engineering anomaly detection method
CN116703485A (en) * 2023-08-04 2023-09-05 山东创亿智慧信息科技发展有限责任公司 Advertisement accurate marketing method and system based on big data
CN116957634A (en) * 2023-09-19 2023-10-27 贵昌集团有限公司 Information intelligent acquisition processing method for electronic commerce platform

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183721B (en) * 2020-09-16 2022-04-26 河海大学 Construction method of combined hydrological prediction model based on self-adaptive differential evolution
CN112183721A (en) * 2020-09-16 2021-01-05 河海大学 Construction method of combined hydrological prediction model based on self-adaptive differential evolution
CN112597539A (en) * 2020-12-28 2021-04-02 上海观安信息技术股份有限公司 Unsupervised learning-based time series anomaly detection method and system
CN112966017B (en) * 2021-03-01 2023-11-14 北京青萌数海科技有限公司 Abnormal subsequence detection method for indefinite length in time sequence
CN112966017A (en) * 2021-03-01 2021-06-15 北京青萌数海科技有限公司 Abnormal subsequence detection method with indefinite length in time sequence
CN113190406A (en) * 2021-04-30 2021-07-30 上海爱数信息技术股份有限公司 IT entity group anomaly detection method under cloud native observability
CN115729981A (en) * 2022-11-29 2023-03-03 中国长江电力股份有限公司 Similar water regime data mining method based on editing distance and application thereof
CN115729981B (en) * 2022-11-29 2024-02-13 中国长江电力股份有限公司 Editing distance-based similar water condition data mining method and application thereof
CN116429220A (en) * 2023-06-14 2023-07-14 济宁安泰矿山设备制造有限公司 Hydraulic engineering anomaly detection method
CN116429220B (en) * 2023-06-14 2023-09-26 济宁安泰矿山设备制造有限公司 Hydraulic engineering anomaly detection method
CN116703485B (en) * 2023-08-04 2023-10-20 山东创亿智慧信息科技发展有限责任公司 Advertisement accurate marketing method and system based on big data
CN116703485A (en) * 2023-08-04 2023-09-05 山东创亿智慧信息科技发展有限责任公司 Advertisement accurate marketing method and system based on big data
CN116957634A (en) * 2023-09-19 2023-10-27 贵昌集团有限公司 Information intelligent acquisition processing method for electronic commerce platform
CN116957634B (en) * 2023-09-19 2023-11-21 贵昌集团有限公司 Information intelligent acquisition processing method for electronic commerce platform

Similar Documents

Publication Publication Date Title
CN110569890A (en) Hydrological data abnormal mode detection method based on similarity measurement
CN108038044B (en) Anomaly detection method for continuous monitored object
CN110018670B (en) Industrial process abnormal working condition prediction method based on dynamic association rule mining
CN111709465B (en) Intelligent identification method for rough difference of dam safety monitoring data
CN112911627B (en) Wireless network performance detection method, device and storage medium
CN108447057B (en) SAR image change detection method based on significance and depth convolution network
CN110851931A (en) Optimal arrangement method for flow monitoring points of urban water supply pipe network
CN109977546B (en) Four-dimensional track online anomaly detection method based on unsupervised learning
CN114792044B (en) Intelligent early warning method and system for subsidence of foundation pit adjacent to ground surface with coupling spatial characteristics
CN105512206A (en) Outlier detection method based on clustering
CN106951680A (en) A kind of Hydrological Time Series abnormal patterns detection method
CN107764458A (en) A kind of aircraft handing characteristics curve generation method
CN118211082B (en) Oil filter element residual life prediction method and system based on data analysis
CN111122162A (en) Industrial system fault detection method based on Euclidean distance multi-scale fuzzy sample entropy
CN115130499A (en) Vibration signal-based online measurement and prediction method for slip ratio of bearing retainer under variable working conditions
CN112380992A (en) Method and device for evaluating and optimizing accuracy of monitoring data in machining process
CN114819260A (en) Dynamic generation method of hydrologic time series prediction model
CN114387332B (en) Pipeline thickness measuring method and device
CN117688498B (en) Ship comprehensive safety state monitoring system based on ship-shore cooperation
CN112711052B (en) GNSS coordinate sequence step detection improvement method and system based on continuous t test
CN117150244B (en) Intelligent power distribution cabinet state monitoring method and system based on electrical parameter analysis
CN118013277A (en) Multi-model combined runoff forecasting method with time-varying weight
CN110780342A (en) Rock slope deformation early warning method
CN113705738A (en) Engineering equipment bearing degradation assessment method
CN113553232B (en) Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20191213