CN114020730A - Method for detecting and repairing abnormal value in water environment monitoring data - Google Patents

Method for detecting and repairing abnormal value in water environment monitoring data Download PDF

Info

Publication number
CN114020730A
CN114020730A CN202111255406.0A CN202111255406A CN114020730A CN 114020730 A CN114020730 A CN 114020730A CN 202111255406 A CN202111255406 A CN 202111255406A CN 114020730 A CN114020730 A CN 114020730A
Authority
CN
China
Prior art keywords
value
water environment
subsequence
data
environment monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111255406.0A
Other languages
Chinese (zh)
Inventor
宋金玲
黄达
黄立明
康燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Normal University of Science and Technology
Original Assignee
Hebei Normal University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Normal University of Science and Technology filed Critical Hebei Normal University of Science and Technology
Priority to CN202111255406.0A priority Critical patent/CN114020730A/en
Publication of CN114020730A publication Critical patent/CN114020730A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The invention relates to the technical field of water environment monitoring, in particular to a method for detecting and repairing abnormal values in water environment monitoring data. The method comprises the following steps: (1) detecting abnormal values by performing short-term cyclic comparison on the water environment monitoring data by using a sliding window; (2) dynamically dividing water environment monitoring data into subsequences; (3) respectively calculating the similarity with other subsequences aiming at the subsequence with the abnormal value and selecting the subsequence with the maximum similarity; (4) and repairing the abnormal value according to the change trend of the data of the corresponding position in the most similar subsequence. The method aims at repairing the abnormal value of the water environment monitoring data without depending on any model, adopts a short-term cyclic comparison method to detect the abnormal value, and repairs the abnormal value based on the subsequence similarity, so that the repaired data has better smoothness, the short-term trend of the data is more obvious, and the data quality is effectively improved.

Description

Method for detecting and repairing abnormal value in water environment monitoring data
Technical Field
The invention relates to the technical field of water environment monitoring, in particular to a method for detecting and repairing abnormal values in water environment monitoring data.
Background
With the continuous deepening of big data application, the data quality problem becomes a key problem influencing the development of big data application. The water environment monitoring data is a time sequence of various water environment indexes (water temperature, pH value, dissolved oxygen, conductivity, turbidity, ammonia nitrogen, permanganate index, total phosphorus, total nitrogen, chlorophyll, blue-green algae and the like) detected by using a sensor, and are typical big data. Due to the influences of factors such as faults, data transmission and reading errors of the sensor, abnormal values appear in the water environment monitoring data, the existence of the abnormal values reduces the data quality of the monitoring data, and the timely detection and repair of the abnormal values have important significance for later knowledge acquisition, water environment index prediction modeling and other applications.
In order to improve the usability of data, researchers carry out abnormal value restoration on big data by adopting a fitting method and a prediction method, but the model error of the water environment monitoring data after fitting according to a long-term trend is large due to large external influences (such as pollution, weather reasons and the like), and the abnormal value in the water environment monitoring generally comprises a missing value, so that the prediction method is invalid. The early stage steps of abnormal value restoration are abnormal value detection, the traditional abnormal value detection technology is mainly based on classification, distance, clustering, statistics, information theory and the like, but a more targeted abnormal value detection method needs to be provided for a specific application scene.
Disclosure of Invention
In order to solve the problems, the invention provides a method for detecting and repairing an abnormal value in water environment monitoring data, which aims at repairing the abnormal value of the water environment monitoring data without depending on any model, adopts a short-term cyclic comparison method to detect the abnormal value, and repairs the abnormal value based on subsequence similarity, so that the repaired data has better smoothness, the short-term trend of the data is more obvious, and the data quality is effectively improved.
The invention is realized by adopting the following technical scheme:
a method for detecting and repairing abnormal values in water environment monitoring data comprises the following steps:
(1) detecting abnormal values by performing short-term cyclic comparison on the water environment monitoring data by using a sliding window;
(2) dynamically dividing water environment monitoring data into subsequences;
(3) respectively calculating the similarity with other subsequences aiming at the subsequence with the abnormal value and selecting the subsequence with the maximum similarity;
(4) and repairing the abnormal value according to the change trend of the data of the corresponding position in the most similar subsequence.
In one embodiment, the method for detecting an abnormal value in step (1) specifically includes: for each monitored value V on the time series ViCalculating difference values one by one with the size of the k-nearest neighbor window, recording the number of points countnum of which the difference values exceed the threshold epsilon, and checking v according to whether the countnum value is greater than tau or not after all comparisons are finishediWhether or not it is an abnormal value, if viIf the abnormal value is the abnormal value, recording the position of the abnormal value; wherein V is water environment monitoring data V ═ (V)1,v2,…,vn) Center point v ofi(1. ltoreq. i. ltoreq. n) is tiThe monitoring value corresponding to the moment is obtained,
Figure BDA0003323972920000021
representing point viK-nearest neighbor window if at
Figure BDA0003323972920000022
In existence of
Figure BDA0003323972920000023
Or
Figure BDA0003323972920000024
Then v is judgediIs an outlier, where ε is the difference threshold and τ is the quantity threshold.
In one embodiment, the dynamic segmentation of the water environment monitoring data in the step (2) into subsequences specifically includes: based on maximum distance threshold
Figure BDA0003323972920000025
The sequence is dynamically divided, firstly a flag variable flag is set, the flag value 1 represents that the subsequence is in the ascending state, the value-1 represents that the subsequence is in the descending stateA state, value of 0, represents a steady state; assigning an initial value to a flag according to the initial trend of the data sequence; then for each monitored data viThe difference e from the previous data is calculated as followsi=vi-vi-1And calculating the maximum distance of the subsequence to the present time, when the maximum distance does not exceed the threshold value
Figure BDA0003323972920000026
When e is present, if eiSetting the flag to 0 when the sign of the flag is opposite, and setting the maximum distance to be greater than the threshold value
Figure BDA0003323972920000027
Then according to flag and eiJudging the overall trend and the variation condition of the subsequence curve, thereby dividing and storing the subsequences, and finally carrying out initialization operation for dividing the next subsequence.
In one embodiment, the shape distance calculation formula for calculating the similarity with other subsequences in step (3) is shown as formula (1):
Figure BDA0003323972920000028
wherein, V1,V2For two time sequences, M1x,M2xAre respectively a time series V1,V2The value range of the mode of the x point is { -3, -2, -1,0,1,2,3}, which respectively represents six modes of { accelerated descent, horizontal descent, decelerated descent, invariant, decelerated ascent, horizontal ascent, accelerated ascent }; a. the1x,A2xAre respectively a time series V1,V2Amplitude change amount of the x-th point (i.e., A)x=vx+1-vx,);
As can be seen from the equation (1), D (V) is used because the similarity between two time series increases as the shape distance value decreasesi,Vj) The reciprocal of (a) is taken as the similarity of the subsequence, and is shown as a formula (2); for the case of inconsistent sub-sequence sizes, the short sequences can be rolledRespectively carrying out similarity calculation with each equal-length segment in the long sequence, and taking the obtained maximum similarity as the final similarity of the two subsequences;
sim(Vi,Vj)=1/D(Vi,Vj) (2)。
in one embodiment, the method for repairing the abnormal value in step (4) includes: according to the most similar subsequence VjThe variation trend of the corresponding value in the process, for abnormal value viTo make a repair, i.e. vi=vi-1+(vj-vj-1)。
Compared with the prior art, the invention has the beneficial effects that: aiming at the problem of repairing abnormal values in water environment monitoring data, the abnormal values are detected by performing short-term loop comparison on the water environment monitoring data by using a sliding window, then a plurality of subsequences are obtained by dynamically segmenting the water environment monitoring data, and the abnormal values are repaired on the basis of comparing the similarity of the subsequences; the experimental result shows that the method is superior to the conventional prediction restoration method and fitting restoration method, the restored data is smoother, the short-term trend is more obvious, and the data quality of the monitored data can be effectively improved.
Drawings
FIG. 1 is a graph corresponding to a pH data set;
FIG. 2 is a graph corresponding to a dissolved oxygen data set;
FIG. 3 is a diagram showing the result of the PH anomaly detection method;
FIG. 4 is a result of execution of a dissolved oxygen abnormal value detection method;
FIG. 5 is a graph of pH data set versus outlier repair results;
FIG. 6 is the repair results of the dissolved oxygen data set for outliers;
FIG. 7 is a comparison of the repair results of various methods of the PH value dataset;
FIG. 8 is a comparison of the results of various methods of dissolved oxygen data set repair.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
As the monitoring indexes of the water environment mainly comprise water temperature, pH value, dissolved oxygen, conductivity, turbidity, ammonia nitrogen, permanganate index, total phosphorus, total nitrogen, chlorophyll, blue-green algae and the like, the water environment is dynamically changed, the monitoring values of the indexes can be continuously changed along with the development of time, and the sensors return the monitoring values of the corresponding indexes according to fixed time intervals in the online monitoring process, so that discrete time sequence data of each index can be obtained. According to different monitored water environment indexes, time sequences of all indexes can be obtained respectively.
Definition 1 water environment monitoring data is a time sequence of a certain water environment index monitoring value and is recorded as V ═ (V)1,v2,…,vn) Wherein v isi(1. ltoreq. i. ltoreq.n) represents tiAnd monitoring values corresponding to the moments.
The water environment is sensitive to external influence, the monitoring values of all indexes fluctuate up and down under normal conditions, but adjacent monitoring values are close to each other under the condition of small time interval, and even if the monitoring values at a certain moment greatly increase or decrease relative to the previous values due to the external influence of the water environment, the subsequent monitoring values are close to each other, so that abnormal values cannot be judged according to the rising or decreasing amplitude of the monitoring values. The outliers are the bur points and are characterized by significant deviation from other nearby monitored values, i.e., the outliers are much larger (or smaller) than other nearby values over a certain period of time. According to the analysis, the abnormal value in the water environment monitoring data is defined as: within a k-neighbor time window of a monitored value, if the number of differences between the value and other monitored values that is greater than a threshold exceeds a certain number (or the number of differences that are less than a negative threshold exceeds a certain number), then the value is an outlier.
Definition 2 of abnormal values in water environment monitoring data given water environment monitoring data V ═ (V ═ V1,v2,…,vn) Center point v ofi(1. ltoreq. i. ltoreq.n) represents tiAnd monitoring values corresponding to the moments.
Figure BDA0003323972920000041
Figure BDA0003323972920000042
Representing point viIf in the k-nearest neighbor window of (1)
Figure BDA0003323972920000043
In existence of
Figure BDA0003323972920000044
Or
Figure BDA0003323972920000045
Then v is judgediIs an outlier, where ε is the difference threshold and τ is the quantity threshold.
In the definition, the k-nearest neighbor adopts a bidirectional time window, so that the monitoring value under the condition that the water environment index is greatly influenced and changed by external factors can be prevented from being judged as an abnormal value by mistake, because the monitoring value has a larger difference with the monitoring value of the front driving time window, but is closer to the monitoring value of the rear driving time window. To determine a certain monitoring value viWhether the window size k is an abnormal value or not, and the setting of the size k, the difference threshold epsilon and the number threshold tau of the nearest neighbor window are important criteria for judgment. Therefore, different values of k, epsilon and tau can be selected according to different water environment monitoring indexes, and the purpose of detecting abnormal values of various water environment monitoring indexes is met.
The abnormal value restoration of the water environment monitoring data is essentially to find a value closest to the true value of the position of the abnormal value by adopting a certain method, and replace the abnormal value in the original sequence with the value, thereby obtaining a restored data sequence.
Definition 3 WaterRepairing sequence of environmental monitoring data gives water environment monitoring data V ═ (V)1,v2,…,vn) Center point v ofi(1. ltoreq. i. ltoreq. n) is tiAnd monitoring values corresponding to the moments. If v isiIs an abnormal value, v is corrected according to a certain methodiThe repaired value is vi', when the sequence V is (V)1,v2,…,vi′,…,vn) A repair sequence for water environment monitoring data, wherein vi' should be as close to v as possibleiThe true value of (d).
According to the idea of solving the problem of repairing the abnormal value of the water environment monitoring data, the invention provides a method for detecting and repairing the abnormal value in the water environment monitoring data, which comprises the following steps:
(1) detecting abnormal values by performing short-term cyclic comparison on the water environment monitoring data by using a sliding window;
according to the definition of abnormal values of the water environment monitoring data given in the foregoing, under the condition that the k, epsilon and tau values are given, each monitoring value V on the time series V can be defined by means of a sliding windowiCalculating difference values one by one with the size of the k-nearest neighbor window, recording the number of points countnum of which the difference values exceed the threshold epsilon, and checking v according to whether the countnum value is greater than tau or not after all comparisons are finishediWhether or not it is an abnormal value, if viIf the abnormal value is found, the position of the abnormal value is recorded.
(2) Dynamically dividing water environment monitoring data into subsequences;
according to the purpose of abnormal value repair, how to make the repair value closer to the true value of the monitoring index at the moment is a key problem. By observing the characteristics of the water environment monitoring data, the data change curves of the subsequences in different time periods are found to have high similarity, so if a subsequence with the highest similarity to the subsequence in which the abnormal value is located is found, the abnormal value can be repaired according to the change trend of the corresponding value in the most similar subsequence, and the repaired value is close to the real value as much as possible. Through the analysis, the restoration of the abnormal value of the water environment monitoring data needs the following three steps: firstly, dividing water environment monitoring data into subsequences; then, respectively calculating the similarity with other subsequences aiming at the subsequence where the abnormal value is located and selecting the subsequence with the maximum similarity; and finally, repairing the abnormal value according to the change trend of the data of the corresponding position in the most similar subsequence.
In water environment monitoring data, data segments in a short time range have obvious regularity and are generally in a rising, falling or relatively stable state, but the scale of the data segments is not fixed, so that a fixed-size dividing method cannot be adopted when a data sequence is divided, and dynamic division needs to be performed according to the change characteristics of data. The invention is based on a maximum distance threshold
Figure BDA0003323972920000051
Dynamically partitioning the sequence by considering the maximum distance of the data segments to be less than a threshold
Figure BDA0003323972920000052
When the data is in a stable state, the continuously rising (falling) data segment is not influenced by the threshold value
Figure BDA0003323972920000053
Limit (maximum distance exceeds threshold)
Figure BDA0003323972920000054
And also not divided), so that the change rule of each subsequence curve is ascending, descending or relatively stable. The dynamic division method not only can reduce the number of subsequences, but also can make the variation trend of each subsequence more obvious.
To indicate the trend of the sub-sequence, a flag variable flag may be set, where a flag value of 1 indicates that the sub-sequence is in a rising state, a value of-1 indicates that the sub-sequence is in a falling state, and a value of 0 indicates that the sub-sequence is in a steady state. Using difference values (v) of adjacent datai-vi-1) And the current flag value, namely whether the trend of the data sequence changes or not can be known, and the maximum distance threshold value is combined
Figure BDA0003323972920000061
Dynamic partitioning of subsequences can be achieved. Therefore, the dynamic partitioning steps of the water environment monitoring data are as follows: firstly, according to the initial trend of a data sequence, assigning an initial value to a flag; then for each monitored data viThe difference e from the previous data is calculated as followsi=vi-vi-1And calculating the maximum distance of the subsequence to the present time, when the maximum distance does not exceed the threshold value
Figure BDA0003323972920000062
When e is present, if eiSetting the flag to 0 when the sign of the flag is opposite, and setting the maximum distance to be greater than the threshold value
Figure BDA0003323972920000063
Then according to flag and eiJudging the overall trend and the variation condition of the subsequence curve, thereby dividing and storing the subsequences, and finally carrying out initialization operation for dividing the next subsequence. In order to avoid the influence of the abnormal value, the abnormal value is ignored when the subsequence is divided, so that the abnormal value is contained in a certain subsequence.
(3) Respectively calculating the similarity with other subsequences aiming at the subsequence with the abnormal value and selecting the subsequence with the maximum similarity;
in the step (2), the water environment monitoring data can be divided into a plurality of subsequences, and the divided subsequences can be compared in similarity, so that a foundation is laid for repairing an abnormal value. Because the similarity of the subsequences is curve similarity, that is, the subsequences are required to have not only shape similarity (similar change trend) but also closest distance, the similarity of the subsequences cannot be measured simply by adopting Euclidean distance, and the Euclidean distance cannot distinguish the shape similarity and cannot reflect the similarity of trend dynamic change amplitude. The invention adopts a shape distance algorithm to measure the similarity of subsequences, the principle of the shape distance algorithm is to adopt the product of mode distance and amplitude variation distance of each point as the distance between two subsequences, and the shape distance calculation formula is shown as formula (1). Wherein, V1,V2For two time sequences, M1x,M2xAre respectively whenM sequence V1,V2The value range of the mode of the x point is { -3, -2, -1,0,1,2,3}, which respectively represents six modes of { accelerated descent, horizontal descent, decelerated descent, invariant, decelerated ascent, horizontal ascent, accelerated ascent }; a. the1x,A2xAre respectively a time series V1,V2Amplitude change amount of the x-th point (i.e., A)x=vx+1-vx,)。
Figure BDA0003323972920000064
From the equation (1), the smaller the shape distance value is, the greater the similarity between the two time series is, and hence we take D (V)i,Vj) The reciprocal of (a) is taken as the similarity of the subsequences, and is shown in formula (2). For the case of inconsistent subsequence sizes, similarity calculation can be performed on the short sequence and each equal-length segment in the long sequence in a rolling manner, and the obtained maximum similarity is used as the final similarity of the two subsequences.
sim(Vi,Vj)=1/D(Vi,Vj) (2)
(4) And repairing the abnormal value according to the change trend of the data of the corresponding position in the most similar subsequence.
The abnormal value restoration of the water environment monitoring data is carried out according to the subsequence with the maximum similarity, so that the abnormal value v is restorediWhen repairing, v needs to be found firstiThe subsequence V ofiThen, according to the subsequence similarity calculation method in the step (3), the similarity between the subsequence V and other subsequences is calculated respectively, and the subsequence V with the maximum similarity is selectedj. Because only the subsequences with the same overall tendency have higher similarity, and the tendency of the subsequences is stored when the subsequences are divided in the step (2), only the sum V can be calculated when the similarity is calculatediA subsequence of identical trend. Finally according to subsequence VjThe variation trend of the corresponding value in the process, for abnormal value viTo make a repair, i.e. vi=vi-1+(vj-vj-1)。
Example 1
In order to verify the effectiveness of the detection and repair method for the abnormal values of the water environment monitoring data, provided by the invention, the water environment monitoring data of a certain water station is selected for experiment, and the repair result of the abnormal values is analyzed and discussed.
1. Data preparation
Two different water quality factors of pH value and dissolved oxygen (unit is mg/l) which are monitored on line are selected for algorithm verification, online monitoring data of 2018 and 2019 of the water station are adopted, the time interval of the data is 1 day, and a curve corresponding to two groups of data sets is shown in fig. 1 and fig. 2. As can be seen from fig. 1 and 2, the time series data set curves of PH and dissolved oxygen have a smooth general trend, but there are some clearly suspicious abnormal points (missing values, outliers, etc.), which may have a serious impact on the later data analysis if they cannot be repaired in time.
2. Evaluation of results
Firstly, the abnormal value detection algorithm is evaluated according to two groups of data sets, and due to the fact that the PH value and the dissolved oxygen have different data characteristics, the size k of a nearest neighbor window, a difference threshold epsilon and a quantity threshold tau which are set when the abnormal value detection algorithm is executed are also different.
After many experimental contrastive analyses, the final k, epsilon and tau values of the abnormal value detection algorithm are set as follows: k is 3, e is 0.5, τ is 4, k is 3, e is 0.9, τ is 3 in the PH data set. Under the parameter setting, 11 abnormal values are detected in the PH value data set, 21 abnormal values are detected in the dissolved oxygen data set, and FP (FalsePositive, which is normally detected as abnormal) and FN (FalseNegicive, which is abnormal) data do not exist in the abnormal value detection process. Fig. 3 and 4 show the execution result of the abnormal value detection algorithm, and the data points represented by the dots, namely the abnormal points, are obviously deviated from the neighbor nodes. The detection result shows that the outlier detection algorithm provided by the invention can successfully detect the outlier which is obviously deviated from the neighbor node (i.e. is distinctive) from the water environment monitoring data set, and simultaneously, the data fluctuation caused by the water environment change is reserved to the greatest extent, so that the later-stage data analysis is facilitated.
Then, the repair algorithm of the abnormal value is evaluated for the two sets of data respectively. In the sub-sequence division stage, the PH value data set is set with the maximum distance threshold value of
Figure BDA0003323972920000081
The maximum distance threshold value set by the dissolved oxygen data set is divided into 101 subsequences
Figure BDA0003323972920000082
The sub-sequences are divided into 104 sub-sequences, and the sizes of the divided sub-sequences are moderate, which shows that the given sub-sequence division method is effective. In the stage of repairing the abnormal value by using the similarity of the subsequences, the subsequences where the abnormal value is located can all find the subsequences with the maximum similarity, the effect of repairing the abnormal value is shown in fig. 5 and 6, and it can be seen from fig. 5 and 6 that the curve after repairing becomes smoother without obviously deviating from the nodes of the neighbor nodes, which indicates that the method for repairing data based on the similarity of the subsequences is feasible, and the method for repairing the abnormal value has better effectiveness.
3. Comparative experiment of abnormal value repair method
To further validate the effectiveness of the outlier restoration method herein, the method of the present invention was compared to the prediction-based and fitting-based outlier restoration methods on the experimental data set, and the restoration results of the three methods are shown in fig. 7 and 8. As can be seen from fig. 7 and 8: a certain deviation exists between a repair data set and an original data set obtained based on a prediction method, particularly, a prediction result of an original deficiency value is far deviated from an original curve, which shows that the prediction method is obviously influenced by the deficiency value; the repair data obtained based on the fitting method has a plurality of bulges relative to the original curve, because the fitting polynomial reflects the long-term trend of the original data only, and the difference between the specific fitting value and the original data is large; the repair data set obtained by the method of the invention is overlapped with most of the original data set, the area of the abnormal value becomes smoother, and the short-term trend of the data is more obvious. A contrast experiment shows that the abnormal value restoration method based on prediction and fitting can not be suitable for the water environment monitoring data, and the restoration method can effectively improve the data quality of the water environment monitoring data on the basis of keeping the original characteristics of the water environment monitoring data.
4. Conclusion
And aiming at abnormal values existing in the water environment monitoring data, detecting the abnormal values by using a short-term cyclic ratio method, and repairing the abnormal values according to the subsequence similarity, wherein an experimental result shows that the proposed abnormal value repairing method is superior to the conventional prediction repairing method and fitting repairing method.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (5)

1. A method for detecting and repairing abnormal values in water environment monitoring data is characterized by comprising the following steps:
(1) detecting abnormal values by performing short-term cyclic comparison on the water environment monitoring data by using a sliding window;
(2) dynamically dividing water environment monitoring data into subsequences;
(3) respectively calculating the similarity with other subsequences aiming at the subsequence with the abnormal value and selecting the subsequence with the maximum similarity;
(4) and repairing the abnormal value according to the change trend of the data of the corresponding position in the most similar subsequence.
2. The method for detecting and repairing abnormal values in water environment monitoring data according to claim 1, wherein the method for detecting abnormal values in the step (1) is specifically as follows: for each monitored value V on the time series ViCalculating difference values one by one with the size of the k-nearest neighbor window, recording the number of points countnum of which the difference values exceed a threshold epsilon, and checking v according to whether the countnum value is greater than tau or not after all comparisons are finishediWhether or not it is an abnormal value, if viIf the abnormal value is the abnormal value, recording the position of the abnormal value; wherein V is water environment monitoring data V ═ (V)1,v2,…,vn) Center point v ofi(1. ltoreq. i. ltoreq. n) is tiThe monitoring value corresponding to the moment is obtained,
Figure FDA0003323972910000011
Figure FDA0003323972910000012
representing point viK-nearest neighbor window if at
Figure FDA0003323972910000013
In existence of
Figure FDA0003323972910000014
Or
Figure FDA0003323972910000015
Then v is judgediIs an outlier, where ε is the difference threshold and τ is the quantity threshold.
3. The method for detecting and repairing abnormal values in water environment monitoring data according to claim 1, wherein the step (2) of dynamically dividing the water environment monitoring data into subsequences specifically comprises: based on maximum distance threshold
Figure FDA0003323972910000016
Dynamic scoring of sequencesFirstly, setting a flag variable flag, wherein the flag value is 1 to represent that the subsequence is in an ascending state, the value is-1 to represent that the subsequence is in a descending state, and the value is 0 to represent that the subsequence is in a stable state; assigning an initial value to a flag according to the initial trend of the data sequence; then for each monitored data viThe difference e from the previous data is calculated as followsi=vi-vi-1And calculating the maximum distance of the subsequence to the present time, when the maximum distance does not exceed the threshold value
Figure FDA0003323972910000017
When e is present, if eiSetting the flag to 0 when the sign of the flag is opposite, and setting the maximum distance to be greater than the threshold value
Figure FDA0003323972910000018
Then according to flag and eiJudging the overall trend and the variation condition of the subsequence curve, thereby dividing and storing the subsequences, and finally carrying out initialization operation for dividing the next subsequence.
4. The method for detecting and repairing abnormal values in water environment monitoring data according to claim 1, wherein the shape distance calculation formula for calculating the similarity with other subsequences in the step (3) is shown as formula (1):
Figure FDA0003323972910000021
wherein, V1,V2For two time sequences, M1x,M2xAre respectively a time series V1,V2The value range of the mode of the x point is { -3, -2, -1,0,1,2,3}, which respectively represents six modes of { accelerated descent, horizontal descent, decelerated descent, invariant, decelerated ascent, horizontal ascent, accelerated ascent }; a. the1x,A2xAre respectively a time series V1,V2Amplitude change amount of the x-th point (i.e., A)x=vx+1-vx,);
As can be seen from the equation (1), D (V) is used because the similarity between two time series increases as the shape distance value decreasesi,Vj) The reciprocal of (a) is taken as the similarity of the subsequence, and is shown as a formula (2); for the case of inconsistent subsequence sizes, similarity calculation can be performed on the short sequences and each equal-length segment in the long sequences in a rolling mode, and the obtained maximum similarity is used as the final similarity of the two subsequences;
sim(Vi,Vj)=1/D(Vi,Vj) (2)。
5. the method for detecting and repairing abnormal values in the monitored data of water environment according to claim 1, wherein the method for repairing abnormal values in the step (4) comprises: according to the most similar subsequence VjThe variation trend of the corresponding value in the process, for abnormal value viTo make a repair, i.e. vi=vi-1+(vj-vj-1)。
CN202111255406.0A 2021-10-27 2021-10-27 Method for detecting and repairing abnormal value in water environment monitoring data Pending CN114020730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111255406.0A CN114020730A (en) 2021-10-27 2021-10-27 Method for detecting and repairing abnormal value in water environment monitoring data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111255406.0A CN114020730A (en) 2021-10-27 2021-10-27 Method for detecting and repairing abnormal value in water environment monitoring data

Publications (1)

Publication Number Publication Date
CN114020730A true CN114020730A (en) 2022-02-08

Family

ID=80058170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111255406.0A Pending CN114020730A (en) 2021-10-27 2021-10-27 Method for detecting and repairing abnormal value in water environment monitoring data

Country Status (1)

Country Link
CN (1) CN114020730A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154122A (en) * 2022-02-09 2022-03-08 杭州开闳流体科技有限公司 Flow measurement data quality evaluation method, device and application
CN114491383A (en) * 2022-04-15 2022-05-13 江西飞尚科技有限公司 Abnormal data processing method and system for bridge monitoring
CN114707570A (en) * 2022-02-22 2022-07-05 南通大学 Method for rapidly detecting abnormal value of time sequence
CN116990465A (en) * 2023-09-25 2023-11-03 北京金水永利科技有限公司 Air quality data abnormity early warning method and system thereof
CN117172598A (en) * 2023-09-05 2023-12-05 中国长江电力股份有限公司 Basin water ecology fish monitoring management system based on cloud computing
CN117829381A (en) * 2024-03-05 2024-04-05 成都农业科技职业学院 Agricultural greenhouse data optimization acquisition system based on Internet of things

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154122A (en) * 2022-02-09 2022-03-08 杭州开闳流体科技有限公司 Flow measurement data quality evaluation method, device and application
CN114707570A (en) * 2022-02-22 2022-07-05 南通大学 Method for rapidly detecting abnormal value of time sequence
CN114707570B (en) * 2022-02-22 2024-05-24 南通大学 Method for rapidly detecting abnormal value of time sequence
CN114491383A (en) * 2022-04-15 2022-05-13 江西飞尚科技有限公司 Abnormal data processing method and system for bridge monitoring
CN114491383B (en) * 2022-04-15 2022-09-16 江西飞尚科技有限公司 Abnormal data processing method and system for bridge monitoring
CN117172598A (en) * 2023-09-05 2023-12-05 中国长江电力股份有限公司 Basin water ecology fish monitoring management system based on cloud computing
CN117172598B (en) * 2023-09-05 2024-05-28 中国长江电力股份有限公司 Basin water ecology fish monitoring management system based on cloud computing
CN116990465A (en) * 2023-09-25 2023-11-03 北京金水永利科技有限公司 Air quality data abnormity early warning method and system thereof
CN116990465B (en) * 2023-09-25 2023-12-19 北京金水永利科技有限公司 Air quality data abnormity early warning method and system thereof
CN117829381A (en) * 2024-03-05 2024-04-05 成都农业科技职业学院 Agricultural greenhouse data optimization acquisition system based on Internet of things
CN117829381B (en) * 2024-03-05 2024-05-14 成都农业科技职业学院 Agricultural greenhouse data optimization acquisition system based on Internet of things

Similar Documents

Publication Publication Date Title
CN114020730A (en) Method for detecting and repairing abnormal value in water environment monitoring data
CN107092582B (en) Online abnormal value detection and confidence evaluation method based on residual posterior
CN103473540B (en) The modeling of intelligent transportation system track of vehicle increment type and online method for detecting abnormality
CN108667684B (en) Data flow anomaly detection method based on local vector dot product density
CN107682319A (en) A kind of method of data flow anomaly detection and multiple-authentication based on enhanced angle Outlier factor
CN109325060B (en) Time series stream data fast searching method based on data characteristics
CN112987675A (en) Method, device, computer equipment and medium for anomaly detection
CN107332691B (en) Method for detecting fault node of wireless sensor network
Lee et al. Studies on the GAN-based anomaly detection methods for the time series data
CN106951680A (en) A kind of Hydrological Time Series abnormal patterns detection method
CN114386324A (en) Ultra-short-term wind power segmented prediction method based on turning period identification
CN110633643A (en) Abnormal behavior detection method and system for smart community
CN106411921A (en) Multi-step attack prediction method based on cause-and-effect Byesian network
CN103778647A (en) Multi-target tracking method based on layered hypergraph optimization
KR102169452B1 (en) METHOD FOR ENSURING STABILITY OF DATA COLLECTED IN IoT WEATHER ENVIRONMENT
CN107978147B (en) KNN algorithm-based traffic flow abnormal data bidirectional detection and restoration method
TWI715230B (en) Missing data compensation method, missing data compensation system, and non-transitory computer-readable medium
CN112766429B (en) Method, device, computer equipment and medium for anomaly detection
CN110569855A (en) Long-time target tracking algorithm based on correlation filtering and feature point matching fusion
CN105678409A (en) Adaptive and distribution-free time series abnormal point detection method
CN111611961A (en) Harmonic anomaly identification method based on variable point segmentation and sequence clustering
CN112131575A (en) Concept drift detection method based on classification error rate and consistency prediction
CN117312769A (en) BiLSTM-based method for detecting abnormality of time sequence data of Internet of things
CN107562778B (en) Outlier mining method based on deviation features
CN111612531B (en) Click fraud detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination