CN117216703A - Water delivery pipe network operation data anomaly detection and correction method - Google Patents

Water delivery pipe network operation data anomaly detection and correction method Download PDF

Info

Publication number
CN117216703A
CN117216703A CN202311215125.1A CN202311215125A CN117216703A CN 117216703 A CN117216703 A CN 117216703A CN 202311215125 A CN202311215125 A CN 202311215125A CN 117216703 A CN117216703 A CN 117216703A
Authority
CN
China
Prior art keywords
abnormal
sample
data
value
detecting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311215125.1A
Other languages
Chinese (zh)
Inventor
李江
金波
李守俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Pute Fluid Control Co ltd
Original Assignee
Xi'an Pute Fluid Control Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Pute Fluid Control Co ltd filed Critical Xi'an Pute Fluid Control Co ltd
Priority to CN202311215125.1A priority Critical patent/CN117216703A/en
Publication of CN117216703A publication Critical patent/CN117216703A/en
Pending legal-status Critical Current

Links

Landscapes

  • Testing Of Devices, Machine Parts, Or Other Structures Thereof (AREA)

Abstract

The application relates to a method for detecting and correcting abnormal data of a water delivery pipe network, which comprises the steps of firstly, establishing a local model to detect sample data and completing sample labeling, secondly, establishing an LOF abnormal data detection model, thirdly, detecting test sample data and detecting abnormal parameters of an abnormal sample, fourthly, detecting actual abnormal data, and fifthly, correcting the abnormal parameters by using a neighborhood mean value method. According to the application, a proper amount of sample data is obtained through the local model test of the water delivery pipe network, an LOF abnormal data detection model is built through training, abnormal samples in monitoring data are identified, specific abnormal parameters are determined through Z scores, the abnormal parameters are corrected by using a k-neighborhood mean method, and the method is simple in modeling and rapid and efficient in calculation aiming at the actually measured high-dimensional and multi-parameter data, and provides guarantees for accurately analyzing the pipe network running state, simulating in real time, monitoring in state and diagnosing faults.

Description

Water delivery pipe network operation data anomaly detection and correction method
Technical Field
The application relates to the technical field of water delivery pipe network on-line monitoring, in particular to a method for detecting and correcting abnormal data of a water delivery pipe network.
Background
The quality of the operation data is important for real-time simulation, state monitoring and fault diagnosis. If the actual measurement data itself is wrong, the deviation overrun will cause erroneous judgment and even erroneous operation. Therefore, it is significant to detect and correct the abnormality of the measured data. In the research of the monitoring data of the existing water supply network, document [1] proposes a water supply network data abnormal value detection method based on interactive identification, on-line monitoring data are segmented at intervals of 15min, the number of optimal monitoring points is determined through analysis of spatial topological relation among the monitoring points, and the data of the selected monitoring points are constructed into a support vector regression (Support Vector Regression, SVR) model to realize interactive identification among the data of the selected monitoring points. The literature [2] uses self-identification to control the quality of the water supply network monitoring data, divides the data of a certain water supply network monitoring station in time period and season, builds up to hundreds (Autoregressive Moving Average, ARMA) models in total, and realizes the self-identification of the data of the independent nodes. The literature [3] uses an isolated Forest method (Isolation Forest) and a K-means clustering algorithm to identify one-dimensional abnormal data of 'hour water supply', and the result shows that the isolated Forest method is higher than the K-means clustering algorithm in recall rate, precision rate and F1 number, but the isolated Forest method is not applicable to high-dimensional data. Document [4] uses a local anomaly factor algorithm (Local Outlier Factor, LOF) and a K-means algorithm to perform anomaly detection on one-dimensional pressure monitoring data, and finds that the K-means algorithm has a better anomaly detection effect on processed sample data. In the above research, the detection capability of abnormal data is guaranteed by dividing time periods and blocks to construct hundreds of local data models by only using the time sequence characteristic of a single parameter or using the time-space correlation between a small number of local measuring point parameters, which has a certain limitation and is difficult to be practically applied.
In actual system operation, the number of measurement anomalies is small and the conditions are different. From the vector space of the monitoring parameters, the number of normal samples is large, and the number of abnormal samples is small. The document [5] adopts a classification support vector machine (One-Class SVM) to detect network anomalies, the training set is obtained by active learning, normal data in the training set cannot be completely ensured, and if the training set contains anomalies, the accuracy of the method is reduced. In addition, in the detection of water pipe network abnormality, the acquisition of an abnormal sample is a difficult problem, and the literature [1,2,4] adopts manual construction of abnormal data, which is not fully practical.
Therefore, there is a need for a method for detecting and correcting anomalies in water pipe network data, which aims at actually measured high-dimensional and multi-parameter data, has simple modeling and rapid and efficient calculation.
Reference is made to:
[1] liu Shuming Wu Zhi Peng and che Zhi Ling. And detecting abnormal values of the water supply network data based on interactive identification [ J ]. Feedwater drainage 2015 (11): 150-154;
[2] liu Shuming Wu Zhi Peng, che Zhi Ling, etc. The self-identified water supply network is utilized to monitor data quality control [ J ]. University of Qinghua journal (natural science edition), 2017, 57 (9): 999-1003;
[3] zhang Kai, cui Guangliang. The use of anomaly data recognition and repair mechanisms in regional water supply prediction schemes [ J ]. Hydroelectric energy science, 2021, 39 (7): 53-56;
[4] yang Qihang. And (3) identifying and researching the abnormal detection data of the water supply network. Tianjin: university of Tianjin theory, 2022;
[5] liu Jing, gu Lize, niu Xinxin, etc. Network anomaly detection study based on single-class support vector machine and active learning [ J ]. Communication journal, 2015, 36 (11): 136-146.
Disclosure of Invention
Aiming at the problems, the application provides a method for detecting and correcting the abnormal data of the water delivery pipe network, which aims at the high-dimensional vector space containing all monitoring parameters, multi-parameter data, has simple modeling and rapid and efficient calculation and realizes the on-line detection and correction of the actually measured abnormal data.
The purpose of the application is realized in the following way: the method for detecting and correcting the abnormal data of the water delivery pipe network comprises the following steps:
step one, establishing local model detection sample data and completing sample labeling: setting a total of a monitoring points, wherein the high-dimensional actual measurement parameters consist of a flow and a pressure data, and each sampling moment forms 1 sample; adopting local model detection, verifying each sample parameter by establishing a flow soft measurement models and a node pressure regression models, marking whether the sample and the parameters thereof are normal or not, and generating a sample label;
step two, building an LOF abnormal data detection model:
1. obtaining a local outlier factor LOF value of each sample by using an LOF algorithm;
2. assuming that the temporary threshold Th is from 0 to 5, the step length is 0.1, obtaining a confusion matrix under each temporary threshold according to the sample label in the first step, and calculating the true positive rate TPR (sensitivity) and the false positive rate FPR (specificity) one by one.
3. Drawing an ROC curve of the operation characteristic of a receiver according to the TPR value and the FPR value, finding a critical point closest to the (0, 1) point on the ROC curve, and determining a corresponding Th value as an LOF threshold value; a normal sample with LOF value smaller than the threshold value and the LOF threshold value form an LOF abnormal data detection model;
step three, testing the test sample data, detecting the abnormal parameters of the abnormal sample, and successively calculating the abnormal detection accuracy A 1 Accuracy A of anomaly parameter determination 2 When A is 1 、A 2 Up to 95%, the step four can be entered, otherwise, the step two is returned;
step four, detecting actual abnormal data: aiming at node flow pressure data obtained by current sampling of a monitoring point, calculating an LOF value of each sample, judging whether the sample is abnormal according to a threshold value, and detecting abnormal parameters in the abnormal sample;
and fifthly, correcting the abnormal parameters by using a neighborhood mean method.
In the first step, the flow soft measurement model is provided with nodes d and Q in =Q out +S×Δh/Δt, where Q in For the pooling flow (theoretically equal to the flow Q of node d), S is the reservoir bottom area, Q out For the flow out of the tank, deltah is the liquid level variation (obtained by a flowmeter and a liquid level meter respectively), deltat is the sampling time interval, and Q is set in And the mean value u=0 of the Q residual, standard deviation σ 1 m 3 Obtaining upper limit and lower limit of confidence interval according to 3 sigma principle, for a certain moment, using flow soft measurement model to estimate the flow Q of the pool in =c m 3 /h, if the real measured flow value Q is Q in 3 sigma (i.e. [ c-3 sigma ] 1 ,c+3σ 1 ]) The interval is normal, otherwise, the interval is judged to be abnormal.
In the first step, the pressure regression model is a regression model for setting the flow of the node d as an independent variable x, establishing the pressure y of the node d to obtain an analytical formula of y= -kx+b, and calculating the standard deviation sigma of the residual error of the pressure value and the regression value 2 The upper and lower limits of the confidence interval are also obtained according to the 3 sigma principle, for a certain moment the flow value e m 3 And/h, obtaining the corresponding pressure interval as [ -ke+b-3 sigma ] according to the analysis of the upper limit and the lower limit 2 ,-ke+b+3σ 2 ]If the pressure value of the node d at the moment is in the pressure interval, the node d is normal, and if the pressure value is not in the pressure interval, the node d pressure parameter measured at the moment is judged to be abnormal.
In the second step, the local outlier factor LOF value of each sample is obtained by using the LOF algorithm, which comprises the following steps:
1. calculating a k-th distance: let the distance between two points p and o be d (p, o), k be the number of adjacent points, d k (p) represents the kth distance, d, of the point p k (p) =d (p, o), indicating that point o is the kth point from point p, but does not include p itself;
2. calculating a kth distance neighborhood: n (N) k (p) represents the kth distance neighborhood of p (set of all points within the kth distance of p), |N k (p) | represents the number of all points in the neighborhood;
3. calculating a kth reachable distance: r is R k (o, p) represents the kth reachable distance from point o to point p, R k (o,p)=max{d k (o), d (p, o) }, i.e. reach a distance R k (o, p) is at least the kth distance d of point o k (o), or the true distance d (p, o) between points o and p;
4. calculating local reachable densities: local reachable density representation N k The inverse of the mean of the kth reachable distances from all points within (p) to point p, noted as:
5. calculating local outlier factors: the local outlier factor is defined by the local relative density, representing the average of the ratio of the local reachable densities of all sample points in the kth neighborhood of point p to the local reachable density of point p, and is recorded as:
in the second step, the confusion matrix is shown in formula 1:
in the second step, the true positive rate and the false positive rate are calculated according to formulas 4 and 5 respectively:
in the third step, the test sample data is tested:
1. detecting sample data by using LOF model, verifying by local model to obtain confusion matrix parameters, and calculating abnormal detection accuracy A according to 6 1 And verifying the model effect.
2. Detecting an abnormal parameter of the abnormal sample by using the Z score, and calculating the abnormal parameter to determine the accuracy A according to the formula 8 2 And checking the Z score effect. Assuming an abnormal sample p, at N k In (p), the average value and standard deviation of each parameter of all points are calculated first, and thenZ fraction of each parameter of the sample p is calculated as follows
Wherein X represents the original data and wherein,representing average number, S representing standard deviation, if Z fraction of a certain parameter is maximum, the deviation degree of the parameter is maximum, and determining the parameter as abnormal parameter, and determining accuracy A of abnormal parameter 2 The calculation is as follows:
wherein C is NR Determining the correct number of abnormal parameters for the Z score, C NA Is the total number of anomaly parameters.
The application has the beneficial effects that: according to the method, a proper amount of sample data is obtained through the local model inspection of the water conveying pipe network, an LOF abnormal data detection model is built through training, abnormal samples in monitoring data are identified, specific abnormal parameters are determined through Z scores, the abnormal parameters are corrected through a k neighborhood mean method, and the method is simple in modeling and rapid and efficient in calculation and provides guarantees for accurately analyzing the running state of the pipe network, simulating in real time, monitoring the state and diagnosing faults aiming at the actually measured high-dimensional and multi-parameter data.
Drawings
The specific structure of the present application is shown in the following drawings and examples:
FIG. 1 is a schematic diagram of a k-neighborhood;
fig. 2 is a k-neighborhood mean correction example graph (k=5);
FIG. 3 is a flow chart of the method of the present application;
FIG. 4 is a node 23# pressure regression model and normal pressure data interval;
FIG. 5 is a ROC curve;
FIG. 6 is a comparison of parameter values before and after correction of an outlier;
fig. 7 is a comparison of model fitness before and after outlier correction.
Detailed Description
The present application is not limited by the following examples, and specific embodiments can be determined according to the technical scheme and practical situations of the present application.
The application is further described below with reference to examples and the accompanying drawings, example 1: as shown in fig. 1-7, a method for detecting and correcting abnormal data of a water delivery pipe network comprises the following steps:
step one, establishing local model detection sample data and completing sample labeling: the water delivery pipe network is generally provided with flow and pressure monitoring points at the positions of the water requiring nodes at the tail end and the pipe network branch nodes so as to sense the hydraulic operation state of the pipe network. The high-dimensional actual measurement parameters consist of a flow and a pressure data, and each sampling moment forms 1 sample. And adopting local model detection, namely verifying each sample parameter by establishing a flow soft measurement models and a node pressure regression models, and marking whether the sample and the parameters thereof are normal or not. Since the reservoir liquid level of the water plant changes along with the change of the inlet/outlet flow, a water plant flow soft measurement model is established through the reservoir liquid level and the outlet flow, and whether the inlet flow data of the water plant is normal or not is verified. The specific flow is as follows: is provided with a node d, Q in =Q out +S×Δh/Δt, where Q in For the pooling flow (theoretically equal to the flow Q of node d), S is the reservoir bottom area, Q out For out-of-pool flow, Δh is the level change (obtained by the flow meter and the level meter, respectively), and Δt is the sampling time interval. Set Q in And the mean value u=0 of the Q residual, standard deviation σ 1 m 3 And/h. For a certain moment, the pooling flow Q estimated by using a flow soft measurement model in =c m 3 /h, if the real measured flow value Q is Q in 3 sigma (i.e. [ c-3 sigma ] 1 ,c+3σ 1 ]) The interval is normal, otherwise, the interval is judged to be abnormal. The node pressure of the water plant is related to the flow of the water plant, and a node pressure regression model can be established. Under the condition that the flow of the water plant is known, determining a reasonable value range of the node pressure according to the node pressure regression model predicted value and the 3 sigma principle, and verifying the pressure parameterWhether it is normal. The specific flow is as follows: establishing a regression model of the pressure y of the node d by taking the flow of the node d as an independent variable x to obtain an analytical formula of y= -kx+b, and calculating the standard deviation sigma of the residual error of the pressure value and the regression value 2 The upper and lower limits of the confidence interval are also obtained according to the 3σ principle. For a certain moment of flow value e m 3 And/h, obtaining the corresponding pressure interval as [ -ke+b-3 sigma ] according to the analysis of the upper limit and the lower limit 2 ,-ke+b+3σ 2 ]. If the pressure value of the node d at the moment is not in the interval, judging that the pressure parameter of the node d measured at the moment is abnormal.
Step two, building an LOF abnormal data detection model: the specific flow is as follows:
(1) The local outlier LOF value for each sample was obtained using the LOF algorithm.
(2) Assuming that the temporary threshold Th is from 0 to 5, the step length is 0.1, and then the confusion matrix under each temporary threshold is obtained according to the sample label in the step 1, as shown in the formula 1, and the true positive rate TPR (sensitivity) and the false positive rate FPR (specificity) of the confusion matrix are calculated one by one.
(3) And drawing an ROC curve of the operation characteristic of the receiver according to the TPR value and the FPR value, finding the critical point closest to the (0, 1) point, and determining the corresponding Th value as an LOF threshold value. And the normal samples with LOF values smaller than the threshold value and the LOF threshold value form an LOF abnormal data detection model (LOF model for short).
The specific flow of the LOF algorithm for calculating the LOF value is as follows (described in conjunction with fig. 1):
(1) Calculating a kth distance, assuming that the distance between two points p and o is d (p, o), k being the number of adjacent points, d k (p) represents the kth distance, d, of the point p k (p) =d (p, o), indicating that point o is the kth point from point p, but does not include p itself.
(2) Calculating a kth distance neighborhood, N k (p) represents the kth distance neighborhood of p (set of all points within the kth distance of p), |N k (p) | represents the number of all points in the neighborhood.
(3) Calculating the kth reachable distanceSeparation, R k (o, p) represents the kth reachable distance from point o to point p, R k (o,p)=max{d k (o), d (p, o) }, i.e. reach a distance R k (o, p) is at least the kth distance d of point o k (o), or the true distance d (p, o) between points o and p.
(4) Calculating local reachable density, which represents N k The inverse of the mean of the kth reachable distances from all points within (p) to point p, noted as:
(5) Calculating local outlier factors, wherein the local outlier factors are defined by local relative densities, represent the average of the ratios of the local reachable densities of all sample points in the kth neighborhood of the point p to the local reachable densities of the point p, and are recorded as follows:
the true positive rate and the false positive rate are calculated according to formulas (4) and (5):
step 3, testing the test sample data, and detecting abnormal parameters of the abnormal sample by using the Z score:
(1) Detecting sample data by using LOF model, verifying by local model to obtain confusion matrix parameters, and calculating abnormal detection accuracy A according to (6) 1 And verifying the model effect.
(2) Detecting an abnormal parameter of the abnormal sample using the Z score, and calculating the abnormal parameter according to formula (8) to determine the accuracy A 2 And checking the Z score effect. Assuming an abnormal sample p, at N k In (p), the average value and standard deviation of each parameter of all points are calculated, and then the Z fraction of each parameter of the sample p is calculated. The calculation formula is that
Wherein X represents the original data and wherein,mean, S standard deviation. If the Z fraction of a certain parameter is maximum, the deviation degree of the parameter is maximum, and the parameter is determined as an abnormal parameter. Accuracy A of anomaly parameter determination 2 The calculation is as follows:
wherein C is NR Determining the correct number of abnormal parameters for the Z score, C NA Is the total number of anomaly parameters.
When the detection accuracy A 1 、A 2 Up to 95%, step 4 may be entered. Otherwise, the k initial value is adjusted, and the step 2 is returned.
Step four, detecting actual abnormal data:
node flow pressure data obtained by current sampling aiming at monitoring points: and calculating LOF value of each sample, judging whether the sample is abnormal according to the threshold value, and detecting an abnormal parameter q in the abnormal sample p through Z fraction.
Correcting abnormal parameters by using a k-neighborhood mean method:
k-neighborhood mean correction, i.e. assuming that the q-th parameter in the anomaly sample p is anomalous, using N k The mean value of the q-th parameter of the k normal samples in (p) is subjected to substitution correction. As shown in fig. 2, the two-dimensional parameter (x, y) is illustrated as k=5. Obviously O is an outlier, A-E are N 5 (O) 5 normal points in the neighborhood (red circle)Wherein R is the center of 5 normal points, S is the correction point, x 1 、y 1 As the distance between O and R in the x-axis and y-axis, x (O) is a normal parameter and y (O) is an abnormal parameter, detected by the Z-score algorithm, so y (O) must be corrected to y (R) =y (S).
Example 2: example 2 is described in conjunction with fig. 3-7. All data of JS water-conveying pipe network from running are selected, 15min sampling data of 2021 month 3 to 5 months are added, 17472 5min sampling data of 2021 month 6 months are added, and the data are divided into training samples (15725) and test samples (1747) according to the proportion of 9:1. 5min between 7 and 8 months 2021 samples data, 17472 in total, for practical anomaly detection/correction applications.
Step 1, establishing local model detection sample data and completing sample labeling:
in the JS water transmission pipe network, the 42-dimensional actual measurement parameter consists of 21 flow and 21 pressure data, and each sampling moment forms 1 sample. And adopting local model detection, specifically verifying each sample parameter by establishing 21 flow soft measurement models and 21 node pressure regression models, and marking whether the sample and the parameters thereof are normal or not. Because of the large number of nodes, the specific construction method of the flow soft measurement model and the pressure regression model is described by taking the node 23# as an example.
Node 23# flow soft measurement model Q in =Q out +S×Δh/Δt, where Q in For the pooling flow (theoretically equal to node 23# flow Q), S is 237.8m of reservoir bottom area 2 ,Q out For out-of-pool flow, Δh is the level change (obtained by the flow meter and the level meter, respectively), and Δt is the sampling time interval. According to measurement and calculation of training samples, Q in And the mean value u=0.0 of the Q residual, standard deviation σ=0.97 m 3 And/h. For a certain moment, the pooling flow Q estimated by using a flow soft measurement model in =90.0m 3 /h, if the real measured flow value Q is Q in 3 sigma (i.e. [87.09,92.91 ]]) The interval is normal, otherwise, the interval is judged to be abnormal.
Node 23# pressure regression model: as shown in fig. 4, for the training samples, a regression model of the pressure y of the node 23# is established with the flow of the node 23# as the argument xThe analytical formula is obtained as y= -2.45x+479.61, the standard deviation of the residual error of the pressure value and the regression value is calculated, and the upper limit and the lower limit of the confidence interval are obtained according to the 3 sigma principle. The flow value for a certain moment is 88.6m 3 And/h, obtaining corresponding pressure interval [167.77, 357.32 ] according to the analysis of the upper and lower limits]. If the node 23# pressure value at the moment is not in the section, the abnormality of the node 23# pressure parameter measured at the moment is determined.
Step 2, building an LOF abnormal data detection model:
when the JS pipe network normally and stably operates, the working condition changes little in the sampling interval time. Therefore, the initial value of k is taken as 5, and the capability of detecting the abnormal state and the calculation power can be considered. LOF values were calculated for 15725 training samples and the results are shown in table 1.
Table 1 LOF values of training samples
Sequence number 1 2 3 15723 15724 15725
LOF value 4.95 4.91 5.13 19.06 1.33 1.41
According to the labeling of whether the training set sample is normal or not in the step 1, setting a temporary threshold Th from 0 to 5, calculating the true positive rate TPR (sensitivity) and the false positive rate FPR (specificity) one by one, drawing a ROC curve, and as shown in fig. 5, determining the threshold Th=1.4 closest to the critical point of the (0, 1) point as the LOF threshold, and determining the normal sample set. The area under ROC curve AUC value was 0.982.
Step 3, testing the test sample data and detecting the abnormal parameters of the abnormal sample by using the Z fraction
k=5, and the test set sample LOF values were calculated, and the results are shown in table 2.
TABLE 2 LOF values for test samples
Sequence number 1 2 3 1745 1746 1747
LOF value 1.34 8.71 9.98 0.91 8.55 0.87
Obtaining confusion matrix parameters C according to LOF threshold and local model verification TP =44、C FP =3、C FN =2、C TN =1698, calculate abnormal sample detection accuracy a from 6 1 99.71%, and the detection effect reaches the standard. And (3) calculating the 5 th distance neighborhood of each abnormal sample in the normal sample set in the step (1), and the average value and the standard deviation of all parameters of sample points in the 5 th distance neighborhood of each abnormal sample. Taking a certain abnormal sample in the test set samples as an example, the average value of all parameters of all sample points in the 5 th neighborhood is {136.55,67.37,174.18, …,505.15,555.4,578.1}, the standard deviation is {0.3,0.09,0.19, …,0.27,1.2,0.24}, the Z score of each parameter of the sample is {0.34,0.45,0.35, …,0.54,9.1,0.49}, and the parameter with the largest Z score is determined to be the abnormal parameter. The actual abnormal parameter quantity 46, the abnormal parameter detection correct quantity 44, 2 false detections without missing detection in the test sample are tested by a Z-score abnormal parameter detection algorithm, and the abnormal parameter determination accuracy A is calculated by a formula 8 2 95.65%.
Thus, the detection accuracy A 1 、A 2 >95%, and can enter step 4.
Step 4, actual abnormal data detection
In an embodiment, the LOF model was used to detect 5min sampling data from 7 months to 8 months of 2021 for 17472 strips and the Z score was used to determine the location of the anomaly parameter. For the 7-8 month actual dataset, the k value was taken as 5, and the LOF values of 17472 samples were calculated, with the results shown in table 3.
TABLE 3 LOF values for all samples in actual anomaly detection
Abnormal samples in the monitoring data are detected through LOF threshold values, and abnormal parameters are detected through a Z-score algorithm.
Obtaining confusion matrix parameters C through local model verification TP =1060、C FP =52、C FN =30、C TN =16330, the actual anomaly detection samples for anomaly parameters: the number of abnormal parameters is 1090, wherein the number of abnormal parameters of flow of the 45# measuring point is 496, and the number of abnormal parameters of pressure of the 15# measuring point is 594. The Z-score anomaly parameter detection algorithm correctly determines 1060 anomaly parameters, wherein 478 flow anomaly parameters are correctly determined and 582 pressure anomaly parameters are correctly determined. Calculating from 6 to obtain classification accuracy A 1 =99.53%. Calculating to obtain abnormal parameter determination accuracy A by 8 2 The effect is excellent when the index is more than 95 percent (97.25 percent).
Step 5, correcting abnormal parameters by using k-neighborhood mean method
And (6) correcting 1090 abnormal parameters by using a k-neighborhood mean method, wherein the correction results of the abnormal flow value of the 45# measuring point and the abnormal pressure value of the 15# measuring point are shown in fig. 6.
To illustrate the effectiveness of the correction data, the corrected data is substituted into a steady-state hydraulic model, and the model fitness Fit is calculated
Wherein s is the number of all pressure measuring points, l represents the first measuring point, P m (l) Representing the simulated pressure value of the first measuring point, P r (l) The measured pressure value at the first measurement point is shown. If Fit is significantly improved, correction is indicatedIs effective. According to the national model checking standard, the fitness Fit<399, indicating that the steady state hydraulic model is in a normal operating state. The model fitness before and after correction was compared, and the result is shown in fig. 7. The k-neighborhood mean method enables the model fitness to be always in a low state, and most of the model fitness Fit<399, the correction effect reaches the standard, and the normal simulation tracking of the model is ensured.
The foregoing description is provided for the purpose of clearly illustrating the application and is not to be taken in a limiting sense. Obvious changes and modifications which are extended by the technical proposal of the application are still within the protection scope of the application.

Claims (8)

1. The method for detecting and correcting the abnormal data of the water delivery pipe network comprises the following steps:
s1, establishing local model detection sample data, and completing sample labeling: setting a total of a monitoring points, wherein the high-dimensional actual measurement parameters consist of a flow and a pressure data, and each sampling moment forms 1 sample; adopting local model detection, verifying each sample parameter by establishing a flow soft measurement models and a node pressure regression models, marking whether the sample and the parameters thereof are normal or not, and generating a sample label;
s2, building an LOF abnormal data detection model:
(1) Obtaining a local outlier factor LOF value of each sample by using an LOF algorithm;
(2) The temporary threshold Th is from 0 to 5, the step length is 0.1, the confusion matrix under each temporary threshold is obtained according to the sample label in the first step, and the true positive rate TPR (sensitivity) and the false positive rate FPR (specificity) of the confusion matrix are calculated one by one.
(3) Drawing an ROC curve of the operation characteristic of a receiver according to the TPR value and the FPR value, finding a critical point closest to the (0, 1) point on the ROC curve, and determining a corresponding Th value as an LOF threshold value; a normal sample with LOF value smaller than the threshold value and the LOF threshold value form an LOF abnormal data detection model;
s3, testing the test sample data, detecting abnormal parameters of the abnormal sample, and successively calculating the abnormal detection accuracy A 1 Accuracy of anomaly parameter determinationRate A 2 When A is 1 、A 2 Up to 95%, the step four can be entered, otherwise, the step two is returned;
s4, detecting actual abnormal data: aiming at node flow pressure data obtained by current sampling of a monitoring point, calculating an LOF value of each sample, judging whether the sample is abnormal according to a threshold value, and detecting abnormal parameters in the abnormal sample;
s5, correcting the abnormal parameters by using a neighborhood mean method.
2. The method for detecting and correcting abnormal data of water delivery pipe network according to claim 1, wherein the method comprises the following steps: s1, a flow soft measurement model is provided with a node d and a node Q in =Q out +S×Δh/Δt, where Q in For the flow rate of the water entering the pool, S is the bottom area of the water storage pool, Q out For the flow out of the tank, deltah is the liquid level variation, deltat is the sampling time interval, and Q is set in And the mean value u=0 of the Q residual, standard deviation σ 1 m 3 Obtaining upper limit and lower limit of confidence interval according to 3 sigma principle, for a certain moment, using flow soft measurement model to estimate the flow Q of the pool in =c m 3 /h, if the real measured flow value Q is Q in 3 sigma (i.e. [ c-3 sigma ] 1 ,c+3σ 1 ]) The interval is normal, otherwise, the interval is judged to be abnormal.
3. The method for detecting and correcting abnormal data of water delivery pipe network according to claim 1, wherein the method comprises the following steps: in S1, the pressure regression model is a regression model which sets the flow of a node d as an independent variable x, establishes the pressure y of the node d to obtain an analytical formula of y= -kx+b, and calculates the standard deviation sigma of the residual error of the pressure value and the regression value 2 The upper and lower limits of the confidence interval are also obtained according to the 3 sigma principle, for a certain moment the flow value e m 3 And/h, obtaining the corresponding pressure interval as [ -ke+b-3 sigma ] according to the analysis of the upper limit and the lower limit 2 ,-ke+b+3σ 2 ]If the pressure value of the node d at the moment is in the pressure interval, the node d is normal, and if the pressure value is not in the pressure interval, the node d pressure parameter measured at the moment is judged to be abnormal.
4. The method for detecting and correcting abnormal data of water delivery pipe network according to claim 1, wherein the method comprises the following steps: in S2, obtaining the local outlier factor LOF value of each sample using the LOF algorithm includes the steps of:
(1) Calculating a k-th distance: let the distance between two points p and o be d (p, o), k be the number of adjacent points, d k (p) represents the kth distance, d, of the point p k (p) =d (p, o), indicating that point o is the kth point from point p, but does not include p itself;
(2) Calculating a kth distance neighborhood: n (N) k (p) represents the kth distance neighborhood of p, |N k (p) | represents the number of all points in the neighborhood;
(3) Calculating a kth reachable distance: r is R k (o, p) represents the kth reachable distance from point o to point p, R k (o,p)=max{d k (o),d(p,o)};
(4) Calculating local reachable densities: local reachable density representation N k The inverse of the mean of the kth reachable distances from all points within (p) to point p, noted as:
(5) Calculating local outlier factors: the local outlier factor is defined by the local relative density, representing the average of the ratio of the local reachable densities of all sample points in the kth neighborhood of point p to the local reachable density of point p, and is recorded as:
5. the method for detecting and correcting abnormal data of water delivery pipe network according to claim 1, wherein the method comprises the following steps: in the second step, the confusion matrix is shown in formula 1:
6. the method for detecting and correcting abnormal data of water delivery pipe network according to claim 5, wherein the method comprises the following steps: in S2, the true positive rate TPR and the false positive rate FPR are calculated as formula 4 and formula 5, respectively:
7. the method for detecting and correcting abnormal data of water delivery network according to claim 4, wherein the method comprises the following steps: in S3, the test sample data is tested:
(1) Detecting sample data by using LOF model, verifying by local model to obtain confusion matrix parameters, and calculating abnormal detection accuracy A according to 6 1 And verifying the model effect.
(2) Detecting an abnormal parameter of the abnormal sample by using the Z score, and calculating the abnormal parameter to determine the accuracy A according to the formula 8 2 And checking the Z score effect. Assuming an abnormal sample p, at N k In (p), firstly calculating the average value and standard deviation of all parameters of all points, and then calculating the Z fraction of each parameter of the sample p, wherein the calculation formula is that
Wherein X represents the original data and wherein,representing the averageThe number S represents the standard deviation, if the Z fraction of a certain parameter is maximum, the deviation degree of the parameter is maximum, the parameter is determined as an abnormal parameter, and the accuracy A of the abnormal parameter determination is determined 2 The calculation is as follows:
wherein C is NR Determining the correct number of abnormal parameters for the Z score, C NA Is the total number of anomaly parameters.
8. The method for detecting and correcting abnormal data of water delivery network according to claim 7, wherein the method comprises the following steps: when A is 1 、A 2 When the number of the adjacent points is less than 95%, the initial value of the number k of the adjacent points is adjusted by returning to S2.
CN202311215125.1A 2023-09-19 2023-09-19 Water delivery pipe network operation data anomaly detection and correction method Pending CN117216703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311215125.1A CN117216703A (en) 2023-09-19 2023-09-19 Water delivery pipe network operation data anomaly detection and correction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311215125.1A CN117216703A (en) 2023-09-19 2023-09-19 Water delivery pipe network operation data anomaly detection and correction method

Publications (1)

Publication Number Publication Date
CN117216703A true CN117216703A (en) 2023-12-12

Family

ID=89045864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311215125.1A Pending CN117216703A (en) 2023-09-19 2023-09-19 Water delivery pipe network operation data anomaly detection and correction method

Country Status (1)

Country Link
CN (1) CN117216703A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117789422A (en) * 2024-02-26 2024-03-29 江西依爱弘泰消防安全技术有限公司 Combustible gas alarm control system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117789422A (en) * 2024-02-26 2024-03-29 江西依爱弘泰消防安全技术有限公司 Combustible gas alarm control system and method

Similar Documents

Publication Publication Date Title
CN107357275B (en) Non-gaussian industrial process fault detection method and system
CN112508105B (en) Fault detection and retrieval method for oil extraction machine
CN112799898B (en) Interconnection system fault node positioning method and system based on distributed fault detection
CN104221475B (en) Fault detection, localization and performance monitoring of photosensors for lighting controls
KR100867938B1 (en) Prediction method for watching performance of power plant measuring instrument by dependent variable similarity and kernel feedback
CN117216703A (en) Water delivery pipe network operation data anomaly detection and correction method
CN107741578B (en) Original meter reading data processing method for remote calibration of running error of intelligent electric energy meter
KR101908865B1 (en) Method for data quality analysis of observed temperature
CN113556629B (en) Intelligent ammeter error remote estimation method and device
CN102495318A (en) Fault diagnosis method of capacitive equipment
CN110399986A (en) A kind of generation method of pumping plant unit fault diagnosis system
Sarrate et al. Clustering techniques applied to sensor placement for leak detection and location in water distribution networks
CN113721182B (en) Method and system for evaluating reliability of online performance monitoring result of power transformer
CN108615054B (en) Method for constructing comprehensive index for measuring similarity between drainage pipe network nodes
CN105741184A (en) Transformer state evaluation method and apparatus
CN105279553A (en) Method for identifying fault degree of high-pressure heater water supply system
CN110750756B (en) Real-time on-line instrument checksum diagnosis method through optimal support vector machine algorithm
CN113157684B (en) Water conservancy mass data error checking method
CN115184734A (en) Power grid line fault detection method and system
CN110705187B (en) Instant on-line instrument checksum diagnosis method through least square algorithm
CN114692729A (en) New energy station bad data identification and correction method based on deep learning
CN110705186B (en) Real-time online instrument checksum diagnosis method through RBF particle swarm optimization algorithm
Przystałka et al. Optimal placement of sensors and actuators for leakage detection and localization
CN110532520A (en) A kind of the statistics method for reconstructing and system of engineering test missing data
Wang et al. Abnormal node detection method for time-sharing heating energy consumption in multi-storey buildings based on drosophila algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination