CN117272216B - Data analysis method for automatic flow monitoring station and manual water gauge observation station - Google Patents

Data analysis method for automatic flow monitoring station and manual water gauge observation station Download PDF

Info

Publication number
CN117272216B
CN117272216B CN202311560638.6A CN202311560638A CN117272216B CN 117272216 B CN117272216 B CN 117272216B CN 202311560638 A CN202311560638 A CN 202311560638A CN 117272216 B CN117272216 B CN 117272216B
Authority
CN
China
Prior art keywords
data
distance
neighbor
value
density
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311560638.6A
Other languages
Chinese (zh)
Other versions
CN117272216A (en
Inventor
谭文杰
王鹏
郑勇
蔡建军
黄文峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Building Materials Inspection And Certification Group Hunan Co ltd
Original Assignee
China Building Materials Inspection And Certification Group Hunan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Building Materials Inspection And Certification Group Hunan Co ltd filed Critical China Building Materials Inspection And Certification Group Hunan Co ltd
Priority to CN202311560638.6A priority Critical patent/CN117272216B/en
Publication of CN117272216A publication Critical patent/CN117272216A/en
Application granted granted Critical
Publication of CN117272216B publication Critical patent/CN117272216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The invention relates to the technical field of data processing, and provides a data analysis method of an automatic flow monitoring station and an artificial water gauge observation station. The invention aims to improve the accuracy and reliability of the data acquired by the automatic flow monitoring station and the manual water gauge observation station and realize the accurate analysis of the data of the automatic flow monitoring station and the manual water gauge observation station.

Description

Data analysis method for automatic flow monitoring station and manual water gauge observation station
Technical Field
The invention relates to the technical field of data processing, in particular to a data analysis method of an automatic flow monitoring station and a manual water gauge observation station.
Background
Automatic flow monitoring monitors the water level and water flow of irrigation district channels by installing flow rate sensors and water level sensors on the irrigation district channels. And (3) observing by using a manual water gauge, visually reading data on the water gauge by using a monitoring staff, recording the data on an observation record book, and keeping the measurement of water level and flow. Irrigation district channels are an important water source supply route for farm irrigation. By monitoring the water level and flow of the channel, the change of the water quantity in the channel can be known in time, the supply and distribution of irrigation water resources can be effectively managed, proper irrigation water can be ensured to be obtained in farmlands, and the yield and quality of crops are improved. Monitoring the water level and flow of irrigation area channels can provide key water resource information for water resource scheduling and planning. Through real-time monitoring data, the supply quantity of water sources in the irrigation areas can be adjusted according to different time periods and requirements, and water resources are reasonably distributed. Monitoring the water level and flow of the channel can help to find the water leakage point and the water resource waste, timely take repair measures, reduce the loss of water resources and realize water saving benefits.
Because the sensor or the manual observation can generate data error record, when the automatic monitoring data and the manual observation data are combined, the error data need to be removed. As the result of the traditional local outlier factor algorithm is greatly influenced by the k value of the neighbor number, if the k value is set smaller, the normal data is easy to judge as the abnormal data, and if the k value is set larger, the abnormal data is easy to judge as the normal data. Therefore, the embodiment of the invention provides an improvement of the LOF algorithm by self-adaptive k value. And the abnormal data in the automatic monitoring data and the manual observation data are removed.
In summary, the data analysis method of the automatic flow monitoring station and the manual water gauge observation station is provided by the invention, the automatic monitoring data and the manual observation data are analyzed to obtain the optimal density index, the k value in the local outlier factor algorithm is replaced, and the reliability and the accuracy of the automatic monitoring data and the manual observation data are improved.
Disclosure of Invention
In order to solve the technical problems, the invention provides a data analysis method of an automatic flow monitoring station and a manual water gauge observation station, so as to solve the existing problems.
The invention relates to a data analysis method of an automatic flow monitoring station and a manual water gauge observation station, which adopts the following technical scheme:
one embodiment of the invention provides a data analysis method of an automatic flow monitoring station and a manual water gauge observation station, which comprises the following steps:
respectively acquiring water level data and flow data of an automatic flow monitoring station and an artificial water gauge observation station at the same moment; taking water level data and flow data acquired by an automatic flow monitoring station as automatic monitoring data; taking water level data and flow data acquired by the artificial water gauge observation station as artificial observation data;
taking water level data and flow data of the automatic monitoring data at each moment as a data point; acquiring each neighbor group according to the distance value range among the data points of the automatic monitoring data; obtaining error adjustment components of each neighbor group according to the distribution of the distance values in each neighbor group; combining the error adjustment components of each neighbor set to obtain error adjustment factors of each neighbor set; dividing the distance values in each neighbor group into various types according to the error adjustment factors of each neighbor group;
obtaining weight factors of each class in each neighbor group according to the number of distance values contained in each class in each neighbor group; combining various weight factors in each neighbor set to obtain the data density distance of each neighbor set; obtaining the equilibrium density distance of the automatic monitoring data according to the data density distance of each adjacent group; combining the equilibrium density distance of the automatic monitoring data to obtain the optimal density of the automatic monitoring data;
acquiring abnormal data points of the automatic monitoring data at each moment by combining the optimal density of the automatic monitoring data and a local outlier factor algorithm; deleting abnormal data points in the automatic monitoring data and the manual observation data; and finishing data analysis of the automatic flow monitoring station and the manual water gauge observation station.
Preferably, the acquiring each neighbor set according to the distance value range between each data point of the automatic monitoring data includes:
for each data point of the automatic monitoring data, respectively calculating the distance between each data point and the rest other data points, sorting the distances of each data point from small to large, sorting the minimum distance of all data points from small to large as a first neighbor group, sorting the second distance of all data points from small to large as a second neighbor group, and so on, sorting the n-1 distance of all data points from small to large as a first neighbor groupNeighbor set, wherein->Indicating the number of data points in the automatic monitoring data.
Preferably, the error adjustment component of each neighboring group is obtained according to the distribution of the distance values in each neighboring group, and the specific method includes:
for each neighbor set, calculating the difference value between each distance value and the previous distance value, recording the difference value as a first difference value, calculating the difference value between the next distance value and the current distance value of each distance value, recording the difference value as a second difference value, calculating the average value of the first difference value and the second difference value of each distance value, and taking the ratio of the minimum value to the maximum value of the average value of all the distance values as the error adjusting component of each neighbor set.
Preferably, the error adjustment component of each neighbor group is combined to obtain an error adjustment factor of each neighbor group, and the expression is:
in the method, in the process of the invention,indicate->Error-regulating factor of neighbor set,/>Indicate->Intra-neighbor set->Neighborhood distance difference of individual distance values, +.>Indicate->Standard deviation of all distance values within a neighbor set, +.>Representing an exponential function based on natural constants, < ->Indicate->Error-regulating component of neighbor set,/>For regulating parameters->Indicating the number of data points in the automatic monitoring data.
Preferably, the dividing the distance values in each neighbor set into various types according to the error adjustment factors of each neighbor set includes:
for each neighbor group, judging from a first distance value in each neighbor group, if the absolute value of the difference between the second distance value and the first distance value is smaller than or equal to an error adjustment factor of the neighbor group, classifying the first distance value and the second distance value into one class, continuously judging a third distance value at the moment, if the absolute value of the difference between the third distance value and the first distance value is smaller than or equal to an error adjustment factor of the neighbor group, classifying the first distance value, the second distance value and the third distance value into one class, if the absolute value of the difference between the third distance value and the first distance value is larger than the error adjustment factor of the neighbor group, classifying the first distance value and the second distance value into one class, and judging the relation between the fourth distance value and the third distance value from the third distance value, and classifying each distance value in the neighbor group into one class.
Preferably, the obtaining the weight factor of each class in each neighboring group according to the number of distance values contained in each class in each neighboring group includes:
and counting the number of distance values contained in each neighbor group, calculating the ratio of the number of distance values to the number of all distance values in the neighbor group, taking the ratio as an independent variable of a logarithmic function based on 2, calculating the product of the calculation result of the logarithmic function and the ratio, and taking the reciprocal of the product as a weight factor of each group.
Preferably, the data density distance of each neighbor group is obtained by combining various weight factors in each neighbor group, and the expression is:
in the method, in the process of the invention,indicate->Data density distance of neighbor set, +.>Indicate->Intra-neighbor set->The class contains the mean of all distance values, +.>Representing an exponential function based on natural constants, < ->Indicate->The>Weight factor of class,/->Indicate->The total number of classes contained in the neighbor set.
Preferably, the equilibrium density distance of the automatic monitoring data is obtained according to the data density distance of each neighboring group, and the expression is:
in the method, in the process of the invention,represents the equilibrium density distance, +.>Data density distance set representing the data density distances of all neighbor sets, +.>Quarter distance representing data density distance set, +.>Representing a maximum function>Representing taking the minimum function +_>Representing an exponential function based on natural constants, < ->Indicate->Data density distance of neighbor set, +.>Indicating the number of data points in the automatic monitoring data.
Preferably, the combining the balanced density distance of the automatic monitoring data to obtain the optimal density of the automatic monitoring data includes:
taking each data point of the automatic monitoring data as a center, taking the balanced density distance of the automatic monitoring data as a radius to make a circle, counting the number of other data points contained in the circle of each data point, taking the number as the density of each data point, and taking the average value of the densities of all the data points of the automatic monitoring data as the optimal density of the automatic monitoring data.
Preferably, the method for obtaining abnormal data points of the automatic monitoring data at each moment by combining the optimal density of the automatic monitoring data and the local outlier factor algorithm comprises the following steps:
and taking the optimal density of the automatic monitoring data as a k neighbor value of a local outlier factor algorithm, carrying out anomaly detection on the automatic monitoring data by utilizing the improved local outlier factor algorithm, obtaining an outlier factor of each data point, taking the data point with the outlier factor being greater than or equal to a preset threshold value as an outlier data point, and taking the data point with the outlier factor being less than the preset threshold value as a normal data point.
The invention has at least the following beneficial effects:
according to the invention, the optimal density of the automatic monitoring data is obtained by analyzing the distance distribution characteristics and the density characteristics of each data point in the automatic monitoring data, and the data analysis of an automatic flow monitoring station and an artificial water gauge observation station is realized by combining a local outlier factor algorithm, so that the accuracy of the automatic monitoring data and the artificial observation data is improved;
furthermore, the method acquires each neighbor group according to the distance between the data points, further obtains error adjustment factors of each neighbor group, classifies the distance in each neighbor group by using the error adjustment factors, obtains the balanced density distance and the optimal density of the automatic monitoring data according to the classification result, improves the local outlier factor algorithm, and solves the problem that the abnormal data detection is inaccurate due to the fact that the k value of the neighbor number is set improperly, so that the data reliability of the automatic flow monitoring station and the artificial water gauge observation station is affected. The invention has the advantages of high accuracy and high reliability.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a method for analyzing data of an automatic flow monitoring station and a manual water gauge observation station according to an embodiment of the present invention;
FIG. 2 is a flow chart of water level flow data analysis.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of the specific implementation, structure, characteristics and effects of the data analysis method of the automatic flow monitoring station and the manual water gauge observation station according to the invention in combination with the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the data analysis method of the automatic flow monitoring station and the manual water gauge observation station provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a data analysis method of an automatic flow monitoring station and a manual water gauge observation station according to an embodiment of the present invention is shown, where the method includes the following steps:
and S001, collecting automatic monitoring data and manual observation data, and preprocessing.
And collecting water level data and flow data of the irrigation channels at the same position at different continuous times aiming at an automatic flow monitoring station and a manual water gauge observation station. Because the measured water level and the measured flow are not data in the same dimension, the numerical value difference between the two data is larger. Therefore, the two data of the water level and the flow are mapped to the same dimension by adopting a linear mapping method, so that the difference between the two data of the water level and the flow is eliminated, and the subsequent processing and use are facilitated. Since the linear mapping method is a known technology, the embodiment will not be described in detail herein, the water level and flow data collected by the automatic flow monitoring station are used as automatic monitoring data, the water level and flow data collected by the manual water gauge observation station are used as manual observation data, and the water level and flow data after the processing are shown in table 1.
TABLE 1
In the table 1, the contents of the components,representing the time; />Representation->The water level value collected by the automatic flow monitoring station at any moment; />Representation->Flow value collected by automatic flow monitoring station at moment, < >>Representation->Water level value collected by time manual water gauge observation station, < ->Representation->And the flow value is acquired by the manual water gauge observation station at the moment.
Step S002, each neighbor set of the automatic monitoring data is obtained; obtaining error adjustment factors of the data of each neighbor set; classifying the neighbor set data by using an error adjustment factor; calculating the data density distance of each neighbor group by using classification; thereby obtaining the equilibrium density distance of the automatic monitoring data. The Local Outlier Factor (LOF) algorithm is improved by using the equalized density distance calculation to automatically monitor the optimal density of the data.
Specifically, in this embodiment, the water level data and the flow data at each moment are taken as one data point, each neighboring group is obtained, the error adjustment component and the data density distance of each neighboring group are obtained according to the distance between each data point, and further the balanced density distance and the optimal density of the automatic monitoring data are obtained, and the data analysis of the automatic flow monitoring station and the artificial water gauge observation station is completed by combining the optimal density of the automatic monitoring data and the local outlier factor algorithm, and a specific water level flow data analysis flow chart is shown in fig. 2. The construction process of the optimal density of the automatic monitoring data specifically comprises the following steps:
for automatic monitoring data, the water level value and the flow rate at each moment are calculatedThe value is used as a data point, and the Euclidean distance between each data point and all remaining data points is calculated, wherein the Euclidean distance is a known technology, and the embodiment is not described herein. For the followingData points, each of which is calculated as +.>Distance, selecting +.>The minimum value in the distances is counted as a group from small to large, the group is marked as a first neighbor group, the data in the first neighbor group is the minimum value of the distances of all the data points, and the like, and the +_of each data point is selected according to the same acquisition mode as the first neighbor group>The second smallest distance value in the distances is ordered from small to large and used as the second neighbor group until the first neighbor group is acquired>Neighbor set, denoted->Represents->Neighbor sets.
The density distance of a data point is reflected due to the distance between the data point and other data points. When the respective distances of a data point are smaller in each neighbor set, this indicates that the data point is located in a higher area density, indicating that there are more data points in the area around the data point.
When water level and flow measurement are carried out in an irrigation channel, various influences are often received, errors in a certain range exist between measured data and real data when the measurement is carried out, in order to eliminate the influence of the errors on the data, the measured data are more similar to the real data, therefore, an error regulating factor is constructed according to the analysis, and the specific expression of the error regulating factor is as follows:
in the method, in the process of the invention,indicate->Intra-neighbor set->Neighborhood distance difference of individual distance values, +.>Indicate->Error adjustment components of neighbor sets; />Representing a minimum function; />Representing a maximum function; />Indicate->The first in the neighbor groupA distance value; />Indicate->The>Distance value>Indicate->The>Distance value>Indicate->Error-regulating factor of neighbor set,/>Indicate->Standard deviation of all distance values within a neighbor set,representing an exponential function based on natural constants, < ->For regulating parameters->Representing the absolute value function->Represents the number of data points in the automatic monitoring data, in this example +.>The implementation can be set by the implementation personnel according to the actual situation, and the embodiment is not limited to this.
When the first isThe smaller the error adjustment component of the neighbor set, representing the +.>The more dispersed the distance values in the neighbor set, the greater the local density phase difference between the distance values is reflected; the greater the difference in local density, the +.>The larger the standard deviation of all distance values within a neighbor set, thereby making +.>The smaller the portion; thus->The more discrete the distance value distribution in the neighbor set, the +.>Error-regulating component of neighbor set->The smaller the value of (2).
Data in each neighbor group is processed according to the following modeThe regular intervals of (2) are divided into classes, wherein the number of classes is denoted as P, in particular for +.>First->Distance value, i.e.)>Judging->Whether or not the distance value is located +.>In the section, if located, the +.>Distance value and->The distance values are classified into one type, and then the judgment of the +.>Distance value, if not located, will be +.>Distance value and->The distance values are classified into a class, and the +.>And (5) judging again in the regular intervals of the distance values. For example, the data are 0.2, 0.25, 0.29, 0.6, 0.74, 0.78, 0.8, 0.85, < >>The value of (2) is 0.05, so that data can be classified into 0.2, 0.25 and 0.29; 0.6 is one type; 0.74, 0.78, 0.8, 0.85 are one class.
After classifying the data in each neighbor set by using the error adjustment factor, the number of distance values contained in each class determines the importance degree of each class, and in general, the more the number of distance values contained in each class is, the greater the importance of the class is indicated, so the more weight is occupied when calculating the data density distance, so the ratio of the distance value contained in each class to all the distance values in the neighbor set is calculated for each neighbor set, and the data density distance of each neighbor set is constructed according to the ratio, wherein the specific expression of the data density distance is:
in the method, in the process of the invention,indicate->Data density distance of neighbor set; />Indicate->The>The weight factor of the class is set,indicate->The total number of classes contained in the neighbor set; />Indicate->Intra-neighbor set->The class contains the average of all the distance values;an exponential function based on a natural constant; />Indicate->The number of distance values contained in class L in the neighbor group and +.>The ratio of the number of all distance values in the neighbor set; />A logarithmic function with a base of 2 is shown.
Information representing the amount of distance value contained in class L, when the class LThe larger the calculated value, the smaller the number of distance values representing the L-th class, and the weight value attached to the L-th class when calculating the data density distance of each neighbor group>The smaller should be; the smaller the weight value is, the smaller the data density distance component carried by the L class is calculated; thus when the data in the neighbor set are more scattered, the data density is thus distance +>The smaller the value of (2).
Constructing the data density distances of all neighbor groups in the automatic monitoring data into a data density distance set, and recording the data density distance set as. Distance set according to data density->Constructing an equilibrium density distance, wherein the specific expression of the equilibrium density distance is as follows:
in the method, in the process of the invention,represents the equilibrium density distance, +.>Data density distance set representing the data density distances of all neighbor sets, +.>The fourth bit distance representing the data density distance set, that is, the difference obtained by subtracting the 25% element value from the 75% element value in the data density distance set, is the known art, and the embodiment is not described in detail here, except->Representing a maximum function>Representing taking the minimum function +_>Representing an exponential function based on natural constants, < ->Indicate->Data density distance of neighbor set, +.>Indicating the number of data points in the automatic monitoring data.
If the data density is distance setMedium pole difference is constant, data density distance set +.>Is a quarter bit distance of (2)Reflects the data density distance set +.>Distribution of elements in the formula ∈four-bit distance ∈four->The smaller the value of (2) represents the data density distance set +.>The more concentrated the medium elements; data Density distance set->Mean>Smaller, represent->The closer the element concentration is to the minimum value, the data density distance set +.>The larger the mean value of the element, the more representativeThe closer the element concentration is to the maximum. So when elements are more concentrated and closer to maxima, the data density distance setsCalculated equilibrium density distance +.>The greater the value of (2).
Centering on each data point of the automatic monitoring data, equalizing the density distanceCounting the number of data points contained in the circle of each data point for making a circle for the radius, taking the number as the density of each data point, and respectively representing the density value of each data point as +.>,/>,…/>. Calculating the optimal density of the automatic monitoring data according to the density value of each data point, wherein the specific expression of the optimal density is as follows:
in the method, in the process of the invention,representing an optimal density of the automatic monitoring data; />Representing a rounding function; />Indicating the number of data points in the automatic monitoring data. />Indicate->Density of data points.
Density values of data pointsThe larger the density representing the data point, the larger the value of the calculated optimal density K.
Will be of optimal densityThe Local Outlier Factor (LOF) algorithm is improved by being used as the neighbor number k of the Local Outlier Factor (LOF) algorithm, and the original LOF algorithm is highly dependent on the neighbor number k value, if the k value is smaller, normal data can be judged as abnormal data, and if the k value is larger, abnormal data can be judged as normal data. The improved LOF algorithm can better reflect the local characteristics of the data by calculating the K value through the density of the data. Abnormal data points in the automatic monitoring data can be effectively detected. The automatic monitoring data is then used as input data of a Local Outlier Factor (LOF) algorithm, which is a known technique, and the outlier factor of each data point is calculated.
And step S003, eliminating abnormal data of the automatic monitoring data and the manual observation data according to outlier factors of all data points of the automatic monitoring data, and completing data analysis of the automatic flow monitoring station and the manual water gauge observation station.
According to the calculated outlier factor of each data point of the automatic monitoring data, the outlier factor is greater than or equal to a threshold valueJudging the data point of the data point as abnormal data, eliminating the automatic monitoring data and the manual observation data at the time of the data point, and enabling the outlier factor to be smaller than the threshold value +.>The data points of (2) are used as normal data, and normal automatic monitoring data and manual observation data are reserved, wherein the threshold value is +.>Wherein->In order to automatically monitor the latest density of data, the practitioner can set the data according to the actual situation, and the embodiment is not limited to this.
And providing all normal automatic monitoring data and manual observation data for data processing personnel to complete data analysis, so as to realize data analysis of an automatic flow monitoring station and a manual water gauge observation station.
In conclusion, the embodiment of the invention solves the problem that abnormal data detection is inaccurate due to incorrect setting of the nearest neighbor k value, so that the data reliability of an automatic flow monitoring station and an artificial water gauge observation station is affected, and the accuracy of automatic monitoring data and artificial observation data is improved by combining a local outlier factor algorithm.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. The data analysis method of the automatic flow monitoring station and the manual water gauge observation station is characterized by comprising the following steps of:
respectively acquiring water level data and flow data of an automatic flow monitoring station and an artificial water gauge observation station at the same moment; taking water level data and flow data acquired by an automatic flow monitoring station as automatic monitoring data; taking water level data and flow data acquired by the artificial water gauge observation station as artificial observation data;
taking water level data and flow data of the automatic monitoring data at each moment as a data point; acquiring each neighbor group according to the distance value range among the data points of the automatic monitoring data; obtaining error adjustment components of each neighbor group according to the distribution of the distance values in each neighbor group; combining the error adjustment components of each neighbor set to obtain error adjustment factors of each neighbor set; dividing the distance values in each neighbor group into various types according to the error adjustment factors of each neighbor group;
obtaining weight factors of each class in each neighbor group according to the number of distance values contained in each class in each neighbor group; combining various weight factors in each neighbor set to obtain the data density distance of each neighbor set; obtaining the equilibrium density distance of the automatic monitoring data according to the data density distance of each adjacent group; combining the equilibrium density distance of the automatic monitoring data to obtain the optimal density of the automatic monitoring data;
acquiring abnormal data points of the automatic monitoring data at each moment by combining the optimal density of the automatic monitoring data and a local outlier factor algorithm; deleting abnormal data points in the automatic monitoring data and the manual observation data; completing data analysis of an automatic flow monitoring station and a manual water gauge observation station;
the error adjustment component of each neighbor group is obtained according to the distribution of the distance values in each neighbor group, and the specific method comprises the following steps: for each neighbor group, calculating the difference value between each distance value and the previous distance value, marking the difference value as a first difference value, calculating the difference value between the next distance value and the current distance value of each distance value, marking the difference value as a second difference value, calculating the average value of the first difference value and the second difference value of each distance value, and taking the ratio of the minimum value and the maximum value of the average value of all the distance values as the error adjusting component of each neighbor group;
and combining the error adjustment components of each neighbor set to obtain error adjustment factors of each neighbor set, wherein the expression is as follows:
in the method, in the process of the invention,indicate->Error-regulating factor of neighbor set,/>Indicate->Intra-neighbor set->Neighborhood distance difference of individual distance values, +.>Indicate->Standard deviation of all distance values within a neighbor set, +.>Representing an exponential function based on natural constants, < ->Indicate->Error-regulating component of neighbor set,/>For regulating parameters->Indicating the number of data points in the automatic monitoring data.
2. The method for analyzing data of an automatic flow monitoring station and an artificial water gauge observation station according to claim 1, wherein the acquiring each neighbor set according to a distance value range between data points of the automatic monitoring data comprises:
for each data point of the automatic monitoring data, respectively calculating the distance between each data point and the rest other data points, sorting the distances of each data point from small to large, sorting the minimum distance of all data points from small to large as a first neighbor group, sorting the second distance of all data points from small to large as a second neighbor group, and so on, sorting the n-1 distance of all data points from small to large as a first neighbor groupNeighbor set, wherein->Indicating the number of data points in the automatic monitoring data.
3. The data analysis method of an automatic flow monitoring station and an artificial water gauge observation station according to claim 1, wherein the dividing the distance values in each neighbor set into classes according to the error adjustment factors of each neighbor set comprises:
for each neighbor group, judging from a first distance value in each neighbor group, if the absolute value of the difference between the second distance value and the first distance value is smaller than or equal to an error adjustment factor of the neighbor group, classifying the first distance value and the second distance value into one class, continuously judging a third distance value at the moment, if the absolute value of the difference between the third distance value and the first distance value is smaller than or equal to an error adjustment factor of the neighbor group, classifying the first distance value, the second distance value and the third distance value into one class, if the absolute value of the difference between the third distance value and the first distance value is larger than the error adjustment factor of the neighbor group, classifying the first distance value and the second distance value into one class, and judging the relation between the fourth distance value and the third distance value from the third distance value, and classifying each distance value in the neighbor group into one class.
4. The data analysis method of an automatic flow monitoring station and an artificial water gauge observation station according to claim 1, wherein the obtaining the weight factors of each class in each neighboring group according to the number of distance values contained in each class in each neighboring group comprises:
and counting the number of distance values contained in each neighbor group, calculating the ratio of the number of distance values to the number of all distance values in the neighbor group, taking the ratio as an independent variable of a logarithmic function based on 2, calculating the product of the calculation result of the logarithmic function and the ratio, and taking the reciprocal of the product as a weight factor of each group.
5. The data analysis method for the automatic flow monitoring station and the manual water gauge observation station according to claim 1, wherein the data density distance of each neighbor set is obtained by combining weight factors of each type in each neighbor set, and the expression is:
in the method, in the process of the invention,indicate->Data density distance of neighbor set, +.>Indicate->Intra-neighbor set->The class contains the mean of all distance values, +.>Representing an exponential function based on natural constants, < ->Indicate->The>The weight factor of the class is set,indicate->The total number of classes contained in the neighbor set.
6. The data analysis method of an automatic flow monitoring station and an artificial water gauge observation station according to claim 1, wherein the equilibrium density distance of the automatic monitoring data is obtained according to the data density distance of each neighboring group, and the expression is:
in the method, in the process of the invention,represents the equilibrium density distance, +.>Data density distance set representing the data density distances of all neighbor sets, +.>Quarter bit distance, < ->Representing a maximum function>Representing taking the minimum function +_>Representing an exponential function based on natural constants, < ->Indicate->Data density distance of neighbor set, +.>Indicating the number of data points in the automatic monitoring data.
7. The data analysis method of an automatic flow monitoring station and a manual water gauge observation station according to claim 1, wherein the obtaining the optimal density of the automatic monitoring data by combining the balanced density distance of the automatic monitoring data comprises:
taking each data point of the automatic monitoring data as a center, taking the balanced density distance of the automatic monitoring data as a radius to make a circle, counting the number of other data points contained in the circle of each data point, taking the number as the density of each data point, and taking the average value of the densities of all the data points of the automatic monitoring data as the optimal density of the automatic monitoring data.
8. The method for analyzing data of an automatic flow monitoring station and an artificial water gauge observation station according to claim 1, wherein the method for acquiring abnormal data points of automatic monitoring data at each moment by combining the optimal density of the automatic monitoring data and a local outlier factor algorithm comprises the following steps:
and taking the optimal density of the automatic monitoring data as a k neighbor value of a local outlier factor algorithm, carrying out anomaly detection on the automatic monitoring data by utilizing the improved local outlier factor algorithm, obtaining an outlier factor of each data point, taking the data point with the outlier factor being greater than or equal to a preset threshold value as an outlier data point, and taking the data point with the outlier factor being less than the preset threshold value as a normal data point.
CN202311560638.6A 2023-11-22 2023-11-22 Data analysis method for automatic flow monitoring station and manual water gauge observation station Active CN117272216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311560638.6A CN117272216B (en) 2023-11-22 2023-11-22 Data analysis method for automatic flow monitoring station and manual water gauge observation station

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311560638.6A CN117272216B (en) 2023-11-22 2023-11-22 Data analysis method for automatic flow monitoring station and manual water gauge observation station

Publications (2)

Publication Number Publication Date
CN117272216A CN117272216A (en) 2023-12-22
CN117272216B true CN117272216B (en) 2024-02-09

Family

ID=89203021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311560638.6A Active CN117272216B (en) 2023-11-22 2023-11-22 Data analysis method for automatic flow monitoring station and manual water gauge observation station

Country Status (1)

Country Link
CN (1) CN117272216B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117807375A (en) * 2024-02-27 2024-04-02 成都秦川物联网科技股份有限公司 Ultrasonic water meter noise processing method, system and equipment based on Internet of Things

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545273A (en) * 2017-07-06 2018-01-05 北京航空航天大学 A kind of local outlier detection method based on density
CN107679215A (en) * 2017-10-19 2018-02-09 西安交通大学 A kind of outlier detection method based on barycenter
CN109597745A (en) * 2018-12-06 2019-04-09 中科恒运股份有限公司 Method for processing abnormal data and device
CN110807474A (en) * 2019-10-12 2020-02-18 腾讯科技(深圳)有限公司 Clustering method and device, storage medium and electronic equipment
US10789507B2 (en) * 2018-03-30 2020-09-29 Walmart Apollo, Llc Relative density-based clustering and anomaly detection system
WO2022262869A1 (en) * 2021-06-18 2022-12-22 工业互联网创新中心(上海)有限公司 Data processing method and apparatus, network device, and storage medium
CN115628776A (en) * 2022-10-25 2023-01-20 杭州电子科技大学 Water supply pipe network abnormal data detection method
CN115931055A (en) * 2023-01-06 2023-04-07 长江信达软件技术(武汉)有限责任公司 Rural water supply operation diagnosis method and system based on big data analysis
WO2023057434A1 (en) * 2021-10-04 2023-04-13 University Of Malta Method and flight data analyzer for identifying anomalous flight data and method of maintaining an aircraft
CN116484307A (en) * 2023-06-21 2023-07-25 深圳市魔样科技有限公司 Cloud computing-based intelligent ring remote control method
CN116502170A (en) * 2023-06-29 2023-07-28 山东浩坤润土水利设备有限公司济宁经济开发区分公司 Agricultural water conservancy monitoring method and related device based on cloud platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11755932B2 (en) * 2020-04-23 2023-09-12 Actimize Ltd. Online unsupervised anomaly detection
US20220129772A1 (en) * 2020-10-27 2022-04-28 Airis Labs Sdn Bhd System and method having the artificial intelligence (ai) algorithm of k-nearest neighbors (k-nn)

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545273A (en) * 2017-07-06 2018-01-05 北京航空航天大学 A kind of local outlier detection method based on density
CN107679215A (en) * 2017-10-19 2018-02-09 西安交通大学 A kind of outlier detection method based on barycenter
US10789507B2 (en) * 2018-03-30 2020-09-29 Walmart Apollo, Llc Relative density-based clustering and anomaly detection system
CN109597745A (en) * 2018-12-06 2019-04-09 中科恒运股份有限公司 Method for processing abnormal data and device
CN110807474A (en) * 2019-10-12 2020-02-18 腾讯科技(深圳)有限公司 Clustering method and device, storage medium and electronic equipment
WO2022262869A1 (en) * 2021-06-18 2022-12-22 工业互联网创新中心(上海)有限公司 Data processing method and apparatus, network device, and storage medium
WO2023057434A1 (en) * 2021-10-04 2023-04-13 University Of Malta Method and flight data analyzer for identifying anomalous flight data and method of maintaining an aircraft
CN115628776A (en) * 2022-10-25 2023-01-20 杭州电子科技大学 Water supply pipe network abnormal data detection method
CN115931055A (en) * 2023-01-06 2023-04-07 长江信达软件技术(武汉)有限责任公司 Rural water supply operation diagnosis method and system based on big data analysis
CN116484307A (en) * 2023-06-21 2023-07-25 深圳市魔样科技有限公司 Cloud computing-based intelligent ring remote control method
CN116502170A (en) * 2023-06-29 2023-07-28 山东浩坤润土水利设备有限公司济宁经济开发区分公司 Agricultural water conservancy monitoring method and related device based on cloud platform

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A Feature-Based Procedure for Detecting Technical Outliers in Water-Quality Data From In Situ Sensors;Priyanga Dilini Talagala et al.;《Water Resources Research》;第55卷(第11期);第8547-8568页 *
Local and global outlier detection algorithms in unsupervised approach: a review;Ayad Mohammed Jabbar;《Iraqi Journal for Electrical and Electronic Engineering》;第17卷(第1期);第76-87页 *
一种改进的LOF异常点检测算法;周鹏等;《计算机技术与发展》;第27卷(第12期);第115-118页 *
基于局部密度的快速离群点检测算法;邹云峰等;《计算机应用》;第37卷(第10期);第2932-2937页 *
基于局部离群点检测和标准差方法的锂离子电池组早期故障诊断;李纪伟等;《储能科学与技术》;第12卷(第9期);第2917-2926页 *
基于机器学习的配电网异常缺失数据动态清洗方法;梅玉杰等;《电力系统保护与控制》;第51卷(第7期);第158-169页 *

Also Published As

Publication number Publication date
CN117272216A (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN117272216B (en) Data analysis method for automatic flow monitoring station and manual water gauge observation station
CN112381476B (en) Method and device for determining electric energy meter with abnormal state
CN110070282B (en) Low-voltage transformer area line loss influence factor analysis method based on comprehensive relevance
CN105403245A (en) Sunlight greenhouse wireless sensor multi-data fusion method
CN116431975B (en) Environment monitoring method and system for data center
CN114168906B (en) Mapping geographic information data acquisition system based on cloud computing
CN113887908A (en) Bridge risk assessment method considering subjective and objective cross fusion weight
CN111998918A (en) Error correction method, error correction device and flow sensing system
CN116148753A (en) Intelligent electric energy meter operation error monitoring system
CN109523077B (en) Wind power prediction method
Oukil et al. A DEA cross-efficiency inclusive methodology for assessing water quality: A Composite Water Quality Index
CN117469603B (en) Multi-water-plant water supply system pressure optimal control method based on big data learning
CN117148803B (en) Adjusting control method for automatic centering width adjusting assembly line
CN117348831A (en) Picture adjustment method and system for liquid crystal display screen
CN109886288B (en) State evaluation method and device for power transformer
CN117154716A (en) Planning method and system for accessing distributed power supply into power distribution network
CN115034693B (en) Biological information data security management method, system and storage medium based on Internet of things
CN107305563B (en) Abnormal data detection method and system based on distance
CN114189313B (en) Ammeter data reconstruction method and device
CN113608506A (en) Intelligent detection device for alumina operation index
CN113673759A (en) Real-time marshalling method and terminal for hydrological data
CN110852516A (en) Data quality judging method based on big data information entropy traffic flow detection equipment
CN117421686B (en) Water and fertilizer integrated irrigation dosage data collection method
CN109918220A (en) A kind of anomaly data detection determination method for parameter and determining device
CN109933579A (en) A kind of part k nearest neighbor missing values interpolation system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant