WO2022160682A1 - 水质监测数据分析方法及装置、设备、存储介质 - Google Patents

水质监测数据分析方法及装置、设备、存储介质 Download PDF

Info

Publication number
WO2022160682A1
WO2022160682A1 PCT/CN2021/114248 CN2021114248W WO2022160682A1 WO 2022160682 A1 WO2022160682 A1 WO 2022160682A1 CN 2021114248 W CN2021114248 W CN 2021114248W WO 2022160682 A1 WO2022160682 A1 WO 2022160682A1
Authority
WO
WIPO (PCT)
Prior art keywords
water quality
quality monitoring
monitoring data
abnormal
data
Prior art date
Application number
PCT/CN2021/114248
Other languages
English (en)
French (fr)
Inventor
张子秋
蒙良庆
胡石泉
雷曼琴
曾海霞
符岳辉
Original Assignee
力合科技(湖南)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 力合科技(湖南)股份有限公司 filed Critical 力合科技(湖南)股份有限公司
Publication of WO2022160682A1 publication Critical patent/WO2022160682A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01KMEASURING TEMPERATURE; MEASURING QUANTITY OF HEAT; THERMALLY-SENSITIVE ELEMENTS NOT OTHERWISE PROVIDED FOR
    • G01K13/00Thermometers specially adapted for specific purposes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N31/00Investigating or analysing non-biological materials by the use of the chemical methods specified in the subgroup; Apparatus specially adapted for such methods
    • G01N31/16Investigating or analysing non-biological materials by the use of the chemical methods specified in the subgroup; Apparatus specially adapted for such methods using titration
    • G01N31/18Burettes specially adapted for titration
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/18Water
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Definitions

  • the invention relates to the technical field of water quality monitoring, in particular to a water quality monitoring data analysis method and device, equipment and computer storage medium.
  • Water pollution is mainly caused by pollutants produced by human activities, often caused by the reduction or loss of the use value of water by harmful chemical substances. Sewage contains acids, alkalis, oxidants, and compounds such as copper, cadmium, mercury, arsenic, benzene, Dichloroethane, ethylene glycol and other organic poisons will poison aquatic organisms, affect drinking water sources, and destroy the landscape of scenic spots. When the organic matter in the sewage is decomposed by microorganisms, the oxygen in the water is consumed, which affects the life of aquatic organisms.
  • abnormal data are often included.
  • the source of data abnormality is the change of water quality, and some of it is from the failure of the monitoring instrument itself. How to efficiently identify the real cause of abnormal data is the primary condition for judging the authenticity of monitoring data, and it is also the follow-up cause analysis and solution. the cornerstone.
  • the embodiments of the present invention provide a water quality monitoring data analysis method, device, equipment, and computer storage medium that are more accurate, efficient, and improve the timeliness and pertinence of response to water pollution problems.
  • a first aspect of the embodiments of the present invention provides a water quality monitoring data analysis method, including:
  • a water quality monitoring data analysis device including:
  • an acquisition module for acquiring water quality monitoring data within a preset period
  • an abnormal value extraction module configured to perform abnormal point analysis on the water quality monitoring data, and mark the abnormal value of the water quality monitoring data
  • a correlation analysis module configured to determine a reference time period according to the distribution of the abnormal values, and perform correlation analysis between the water quality monitoring data of the reference time period and the water quality monitoring data of the upstream site;
  • a determination module configured to determine the occurrence time of pollution at the upstream site according to the correlation analysis result.
  • a water quality monitoring device including a processor and a memory, wherein the memory stores a computer program executable by the processor, and the computer program is executed by the processor At the same time, the water quality monitoring data analysis method described in any embodiment of the present application is realized.
  • a computer storage medium is further provided, and a computer program is stored on the computer storage medium, and when the computer program is executed by the controller, the analysis of the water quality monitoring data described in any one of the embodiments of the present application is realized. method.
  • the water quality monitoring data analysis method and device, water quality monitoring equipment and computer storage medium provided by the above embodiments can efficiently identify the real cause of abnormal data by performing abnormal point analysis on the water quality monitoring data and marking the abnormal value of the water quality monitoring data. , judge the authenticity of monitoring data, and can shorten the cycle of data identification, reduce the length of time from problem discovery to problem solving, solve key technical problems in current environmental management work, use historical data as the basis for judgment, and use algorithms as a means. Promoting the identification of data outliers can solve the efficiency shortcomings of water station operation and maintenance management from a management perspective, and improve the overall effectiveness of online monitoring data.
  • FIG. 1 is a flowchart of a method for analyzing water quality monitoring data in an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for analyzing water quality monitoring data in another embodiment of the present invention.
  • FIG. 3 is a schematic diagram of determining an abnormal value by deviating from a regression line in a water quality monitoring data analysis method according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of identifying abnormal values in a time series in a water quality monitoring data analysis method according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a method for analyzing water quality monitoring data in another embodiment of the present invention.
  • Figure 6 is a linear regression graph and a trend graph of two characteristics of turbidity and total phosphorus in September 2019 in an optional specific example
  • Figure 7 is a linear regression graph and a trend graph of two characteristics of turbidity and total phosphorus in March 2020 in an optional specific example
  • Fig. 8 is the data comparison diagram of the interval of 3 periods between upstream and downstream sites in an optional specific example
  • FIG. 9 is a schematic structural diagram of a water quality monitoring data analysis device in another embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a water quality monitoring device according to an embodiment of the present invention.
  • a method for analyzing water quality monitoring data includes the following steps:
  • the water quality monitoring data may include corresponding water quality monitoring data manually collected from each site, or may refer to water quality monitoring data corresponding to different variable parameters automatically collected from each site according to a preset frequency.
  • the preset period may refer to an arbitrarily set duration such as one year, half a year, a quarter, or a month, and may be determined according to the amount of data within the corresponding duration.
  • the water quality monitoring data includes a plurality of different variables that can characterize water quality
  • the variable parameters can include different parameters used to characterize water quality, such as water temperature, pH value, dissolved oxygen, conductivity, turbidity, permanganate index ( CODMn), ammonia nitrogen, total phosphorus, total nitrogen, chlorophyll ⁇ and algae density.
  • CODMn permanganate index
  • the water quality monitoring data can be remotely sent to the water quality monitoring data center for storage, and the water quality monitoring auditors can analyze the water quality monitoring data in the water quality monitoring data center.
  • the obtaining of the water quality monitoring data within the preset period may be to obtain the water quality monitoring data corresponding to different variable parameters collected by the same site within a set historical time period.
  • S103 Perform an abnormal point analysis on the water quality monitoring data, and mark abnormal values of the water quality monitoring data.
  • the abnormal point refers to the water quality data collected corresponding to the time point that can characterize the water quality problem.
  • the abnormal point By acquiring the water quality monitoring data, analyzing the abnormal points of the water quality monitoring data, and marking the abnormal values of the water quality monitoring data, it can be used to identify abnormal values within a certain time and space, and automatically screen and predict suspicious data. , so that it has the ability to judge and analyze abnormal data, so that it can not only provide various analysis mechanisms for a single monitoring index, but also have the ability to analyze the combination of multiple monitoring indicators.
  • S105 Determine a reference time period according to the distribution of the abnormal values, and perform a correlation analysis between the water quality monitoring data of the reference time period and the water quality monitoring data of the upstream station.
  • the intensive abnormal time points become the time of the end of the month, and then by taking 15 days before and after the end of the month, It can cover all abnormal times approximately to the end of this month, and then perform the above-mentioned before and after site correlation analysis within the abnormal time period.
  • determining a reference time period according to the distribution of the abnormal values, and performing correlation analysis between the water quality monitoring data of the reference time period and the water quality monitoring data of the upstream site including:
  • the calculation process is equivalent to deleting the first occurrence of na
  • the correlation of the lag k period can more accurately find that the relevant pollution of the upstream site has an impact on the downstream after a period of time.
  • the analysis of the correlation between the variable parameters in the water quality monitoring data of the current site and the upstream site at different intervals includes:
  • a reference time period is determined according to the marked distribution of the abnormal values of the current site, and correlation analysis is performed between the water quality monitoring data of the reference time period of the current site and the water quality monitoring data of the upstream site at different intervals.
  • the water quality monitoring data analysis method transmits the water quality monitoring data from the site to the remote data center, and after the remote data center stores the data, the abnormal point analysis of the water quality monitoring data is performed to mark the abnormality of the water quality monitoring data. It can efficiently identify the real cause of abnormal data, judge the authenticity of monitoring data, shorten the cycle of data identification, reduce the length of time from problem discovery to problem solving, and solve key technical problems in current environmental management work. Taking historical data as the basis for judgment and using algorithms as a means to promote the identification of data outliers, can solve the efficiency shortcomings of water station operation and maintenance management from a management perspective, and improve the overall effectiveness of online monitoring data.
  • the step S103 performing an abnormal point analysis on the water quality monitoring data, and marking the abnormal value of the water quality monitoring data, including:
  • a time series decomposition algorithm (Standard Template Library, STL) is used to decompose the water quality monitoring data to detect abnormal points in the time series.
  • STL Standard Template Library
  • x t represents the observations of the series at time t.
  • the series is decomposed into three parts, the seasonal item (seasonal S X ), the trend item (trend S T ) and the residual item (residual S R ), where the seasonal item represents the information of some periods in the time series, and the trend item represents the information of some cycles in the time series. Trend over time in this time series. After subtracting the seasonal and trend terms, the outliers are mined from the remaining terms.
  • R X XT X -S X ;
  • the seasonal item it can be replaced by the average value of the corresponding series of each small period. For example, if the year is the small period, then when calculating the seasonal value of January, the average value of all January values in the current data is used as the The January value in the seasonal item. For the determination of the length of the small period, it needs to be determined according to the granularity of the current data.
  • the value of the trend item can be determined by the following three methods: first, use the moving average of the time series as the value of the trend item; second, use the median of the time series as the value of the trend item; Third, use the median absolute deviation of the time series as the value of the trend term.
  • the formula for calculating the median absolute deviation can be as follows:
  • ESD Extreme Studentized Deviate
  • t p,nk-1 is the upper critical value of the t-distribution with nk-1 degrees of freedom and significance level p.
  • the system extracts the original data from the database, decomposes it by STL, calculates the decomposed sequence and extracts features. After STL filtering, the marked outliers are eliminated, and the data is re-identified by the linear regression method. The pairwise relationship between them realizes the identification of abnormal data. When two variables are found to have an out-of-context correlation, they are pair-marked.
  • the linear regression method is used to analyze the outliers twice, so that the time series method and the linear regression method can be used at the same time.
  • the two dimensions of the relationship between the two dimensions are used to identify and calculate the data to improve the accuracy of water quality monitoring data analysis.
  • the use of a linear regression method to perform secondary abnormal point analysis on the water quality monitoring data after removing the abnormal values, to mark the abnormal values of the water quality monitoring data includes:
  • the abnormal data segment is identified according to the change trend of the correlation relationship, and the abnormal value corresponding to the abnormal data segment is extracted and marked.
  • the correlation can be characterized by the value of the correlation coefficient.
  • the two variable parameters are represented by x and y respectively, and the correlation coefficient ⁇ between the two variable parameters can be shown in the following formula.
  • the performing abnormal point analysis on the water quality monitoring data, and marking the abnormal values of the water quality monitoring data further includes:
  • the outliers are identified by the Cook distance, and the outliers whose Cook distance is greater than the threshold are marked as outliers.
  • the abnormal data segment is identified according to the change trend of the correlation relationship, and after the abnormal value corresponding to the abnormal data segment is extracted and marked, the abnormal value is masked, and the masked data is subjected to linear regression analysis.
  • performing linear regression analysis on the masked data includes using two variable parameters with larger correlation coefficients as a variable parameter group whose correlation is higher than a threshold, performing linear regression analysis, and performing linear regression analysis according to the error vector determined by the linear regression analysis. , standard error and parameter estimates to calculate Cook distance, identify outliers by Cook distance, and mark outliers with Cook distance greater than the threshold as outliers.
  • a scatter plot and a regression line can be drawn, and the Cook distance of each variable parameter is calculated.
  • the Cook distance threshold is 5 times the average of the Cook distances for all variable parameters.
  • the Cook distance can be calculated as follows:
  • h i refers to the parameter estimate of the ith element, is the ith component of the error vector, i.e.
  • the ith component of the vector, MSE refers to the standard error.
  • variable parameter group x and y Taking the variable parameter group whose correlation is higher than the threshold including two variable parameters x and y as an example, a linear regression analysis is performed on the variable parameter group x and y to establish a one-dimensional model as shown in the following formula:
  • the parameter estimates for B are:
  • the parameters of the error vector ⁇ are estimated as: where I is the identity matrix.
  • the corresponding scatter plot and regression line can be drawn according to the above calculation, and a one-dimensional model is established by performing linear regression analysis on the variable parameter group whose correlation is higher than the threshold.
  • determining a reference time period according to the distribution of the abnormal values, and performing correlation analysis between the water quality monitoring data of the reference time period and the water quality monitoring data of the upstream station including:
  • All marked abnormal values are sorted, and the marked abnormal value is a pile point, and the time window is expanded according to the set time frequency with the pile point as the center, and a reference time period is determined according to the time window;
  • Correlation analysis is performed between the water quality monitoring data of the reference time period and the water quality monitoring data of the upstream station.
  • the analysis can be improved. Accuracy and efficiency.
  • determining the time of occurrence of pollution at the upstream site includes:
  • Correlation analysis is performed between the water quality monitoring data of the reference time period and the upstream station data at different intervals, and the time of occurrence of pollution at the upstream station is determined according to the correlation coefficient with the upstream station data at different intervals.
  • the intensive abnormal time points become the time of the end of the month, and then by taking 15 days before and after the end of the month, It can cover all abnormal times approximately to the end of this month, and then perform the above-mentioned before and after site correlation analysis within the abnormal time period.
  • the method further includes:
  • the updated variable parameters are formed according to the variation of the two parameters whose time series interval is n in the variable parameter vector;
  • the correlation coefficient between different variables and parameters at the same site may be low, but from the actual perception, the change trend has a certain correlation.
  • the updated variable parameters are formed according to the variation of the two parameters whose time series interval is n in the variable parameter vector, and then the correlation analysis is performed on the updated variable parameters.
  • the updated variable parameter can be formed by using the data after the first difference to calculate the correlation coefficient.
  • the two variable parameters are respectively (x 1 ,x 2 ,...,x n ) and (y 1 ,y 2 ,...,y n ), the first difference refers to the variation of the two variable parameters whose original time series interval is 1, and the calculated time series interval is
  • the updated variable parameters formed by the variation of the two parameters of 1 are (x 2 -x 1 ,...,x n -x n-1 ) and (y 2 -y 1 ,...,y n -y n-1 ) , and then use the new data for subsequent correlation coefficient calculations and regression analysis.
  • the rate of change of the original time series with an interval of 1 can also be used for subsequent analysis, that is, the new data is and
  • the variation of the original time series interval of k can be used, that is, the new data are (x k+1 -x 1 ,...,x n -x nk ) and (y k+1 -y 1 ,...,y n ) -y nk ) to perform subsequent correlation coefficient calculation and regression analysis.
  • the variation of the original time series interval of 1 is mainly used to perform subsequent correlation coefficient calculation and regression analysis.
  • the obtaining of water quality monitoring data within a preset period includes:
  • variable parameters may include different parameters for characterizing water quality, such as water temperature, pH, dissolved oxygen, conductivity, turbidity, permanganate index (CODMn), ammonia nitrogen, total phosphorus, total nitrogen, chlorophyll alpha and algal density.
  • water temperature such as water temperature, pH, dissolved oxygen, conductivity, turbidity, permanganate index (CODMn), ammonia nitrogen, total phosphorus, total nitrogen, chlorophyll alpha and algal density.
  • CODMn permanganate index
  • the water quality monitoring data analysis method further includes:
  • the water quality monitoring data is screened according to the abnormal value upper limit value, the abnormal value lower limit value corresponding to different variable parameters, and the set variable parameter relationship to obtain effective water quality monitoring data.
  • variable parameters including water temperature, pH value, dissolved oxygen, electrical conductivity, turbidity, permanganate index (CODMn), ammonia nitrogen, total phosphorus, total nitrogen, chlorophyll alpha and algae density as an example.
  • CODMn permanganate index
  • the set variable parameter relationship includes: a), total nitrogen is greater than ammonia nitrogen; b), chemical oxygen demand is greater than permanganate index; c), chemical oxygen demand is greater than biochemical oxygen demand.
  • the upper limit value of abnormal value, the lower limit value of abnormal value corresponding to different variable parameters, and the set variable parameter relationship can be determined according to the empirical value of manual review.
  • the water quality monitoring data is screened according to the value and the set variable parameter relationship, and part of the calculation that does not participate in the mathematical analysis can be deleted, so as to reduce noise and improve the accuracy of the mathematical analysis.
  • the water quality monitoring data analysis method includes the following steps:
  • S11 perform STL decomposition on the water quality monitoring data, calculate and extract the features of the decomposed sequences, and mark and extract the outliers in the time series features;
  • the water quality monitoring data analysis method uses both the time series method and the linear regression method to identify and calculate the data from the two dimensions of time transformation and relationship between variables, and cross-use a variety of outlier identification technologies to achieve targeted
  • the outlier identification function of the system can realize the establishment of a recognition model for the existing monitoring data, and apply it to the monitoring section where it is located.
  • To achieve dual identification of remote prediction and on-site verification to ensure the timeliness and pertinence of the response to abnormal environmental protection events.
  • linear regression and time series algorithms starting from time and the internal relationship of variables, outliers are identified and judged.
  • the two methods are alternately performed according to the on-site conditions during use, and parameters are optimized for specific situations.
  • the quality judgment is integrated into the data statistical model algorithm, and the data identification of the relatively stable monitoring section has achieved the degree of quantitative identification.
  • model operation result using the water quality monitoring data analysis method of the embodiment of the present application includes:
  • the experimental data is the monitoring data of a certain water system site.
  • stations in this water system There are 4 stations in this water system, denoted as station A, station B, station C, station D respectively.
  • each station is characterized as water temperature (°C), pH (dimensionless), dissolved oxygen (mg/L), conductivity ( ⁇ S/cm), turbidity (NTU), permanganate index (mg/L) ), ammonia nitrogen (mg/L), total phosphorus (mg/L), a total of 8 variable parameter characteristics.
  • the time frequency of selecting a pile point is D, that is, every day is regarded as a pile point, the number of periods calculated is 3 periods, the threshold of the correlation coefficient is set to 0.8, and the time window for each abnormal point is 16 hours. Take 16 hours before and after the pile point as the time window length as an example. Each period is 4 hours long.
  • the analysis results of dissolved oxygen in A-B are selected for display. The analysis results of dissolved oxygen are as follows:
  • the embodiment of the present application further provides a water quality monitoring data analysis device, including an acquisition module 11 for acquiring water quality monitoring data within a preset period; an abnormal value extraction module 12 for The abnormal point analysis is performed on the water quality monitoring data, and the abnormal value of the water quality monitoring data is marked; the correlation analysis module 13 is used to determine a reference time period according to the distribution of the abnormal value, and use the water quality monitoring data of the reference time period. Correlation analysis is performed with the water quality monitoring data of the upstream site; the determination module 14 is configured to determine the occurrence time of pollution in the upstream site according to the correlation analysis result.
  • the outlier extraction module 12 is specifically configured to decompose the water quality monitoring data by using a time series decomposition algorithm, and mark the outliers in the decomposed water quality detection data sequence as outliers; The outliers are eliminated; the linear regression method is used to perform secondary outlier analysis on the water quality monitoring data after the outliers are eliminated, so as to mark the abnormal values of the water quality detection data.
  • the abnormal value extraction module 12 is further configured to analyze the correlation relationship between the variable parameters in the water quality monitoring data after removing the abnormal value; Identify, extract the abnormal value corresponding to the abnormal data segment and mark it.
  • the outlier extraction module 12 is also used for shielding the outliers, and performing linear regression analysis on the masked data; identifying outliers through Cook's distance, and marking the outliers whose Cook's distance is greater than the threshold is an outlier.
  • the correlation analysis module 13 is specifically used to sort out all the marked abnormal values, take the marked abnormal values as pile points, and expand the time window according to the set time frequency with the pile points as the center, A reference time period is determined according to the time window; and correlation analysis is performed between the water quality monitoring data of the reference time period and the water quality monitoring data of the upstream station.
  • the correlation analysis module 13 is further configured to perform correlation analysis between the water quality monitoring data of the reference time period and the upstream station data at different intervals, and according to the correlation coefficient with the upstream station data at different intervals, Determine when the upstream site contamination occurred.
  • the acquisition module 11 is specifically configured to acquire water quality monitoring data corresponding to multiple variable parameters of the same site within a preset period.
  • another embodiment of the present application further provides a water quality monitoring device, including a processor 51 and a memory 52, wherein the memory 52 stores a computer program executable by the processor, and the computer When the program is executed by the processor 51, the steps of the water quality monitoring data analysis method provided by any embodiment of the present application are implemented.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Molecular Biology (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

一种水质监测数据分析方法及装置、设备及计算机存储介质,所述水质监测方法包括:获取预设周期内的水质监测数据(S101);对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值(S103);根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析(S105);根据所述相关性分析结果,确定上游站点污染发生时间(S107)。

Description

水质监测数据分析方法及装置、设备、存储介质 技术领域
本发明涉及水质监测技术领域,特别涉及一种水质监测数据分析方法及装置、设备、计算机存储介质。
背景技术
水污染主要是由人类活动产生的污染物造成,往往是由有害化学物质造成水的使用价值降低或丧失,污水中含有酸、碱、氧化剂,以及铜、镉、汞、砷等化合物,苯、二氯乙烷、乙二醇等有机毒物,会毒死水生生物、影响饮用水源、破坏风景区景观。污水中的有机物被微生物分解时消耗水中的氧,影响水生生物的生命,水中溶解氧耗尽后,有机物进行厌氧分解,产生硫化氢、硫醇等难闻气体,使水质进一步恶化。因此,水污染已对人类的生存安全构成重大威胁,成为人类健康、经济和社会可持续发展的重大障碍。
目前,随着对水质进行在线监测的技术发展,当前已经做到可以大量获取水质监测数据。但是对数据的审核工作主要还是通过人工,随着在线采集数据量的增加,人工审核的工作量激增,这就为水质检测数据异常值识别的效率和后期现场排查提供了难度。而且由于人工审核是靠经验和感官,面对庞大的数据很容易发生错审、漏审,审核花费的时间长,在现场出现异常后很难在第一时间完成预判,容易造成排查的滞后性。
此外,在监测数据获取过程中,经常收录有异常数据。造成数据异常的源头有来着水质变化因素,也有部分来自监测仪器本身故障的因素,如何高效的识别造成异常数据的真正成因,是判别监测数据真实性的首要条件,也是后续成因分析,解决方案的基石所在。
发明内容
为了解决现有存在的技术问题,本发明实施例提供一种更加准确、高效和提升对水质污染问题响应及时性和针对性的水质监测数据分析方法及装置、设备、计算机存储介质。
本发明实施例第一方面,提供一种水质监测数据分析方法,包括:
获取预设周期内的水质监测数据;
对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值;
根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监 测数据与上游站点的水质监测数据进行相关性分析;
根据所述相关性分析结果,确定上游站点污染发生时间。
本申请实施例第二方面,提供一种水质监测数据分析装置,包括:
获取模块,用于获取预设周期内的水质监测数据;
异常值提取模块,用于对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值;
相关性分析模块,用于根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析;
确定模块,用于根据所述相关性分析结果,确定上游站点污染发生时间。
本申请实施例的第三方面,还提供一种水质监测设备,包括处理器及存储器,所述存储器内存储有可被所述处理器执行的计算机程序,所述计算机程序被所述处理器执行时实现本申请任一实施例所述水质监测数据分析方法。
本申请实施例的第四方面,还提供一种计算机存储介质,所述计算机存储介质上存储有计算机程序,所述计算机程序被控制器执行时实现本申请任一实施例所述水质监测数据分析方法。
上述实施例提供的水质监测数据分析方法及装置、水质监测设备及计算机存储介质,通过对水质监测数据进行异常点分析,标记所述水质检测数据的异常值,可高效识别造成异常数据的真正成因,判别监测数据的真实性,并且可以缩短数据识别的周期,减小从发现问题到解决问题的时间长度,解决当前环境管理工作中的关键技术难题,以历史数据作为判定基础,以算法为手段推动数据异常值识别,能够从管理角度解决水站运维管理的效率短板,提升在线监测数据的整体有效性。
附图说明
图1为本发明一实施例中水质监测数据分析方法的流程图;
图2为本发明另一实施例中水质监测数据分析方法的流程图;
图3为本发明实施例水质监测数据分析方法中通过偏离回归线确定异常值的示意图;
图4为本发明实施例水质监测数据分析方法中在时间序列中标识异常值的示意图;
图5为本发明另一实施例中水质监测数据分析方法的流程图;
图6为可选的具体示例中2019年9月浊度和总磷两个特征的线性回归图与趋势图;
图7为可选的具体示例中2020年3月中浊度和总磷两个特征的线性回归图与趋势图;
图8为可选的具体示例中上下游站点间隔3期的数据对比图;
图9为本发明另一实施例中水质监测数据分析装置的结构示意图;
图10为本发明一实施例中水质监测设备的结构示意图。
具体实施方式
以下结合说明书附图及具体实施例对本发明技术方案做进一步的详细阐述。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明的保护范围。本文所使用的术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。
在以下的描述中,涉及到“一些实施例”的表述,其描述了所有可能实施例的子集,但是应当理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
请参阅图1,为本发明一实施例提供的一种水质监测数据分析方法,包括如下步骤:
S101,获取预设周期内的水质监测数据。
其中,水质监测数据可以包括从各个站点人工采集到的对应的水质监测数据,也可以是指按照预设频率从各个站点自动采集到不同变量参数对应的水质监测数据。预设周期可以是指一年、半年、一个季度或一个月等任意设定的时长,可以根据相应时长内数据量大小确定。
可选的,水质监测数据包括能够表征水质的多个不同变量,变量参数可以包括用于表征水质的不同参数,如水温、PH值、溶解氧、电导率、浊度、高锰酸盐指数(CODMn)、氨氮、总磷、总氮、叶绿素α及藻密度等。水质监测数据进行采集后可通过远程方式发送给水质监测的数据中心进行存储,水质监测审核人员可以在水质监测的数据中心对水质监测数据进行分析。所述获取预设周期内的水质监测数据可以是,获取同一站点在设定的历史时间段内采集到的不同变量参数对应的水质监测数据。
S103,对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值。
异常点是指能够表征水质存在问题的时间点所对应采集到的水质数据。通过获取水质监测数据,对水质监测数据进行异常点分析,标记所述水质检测数据的异常值,可以用于对一定时间、一定空间范围内的异常值进行识别,对可疑数据自动筛选和预判,使得具备对异常数据的研判分析能力,从而不仅能对单个监测指标提供各种分析机制,还具有多个监测指标组合分析能力。
S105,根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析。
对当前站点和上游站点间隔不同周期的所述水质监测数据中变量参数两两 间的相关性进行分析,根据所述相关性分析结果,确定上游站点污染发生时间的过程中,若使用全部数据效果不太明显,因此我们将考察时间段放在异常发生的时间段内,通过观察滞后k期的相关性,判断上、下游站点在异常发生时的影响。通过根据标记的异常值的分布状况,以异常值为桩点扩大时间窗,确定参考时间段,采用参考时间段内的数据来计算与其间隔k期的数据之间的相关性,可以提高分析准确性和效率。
S107,根据所述相关性分析结果,确定上游站点污染发生时间。
如,以天为单位,将每个异常时间点向距离最近的月底的时间点近似,这样处理之后,密集的异常时间点都变成了月底的时间,再通过在月底前后各取15天就可以覆盖所有近似到这个月底时间点所有的异常时间了,然后再在异常时间段内进行上述的前后站点相关性分析。
可选的,所述根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析,包括:
获取上游站点的不同变量参数对应的水质监测数据;
对当前站点和上游站点间隔不同周期的所述水质监测数据中变量参数两两间的相关性进行分析。
上游站点的某一变量参数的时间序列为x=(x 1,x 2,…),下游站点的相同变量参数的时间序列为y=(y 1,y 2,…),对当前站点和上游站点间隔不同周期的所述水质监测数据中变量参数两两间的相关性进行分析,以间隔k期为例,可以把站点x的指标序列滞后k期得到x(k)=(na,na,…,x 1,x 2,…),该时间序列到k+1的位置才出现x 1,然后再计算x(k)和y之间的相关系数,计算过程相当于删除掉最开始出现na的那几期样本,通过滞后k期的相关性可以更加精准地发现上游站点的相关污染经过一段时间对下游产生影响。
其中,所述对当前站点和上游站点间隔不同周期的所述水质监测数据中变量参数两两间的相关性进行分析,包括:
根据标记的当前站点的所述异常值的分布状况确定参考时间段,将当前站点的所述参考时间段的水质监测数据与上游站点间隔不同周期的所述水质监测数据进行相关性分析。
上述实施例中,水质监测数据分析方法通过将水质监测数据从现场传输至远程数据中心,远程数据中心将数据入库后,通过对水质监测数据进行异常点分析,标记所述水质检测数据的异常值,可以高效识别造成异常数据的真正成因,判别监测数据的真实性,并且可以缩短数据识别的周期,减小从发现问题到解决问题的时间长度,解决当前环境管理工作中的关键技术难题,以历史数据作为判定基础,以算法为手段推动数据异常值识别,能够从管理角度解决水站运维管理的效率短板,提升在线监测数据的整体有效性。
在一些实施例中,请参阅图2,所述步骤S103,对所述水质监测数据进行 异常点分析,标记所述水质检测数据的异常值,包括:
S1031,采用时间序列分解算法对所述水质监测数据进行分解,将分解后的所述水质检测数据序列中的离群值标记为异常值。
本实施例中,是对水质监测数据采用时间序列分解算法(Standard Template Library,STL)分解来检测时间序列中的异常点。对于一个时间序列X,x t表示了该序列在时间t的观测值。将该序列分解成三个部分,季节项(seasonal S X),趋势项(trend S T)和余项(residual S R),其中季节项表示了时间序列中一些周期的信息,趋势项表示了该时间序列中随时间的趋势。在减去了季节项和趋势项之后,再从余项中挖掘异常点。
R X=X-T X-S X
对于季节项,可以用每个小周期对应序列的值的平均值来代替,比如以年为小周期,那么在计算一月的季节值时,就用当前数据中所有一月值的平均值作为季节项中的一月值。对于小周期长度的确定,则需要根据当前数据的粒度来确定。对于趋势项,趋势项到的值可以通过如下三种方法来确定:第一,用时间序列的滑动平均值作为趋势项的值;第二、利用时间序列的中位数作为趋势项的值;第三、利用时间序列的中位数绝对偏差来作为趋势项的值。其中中位数绝对偏差的计算公式可以如下:
MAD=median i(|x i-median i(x j)|);
在对时间序列减去了季节项和趋势项之后,可以利用ESD(Extreme Studentized Deviate)对余项进行异常点检测。ESD算法如下:
ESD假设数据中至多有k个异常点。首先计算出数据中统计量C k
Figure PCTCN2021114248-appb-000001
在计算C k时,首先计算
Figure PCTCN2021114248-appb-000002
在计算C 1之后,去掉使得C 1中最大化的数据,利用剩下的n-1个数据计算C 2,之后迭代计算直至得到C k.利用C 1…C k与临界值λ 1…λ k进行比较,选取最大的i使得C ii.则取使得前C i个统计量中最大化的数据点作为异常点,共i个异常点。λ k的计算公式为:
Figure PCTCN2021114248-appb-000003
其中t p,n-k-1是自由度为n-k-1,显著水平为p的t分布的上临界值。
S1033,将标记出的所述异常值剔除。
S1035,采用线性回归方法对剔除所述异常值后的所述水质监测数据进行二次异常点分析,以标记所述水质检测数据的异常值。
系统从数据库中提取原始数据,对其进行STL分解,将分解后的序列进行计算并特征提取,STL过滤后,将标记出的异常值剔除,并用线性回归方法对数据进行二次识别,通过变量间的两两关系实现对异常数据的识别。当发现两个变量的相关关系与常规不符时,对其进行配对标记。
上述实施例中,分别采用时间序列分解算法进行异常值识别后,再采用线性回归方法对剔除异常值之后进行二次异常点分析,从而可同时使用时序方法及线性回归方法,从时间变换、变量间关系两个维度对数据进行识别计算,提高水质监测数据分析的准确性。
在一些实施例中,所述采用线性回归方法对剔除所述异常值后的所述水质监测数据进行二次异常点分析,以标记所述水质检测数据的异常值,包括:
对剔除所述异常值后的所述水质监测数据中变量参数两两间的相关关系进行分析;
根据所述相关关系的变化趋势对异常数据段进行识别,提取所述异常数据段对应的异常值进行标记。
其中,相关关系可以用相关系数的值来表征。将两个变量参数分别用x、y表示,两个变量参数之间的相关系数ρ可以如下公式所示。
Figure PCTCN2021114248-appb-000004
根据水质监测数据中变量参数两两间的相关关系绘制曲线,当时间序列中水质监测数据的变化趋势与所述相关关系的变化趋势不符合时,可以确定为异常数据段,并提取所述异常数据段对应的异常值。
在一些实施例中,所述对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值,还包括:
对所述异常值进行屏蔽,对屏蔽后的数据进行线性回归分析;
通过库克距离进行异常点识别,将库克距离大于阈值的异常点标记为异常值。
其中,根据所述相关关系的变化趋势对异常数据段进行识别,提取所述异常数据段对应的异常值进行标记之后,还包括对异常值进行屏蔽,并对屏蔽后的数据进行线性回归分析。可选的,对屏蔽后的数据进行线性回归分析包括将相关系数较大的两个变量参数作为相关性高于阈值的变量参数组,进行线性回归分析,根据所述线性回归分析确定的误差向量、标准误差和参数估计值计算库克距离,通过库克距离进行异常点识别,将库克距离大于阈值的异常点标记为异常值。
请结合参阅图3和图4,通过进行线性回归分析可以画出散点图以及回归直线,计算每个变量参数的库克(cook)距离,库克距离越大,则表示偏离回归直线越远,将库克距离大于阈值的异常点标记为异常值。可选的,库克距离阈值为所有变量参数的库克距离的平均值的5倍。库克距离的计算可以如下公式所示:
Figure PCTCN2021114248-appb-000005
h i是指第i个元素的参数估计值,
Figure PCTCN2021114248-appb-000006
为误差向量的第i个分量,即
Figure PCTCN2021114248-appb-000007
向量的第i个分量,MSE是指标准误差。
以相关性高于阈值的变量参数组包括两个变量参数x、y为例,对变量参数组x、y进行线性回归分析建立一维模型如下公式所示:
y i=a+bx ii
a,b分别为线性参数,将变量参数x、y各个时间序列对应的值为样本,其中一个指标记为n维向量y,另一个指标和全为1的向量组成n*p的矩阵X=(1,x),这里p=2,记为线性参数向量B(a,b),ε表示n维误差向量,则上式的矩阵形式为:y=XN+∈。
B的参数估计为:
Figure PCTCN2021114248-appb-000008
y的参数估计为:
Figure PCTCN2021114248-appb-000009
记H=X(X′X) -1X′,则上式可记为:
Figure PCTCN2021114248-appb-000010
参数估计值为矩阵H的第i个对角线元素h i
误差向量ε的参数估计为:
Figure PCTCN2021114248-appb-000011
其中I为单位矩阵。
标准误差为:
Figure PCTCN2021114248-appb-000012
上述实施例中,根据上述计算可以画出相应的散点图以及回归直线,通过对相关性高于阈值的变量参数组进行线性回归分析建立一维模型,根据一维模型对获取到的更新时段内的水质监测数据进行实时监测,对可疑数据自动筛选和预判,捕获异常数据,从而实现对水质的智能监测。
在一些实施例中,所述根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析,包括:
对所有标记后的异常值进行整理,以标记的所述异常值为桩点,以所述桩点为中心按照设定时间频率扩大时间窗,根据所述时间窗确定参考时间段;
将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析。
通过根据标记的异常值的分布状况,以异常值为桩点扩大时间窗,确定参考时间段,通过采用参考时间段内的数据来计算与其间隔k期的数据之间的相关性,可以提高分析准确性和效率。
在一些实施例中,所述根据所述相关性分析结果,确定上游站点污染发生时间,包括:
将所述参考时间段的水质监测数据与间隔不同周期的上游站点数据进行相关性分析,根据与间隔不同周期的上游站点数据的相关性系数,确定上游站点污染发生时间。
对当前站点和上游站点间隔不同周期的所述水质监测数据中变量参数两两间的相关性进行分析,根据所述相关性分析结果,确定上游站点污染发生时间的过程中,若使用全部数据效果不太明显,因此我们将考察时间段放在异常发生的时间段内,通过观察滞后k期的相关性,判断上、下游站点在异常发生时的影响。通过根据标记的异常值的分布状况,以异常值为桩点扩大时间窗,确定参考时间段,采用参考时间段内的数据来计算与其间隔k期的数据之间的相关性,可以提高分析准确性和效率。
如,以天为单位,将每个异常时间点向距离最近的月底的时间点近似,这样处理之后,密集的异常时间点都变成了月底的时间,再通过在月底前后各取15天就可以覆盖所有近似到这个月底时间点所有的异常时间了,然后再在异常时间段内进行上述的前后站点相关性分析。
可选的,所述方法还包括:
对相关性低于阈值的变量参数,根据所述变量参数向量中时间序列间隔为n的两个参数的变化量,形成更新的变量参数;
对所述更新的变量参数两两间的相关性进行分析,并返回所述对相关性高于阈值的变量参数组进行线性回归分析的步骤。
在实际数据分析中,可能出现同一站点不同变量参数之间的相关系数较低的情况,然而从实际观感来说变化趋势有一定相关性。对相关性低于阈值的变量参数,根据所述变量参数向量中时间序列间隔为n的两个参数的变化量,形成更新的变量参数,再对更新的变量参数进行相关性分析。其中,根据所述变量参数向量中时间序列间隔为n的两个参数的变化量,形成更新的变量参数,可以是利用一次差分之后的数据进行相关系数的计算,如,两个变量参数分别为(x 1,x 2,…,x n)和(y 1,y 2,…,y n),一次差分是指原时间序列间隔为1的两个变量参数的变化量,计算时间序列间隔为1的两个参数的变化量形成的更新的变量参数为(x 2-x 1,…,x n-x n-1)和(y 2-y 1,…,y n-y n-1),然后再用新的数据进行后续相关系数的计算以及回归分析。
类似的,还可以利用原时间序列间隔为1的变化率来做后续分析,即新数据为
Figure PCTCN2021114248-appb-000013
Figure PCTCN2021114248-appb-000014
进一步的,可以使用原时间序列间隔为k的变化量,即新数据为(x k+1-x 1,…,x n-x n-k)和(y k+1-y 1,…,y n-y n-k)进行后续相关系数的计算以及回归分析,本实施例中,主要是使用原时间序列间隔为1的变化量来进行后续相关系数的计算以及回归分析。
所述获取预设周期内的水质监测数据,包括:
获取预设周期内同一站点的多个变量参数对应的水质监测数据。
可选的,变量参数可以包括用于表征水质的不同参数,如水温、PH值、溶解氧、电导率、浊度、高锰酸盐指数(CODMn)、氨氮、总磷、总氮、叶绿素α及藻密度等。
可选的,所述水质监测数据分析方法还包括:
根据不同变量参数对应的异常值上限值、异常值下限值,以及设定的变量参数关系对所述水质监测数据进行筛选,得到有效的水质监测数据。
以所述变量参数包括水温、PH值、溶解氧、电导率、浊度、高锰酸盐指数(CODMn)、氨氮、总磷、总氮、叶绿素α及藻密度为例。所述变量参数对应的异常值上限值、异常值下限值可以如下表一所示:
Figure PCTCN2021114248-appb-000015
设定的变量参数关系包括:a)、总氮大于氨氮;b)、化学需氧量大于高锰酸盐指数;c)、化学需氧量大于生化需氧量。
不同变量参数对应的异常值上限值、异常值下限值,以及设定的变量参数关系可以根据人工审核的经验值确定,通过根据不同变量参数对应的异常值上限值、异常值下限值及设定的变量参数关系对所述水质监测数据进行筛选,可以删除部分不参与数理分析的计算,达到减少噪音以提升数理分析的准确性的目的。
为了能够对水质监测数据分析方法实施例有进一步整体的理解,请参阅图5,为一可选的具体示例提供的水质监测数据分析方法,包括如下步骤:
S11,对水质监测数据进行STL分解,将分解后的序列进行计算并特征提取,将时序特征中出现的离群值进行标记并提取;
S12,STL过滤后,将标记出的异常值剔除,用线性回归方法对数据进行二次识别,通过变量之间的两两关系实现对异常数据的识别;关系与常规不符时进行配对标记;
S13,对异常值进行屏蔽,屏蔽后的数据进行线性回归分析,通过cook值进行异常点识别,挑出异常点;
S14,对所有标记后的异常值进行整理,以做过标记的异常值为桩点扩大时 间窗,将该时间段数据与上游站点数据进行相关性分析,通过相关性判定异常值之后时段,以此推断上游污染可能的发生时间点。
本申请实施例所提供的水质监测数据分析方法,同时使用时序方法及线性回归方法,从时间变换、变量间关系两个维度对数据进行识别计算,交叉使用多种异常值识别技术实现有针对性的异常值识别功能,能够实现对已有监测数据建立识别模型,并将其应用到所在监测断面,就能通过远程的数据诊断实现对异常值的预判,并采取相应的应急手段。做到远程预判、现场验证的双重识别,确保对环保异常事件的反应及时性和针对性。通过综合使用线性回归和时间序列算法,从时间及变量内部关系入手,对异常值进行识别和判定,两种方法在使用过程中根据现场情况进行交替进行,并对特定情况进行参数优化,将经验性判断融入到数据统计模型算法中,对相对稳定监测断面的数据识别做到了量化识别的程度。
在一可选的具体示例中,采用本申请实施例的水质监测数据分析方法的模型运算结果包括:
1)异常值识别
对于同一站点不同指标之间异常值检测算法,实验数据为某水系站点监测数据。该水系有4个站点,分别记为A站点,B站点,C站点,D站点。其中每个站点的特征均为水温(℃),pH(无量纲),溶解氧(mg/L),电导率(μS/cm),浊度(NTU),高锰酸盐指数(mg/L),氨氮(mg/L),总磷(mg/L)共8个变量参数特征。
Figure PCTCN2021114248-appb-000016
2)单站点多参数相关性
下面以2019年-2020年中A站点的浊度与总磷两个特征为例进行结果展示。
2019年9月中浊度和总磷两个特征的线性回归图与趋势图如图6所示,(在2019年9月中浊度无数据缺失,总磷共有11条数据缺失,两者之间的相关系数为0.83434)。直线两侧的黑点所标记出的分别为使用时序异常检测算法所检测出的异常点、以及使用cook距离所标记出的异常点。氨氮与总磷的回归关系为:
总磷(mgL)=0.023564+0.000136x浊度(NTU)
2020年3月中浊度和总磷两个特征的线性回归图与趋势图如图7所示(在 2020年3月中浊度和总磷均无数据缺失,两者之间的相关系数为0.67543):
浊度和总磷的回归关系为:
总磷(mgL)=0.012672+0.000278x浊度(NTU)
从图中可以看出,绝大部分的异常点均被算法标记出,由此可以验证算法的有效性。
3)多站点单参数相关性
选取桩点的时间频率为D,即每天作为一个桩点,计算的期数为3期,相关系数的阈值设置为0.8,每一个异常点的选取的时间窗口时长为16小时,即选择每一个桩点的前后16小时作为时间窗口长度为例。每一期的时长为4小时。选取A-B中溶解氧的分析结果进行展示,溶解氧的分析结果如下:
上游站点:A
下游站点:B
分析属性:溶解氧(mgL)
找到异常点个数:110
找到间隔为0期的个数为:23
找到间隔为1期的个数为:29
找到间隔为2期的个数为:55
找到间隔为3期的个数为:80
其中大部分数据同样显示两者之间的间隔关系为3期。同样的,如图8所示,取了一些桩点的具体数据进行展示,其中下游站点数据为间隔了12小时之后的数据。从图中可以看出,两者在该期数下具有较强的相关关系,由此可以证明算法的有效性。
本申请实施例另一方面,请参阅图9,还提供一种水质监测数据分析装置,包括获取模块11,用于获取预设周期内的水质监测数据;异常值提取模块12,用于对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值;相关性分析模块13,用于根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析;确定模块14,用于根据所述相关性分析结果,确定上游站点污染发生时间。
其中,所述异常值提取模块12,具体用于采用时间序列分解算法对所述水质监测数据进行分解,将分解后的所述水质检测数据序列中的离群值标记为异常值;将标记出的所述异常值剔除;采用线性回归方法对剔除所述异常值后的所述水质监测数据进行二次异常点分析,以标记所述水质检测数据的异常值。
其中,所述异常值提取模块12,还用于对剔除所述异常值后的所述水质监测数据中变量参数两两间的相关关系进行分析;根据所述相关关系的变化趋势 对异常数据段进行识别,提取所述异常数据段对应的异常值进行标记。
其中,所述异常值提取模块12,还用于对所述异常值进行屏蔽,对屏蔽后的数据进行线性回归分析;通过库克距离进行异常点识别,将库克距离大于阈值的异常点标记为异常值。
其中,所述相关性分析模块13,具体用于对所有标记后的异常值进行整理,以标记的所述异常值为桩点,以所述桩点为中心按照设定时间频率扩大时间窗,根据所述时间窗确定参考时间段;将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析。
其中,所述相关性分析模块13,还用于将所述参考时间段的水质监测数据与间隔不同周期的上游站点数据进行相关性分析,根据与间隔不同周期的上游站点数据的相关性系数,确定上游站点污染发生时间。
其中,所述获取模块11,具体用于获取预设周期内同一站点的多个变量参数对应的水质监测数据。
需要说明的是:上述实施例提供的水质监测数据分析装置在水质监测过程中,仅以上述各程序模块的划分进行举例说明,在实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即可将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分方法步骤。另外,上述实施例提供的水质监测数据分析装置与水质监测数据分析方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
请参阅图10,本申请实施例另一方面,还提供一种水质监测设备,包括处理器51及存储器52,所述存储器52内存储有可被所述处理器执行的计算机程序,所述计算机程序被所述处理器51执行时实现本申请任一实施例提供的水质监测数据分析方法的步骤。
本领域普通技术人员可以理解的,实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到 变化或替换,都应涵盖在本发明的保护范围之内。本发明的保护范围应以所述权利要求的保护范围以准。

Claims (10)

  1. 一种水质监测数据分析方法,其特征在于,包括:
    获取预设周期内的水质监测数据;
    对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值;
    根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析;
    根据所述相关性分析结果,确定上游站点污染发生时间。
  2. 如权利要求1所述的水质监测数据分析方法,其特征在于,所述对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值,包括:
    采用时间序列分解算法对所述水质监测数据进行分解,将分解后的所述水质检测数据序列中的离群值标记为异常值;
    将标记出的所述异常值剔除;
    采用线性回归方法对剔除所述异常值后的所述水质监测数据进行二次异常点分析,以标记所述水质检测数据的异常值。
  3. 如权利要求2所述的水质监测数据分析方法,其特征在于,所述采用线性回归方法对剔除所述异常值后的所述水质监测数据进行二次异常点分析,以标记所述水质检测数据的异常值,包括:
    对剔除所述异常值后的所述水质监测数据中变量参数两两间的相关关系进行分析;
    根据所述相关关系的变化趋势对异常数据段进行识别,提取所述异常数据段对应的异常值进行标记。
  4. 如权利要求2所述的水质监测数据分析方法,其特征在于,所述对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值,还包括:
    对所述异常值进行屏蔽,对屏蔽后的数据进行线性回归分析;
    通过库克距离进行异常点识别,将库克距离大于阈值的异常点标记为异常值。
  5. 如权利要求1所述的水质监测数据分析方法,其特征在于,所述根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析,包括:
    对所有标记后的异常值进行整理,以标记的所述异常值为桩点,以所述桩点为中心按照设定时间频率扩大时间窗,根据所述时间窗确定参考时间段;
    将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析。
  6. 如权利要求5所述的水质监测数据分析方法,其特征在于,所述根据所述相关性分析结果,确定上游站点污染发生时间,包括:
    将所述参考时间段的水质监测数据与间隔不同周期的上游站点数据进行相关性分析,根据与间隔不同周期的上游站点数据的相关性系数,确定上游站点污染发生时间。
  7. 如权利要求1所述的水质监测数据分析方法,其特征在于,所述获取预设周期内的水质监测数据,包括:
    获取预设周期内同一站点的多个变量参数对应的水质监测数据。
  8. 一种水质监测数据分析装置,其特征在于,包括:
    获取模块,用于获取预设周期内的水质监测数据;
    异常值提取模块,用于对所述水质监测数据进行异常点分析,标记所述水质检测数据的异常值;
    相关性分析模块,用于根据所述异常值的分布状况确定参考时间段,将所述参考时间段的水质监测数据与上游站点的水质监测数据进行相关性分析;
    确定模块,用于根据所述相关性分析结果,确定上游站点污染发生时间。
  9. 一种水质监测设备,其特征在于,包括处理器及存储器,所述存储器内存储有可被所述处理器执行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至7中任一项所述水质监测数据分析方法。
  10. 一种计算机存储介质,其特征在于,所述计算机存储介质上存储有计算机程序,所述计算机程序被控制器执行时实现如权利要求1至7中任一项所述水质监测数据分析方法。
PCT/CN2021/114248 2021-01-27 2021-08-24 水质监测数据分析方法及装置、设备、存储介质 WO2022160682A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110113746.3 2021-01-27
CN202110113746.3A CN114818238A (zh) 2021-01-27 2021-01-27 水质监测数据分析方法及装置、设备、存储介质

Publications (1)

Publication Number Publication Date
WO2022160682A1 true WO2022160682A1 (zh) 2022-08-04

Family

ID=82525002

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114248 WO2022160682A1 (zh) 2021-01-27 2021-08-24 水质监测数据分析方法及装置、设备、存储介质

Country Status (2)

Country Link
CN (1) CN114818238A (zh)
WO (1) WO2022160682A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185306A (zh) * 2023-04-24 2023-05-30 山东爱福地生物股份有限公司 一种利用菹草的污水处理系统数据存储方法
CN116561525A (zh) * 2023-07-07 2023-08-08 四川君安天源精酿啤酒有限公司 基于物联网的精酿啤酒酿造数据智能监测方法
CN116662864A (zh) * 2023-06-14 2023-08-29 同济大学 一种面向在线水质、水动力监测数据的滚动数据清洗方法
CN117009771A (zh) * 2023-09-26 2023-11-07 中国环境科学研究院 一种适用于公园城市的水污染程度检测方法及系统
CN117290559A (zh) * 2023-11-22 2023-12-26 山东贵玉复合材料有限公司 一种水处理剂含量监测方法及系统
CN117309067A (zh) * 2023-11-30 2023-12-29 长春职业技术学院 水资源实时监控方法、系统和电子设备
CN117312617A (zh) * 2023-11-29 2023-12-29 山东优控智能技术有限公司 基于污水数据监测的实时污水处理方法及系统
CN117349777A (zh) * 2023-12-04 2024-01-05 安徽新宇环保科技股份有限公司 一种水环境在线监测数据真伪智能识别系统及方法
CN117342689A (zh) * 2023-12-06 2024-01-05 安徽新宇环保科技股份有限公司 一种污水厂智能脱氮方法及系统
CN117349611A (zh) * 2023-12-06 2024-01-05 山东清控生态环境产业发展有限公司 一种基于大数据分析的水质波动仪监测方法
CN117373556A (zh) * 2023-12-04 2024-01-09 山东清控生态环境产业发展有限公司 一种基于多维数据的溯源仪器及系统
CN117786584A (zh) * 2024-02-27 2024-03-29 西安中创博远网络科技有限公司 基于大数据分析的畜牧业水源污染监测预警方法及系统
CN117830031A (zh) * 2024-03-05 2024-04-05 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 一种供水管网末梢水质浊度预测方法及相关设备
CN117875797A (zh) * 2024-03-12 2024-04-12 广东华宸建设工程质量检测有限公司 一种建设工程协同监理方法及系统
CN117892248A (zh) * 2024-03-15 2024-04-16 山东鲁新国合节能环保科技有限公司 一种烧结烟气内循环过程中异常数据监测方法
CN118071029A (zh) * 2024-04-17 2024-05-24 江西省水务集团有限公司 一种基于水质数字信息的水务监测治理管理平台
CN118130744A (zh) * 2024-05-08 2024-06-04 芯视界(北京)科技有限公司 排水管网监测方法及装置、电子设备和存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236902B (zh) * 2023-11-08 2024-04-12 北京英视睿达科技股份有限公司 一种基于边缘计算的水质监测的上报方法及系统
CN117964024A (zh) * 2024-04-02 2024-05-03 车泊喜智能科技(山东)有限公司 一种基于人工智能的洗车废水净化处理控制系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082199A1 (en) * 2016-09-21 2018-03-22 International Business Machines Corporation System, method and computer program product for pollution source attribution
CN108132340A (zh) * 2017-12-14 2018-06-08 浙江大学 一种河道多传感器融合上下游污染预警系统及方法
CN109613197A (zh) * 2019-01-15 2019-04-12 太仓中科信息技术研究院 一种基于河道水网的水质监测预警反馈响应方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082199A1 (en) * 2016-09-21 2018-03-22 International Business Machines Corporation System, method and computer program product for pollution source attribution
CN108132340A (zh) * 2017-12-14 2018-06-08 浙江大学 一种河道多传感器融合上下游污染预警系统及方法
CN109613197A (zh) * 2019-01-15 2019-04-12 太仓中科信息技术研究院 一种基于河道水网的水质监测预警反馈响应方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GUO JIANQING ,LI YAN ,WANG HONGSHENG ,ZHOU HONGFEI: "The New Method for Determining the Pollution Parameters under the Condition of Instantaneous Injection of Pollutant in River", JOURNAL OF HYDROELECTRIC ENGINEERING, vol. 26, no. 4, 25 August 2007 (2007-08-25), pages 61 - 65, XP055954900 *
WEI YUAN: "Research on Water Quality Abnormal Detection Based on Time and Spatial Correlation Analysis in Distribution System", CHINESE MASTER'S THESES FULL-TEXT DATABASE, 27 January 2016 (2016-01-27), pages 1 - 99, XP055954904 *
WEI YUAN;FENG TIAN-HENG;HUANG PING-JIE;HOU DI-BO;ZHANG GUANG-XIN: "Contamination Event Detection Method based on Dynamic Correlation Analysis of Multiple Water Quality Parameters", JOURNAL OF ZHEJIANG UNIVERSITY(ENGINEERING SCIENCE), vol. 50, no. 7, 15 July 2016 (2016-07-15), pages 1402 - 1409, XP055954902, ISSN: 1008-973x, DOI: 10.3785/j.issn.1008-973x.2016.07.025 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185306B (zh) * 2023-04-24 2023-07-14 山东爱福地生物股份有限公司 一种利用菹草的污水处理系统数据存储方法
CN116185306A (zh) * 2023-04-24 2023-05-30 山东爱福地生物股份有限公司 一种利用菹草的污水处理系统数据存储方法
CN116662864B (zh) * 2023-06-14 2024-04-23 同济大学 一种面向在线水质、水动力监测数据的滚动数据清洗方法
CN116662864A (zh) * 2023-06-14 2023-08-29 同济大学 一种面向在线水质、水动力监测数据的滚动数据清洗方法
CN116561525A (zh) * 2023-07-07 2023-08-08 四川君安天源精酿啤酒有限公司 基于物联网的精酿啤酒酿造数据智能监测方法
CN116561525B (zh) * 2023-07-07 2023-09-12 四川君安天源精酿啤酒有限公司 基于物联网的精酿啤酒酿造数据智能监测方法
CN117009771A (zh) * 2023-09-26 2023-11-07 中国环境科学研究院 一种适用于公园城市的水污染程度检测方法及系统
CN117009771B (zh) * 2023-09-26 2023-12-26 中国环境科学研究院 一种适用于公园城市的水污染程度检测方法及系统
CN117290559A (zh) * 2023-11-22 2023-12-26 山东贵玉复合材料有限公司 一种水处理剂含量监测方法及系统
CN117290559B (zh) * 2023-11-22 2024-03-01 山东贵玉复合材料有限公司 一种水处理剂含量监测方法及系统
CN117312617B (zh) * 2023-11-29 2024-04-12 山东优控智能技术有限公司 基于污水数据监测的实时污水处理方法及系统
CN117312617A (zh) * 2023-11-29 2023-12-29 山东优控智能技术有限公司 基于污水数据监测的实时污水处理方法及系统
CN117309067A (zh) * 2023-11-30 2023-12-29 长春职业技术学院 水资源实时监控方法、系统和电子设备
CN117309067B (zh) * 2023-11-30 2024-02-09 长春职业技术学院 水资源实时监控方法、系统和电子设备
CN117349777A (zh) * 2023-12-04 2024-01-05 安徽新宇环保科技股份有限公司 一种水环境在线监测数据真伪智能识别系统及方法
CN117373556A (zh) * 2023-12-04 2024-01-09 山东清控生态环境产业发展有限公司 一种基于多维数据的溯源仪器及系统
CN117373556B (zh) * 2023-12-04 2024-02-13 山东清控生态环境产业发展有限公司 一种基于多维数据的溯源仪器及系统
CN117349777B (zh) * 2023-12-04 2024-02-23 安徽新宇环保科技股份有限公司 一种水环境在线监测数据真伪智能识别系统及方法
CN117342689B (zh) * 2023-12-06 2024-02-02 安徽新宇环保科技股份有限公司 一种污水厂智能脱氮方法及系统
CN117349611B (zh) * 2023-12-06 2024-03-08 山东清控生态环境产业发展有限公司 一种基于大数据分析的水质波动仪监测方法
CN117349611A (zh) * 2023-12-06 2024-01-05 山东清控生态环境产业发展有限公司 一种基于大数据分析的水质波动仪监测方法
CN117342689A (zh) * 2023-12-06 2024-01-05 安徽新宇环保科技股份有限公司 一种污水厂智能脱氮方法及系统
CN117786584A (zh) * 2024-02-27 2024-03-29 西安中创博远网络科技有限公司 基于大数据分析的畜牧业水源污染监测预警方法及系统
CN117786584B (zh) * 2024-02-27 2024-04-30 西安中创博远网络科技有限公司 基于大数据分析的畜牧业水源污染监测预警方法及系统
CN117830031A (zh) * 2024-03-05 2024-04-05 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 一种供水管网末梢水质浊度预测方法及相关设备
CN117875797A (zh) * 2024-03-12 2024-04-12 广东华宸建设工程质量检测有限公司 一种建设工程协同监理方法及系统
CN117892248A (zh) * 2024-03-15 2024-04-16 山东鲁新国合节能环保科技有限公司 一种烧结烟气内循环过程中异常数据监测方法
CN117892248B (zh) * 2024-03-15 2024-05-28 山东鲁新国合节能环保科技有限公司 一种烧结烟气内循环过程中异常数据监测方法
CN118071029A (zh) * 2024-04-17 2024-05-24 江西省水务集团有限公司 一种基于水质数字信息的水务监测治理管理平台
CN118130744A (zh) * 2024-05-08 2024-06-04 芯视界(北京)科技有限公司 排水管网监测方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN114818238A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
WO2022160682A1 (zh) 水质监测数据分析方法及装置、设备、存储介质
CN114817851A (zh) 水质监测方法及设备
CN110245880A (zh) 一种污染源在线监控数据作弊识别方法
CN105956788A (zh) 一种输变电工程造价的动态管理控制方法
CN112529234A (zh) 基于深度学习的地表水质预测方法
Xu et al. Multivariate time series forecasting based on causal inference with transfer entropy and graph neural network
CN112633779A (zh) 一种对环境监测数据可信度进行评估的方法
Doshi et al. Tisat: Time series anomaly transformer
Marvin et al. A data-driven approach to forecasting ground-level ozone concentration
CN104699979B (zh) 基于复杂网络的城市湖库藻类水华混沌时间序列预测方法
Shukla et al. Criminal Combat: Crime Analysis and Prediction Using Machine Learning
CN108876062B (zh) 一种犯罪事件智能预测的大数据方法及装置
Zhou et al. Assessing uncertainty propagation in hybrid models for daily streamflow simulation based on arbitrary polynomial chaos expansion
Qian et al. A new nonlinear risk assessment model based on an improved projection pursuit
AT&T
Lakshan et al. An enhanced ensemble model for crime occurrence prediction using machine learning
Muslikh et al. Systematic literature review of data distribution in preprocessing stage with focus on outliers
He et al. Anomaly Detection in Species Distribution Patterns: A Spatio-Temporal Approach for Biodiversity Conservation
El Khansa et al. Prominent discord discovery with matrix profile: application to climate data insights
Thamrin et al. Application of Long-Short Term Memory for Accurate Biochemical Oxygen Demand Prediction in Rivers through Water Quality Parameters
Khan et al. Air quality forecasting based on machine and deep learning models: an IoT application
Chen et al. Using Scan Statistics for Cluster Detection: Recognizing Real Bandwagons
CN116089520B (zh) 一种基于区块链和大数据的故障识别方法及通用计算节点
Li et al. Detecting Key Offenders from Crime Incidents via Attributed Heterogeneous Network Learning
CN117494063B (zh) 新型电力系统下企业碳排放监测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922277

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922277

Country of ref document: EP

Kind code of ref document: A1