CN113723704B - Water quality rapid prediction method based on continuous and graded mixed data - Google Patents
Water quality rapid prediction method based on continuous and graded mixed data Download PDFInfo
- Publication number
- CN113723704B CN113723704B CN202111044820.7A CN202111044820A CN113723704B CN 113723704 B CN113723704 B CN 113723704B CN 202111044820 A CN202111044820 A CN 202111044820A CN 113723704 B CN113723704 B CN 113723704B
- Authority
- CN
- China
- Prior art keywords
- basin
- sub
- water quality
- data
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 title claims abstract description 135
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000013135 deep learning Methods 0.000 claims abstract description 36
- 238000013528 artificial neural network Methods 0.000 claims abstract description 33
- 238000009826 distribution Methods 0.000 claims abstract description 25
- 238000004364 calculation method Methods 0.000 claims abstract description 15
- 239000002352 surface water Substances 0.000 claims abstract description 7
- 238000011161 development Methods 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 44
- 239000013598 vector Substances 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 24
- 239000010865 sewage Substances 0.000 claims description 24
- 230000008020 evaporation Effects 0.000 claims description 18
- 238000001704 evaporation Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 11
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 claims description 10
- 235000013305 food Nutrition 0.000 claims description 10
- 238000004519 manufacturing process Methods 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims description 5
- XKMRRTOUMJRJIA-UHFFFAOYSA-N ammonia nh3 Chemical compound N.N XKMRRTOUMJRJIA-UHFFFAOYSA-N 0.000 claims description 5
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims description 5
- WPBNNNQJVZRUHP-UHFFFAOYSA-L manganese(2+);methyl n-[[2-(methoxycarbonylcarbamothioylamino)phenyl]carbamothioyl]carbamate;n-[2-(sulfidocarbothioylamino)ethyl]carbamodithioate Chemical compound [Mn+2].[S-]C(=S)NCCNC([S-])=S.COC(=O)NC(=S)NC1=CC=CC=C1NC(=S)NC(=O)OC WPBNNNQJVZRUHP-UHFFFAOYSA-L 0.000 claims description 5
- 238000012544 monitoring process Methods 0.000 claims description 5
- 229910052757 nitrogen Inorganic materials 0.000 claims description 5
- 229910052760 oxygen Inorganic materials 0.000 claims description 5
- 239000001301 oxygen Substances 0.000 claims description 5
- 229910052698 phosphorus Inorganic materials 0.000 claims description 5
- 239000011574 phosphorus Substances 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 4
- 235000004257 Cordia myxa Nutrition 0.000 claims description 3
- 244000157795 Cordia myxa Species 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims 1
- 238000004088 simulation Methods 0.000 abstract description 7
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 230000008859 change Effects 0.000 abstract description 2
- 230000002354 daily effect Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000001808 coupling effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000003344 environmental pollutant Substances 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 231100000719 pollutant Toxicity 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000003403 water pollutant Substances 0.000 description 1
- 238000003911 water pollution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A20/00—Water conservation; Efficient water supply; Efficient water use
- Y02A20/152—Water filtration
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Development Economics (AREA)
- Software Systems (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明提供一种基于连续和分级混合数据的水质快速预测方法,该方法利用流域土地利用面积、GDP、人口等参数的分类等级,对流域内不同区域的水质变化进行预测,构建深度学习神经网络,快速计算得到流域不同等级社会经济发展情况下的地表水质变化及其空间分布,作为流域水环境治理的依据。与现有技术相比,本发明的有益效果为:本发明的水质预测方法避免了采用社会经济数据进行水质预测时社会经济统计数据缺失的问题,可以不需要准确的社会经济数据,仅使用社会经济分类等级数据即可进行预测,实用性更强。并且通过深度学习神经网络进行水质预测,相比传统的数值模拟方法,计算速度更快、适用范围更广。
The present invention provides a rapid water quality prediction method based on continuous and graded mixed data. The method utilizes the classification levels of parameters such as watershed land use area, GDP, and population to predict changes in water quality in different areas in the watershed and build a deep learning neural network. , quickly calculate the surface water quality change and its spatial distribution under different levels of social and economic development in the basin, and use it as the basis for water environment governance in the basin. Compared with the prior art, the beneficial effect of the present invention is: the water quality prediction method of the present invention avoids the problem of lack of socio-economic statistical data when using socio-economic data for water quality prediction, does not need accurate socio-economic data, only uses social Economic classification and grade data can be used for forecasting, which is more practical. Moreover, the water quality prediction is performed through the deep learning neural network. Compared with the traditional numerical simulation method, the calculation speed is faster and the scope of application is wider.
Description
技术领域Technical Field
本发明涉及水质预测技术领域,具体涉及一种基于连续和分级混合数据的水质快速预测方法。The invention relates to the technical field of water quality prediction, and in particular to a method for rapid water quality prediction based on continuous and graded mixed data.
背景技术Background Art
随着社会经济发展和人类排污活动的增加,水体污染已经成为制约我国社会可持续发展和人民优质生活追求的重要因素。然而,随着地区之间社会经济联系的不断加强以及气候变化的影响,我国的水环境演变呈现跨区域、多因素耦合影响的复杂变化,单一水样或者河段的水质预测和水环境治理已逐渐不能满足要求,亟需大空间范围多影响因素的水质综合预测。With the development of social economy and the increase of human pollution discharge activities, water pollution has become an important factor restricting the sustainable development of our society and the pursuit of quality life by the people. However, with the continuous strengthening of social and economic ties between regions and the impact of climate change, the evolution of my country's water environment presents complex changes with cross-regional and multi-factor coupling effects. The water quality prediction and water environment management of a single water sample or river section can no longer meet the requirements, and there is an urgent need for comprehensive water quality prediction with multiple influencing factors in a large spatial range.
传统的水质预测方法主要包括数值模拟方法和数理统计方法。然而,由于流域水动力过程和污染物水化学过程的复杂性,水体污染物的迁移转化机理还不完全清楚,导致数值模拟的精确度不高。此外,数值模拟方法的计算量大,在大范围应用上具有一定的困难。在数理统计方法方面,虽然计算量数值模拟方法低,但对数据时序性、连续性以及正态分布的要求严格,建模相对困难,应用性不强。Traditional water quality prediction methods mainly include numerical simulation methods and mathematical statistics methods. However, due to the complexity of the watershed hydrodynamic process and the hydrochemical process of pollutants, the migration and transformation mechanism of water pollutants is not fully understood, resulting in low accuracy of numerical simulation. In addition, the numerical simulation method has a large amount of calculation and has certain difficulties in large-scale application. In terms of mathematical statistics methods, although the numerical simulation method has a low amount of calculation, it has strict requirements on the time series, continuity and normal distribution of data, and is relatively difficult to model and has low applicability.
与此同时,现有的基于深度学习的水质预测方法要是基于循环神经网络,比如长短期记忆模型(LSTM),一般根据历史的水质数据预测未来的水质变化,往往只考虑单个水质指标的变化趋势或者水质指标之间的影响,未能考虑社会经济等外部因素对水质变化的影响,因此导致此类预测方法在决策者水环境治理决策和经济社会发展规划中的支撑作用不足,实用性不高。At the same time, existing water quality prediction methods based on deep learning are based on recurrent neural networks, such as the long short-term memory model (LSTM). They generally predict future water quality changes based on historical water quality data, and often only consider the changing trend of a single water quality indicator or the impact between water quality indicators. They fail to consider the impact of external factors such as social economy on water quality changes. As a result, such prediction methods are insufficient in supporting decision makers in water environment governance decisions and economic and social development plans, and are not very practical.
此外,由于人类社会经济系统的复杂性,与水质相关的社会经济指标很多,包括GDP、人口、粮食产量、耕地面积、排污量、污水处理程度等等,在流域社会经济数据统计中常常面临某个时段或者某个区域的数据缺失或者统计不全。本发明采用社会经济分类等级数据进行水质预测,避免了社会经济统计数据缺失的问题,可不需要准确的社会经济指标数值,仅使用社会经济指标的等级数据即可进行水质预测,实用性更强。In addition, due to the complexity of the human socio-economic system, there are many socio-economic indicators related to water quality, including GDP, population, food production, cultivated land area, sewage discharge, sewage treatment level, etc., and the statistics of basin socio-economic data often face missing data or incomplete statistics for a certain period or region. The present invention uses socio-economic classification and grade data for water quality prediction, avoiding the problem of missing socio-economic statistical data, and does not require accurate socio-economic indicator values. Water quality prediction can be performed using only socio-economic indicator grade data, which is more practical.
发明内容Summary of the invention
为解决背景技术中水质预测方法的单一性,应用性不强以及仅采用单一指标进行预测,实用性不高的缺点,本发明提供一种基于连续和分级混合数据的水质快速预测方法。该方法融合了社会经济分类等级数据,具有计算速度快、运行效率高的优点。In order to solve the shortcomings of the water quality prediction methods in the background technology, such as the singleness, poor applicability, and low practicality of using only a single indicator for prediction, the present invention provides a water quality rapid prediction method based on continuous and graded mixed data. The method integrates the socio-economic classification grade data and has the advantages of fast calculation speed and high operation efficiency.
为实现上述目的,本发明的基于分类数据深度学习的流域水质快速预测方法的技术方案为:To achieve the above purpose, the technical solution of the method for rapid prediction of water quality in a river basin based on deep learning of classified data of the present invention is:
一种基于连续和分级混合数据的水质快速预测方法,包括以下步骤:A method for rapid water quality prediction based on continuous and hierarchical mixed data, comprising the following steps:
S1、确定流域范围及其内部分区,所述内部分区包括子流域分区和行政区分区;S1. Determine the scope of the watershed and its internal divisions, wherein the internal divisions include sub-watershed divisions and administrative district divisions;
S2、确定流域在预测时段内的气象参数,接着将所述气象参数转换为子流域气象数据;所述气象参数包括t时刻i气象点观测的降雨、蒸发和气温数据;S2, determining the meteorological parameters of the basin within the forecast period, and then converting the meteorological parameters into sub-basin meteorological data; the meteorological parameters include rainfall, evaporation and temperature data observed at meteorological point i at time t;
S3、确定子流域在预测时段内的社会经济参数,生成子流域的社会经济参数集合;S3, determining the socio-economic parameters of the sub-basin within the forecast period, and generating a set of socio-economic parameters of the sub-basin;
S4、确定子流域水质参数;S4. Determine sub-basin water quality parameters;
S5、对子流域社会经济参数集合中的数据进行等级分类;S5. Classify the data in the sub-basin socio-economic parameter set;
S6、将等级分类的社会经济参数进行热编码,形成社会经济参数的热编码矩阵;S6, hot-coding the socioeconomic parameters classified by grade to form a hot-coding matrix of the socioeconomic parameters;
S7、对气象参数和水质参数进行标准化处理;S7. Standardize meteorological parameters and water quality parameters;
S8、在社会经济参数的热编码矩阵和标准化的子流域气象数据、水质参数中划分训练集和测试集;S8, divide the training set and the test set into the hot coding matrix of socioeconomic parameters and the standardized sub-basin meteorological data and water quality parameters;
S9、构建全连接深度学习神经网络,并定义损失函数和迭代优化算法;S9. Build a fully connected deep learning neural network and define the loss function and iterative optimization algorithm;
S10、将训练集输入到深度学习神经网络中得到训练后的深度学习神经网络,接着将测试集输入到训练后的深度学习神经网络得到预测的水质参数;将预测的水质参数与实测的水质参数进行对比,调整模型参数,最终储存符合精度要求的深度学习神经网络及其参数数据;S10, inputting the training set into the deep learning neural network to obtain a trained deep learning neural network, and then inputting the test set into the trained deep learning neural network to obtain predicted water quality parameters; comparing the predicted water quality parameters with the measured water quality parameters, adjusting the model parameters, and finally storing the deep learning neural network and its parameter data that meet the accuracy requirements;
S11、选取流域某一时段的气象和社会经济数据,输入到S10中存储的深度学习神经网络,预测得到所有子流域在该时段的水质参数。S11. Select meteorological and socio-economic data of a certain period of time in the basin, input them into the deep learning neural network stored in S10, and predict the water quality parameters of all sub-basins in that period.
进一步地,步骤S2更具体为:Furthermore, step S2 is more specifically:
S21、气象站点获取预测时段内的降雨、蒸发和气温数据;S21, the meteorological station obtains rainfall, evaporation and temperature data within the forecast period;
S22、根据气象站点的经纬度坐标制作气象站点空间分布图,并将每个站点、每个时段的降雨、蒸发和气温数据导入到站点分布图的属性表中;S22, creating a meteorological station spatial distribution map based on the latitude and longitude coordinates of the meteorological stations, and importing the rainfall, evaporation and temperature data of each station and each time period into the attribute table of the station distribution map;
S23、以整个流域为边界,采用克里金插值法对每个站点的气象数据进行空间插值,生成每个时段流域降雨、蒸发和气温的栅格数据,并存储到地理空间数据库中;S23, taking the entire watershed as the boundary, using the Kriging interpolation method to perform spatial interpolation on the meteorological data of each station, generating raster data of rainfall, evaporation and temperature in the watershed for each period, and storing them in the geospatial database;
S24、以子流域为范围,计算每个子流域范围内所有栅格的降雨、蒸发和气温数据的平均值,得到子流域的气象参数。S24. Taking the sub-basin as the range, calculate the average values of the rainfall, evaporation and temperature data of all grids within each sub-basin to obtain the meteorological parameters of the sub-basin.
进一步地,所述社会经济参数包括土地利用面积、常住人口、GDP、人均GDP、人口密度、粮食产量、城市污水处理率。Furthermore, the socio-economic parameters include land use area, permanent population, GDP, GDP per capita, population density, food production, and urban sewage treatment rate.
进一步地,步骤S3更具体为:Furthermore, step S3 is more specifically:
S31、获取预测时段内土地利用面积、常住人口、GDP、人均GDP、人口密度、粮食产量、城市污水处理率的社会经济参数;S31. Obtain socioeconomic parameters such as land use area, permanent population, GDP, GDP per capita, population density, food production, and urban sewage treatment rate during the forecast period;
S32、将S31中的社会经济参数转换为子流域的相关参数;S32, converting the socio-economic parameters in S31 into relevant parameters of the sub-basin;
S33、生成子流域的社会经济参数集合。S33. Generate a set of socio-economic parameters for the sub-basin.
进一步地,所述水质参数包括地表水的溶解氧、COD锰、氨氮、总磷、总氮的浓度值;步骤S4更具体为:Furthermore, the water quality parameters include the concentration values of dissolved oxygen, COD manganese, ammonia nitrogen, total phosphorus and total nitrogen in the surface water; step S4 is more specifically:
S41、确定所有子流域内河道水质监测站点的经纬度坐标以制作水质观测站点的空间分布图,接着将每个河道水质观测站点、每个时段的水质指标数据导入到站点分布图的属性表中;S41, determining the longitude and latitude coordinates of the water quality monitoring stations in all the sub-basins to produce a spatial distribution map of the water quality observation stations, and then importing the water quality index data of each water quality observation station in each time period into the attribute table of the station distribution map;
S42、计算子流域河道中的水质分布;S42, calculate the water quality distribution in the river channel of the sub-basin;
S43、以子流域为范围,计算每个子流域范围内河道水质的平均值,作为子流域的水质参数。S43. Taking the sub-basin as the scope, calculate the average value of the river water quality within each sub-basin as the water quality parameter of the sub-basin.
进一步地,步骤S42更具体为;Further, step S42 is more specifically:
S421、搜集整个流域内五级以上河流的河网数据;S421. Collect river network data of rivers above
S422、以流域河网为边界,采用反距离权重法对站点每个时段的水质数据进行空间插值,生成沿着河流的、每个时段的水质栅格数据,并存储到地理空间数据库中。S422. Taking the river network in the basin as the boundary, the water quality data of each time period at the site is spatially interpolated using the inverse distance weighted method to generate water quality raster data for each time period along the river and store them in the geospatial database.
进一步地,所述土地利用面积包括耕地、林地、草地、水域、城市和居民用地、未利用土地的利用面积;步骤S5更具体为:为耕地、林地、草地、水域、城市和居民用地、未利用土地的面积以及常住人口、GDP、粮食产量、人均GDP、人口密度和城市污水处理率的社会经济参数的等级阈值,计算各社会经济变量的等级数值,形成社会经济等级变量。Furthermore, the land use area includes the utilization area of cultivated land, forest land, grassland, water area, urban and residential land, and unused land; step S5 is more specifically: for the level thresholds of the socioeconomic parameters of the areas of cultivated land, forest land, grassland, water area, urban and residential land, and unused land, as well as the permanent population, GDP, grain output, per capita GDP, population density and urban sewage treatment rate, calculate the level values of each socioeconomic variable to form a socioeconomic level variable.
进一步地,步骤S6更具体为:Furthermore, step S6 is more specifically:
S61、将每个子流域、每个时段的社会经济等级变量转换为0-1行向量;行向量的长度为该社会经济变量的分级总数N,即行向量为1行N列;行向量中,对应该社会经济等级变量等级数值n的第n列,取值为1;行向量中其余元素取值为0;S61, convert the socioeconomic grade variable of each sub-basin and each time period into a 0-1 row vector; the length of the row vector is the total number of grades N of the socioeconomic grade variable, that is, the row vector is 1 row and N columns; in the row vector, the nth column corresponding to the grade value n of the socioeconomic grade variable has a value of 1; the remaining elements in the row vector have a value of 0;
S62、构建社会经济等级变量的热编码0-1矩阵。S62. Construct a hot-coded 0-1 matrix of the socioeconomic class variable.
进一步地,步骤S8更具体为:将子流域气象数据和社会经济数据的热编码0-1矩阵合并成一个数字矩阵,作为深度学习神经网络的输入数字矩阵;将子流域水质参数转化为数字矩阵,作为深度学习神经网络的输出标签矩阵;接着将输入数字矩阵和输出标签矩阵中70%的行做训练集,30%的行做测试集。Furthermore, step S8 is more specific as follows: merging the hot-coded 0-1 matrices of the sub-basin meteorological data and socioeconomic data into a digital matrix as the input digital matrix of the deep learning neural network; converting the sub-basin water quality parameters into a digital matrix as the output label matrix of the deep learning neural network; and then using 70% of the rows in the input digital matrix and the output label matrix as a training set and 30% of the rows as a test set.
进一步地,所述深度学习神经网络的参数包括网络隐藏层数量k、每个隐藏层的节点数目n、激活函数、损失函数、准确率计算函数、学习率、批量大小、最大迭代次数和节点随机丢弃比率;所述深度学习神经网络采用四层隐藏层;每层的激活函数采用Selu函数;第四隐藏层采用节点随机丢弃机制;Furthermore, the parameters of the deep learning neural network include the number of hidden layers k, the number of nodes n in each hidden layer, the activation function, the loss function, the accuracy calculation function, the learning rate, the batch size, the maximum number of iterations and the node random discard ratio; the deep learning neural network adopts four hidden layers; the activation function of each layer adopts the Selu function; the fourth hidden layer adopts a node random discard mechanism;
所述定义损失函数和迭代优化算法,更具体为:损失函数采用均方误差MSE,准确率采用平均值误差MAE,具体的计算公式为:The loss function and iterative optimization algorithm are defined as follows: the loss function uses the mean square error MSE, and the accuracy uses the mean square error MAE. The specific calculation formula is:
其中,zi为实际值,yi为预测值。Among them, zi is the actual value and yi is the predicted value.
与现有技术相比,本发明的优点及有益效果为:本发明采用土地利用面积、GDP、人口等流域社会经济分类等级数据进行流域尺度大范围的水质预测,克服了传统水质预测中依靠数值模拟造成的计算范围小和计算速度慢的缺点,同时避免了传统统计学方法以及目前常用的循环神经网络进行水质预测对于输入数据连续性的严格要求,可更便捷地计算离散型的社会经济分类等级数据与连续变化的水质数据之间的定量关系,提高了预测的实用性。此外,该发明避免了采用社会经济数据进行水质预测时,社会经济统计数据缺失的问题,可以不需要准确的社会经济数据,仅采用社会经济分类等级数据即可进行预测,理论意义明确,操作简单易行,容易在实际水质管理中应用。Compared with the prior art, the advantages and beneficial effects of the present invention are as follows: the present invention uses land use area, GDP, population and other river basin socio-economic classification grade data to predict water quality over a large range of river basin scales, overcoming the shortcomings of small calculation range and slow calculation speed caused by relying on numerical simulation in traditional water quality prediction, while avoiding the strict requirements of traditional statistical methods and currently commonly used recurrent neural networks for water quality prediction on the continuity of input data, and can more conveniently calculate the quantitative relationship between discrete socio-economic classification grade data and continuously changing water quality data, thereby improving the practicality of prediction. In addition, the present invention avoids the problem of missing socio-economic statistical data when using socio-economic data for water quality prediction, and does not require accurate socio-economic data, and can only use socio-economic classification grade data for prediction, with clear theoretical significance, simple and easy operation, and easy to apply in actual water quality management.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明流域水质快速预测方法的流程图;FIG1 is a flow chart of a method for rapid prediction of water quality in a river basin according to the present invention;
图2为CODMn的预测值与实测值散点图;Figure 2 is a scatter plot of the predicted and measured values of CODMn;
图3为某日CODMn的预测值空间分布图。Figure 3 is the spatial distribution of the predicted value of CODMn on a certain day.
具体实施方式DETAILED DESCRIPTION
附图仅用于示例性说明,不能理解为对本专利的限制。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。The accompanying drawings are only used for illustrative purposes and should not be construed as limiting the present invention. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in the field without creative work should fall within the scope of protection of the present invention.
下面结合图1至3和实施例对本发明的技术方案做进一步的说明。The technical solution of the present invention is further described below in conjunction with FIGS. 1 to 3 and embodiments.
一种基于连续和分级混合数据的水质快速预测方法,如图1所示,包括以下步骤:A rapid water quality prediction method based on continuous and hierarchical mixed data, as shown in FIG1 , comprises the following steps:
S1、确定流域范围及其内部分区,所述内部分区包括子流域分区和行政区分区;S1. Determine the scope of the watershed and its internal divisions, wherein the internal divisions include sub-watershed divisions and administrative district divisions;
按照《全国水资源综合规划》的中国水资源分区,选取珠江片区作为实施例流域。采用NASA的SRTM 90米的数字高程数据,进行实施例流域的数字高程数据提取以及汇水区计算,得到珠江片区的138个子流域。采用国家基础地理信息中心(http://www.ngcc.cn/ngcc/)的全国地级市行政区分布图(SHP格式),提取珠江片区内的地级市行政区,75个;并绘制珠江片区的地级市分布图。According to the Chinese water resources division of the National Water Resources Comprehensive Plan, the Pearl River area was selected as the basin of the embodiment. The digital elevation data of the basin of the embodiment was extracted and the catchment area was calculated using NASA's SRTM 90-meter digital elevation data, and 138 sub-basins in the Pearl River area were obtained. The national prefecture-level city administrative district distribution map (SHP format) of the National Basic Geographic Information Center (http://www.ngcc.cn/ngcc/) was used to extract 75 prefecture-level city administrative districts in the Pearl River area; and the prefecture-level city distribution map of the Pearl River area was drawn.
S2、确定流域在预测时段内的气象参数,所述气象参数包括t时刻i气象点观测的降雨、蒸发和气温数据,分别用表示;接着将所述气象参数转换为子流域气象数据。具体包括以下步骤:S2, determine the meteorological parameters of the basin during the forecast period, the meteorological parameters include the rainfall, evaporation and temperature data observed at the meteorological point i at time t, respectively Then, the meteorological parameters are converted into sub-basin meteorological data. Specifically, the following steps are included:
S21、采用中国气象数据网提供的中国地面气候资料日值数据集(V3.0),提取实施例流域2020年1月1日至2021年4月30日、69个气象站点的日平均降雨、蒸发和气温数据;S21. Use the daily value dataset of China's ground climate data (V3.0) provided by the China Meteorological Data Network to extract the daily average rainfall, evaporation and temperature data of 69 meteorological stations in the basin of the embodiment from January 1, 2020 to April 30, 2021;
S22、根据气象站点的经纬度坐标,采用ARCGIS软件制作气象站点空间分布图,并将每个站点、每个时段的降雨、蒸发和气温数据导入到站点分布图的属性表中;S22. Using ARCGIS software to create a spatial distribution map of meteorological stations based on the longitude and latitude coordinates of the meteorological stations, and importing the rainfall, evaporation and temperature data of each station and each time period into the attribute table of the station distribution map;
S23、以珠江片区为边界,采用克里金插值法对站点每日的气象数据进行空间插值;制作克里金插值法的批处理程序,生成每个时段(日)珠江片区的降雨、蒸发和气温的栅格数据,并存储到地理空间数据库中。其中,降雨、蒸发和气温数据各486张栅格图,每日一张图。S23. Using the Pearl River area as the boundary, spatially interpolate the daily meteorological data of the station using the Kriging interpolation method; create a batch program for the Kriging interpolation method to generate raster data of rainfall, evaporation, and temperature in the Pearl River area for each period (day), and store them in the geospatial database. Among them, there are 486 raster maps for rainfall, evaporation, and temperature data, one map per day.
S24、制作栅格数据处理程序,以子流域为边界,提取珠江片区每个子流域每个时段(日)的降雨、蒸发和气温数据,存储至EXCEL文档中。每个子流域一个EXCEL文档,共计138个EXCEL文档,每个文档包含486个时段的3个变量(降雨、蒸发、气温)的数据(486行,3列)。S24, making a raster data processing program, taking the sub-basin as the boundary, extracting the rainfall, evaporation and temperature data of each sub-basin in the Pearl River area for each time period (day), and storing them in an EXCEL document. There is one EXCEL document for each sub-basin, a total of 138 EXCEL documents, each of which contains data of three variables (rainfall, evaporation, temperature) for 486 time periods (486 rows, 3 columns).
S3、确定子流域在预测时段内的社会经济参数,生成子流域的社会经济参数集合;上述社会经济参数包括土地利用面积、常住人口、GDP、人均GDP、人口密度、粮食产量、城市污水处理率。其中,所述土地利用面积包括耕地、林地、草地、水域、城市和居民用地、未利用土地的利用面积。步骤S3具体包括以下步骤:S3, determine the socio-economic parameters of the sub-basin during the forecast period, and generate a set of socio-economic parameters of the sub-basin; the above socio-economic parameters include land use area, permanent population, GDP, per capita GDP, population density, food production, and urban sewage treatment rate. The land use area includes cultivated land, forest land, grassland, water area, urban and residential land, and unused land. Step S3 specifically includes the following steps:
S31、获取预测时段内土地利用面积、常住人口、GDP、人均GDP、人口密度、粮食产量、城市污水处理率的社会经济参数;更具体为:上述参数可通过以下方式获得:采用卫星遥感数据,获取预测时段内整个流域范围每年的土地利用数据;通过区域统计年鉴,获得预测时段内每一年、各行政区的常住人口、GDP和粮食产量;通过区域统计年鉴,获得预测时段内每一年、各行政区的人均GDP、人口密度和城市污水处理率。S31. Obtain socioeconomic parameters such as land use area, permanent population, GDP, per capita GDP, population density, food production, and urban sewage treatment rate during the forecast period. More specifically, the above parameters can be obtained in the following ways: using satellite remote sensing data to obtain annual land use data for the entire river basin during the forecast period; obtaining the permanent population, GDP, and food production of each administrative region for each year during the forecast period through regional statistical yearbooks; obtaining per capita GDP, population density, and urban sewage treatment rate of each administrative region for each year during the forecast period through regional statistical yearbooks.
S32、将S31中的社会经济参数转换为子流域的相关参数;S32, converting the socio-economic parameters in S31 into relevant parameters of the sub-basin;
(1)确定各子流域、每个时段的土地利用参数,具体为:按照耕地、林地、草地、水域、城市和居民用地、未利用土地对流域土地利用数据进行重分类,生成流域土地利用的栅格数据。然后,根据子流域分区,计算每个子流域内六种土地利用类型的面积,具体如下所示。(1) Determine the land use parameters of each sub-basin and each time period. Specifically, reclassify the basin land use data according to cultivated land, forest land, grassland, water area, urban and residential land, and unused land to generate basin land use raster data. Then, calculate the area of the six land use types in each sub-basin according to the sub-basin division, as shown below.
其中,j表示子流域编号;d表示土地利用类型编号,包括耕地(d=1)、林地(d=2)、草地(d=3)、水域(d=4)、城市和居民用地(d=5)、未利用土地(d=6);t表示时段(日),y表示年;AreaS是子流域j内一个栅格的面积;表示第y年、子流域j内、属于d类别土地的栅格数目;分别表示第y年和第t时刻子流域j内d类别土地的面积。其中,t时刻在第y年内。Wherein, j is the sub-basin number; d is the land use type number, including cultivated land (d = 1), forest land (d = 2), grassland (d = 3), water area (d = 4), urban and residential land (d = 5), and unused land (d = 6); t is the time period (day), y is the year; AreaS is the area of a grid in sub-basin j; represents the number of grids belonging to category d in sub-basin j in year y; They represent the area of land of category d in sub-basin j in year y and time t, respectively. Time t is in year y.
(2)确定流域在预测时段内的常住人口、GDP和粮食产量,具体为:采用面积加权平均法,将行政区的社会经济数据转换为子流域的相关参数。具体如下式所示。(2) Determine the permanent population, GDP and grain output of the basin during the forecast period by converting the socioeconomic data of the administrative area into relevant parameters of the sub-basin using the area-weighted average method. The specific formula is as follows.
其中,k表示流域内行政区编号,N为流域内行政区总数;j为子流域编号;e表示社会经济参数的编号,e=1代表常住人口、e=2代表GDP、e=3代表粮食产量;为第y年、第k行政区的常住人口、GDP和粮食产量;和分别为第y年和第t时刻、第j子流域的常住人口、GDP和粮食产量;其中,t时刻在第y年内。表示第k行政区处于第j子流域内部的面积占第k行政区总面积的比值;表示第k行政区处于第j子流域内部的面积。Among them, k represents the administrative district number in the basin, N is the total number of administrative districts in the basin; j is the sub-basin number; e represents the number of the socio-economic parameter, e = 1 represents the permanent population, e = 2 represents GDP, and e = 3 represents grain output; is the permanent population, GDP and food output of the kth administrative district in year y; and are the permanent population, GDP and grain output of the j-th sub-basin in the y-th year and the t-th moment respectively; among which, the t-th moment is in the y-th year. It represents the ratio of the area of the kth administrative district inside the jth sub-basin to the total area of the kth administrative district; Represents the area of the kth administrative district within the jth sub-basin.
(3)确定流域在预测时段内的人均GDP,人口密度和城市污水处理率,具体为:人均GDP、人口密度和城市污水处理率属于比例数据,因此不能采用面积加权平均法进行子流域的参数计算。通过构建行政区与子流域的拓扑关系,通过算数平均方法,计算子流域的人均GDP、人口密度和城市污水处理率。具体如下所示。(3) Determine the GDP per capita, population density and urban sewage treatment rate of the basin during the forecast period. Specifically, GDP per capita, population density and urban sewage treatment rate are proportional data, so the area-weighted average method cannot be used to calculate the parameters of the sub-basin. By constructing the topological relationship between administrative regions and sub-basins, the GDP per capita, population density and urban sewage treatment rate of the sub-basin are calculated using the arithmetic average method. The details are shown below.
其中,k表示流域内行政区编号,N为流域内行政区总数;j为子流域编号;a表示社会经济参数的编号,a=1代表人均GDP、a=2代表人口密度、a=3代表城市污水处理率;为第y年、第k行政区的人均GDP,人口密度和城市污水处理率。和分别为第y年和第t时刻、第j子流域的人均GDP,人口密度和城市污水处理率;其中,t时刻在第y年内。Among them, k represents the administrative district number in the basin, N is the total number of administrative districts in the basin; j is the sub-basin number; a represents the number of the socio-economic parameter, a = 1 represents GDP per capita, a = 2 represents population density, and a = 3 represents urban sewage treatment rate; is the GDP per capita, population density and urban sewage treatment rate of the kth administrative district in the yth year. and are the per capita GDP, population density and urban sewage treatment rate of the j-th sub-basin in the y-th year and the t-th moment respectively; among which, the t-th moment is in the y-th year.
根据上述(步骤S31、S32)内容,步骤S31、S32更具体为:采用Landsat 8卫星提供的1km分辨率的土地利用年数据,提取珠江片区2020和2021年两年的土地利用数据,共计2张栅格数据图。编制土地利用数据处理程序,以珠江片区138个子流域为范围,提取每个子流域、每年的耕地、林地、草地、水域、城市及居民用多、未利用土地的面积;将子流域的年度土地利用数据转化为日数据,每日的各种类型的土地面积等于该日所在年份的相应土地利用数据。将子流域每日的五种类型的土地面积数据存储到EXCEL文档中。每个子流域一个EXCEL文档,共计138个EXCEL文档,每个文档包含486个时段的6个变量(六种类型的土地面积数据)的数据(486行,6列)。According to the above (steps S31 and S32), steps S31 and S32 are more specific as follows: using the 1km resolution annual land use data provided by the
采用中国知网年鉴数据库,搜集珠江片区范围内2020年和2021年、75个地级市行政区的常住人口、GDP、人均GDP、人口密度、粮食产量、城市污水处理率。编制社会经济数据处理程序,按照面积加权平均法和算数平均法,根据地级市行政区与子流域的空间拓扑关系,将75个地级市的社会经济数据转换为138个子流域的社会经济数据。将子流域的年度社会经济数据转化为日数据,每日的社会经济数据等于该日所在年份的相应数据。数据存储到EXCEL文档。每个子流域一个EXCEL文档,共138个EXCEL文档,每个文档包含486个时段的6个变量(6个社会经济参数)数据(486行,6列)。The CNKI yearbook database was used to collect the permanent population, GDP, GDP per capita, population density, grain output, and urban sewage treatment rate of 75 prefecture-level cities in the Pearl River area in 2020 and 2021. A socioeconomic data processing program was compiled. According to the area-weighted average method and arithmetic average method, the socioeconomic data of 75 prefecture-level cities were converted into socioeconomic data of 138 sub-basins based on the spatial topological relationship between the prefecture-level city administrative districts and the sub-basins. The annual socioeconomic data of the sub-basin was converted into daily data, and the daily socioeconomic data was equal to the corresponding data of the year in which the day was located. The data was stored in an EXCEL document. There was one EXCEL document for each sub-basin, a total of 138 EXCEL documents, each of which contained data of 6 variables (6 socioeconomic parameters) for 486 time periods (486 rows and 6 columns).
S33、生成子流域的社会经济参数集合,为。S33. Generate a set of socio-economic parameters for the sub-basin.
S4、确定子流域水质参数;所述水质参数包括地表水的溶解氧、COD锰、氨氮、总磷、总氮的浓度值;采用中国环境监测总站的国家地表水水质自动监测实时数据发布系统提供的全国水质监测数据,提取珠江片区2020年1月1日至2021年4月30日、204个水质站点的日平均水质数据。步骤S4具体包括以下步骤:S4, determine the water quality parameters of the sub-basin; the water quality parameters include the concentration values of dissolved oxygen, COD manganese, ammonia nitrogen, total phosphorus, and total nitrogen in surface water; use the national water quality monitoring data provided by the National Surface Water Quality Automatic Monitoring Real-time Data Release System of the China National Environmental Monitoring Center to extract the daily average water quality data of 204 water quality stations in the Pearl River area from January 1, 2020 to April 30, 2021. Step S4 specifically includes the following steps:
S41、确定整个流域(所有子流域)内河道水质监测站点的经纬度坐标,采用ARCGIS软件制作水质观测站点的空间分布图,接着将每个河道水质站点、每个时段的水质指标数据导入到站点分布图的属性表中。S41. Determine the longitude and latitude coordinates of the river water quality monitoring stations in the entire basin (all sub-basins), use ARCGIS software to create a spatial distribution map of the water quality observation stations, and then import the water quality index data of each river water quality station and each time period into the attribute table of the station distribution map.
S42、计算子流域河道中的水质分布;S42, calculate the water quality distribution in the river channel of the sub-basin;
由于地表水水质属于河流水体的参数,在河流水体中具有一定的空间连续性,但不具有流域面上的空间关联性,因而不采用流域面的空间插值方法计算子流域的水质,采用河道线条插值的方法,计算子流域河道中的水质分布。具体步骤包括:Since the surface water quality is a parameter of the river water body, it has a certain spatial continuity in the river water body, but does not have spatial correlation on the basin surface. Therefore, the spatial interpolation method of the basin surface is not used to calculate the water quality of the sub-basin. The river line interpolation method is used to calculate the water quality distribution in the sub-basin river. The specific steps include:
(1)搜集整个流域内五级以上河流的河网数据(SHP数据);(1) Collect river network data (SHP data) of rivers above
(2)以流域河网为边界,采用反距离权重法对站点每个时段的水质数据进行空间插值,生成沿着河流的、每个时段的水质栅格数据,并存储到地理空间数据库中。(2) Taking the river network in the basin as the boundary, the water quality data of each time period at the station are spatially interpolated using the inverse distance weighted method to generate water quality raster data for each time period along the river and store them in the geospatial database.
S43、以子流域为范围,计算每个子流域范围内河道水质的平均值,作为子流域的水质参数。计算公式如下所示。S43. Taking the sub-basin as the scope, calculate the average value of the river water quality within each sub-basin as the water quality parameter of the sub-basin. The calculation formula is as follows.
其中,j表示子流域编号,t表示时段,s是子流域j内河道位置的栅格的编号;w是水质参数的编号,w=1代表溶解氧、w=2代表COD锰、w=3代表氨氮、w=4代表总磷、w=5代表总氮;表示子流域j范围内河道位置处栅格s的水质数据。表示子流域j范围内河道中的栅格总数,表示t时刻子流域j的水质数据。Wherein, j represents the sub-basin number, t represents the time period, s represents the grid number of the river channel position in sub-basin j; w represents the number of the water quality parameter, w=1 represents dissolved oxygen, w=2 represents COD manganese, w=3 represents ammonia nitrogen, w=4 represents total phosphorus, and w=5 represents total nitrogen; Represents the water quality data of grid s at the river location within sub-basin j. represents the total number of grids in the river within the scope of sub-basin j, Represents the water quality data of sub-basin j at time t.
每个子流域一个EXCEL文档,共计138个EXCEL文档,每个文档包含486个时段的5个变量(5个水质指标)的数据(486行,5列)。There is one EXCEL document for each sub-basin, for a total of 138 EXCEL documents, each of which contains data (486 rows and 5 columns) of five variables (five water quality indicators) for 486 time periods.
S5、对子流域社会经济参数集合中的数据进行等级分类;S5. Classify the data in the sub-basin socio-economic parameter set;
每个子流域的社会经济参数集合中的数据包括:耕地、林地、草地、水域、城市和居民用地、未利用土地的面积、常住人口、GDP和粮食产量、人均GDP、人口密度和城市污水处理率,共12个变量。为每类社会经济变量设置等级阈值,计算各社会经济变量的等级数值,如下式所示。The data in the socioeconomic parameter set of each sub-basin include: cultivated land, forest land, grassland, water area, urban and residential land, unused land area, permanent population, GDP and grain output, GDP per capita, population density and urban sewage treatment rate, a total of 12 variables. A level threshold is set for each type of socioeconomic variable, and the level value of each socioeconomic variable is calculated as shown in the following formula.
其中,j表示子流域编号,t表示时段;d表示土地利用类型编号,d=1表示耕地、d=2表示林地、d=3表示草地、d=4表示水域、d=5表示城市和居民用地、d=6表示未利用土地;e表示社会经济参数的编号,e=1代表常住人口、e=2代表GDP、e=3代表粮食产量;a表示社会经济参数的编号,a=1代表人均GDP、a=2代表人口密度、a=3代表城市污水处理率。 示第t时刻子流域j内d类别土地的面积; 表示d类别土地面积的分级阈值,N表示分级数目;表示第t时刻子流域j内d类别土地面积的等级数值。表示第t时刻子流域j内e类别社会经济变量的数值; 表示e类别社会经济变量的分级阈值,L表示分级数目;表示第t时刻子流域j内e类别社会经济变量的等级数值。表示第t时刻子流域j内a类别社会经济变量的数值; 表示a类别社会经济变量的分级阈值,V表示分级数目;表示第t时刻子流域j内a类别社会经济变量的等级数值。Among them, j is the sub-basin number, t is the time period; d is the land use type number, d=1 is cultivated land, d=2 is forest land, d=3 is grassland, d=4 is water area, d=5 is urban and residential land, and d=6 is unused land; e is the number of the socio-economic parameter, e=1 represents the permanent population, e=2 represents GDP, and e=3 represents grain output; a is the number of the socio-economic parameter, a=1 represents per capita GDP, a=2 represents population density, and a=3 represents urban sewage treatment rate. represents the area of land of category d in sub-basin j at time t; represents the classification threshold of the land area of category d, and N represents the number of classifications; It represents the numerical grade of the land area of category d in sub-basin j at time t. represents the value of the socioeconomic variable of category e in sub-basin j at time t; represents the classification threshold of the socioeconomic variable of category e, and L represents the number of classifications; It represents the numerical value of the socioeconomic variable of category e in sub-basin j at time t. represents the value of the socioeconomic variable of category a in sub-basin j at time t; represents the classification threshold of the socioeconomic variable of category a, and V represents the number of classifications; It represents the level value of the socioeconomic variable of category a in sub-basin j at time t.
参考每个子流域、每类别社会经济变量的最大最小值进行分级,分级情况如下,耕地按照面积分为4级、林地分为5级、草地分为5级、水域分为3级、城市和居民用地分为3级、未利用土地的面积分为3级,常住人口分为4级、GDP和粮食产量分为4级,人均GDP、人口密度和城市污水处理率分为3级。The classification is based on the maximum and minimum values of each sub-basin and each category of socioeconomic variables. The classification is as follows: cultivated land is divided into 4 levels according to area, forest land is divided into 5 levels, grassland is divided into 5 levels, water area is divided into 3 levels, urban and residential land is divided into 3 levels, the area of unused land is divided into 3 levels, the permanent population is divided into 4 levels, GDP and grain output are divided into 4 levels, and per capita GDP, population density and urban sewage treatment rate are divided into 3 levels.
S6、将分类等级的社会经济参数进行热编码,形成社会经济参数的热编码矩阵;S6, hot-coding the socioeconomic parameters of the classification levels to form a hot-coding matrix of the socioeconomic parameters;
S61、将每个子流域、每个时段的社会经济等级变量 转换为0-1行向量;行向量中,对应该社会经济等级变量等级数值n的第n列,取值为1;行向量中其余元素取值为0;具体如下式所示:S61. The socioeconomic class variables for each sub-basin and each time period are Converted into a 0-1 row vector; in the row vector, the nth column corresponding to the value n of the socioeconomic class variable takes the
ca1,n=1 (12)ca 1, n = 1 (12)
其中,j表示子流域编号,t表示时段;d表示土地利用类型编号;表示第t时刻子流域j内d类别土地面积的等级数值,等于n;为d类别土地面积的等级数值所对应的0-1行向量,ca1,n为行向量中第n列的数值,取值为1。N表示d类别土地面积的分级总数。采用上述方法,计算其他社会经济等级变量和的0-1行向量。Wherein, j represents the sub-basin number, t represents the time period, and d represents the land use type number; It represents the numerical value of the land area of category d in sub-basin j at time t, which is equal to n; is a 0-1 row vector corresponding to the grade value of the land area in category d, ca 1, n is the value of the nth column in the row vector, which is 1. N represents the total number of grades of land area in category d. Using the above method, calculate other socioeconomic grade variables and A 0-1 row vector of .
S62、构建社会经济等级变量的热编码0-1矩阵。S62. Construct a hot-coded 0-1 matrix of the socioeconomic class variable.
常住人口、GDP、粮食产量的等级数值所对应的0-1行向量人均GDP、人口密度、城市污水处理率的等级数值所对应的0-1行向量合并在一起构成矩阵具体如下式所示。0-1 row vector corresponding to the level values of permanent population, GDP, and grain output 0-1 row vector corresponding to the level values of GDP per capita, population density, and urban sewage treatment rate Merge together to form a matrix The details are shown in the following formula.
S7、对气象参数和水质参数进行标准化处理;具体如下式所示:S7. Meteorological parameters and water quality parameters Standardization is performed; the specific formula is as follows:
其中,代表第t时刻子流域j的气象和水质参数,包括和 和分别为各指标在所有时刻t和所有子流域j中的最大值和最小值;为标准化后的数据。in, represents the meteorological and water quality parameters of sub-basin j at time t, including and and are the maximum and minimum values of each index at all times t and in all sub-basins j, respectively; is the standardized data.
s8、在社会经济参数的热编码矩阵和标准化的子流域气象数据、水质参数中划分训练集和测试集;s8, divide the training set and test set in the hot coding matrix of socio-economic parameters and the standardized sub-basin meteorological data and water quality parameters;
将连续型的子流域气象数据和社会经济数据的热编码0-1矩阵合并成一个t×j行、3+N+L+V列的数据矩阵,作为深度学习的输入数字矩阵,用TableX表示;将连续性的水质数据转化为t×j行的数据矩阵,作为深度学习的输出标签矩阵,用TableY表示;接着将输入数字矩阵和输出标签矩阵中70%的行做训练集,30%的行做测试集,具体如下式所示:The hot-coded 0-1 matrix of continuous sub-basin meteorological data and socioeconomic data is merged into a data matrix of t×j rows and 3+N+L+V columns as the input digital matrix of deep learning, represented by TableX; the continuous water quality data is converted into a data matrix of t×j rows as the output label matrix of deep learning, represented by TableY; then 70% of the rows in the input digital matrix and the output label matrix are used as the training set, and 30% of the rows are used as the test set, as shown in the following formula:
更具体地,将子流域的气象参数以及社会经济0-1矩阵参数整合成一个数据集。数据储存在EXCEL表中,每个子流域一个EXCEL文档,共138个EXCEL文档。每个文档包含486个时段的44列数据,变量包括:降雨、蒸发、气温连续性变量,以及耕地、林地、草地、水域、城市及居民用多、未利用土地的面积、常住人口、GDP、人均GDP、人口密度、粮食产量、城市污水处理率的等级数据的0-1行向量。将138个EXCEL文档数据合并成1个数据矩阵。将此矩阵记为TableX,作为深度学习的输入数据矩阵。More specifically, the meteorological parameters of the sub-basins and the socio-economic 0-1 matrix parameters are integrated into a data set. The data is stored in an EXCEL table, one EXCEL document for each sub-basin, for a total of 138 EXCEL documents. Each document contains 44 columns of data for 486 time periods. The variables include: rainfall, evaporation, temperature continuity variables, and 0-1 row vectors of the level data of cultivated land, forest land, grassland, water area, urban and residential land, unused land, permanent population, GDP, per capita GDP, population density, food production, and urban sewage treatment rate. The 138 EXCEL document data are merged into one data matrix. This matrix is recorded as TableX as the input data matrix for deep learning.
将子流域的水质参数整合成一个数据集。将138个子流域的水质数据合并成1个数据矩阵。矩阵的列,是5个水质变量。矩阵的行,为138个子流域、486天,共计67068行。将此矩阵记为TableY,作为深度学习的输出标签矩阵。The water quality parameters of the sub-basins are integrated into a data set. The water quality data of the 138 sub-basins are combined into a data matrix. The columns of the matrix are the five water quality variables. The rows of the matrix are 138 sub-basins and 486 days, totaling 67068 rows. This matrix is recorded as TableY as the output label matrix of deep learning.
对TableX和TableY进行标准化,采用最大最小标准化进行处理。矩阵中每个数据减去该数据所在列的最小值再除以该列的最大值和最小值之差。将矩阵TableX和TableY分为训练集和测试集两部分,70%的行作为训练集,形成TrainX和TrainY;30%的行作为测试集,形成TestX和TestY。TableX and TableY are standardized using the maximum and minimum standardization. Each data in the matrix is subtracted from the minimum value of the column where the data is located and then divided by the difference between the maximum and minimum values of the column. The matrices TableX and TableY are divided into two parts: training set and test set. 70% of the rows are used as training sets to form TrainX and TrainY; 30% of the rows are used as test sets to form TestX and TestY.
S9、构建全连接深度学习神经网络(DNN),并定义损失函数和迭代优化算法;S9. Build a fully connected deep learning neural network (DNN) and define the loss function and iterative optimization algorithm;
所述深度学习神经网络参数包括网络隐藏层数量k、每个隐藏层的节点数目n、激活函数、损失函数(Ioss)、准确率计算函数(metrics)、学习率(learning rate)、批量大小(batch size)、最大迭代次数(epochs)、节点随机丢弃比率(dropout)。该网络采用四层隐藏层,第一层节点数128,第二层节点数64,第三层节点数32,第四层节点数16;每层的激活函数采用Selu函数((scaled exponentiallinear units)。第四隐藏层采用节点随机丢弃机制。在本实施例中,丢弃率20%,学习率(learning rate)取值0.001;量大小(batchsize)取值64,最大迭代次数(epochs)取值1000。The deep learning neural network parameters include the number of hidden layers k, the number of nodes n in each hidden layer, the activation function, the loss function (Ioss), the accuracy calculation function (metrics), the learning rate (learning rate), the batch size (batch size), the maximum number of iterations (epochs), and the node random discard ratio (dropout). The network uses four hidden layers, with 128 nodes in the first layer, 64 nodes in the second layer, 32 nodes in the third layer, and 16 nodes in the fourth layer; the activation function of each layer uses the Selu function (scaled exponentiallinear units). The fourth hidden layer uses a node random discard mechanism. In this embodiment, the discard rate is 20%, the learning rate (learning rate) is 0.001; the batch size (batchsize) is 64, and the maximum number of iterations (epochs) is 1000.
所述定义损失函数和迭代优化算法具体为:损失函数采用均方误差(MeanSquared Error,MSE),准确率采用平均值误差(Mean Absolute Error,MAE)。计算公式如下式所示。训练采用Adam算法对训练过程进行优化。The loss function and iterative optimization algorithm are specifically defined as follows: the loss function uses mean square error (MSE), and the accuracy uses mean absolute error (MAE). The calculation formula is shown below. The training uses the Adam algorithm to optimize the training process.
其中,zi为实际值,yi为预测值。MSE和MAE值越小,模型的预测值与真实值越接近,模型的精度就越高,性能就越好。Among them, z i is the actual value and y i is the predicted value. The smaller the MSE and MAE values are, the closer the predicted value of the model is to the true value, the higher the accuracy of the model is, and the better the performance is.
S10、将训练集输入到深度学习神经网络中得到训练后的深度学习神经网络,接着将测试集输入到训练后的深度学习神经网络得到预测的水质参数;将预测的水质参数与实测的水质参数进行对比,调整模型参数,最终储存符合精度要求的深度学习神经网络及其参数数据;S10, inputting the training set into the deep learning neural network to obtain a trained deep learning neural network, and then inputting the test set into the trained deep learning neural network to obtain predicted water quality parameters; comparing the predicted water quality parameters with the measured water quality parameters, adjusting the model parameters, and finally storing the deep learning neural network and its parameter data that meet the accuracy requirements;
更具体地:将TrainX作为DNN网络的输入,将TrainY作为输出的标签,进行DNN网络的训练。基于训练数据迭代地更新DNN网络的权重,直到模型收敛。采用训练后的DNN网络,进行水质预测。More specifically: TrainX is used as the input of the DNN network, and TrainY is used as the output label to train the DNN network. The weights of the DNN network are iteratively updated based on the training data until the model converges. The trained DNN network is used to predict water quality.
将TestX输入到DNN网络,预测得到对应的水质参数。并将预测值进行反标准化得到实际值,将TestY反标准化得到水质实测值。然后,绘制预测的实际值与实测水质值的散点图,计算相关系数r的平方(r2),r2越大表示预测结果越准确。不断地将地表水水质预测值与实测值进行对比,计算对应的MSE和r2,并根据MSE和r2的值,调整模型参数,重新进行训练以保证准确度。储存符合精度要求的DNN网络及其权重等参数数据,为管理人员提供决策支持。反标准化公式如下:Input TestX into the DNN network and predict the corresponding water quality parameters. Denormalize the predicted values to get the actual values, and denormalize TestY to get the measured water quality values. Then, draw a scatter plot of the predicted actual values and the measured water quality values, and calculate the square of the correlation coefficient r (r 2 ). The larger the r 2 , the more accurate the prediction result. Continuously compare the predicted surface water quality values with the measured values, calculate the corresponding MSE and r 2 , and adjust the model parameters according to the values of MSE and r 2 , and retrain to ensure accuracy. Store the DNN network and its weights and other parameter data that meet the accuracy requirements to provide decision support for managers. The denormalization formula is as follows:
y′i=ymin+yi·(ymax-ymin) (20)y′ i =y min +y i ·(y max -y min ) (20)
其中,y′i为还原之后的预测值;yi为DNN网络输出的预测值。Among them, y′i is the predicted value after restoration; yi is the predicted value output by the DNN network.
S11、选取流域某一时段的气象和社会经济数据,输入到S10中存储的深度学习神经网络,预测得到所有子流域在该时段的水质参数值,包括溶解氧、COD锰、氨氮、总磷、总氮的浓度值。对每一个水质指标,绘制子流域水质的空间分布图,并将分布图转换为栅格数据格式。以流域河网为边界,提取河道位置处的栅格的水质指标数据,得到河网的水质分布图。在本实施例中,CODMn的预测值与实测值、某日CODMn的预测值空间分布分别如图2、图3所示。S11, select the meteorological and socio-economic data of a certain period of time in the basin, input it into the deep learning neural network stored in S10, and predict the water quality parameter values of all sub-basins in the period, including the concentration values of dissolved oxygen, COD manganese, ammonia nitrogen, total phosphorus, and total nitrogen. For each water quality index, draw a spatial distribution map of the water quality of the sub-basin, and convert the distribution map into a raster data format. Taking the river network of the basin as the boundary, extract the water quality index data of the grid at the river location to obtain the water quality distribution map of the river network. In this embodiment, the predicted value and measured value of CODMn, and the spatial distribution of the predicted value of CODMn on a certain day are shown in Figures 2 and 3, respectively.
与现有技术相比,本实施例的有益效果为:本实施例以珠江流域为例,采用土地利用面积、常住人口、GDP、人均GDP、人口密度、粮食产量、城市污水处理率的社会经济分类等级数据进行流域尺度大范围的水质预测,克服了传统水质预测中依靠数值模拟造成的计算范围小和计算速度慢的缺点,同时避免了传统统计学方法以及目前常用的循环神经网络进行水质预测对于输入数据连续性的严格要求,可更便捷地计算离散型的社会经济分类等级数据与连续变化的水质数据之间的定量关系,提高了预测的实用性。此外,该发明避免了采用社会经济数据进行水质预测时,社会经济统计数据缺失的问题,可以不需要准确的社会经济数据,仅采用社会经济分类等级数据即可进行预测,理论意义明确,操作简单易行,容易在实际水质管理中应用。Compared with the prior art, the beneficial effects of this embodiment are as follows: this embodiment takes the Pearl River Basin as an example, and adopts the socio-economic classification grade data of land use area, permanent population, GDP, per capita GDP, population density, grain output, and urban sewage treatment rate to predict water quality in a large range of the basin scale, which overcomes the shortcomings of small calculation range and slow calculation speed caused by relying on numerical simulation in traditional water quality prediction, and at the same time avoids the strict requirements of traditional statistical methods and currently commonly used recurrent neural networks for water quality prediction on the continuity of input data, and can more conveniently calculate the quantitative relationship between discrete socio-economic classification grade data and continuously changing water quality data, thereby improving the practicality of prediction. In addition, this invention avoids the problem of missing socio-economic statistical data when using socio-economic data for water quality prediction, and does not require accurate socio-economic data, and only socio-economic classification grade data can be used for prediction, which has clear theoretical significance, is simple and easy to operate, and is easy to apply in actual water quality management.
值得说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。It is worth noting that the above embodiments are only used to illustrate the technical solutions of the present invention rather than to limit it. Although the present invention has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solutions of the present invention can be modified or replaced by equivalents without departing from the purpose and scope of the technical solutions of the present invention, which should be included in the scope of the claims of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111044820.7A CN113723704B (en) | 2021-09-07 | 2021-09-07 | Water quality rapid prediction method based on continuous and graded mixed data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111044820.7A CN113723704B (en) | 2021-09-07 | 2021-09-07 | Water quality rapid prediction method based on continuous and graded mixed data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113723704A CN113723704A (en) | 2021-11-30 |
CN113723704B true CN113723704B (en) | 2023-04-18 |
Family
ID=78682340
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111044820.7A Active CN113723704B (en) | 2021-09-07 | 2021-09-07 | Water quality rapid prediction method based on continuous and graded mixed data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113723704B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109740680A (en) * | 2019-01-07 | 2019-05-10 | 深圳中创华安科技有限公司 | A kind of classification method and system of mixing value attribute examination & approval data |
CN111387938A (en) * | 2020-02-04 | 2020-07-10 | 华东理工大学 | A system for predicting the risk of death in patients with heart failure based on feature-rearranged one-dimensional convolutional neural network |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002337963A1 (en) * | 2001-10-22 | 2003-05-06 | Emery, Coppola, J. , Jr. | Neural network based predication and optimization for groundwater / surface water system |
CN109242203A (en) * | 2018-09-30 | 2019-01-18 | 中冶华天南京工程技术有限公司 | A kind of water quality prediction of river and water quality impact factors assessment method |
US11544530B2 (en) * | 2018-10-29 | 2023-01-03 | Nec Corporation | Self-attentive attributed network embedding |
CN112529234A (en) * | 2019-09-18 | 2021-03-19 | 上海交通大学 | Surface water quality prediction method based on deep learning |
CN112633645B (en) * | 2020-12-10 | 2023-05-02 | 长江水利委员会长江科学院 | Social and economic benefit accounting method for water resource mining and efficient utilization effects of river source arid region |
CN113011992B (en) * | 2021-03-19 | 2024-03-15 | 中国水利水电科学研究院 | River basin agricultural non-point source pollution river entering coefficient measuring and calculating method based on standard data |
-
2021
- 2021-09-07 CN CN202111044820.7A patent/CN113723704B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109740680A (en) * | 2019-01-07 | 2019-05-10 | 深圳中创华安科技有限公司 | A kind of classification method and system of mixing value attribute examination & approval data |
CN111387938A (en) * | 2020-02-04 | 2020-07-10 | 华东理工大学 | A system for predicting the risk of death in patients with heart failure based on feature-rearranged one-dimensional convolutional neural network |
Also Published As
Publication number | Publication date |
---|---|
CN113723704A (en) | 2021-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021238505A1 (en) | Federated learning-based regional photovoltaic power probability prediction method, and cooperative regulation and control system | |
CN115965496A (en) | Intelligent management method for watershed water environment | |
CN113344291B (en) | Urban inland inundation range forecasting method, device, medium and equipment | |
CN110619432B (en) | Feature extraction hydrological forecasting method based on deep learning | |
CN110363356A (en) | An Ecologically-Oriented Spatiotemporal Optimal Allocation Method of Water and Land Resources | |
Liu et al. | Optimal allocation of water quantity and waste load in the Northwest Pearl River Delta, China | |
CN113723703A (en) | Water quality prediction method and system based on multi-source data fusion and deep learning | |
CN117236199B (en) | Methods and systems for improving river and lake water quality and ensuring water security in urban water network areas | |
CN114357678B (en) | Novel optimization design method for regional underground water level monitoring network | |
Chen et al. | Application study of IFAS and LSTM models on runoff simulation and flood prediction in the Tokachi River basin | |
CN116933949A (en) | Water quality prediction method and system integrating hydrodynamic model and numerical model | |
Leela Krishna et al. | Optimal multipurpose reservoir operation planning using Genetic Algorithm and Non Linear Programming (GA-NLP) hybrid approach | |
CN110135652B (en) | Long-term flood season runoff prediction method | |
CN111428936A (en) | River basin rainfall flood availability index measuring and calculating method based on distributed water nodes | |
CN113723704B (en) | Water quality rapid prediction method based on continuous and graded mixed data | |
Zhang et al. | Flood drainage rights in watersheds based on the harmonious allocation method | |
Guo et al. | Automatic setting of urban drainage pipe monitoring points based on scenario simulation and fuzzy clustering | |
Zhuang et al. | Application of water quality evaluation model based on gray correlation analysis and artificial neural network algorithm | |
Chen et al. | Rainfall-runoff short-term forecasting method based on LSTM | |
Lyu et al. | Water level prediction model based on GCN and LSTM | |
Daming et al. | Evaluation method of sponge city potential based on neural network and fuzzy mathematical evaluation | |
CN113919533B (en) | Air quality traceability forecasting method based on artificial intelligence | |
Zhang et al. | Calculation of the water resources dynamic carrying capacity of Tarim River Basin under climate change | |
CN113744525A (en) | Traffic distribution prediction method based on feature extraction and deep learning | |
Feng et al. | Research on the risk of water shortages and the carrying capacity of water resources in Yiwu, China |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |