CN116401962A - Method for pushing optimal characteristic scheme of water quality model - Google Patents

Method for pushing optimal characteristic scheme of water quality model Download PDF

Info

Publication number
CN116401962A
CN116401962A CN202310668458.3A CN202310668458A CN116401962A CN 116401962 A CN116401962 A CN 116401962A CN 202310668458 A CN202310668458 A CN 202310668458A CN 116401962 A CN116401962 A CN 116401962A
Authority
CN
China
Prior art keywords
water quality
data
grid
representing
quality model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310668458.3A
Other languages
Chinese (zh)
Inventor
童山琳
陈杰
夏瑞
许崇育
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202310668458.3A priority Critical patent/CN116401962A/en
Publication of CN116401962A publication Critical patent/CN116401962A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A20/00Water conservation; Efficient water supply; Efficient water use
    • Y02A20/152Water filtration

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for deducing an optimal characteristic scheme of a water quality model, which comprises the following steps: determining environmental characteristics affecting water quality, acquiring meteorological data, socioeconomic status data, land utilization type data and night light intensity data of a research river basin, and preprocessing each item of data; collecting water quality observation data and interpolating the water quality observation data from site scale space to grid scale to obtain water quality grid data; calculating the correlation coefficient and the significance of the environmental characteristic variable and the water quality grid data, constructing a water quality model and optimizing the model super-parameters according to the characteristic data; based on the trained water quality model, calculating SHAP values of each environmental characteristic by adopting a SHAP method grid by grid; and (3) aggregating the global importance of all grid SHAP absolute values to calculate the environmental characteristics and the input sequence thereof, and making an optimal characteristic scheme of the water quality model. The invention can identify the importance of various environmental characteristics in the water quality change process, thereby providing an optimal characteristic scheme to improve the simulation effect.

Description

Method for pushing optimal characteristic scheme of water quality model
Technical Field
The invention belongs to the technical field of surface water environments, and particularly relates to a method for pushing an optimal characteristic scheme of a water quality model.
Background
Deterioration of water quality is a global dilemma. More than two-thirds of the world's population is at serious risk of water shortage, with deterioration of water quality being an important contributor. In order to clarify the complex coupling relationship between water quality changes and meteorological conditions, artificial emission and land management, a data processing tool with multiple nonlinear information analysis capability is required to reveal the influence of water quality and various environmental characteristics so as to predict the water quality condition under different environmental characteristics.
Machine learning algorithms are a tool widely used to capture the nonlinear effects of important environmental features on water quality changes. For example, machine learning algorithms such as random forests and gradient thrusters have been widely used to simulate the spatio-temporal distribution of water quality and to explain the importance of environmental features. However, machine learning algorithms have two drawbacks in revealing feature importance. Firstly, the robustness is insufficient, and changing the input sequence of the environmental features can influence the calculation result of the feature importance. Secondly, the interpretation is insufficient, the importance of the machine learning algorithm on the environmental features is measured by calculating the whole data set, and the global importance of each feature in the whole river basin can only be revealed, so that the feature importance of the regional area of the river basin can not be interpreted. Therefore, in order to robustly reveal the response degree of water quality to environmental feature changes to optimize the water quality simulation effect, a method for improving the robustness and the interpretability of the environmental feature importance calculation result needs to be proposed.
Disclosure of Invention
The invention aims to provide a method for deducing an optimal characteristic scheme of a water quality model, aiming at the defects of the prior art, and the method solves the technical problems of insufficient stability and interpretation of characteristic importance calculation results in the existing water quality model establishment based on a machine learning algorithm, and can identify the importance of various environmental characteristics in the water quality change process, thereby providing the optimal characteristic scheme to improve the simulation effect.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method for deducing an optimal characteristic scheme of a water quality model comprises the following steps:
step 1, determining environmental characteristics affecting the annual change trend of water quality, acquiring meteorological data, socioeconomic status data, land utilization type data and night light intensity data of a research river basin, and performing downscaling treatment on various environmental characteristic data;
step 2, collecting water quality observation data in a research flow field, and interpolating the water quality observation data from site scale space to grid scale to obtain water quality grid data;
step 3, calculating the correlation coefficient and the significance of the environmental characteristic variable and the water quality grid data, constructing a water quality model, selecting sample data containing the environmental characteristic variable and the water quality grid data which are obviously correlated, dividing the sample data into a training set and a testing set, adopting super parameters of the training set training model and the water quality model, and verifying through the testing set to obtain an optimal water quality model;
step 4: based on the optimal water quality model obtained in the step 3, calculating SHAP values of each environmental characteristic by adopting a SHAP method grid by grid;
step 5: and (3) aggregating the global importance of all grid SHAP absolute values to calculate the environmental characteristics and the input sequence thereof, thereby formulating an optimal characteristic scheme of the water quality model.
Further, in step 1, environmental characteristics affecting the annual trend of water quality are determined by consulting a computer database and library literature data in combination with an on-site visit survey.
Further, the specific method for processing the data in the step 1 is as follows:
step 1.1: collecting meteorological data, population data, land utilization type data and night light intensity data, resampling, and unifying the spatial resolution of various data;
step 1.2: and collecting domestic production total value data of the research river basin, and distributing the domestic production total value of the river basin to each grid by taking the night lamp light intensity as a distribution coefficient to obtain domestic production total value data of grid scale.
Further, the calculation formula of the domestic production total value data of the grid scale is as follows:
Figure SMS_1
wherein:GDP i is a gridiA kind of electronic deviceGDPA value;NLI i is a gridiNight light intensity values of (2);nthe total grid number of the research area under the unified resolution;GDPstudy area economic data published for the national statistical office.
Further, the step 2 specifically includes the following sub-steps:
step 2.1: according to the basin where the grid is located, selecting a water quality monitoring section of the same basin to provide a water quality reference value for grid interpolation;
step 2.2: according to the physical flow direction characteristics of the river, the site scale data is interpolated into the grid scale by adopting an inverse distance weight interpolation method with a weight coefficient of 1, and the calculation formula is as follows:
Figure SMS_2
Figure SMS_3
wherein:W i,j represented in a gridjIn the interpolation calculation of (a), the section is monitorediThe weight value of the water quality concentration value is dimensionless;d i,j representing grid points requiring interpolationjAnd cross sectioniIs a horizontal straight line distance of (2);irepresenting the intra-stream fieldiMonitoring sections of water quality;nrepresenting the total number of water quality monitoring sections in the river basin;C j representing a gridjIs a pollution index concentration of (1);C i representing water quality monitoring sectioniIs a pollution index concentration of (a).
Further, step 3 comprises the following sub-steps:
step 3.1: calculating the correlation coefficient of the water quality grid data and the environmental characteristic data by adopting a Pearson correlation coefficient method, and calculating the significance of the water quality grid data and the environmental characteristic data;
step 3.2: constructing an XGBoost model as a water quality model, and randomly dividing samples with significantly related water quality grid data and environment characteristic data into two subsets, wherein one part of the training subsets and the other part of the training subsets are used as test subsets;
step 3.3: inputting the training set into a machine learning algorithm for training, and optimizing the super parameters of the water quality model by adopting a Bayesian optimizing method in the training process;
step 3.4: inputting the optimized super parameters into a water quality model, inputting a training set into the water quality model after super parameter optimization, verifying the water quality model by adopting a ten-fold cross verification method, and finally further verifying the water quality model after verification by adopting a ten-fold cross verification method by adopting a test set, wherein the optimal water quality model is preferably obtained.
Further, in step 3.1, a significant level of no more than 0.05 is set as significant correlation.
Further, in step 3.4, the variance interpretation rate is used as an evaluation index of the super parameter, and the calculation formula of the variance interpretation rate is as follows:
Figure SMS_4
wherein:EVSrepresenting the variance interpretation rate;yrepresenting observations from the test subset;
Figure SMS_5
representing simulated values from a water quality model;Varrepresenting the variance.
Further, the SHAP value of each feature is calculated grid by using a SHAP method, and the calculation formula of the SHAP method is as follows:
Figure SMS_6
in the method, in the process of the invention,
Figure SMS_7
representing the first in the gridiSHAP values of term features, i.e., average of marginal contribution;Frepresenting the total number of features;Srepresenting all possible feature subsets except feature i; />
Figure SMS_8
Representing the sub-setSPerforming permutation and combination calculation to obtain a weight factor; />
Figure SMS_9
Representing the sub-setSGiving the expected output value; />
Figure SMS_10
Representing the sub-setSAnd featuresiTogether with the expected output value.
Further, step 5 specifically includes:
step 5.1: calculating the SHAP absolute value of each feature according to the SHAP value calculated in the step 4, and calculating the average value of all grid SHAP absolute values to be used as the global importance of the feature;
step 5.2: and identifying key environmental features according to the importance of each feature, determining the secondary sequence of each environmental feature input into a machine learning algorithm according to the importance, and formulating the machine learning algorithm to construct an optimal feature scheme of the water quality model.
Compared with the prior art, the invention has the beneficial effects that:
1. evaluating the importance of environmental features based on SHAP values, taking into account not only the influence of a single feature but also the possible synergistic effect between features, and overcoming the multiple collinearity problem between multiple features;
2. the invention adopts the SHAP method to calculate the feature importance, and endows each environmental feature in each grid with a SHAP value, wherein the SHAP value is the importance of each feature in the grid; the positive and negative relations between the SHAP value and the water quality concentration value can be utilized to explain the promotion or inhibition effect of various environmental characteristics on the water quality change, so that the change trend of water quality pollution and the response degree and the space difference of the water quality pollution to various environmental characteristics are comprehensively displayed, and the interpretability of a water quality model based on a machine learning algorithm is greatly improved;
3. in various feature selection and combination schemes, key features influencing water quality change and input sequences thereof are determined by aggregating absolute values of SHAP values of all grids and comparing absolute value average values, an optimal feature input scheme is constructed to train a water quality model, and errors caused by model training time and information gain are reduced on the premise of guaranteeing model training effects.
Drawings
FIG. 1 is a flow chart of a method for evaluating an optimal characteristic scheme of a water quality model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the mean of SHAP absolute values of the global importance of an analysis feature in an embodiment of the present invention.
Detailed Description
The technical solutions of the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention will be further illustrated, but is not limited, by the following examples.
As shown in FIG. 1, the embodiment of the invention provides a method for evaluating an optimal characteristic scheme of a water quality model, which comprises the following steps:
step 1, determining environmental characteristics affecting the annual change trend of water quality, acquiring meteorological data, socioeconomic status data, land utilization type data and night light intensity data of a research river basin, and performing downscaling treatment on various environmental characteristic data; in this embodiment, this step includes the following sub-steps:
step 1.1: consulting a computer database and library literature data, and determining environmental characteristics affecting the annual change trend of the water quality of the river basin, namely meteorological data, underlying land utilization data, socioeconomic status data and night light intensity data by combining with the field visit survey;
step 1.2: collecting grid-scale rainfall data and air temperature data, and interpolating by adopting a nearest-neighbor domain method to obtain rainfall and air temperature data of a Jiang river basin, wherein in the embodiment, the unified spatial resolution is 0.25 degrees;
step 1.3: collecting land use data, population data and night light intensity data with grid scale, resampling by using a GIS tool, and resampling the high-resolution land use data, population data and night light intensity data to 0.25-degree spatial resolution; counting the occupation ratio of forest area, bush area, city area and farmland area in each grid;
step 1.4: collecting total domestic production (GDP) data published by China national statistical office, and using night lamp light intensity data as space distribution coefficient to obtain 0.25 degree resolution grid scale GDP data
Figure SMS_11
;(1)
In the middle ofGDP i Is a gridiA kind of electronic deviceGDPValue, in hundred million yuan;NLI i is a gridiIs dimensionless;nthe total grid number of the Jiang river basin under the condition of 0.25 resolution;GDPeconomic data of Jialing river basin published by national statistical bureau are in hundred million yuan.
Step 2, collecting water quality observation data in a research flow field, and interpolating the water quality observation data from site scale space to grid scale to obtain grid data;
in this embodiment, sampling data of ammonia nitrogen concentration in the river basin of the Jiang river in the research area is obtained, and the water quality index includes but is not limited to ammonia nitrogen index. According to the basin where the grid is located, selecting a water quality monitoring section of the same sub-basin to provide a water quality reference value for grid interpolation;
according to the physical characteristics of the one-dimensional flow direction of the water flow from top to bottom, adopting an inverse distance weight interpolation method with a coefficient of 1 to interpolate site scale data into grid data with 0.25 degree multiplied by 0.25 degree spatial resolution;
Figure SMS_12
; (2)
Figure SMS_13
;(3)
wherein:W i,j represented in a gridjIn the interpolation calculation of (a), the section is monitorediThe weight value of the water quality concentration value is dimensionless;d i,j representing grid points requiring interpolationjAnd cross sectioniIs a horizontal straight line distance of km;irepresenting the intra-stream fieldiMonitoring sections of water quality;nrepresenting the total number of water quality monitoring sections in the river basin;C j representing a gridjIs a pollution index concentration of mg/L;C i representing water quality monitoring sectioniIs a pollution index concentration of mg/L.
Calculating correlation coefficients and significance of environmental characteristic variables and water quality grid data, constructing a water quality model, selecting the environmental characteristic variables and the water quality grid data which are obviously correlated, dividing the environmental characteristic variables and the water quality grid data into a training set and a testing set, training model parameters of the water quality model by adopting the training set, and verifying by the testing set to obtain optimal model parameters;
step 3.1: the pearson correlation coefficient method is adopted to calculate the correlation coefficient of the water quality grid index data and the environment characteristic variable, calculate the significance of the water quality grid index data and verify whether the water quality grid index data and the environment characteristic variable are significant at the level of 0.05, and the calculation formula of the correlation coefficient is as follows:
Figure SMS_14
;(4)
in the method, in the process of the invention,Ras the correlation coefficient(s),nto investigate the total number of grids in the flow domain,x i is the firstiAmmonia nitrogen concentration of each grid, mg/L;
Figure SMS_15
is the average value of ammonia nitrogen concentration, mg/L;y i is the firstiEnvironmental characteristic variable values of the individual grids; />
Figure SMS_16
Is the mean value of the environment characteristic variables. In this example, ammonia of grid scale is calculatedThe correlation coefficients of nitrogen concentration and total annual precipitation, annual average air temperature, population, GDP, forest area ratio, bush area ratio, cultivated area ratio and urban area ratio are respectively 0.45, 0.42, 0.25, -0.01, -0.16, 0.11, 0.05 and 0.05, and the significant level is not more than 0.05;
step 3.2: constructing a water quality model by adopting an XGBoost algorithm, randomly dividing samples containing water quality grid data and environment characteristic data which are obviously related into two subsets, wherein one part of the samples is used as a training subset, and the other part of the samples is used as a testing subset, and in the embodiment, 90% of the samples are selected as the training subset, and 10% of the samples are selected as the testing subset; wherein, the mathematical expression of the XGBoost model is as follows:
Figure SMS_17
;(5)
wherein:
Figure SMS_18
and->
Figure SMS_19
Representing grids respectivelyiRespectively at time intervals oftAndt-1is a predicted value of (2);x i representing a gridiIs provided with input sample data; />
Figure SMS_20
Is shown in the firstkThe first in the treeiScoring of the grid sample; />
Figure SMS_21
Representing an optimization objective function for measuring the degree of fitting the model to the training data;
step 3.3: inputting the training set into a water quality model for training, and optimizing the super parameters of the water quality model by adopting a Bayesian optimizing method in the model training process;
step 3.4: inputting the optimized super parameters into a water quality model, inputting a training set into the water quality model after super parameter optimization, verifying the water quality model by adopting a ten-fold cross verification method, and finally further verifying the water quality model after verification by adopting a test set, wherein the optimal water quality model is optimized; in this embodiment, the simulation effect of the water quality model is evaluated by using the variance interpretation rate at the time of test set verification, and the calculation formula of the variance interpretation rate is as follows:
Figure SMS_22
;(6)
wherein:EVSrepresenting the variance interpretation rate; y represents an observed value of ammonia nitrogen concentration from the test subset;
Figure SMS_23
representing an ammonia nitrogen concentration simulation value from an XGBoost water quality model;Varrepresenting the variance.
The variance interpretation rate of the trained simulated ammonia nitrogen concentration water quality model in the embodiment is 0.835, and the optimal values of all super parameters are respectively as follows: nrounds=355, maximum depth=18, subsamples= 0.8576, learning rate= 0.9872;
calculating the importance of the environmental characteristic variable in the optimized water quality model by adopting a SHAP (hapley additive explanation) method, quantifying the contribution degree of the environmental characteristic variable to a prediction result by utilizing the SHAP value obtained by calculation, and revealing the positive and negative influence of the characteristic change on the water quality change;
analyzing the combination of the environmental characteristic variable and the characteristic subset thereof by using a SHAP method, and obtaining the marginal influence of the characteristic on the grid as the importance of the characteristic by calculating and considering the characteristic and not considering the difference value of the characteristic on the ammonia nitrogen concentration simulation result; calculating SHAP values of each feature by grid to obtain positive and negative contribution degrees of each feature in the grid to ammonia nitrogen concentration prediction;
specifically, the specific calculation formula of the SHAP value of each environmental characteristic variable is as follows:
Figure SMS_24
;(7)
in the method, in the process of the invention,
Figure SMS_25
representing the first in the gridiSHAP values of term features, i.e., average of marginal contribution;Frepresenting the total number of features;Srepresenting all possible feature subsets except feature i; />
Figure SMS_26
Representing the sub-setSPerforming permutation and combination calculation to obtain a weight factor; />
Figure SMS_27
Representing the sub-setSGiving the expected output value; />
Figure SMS_28
Representing the sub-setSAnd featuresiTogether with the expected output value.
Step 5: determining key environmental characteristic variables and input sequences thereof according to SHAP values of the environmental characteristic variables, and making an optimal characteristic scheme of the water quality model;
step 5.1: calculating the average value of all grid SHAP absolute values as the global importance of the feature;
the importance of environmental characteristic variables of different areas on the change of ammonia nitrogen concentration can be explained by showing the |SHAP| of each grid, so that the interpretability of constructing a water quality model based on a machine learning algorithm is improved;
the importance of the characteristic variables is obtained by aggregating the |SHAP|average value calculation of all grids, and the consistency of the global importance calculation result and the feature importance of each grid is ensured, so that the causal relationship between the interpretable environmental characteristic variables and the water quality change which are more in line with human cognition is constructed;
step 5.2: and identifying key environment characteristic variables according to the importance of each characteristic variable, and determining the secondary order of each environment characteristic variable input into the water quality model according to the importance, wherein the result is shown in figure 2, so that an optimal characteristic variable scheme for constructing the water quality model is obtained.
Specifically, in this embodiment, the optimal characteristic variables for simulating ammonia nitrogen concentration in the river basin of the jaggy river and the input sequence thereof are as follows: annual average air temperature, arable land occupancy, forest occupancy, annual total precipitation, shrub occupancy, urban occupancy, population and GDP.
The foregoing is merely illustrative of the preferred embodiments of the present invention and is not intended to limit the embodiments and scope of the present invention, and it should be appreciated by those skilled in the art that equivalent substitutions and obvious variations may be made using the teachings of the present invention, which are intended to be included within the scope of the present invention.

Claims (10)

1. The method for deducing the optimal characteristic scheme of the water quality model is characterized by comprising the following steps:
step 1, determining environmental characteristics affecting the annual change trend of water quality, acquiring meteorological data, socioeconomic status data, land utilization type data and night light intensity data of a research river basin, and performing downscaling treatment on various environmental characteristic data;
step 2, collecting water quality observation data in a research flow field, and interpolating the water quality observation data from site scale space to grid scale to obtain water quality grid data;
step 3, calculating the correlation coefficient and the significance of the environmental characteristic variable and the water quality grid data, constructing a water quality model, selecting sample data containing the environmental characteristic variable and the water quality grid data which are obviously correlated, dividing the sample data into a training set and a testing set, adopting super parameters of the training set training model and the water quality model, and verifying through the testing set to obtain an optimal water quality model;
step 4: based on the optimal water quality model obtained in the step 3, calculating SHAP values of each environmental characteristic by adopting a SHAP method grid by grid;
step 5: and (3) aggregating the global importance of all grid SHAP absolute values to calculate the environmental characteristics and the input sequence thereof, thereby formulating an optimal characteristic scheme of the water quality model.
2. The method for deriving optimal characterization solutions for water quality models according to claim 1, wherein in step 1, environmental characteristics affecting the annual trend of water quality are determined by referring to a computer database and library literature, in combination with on-site interview investigation.
3. The method for evaluating the optimal characteristic scheme of the water quality model according to claim 1, wherein the method for processing data in the step 1 is as follows:
step 1.1: collecting meteorological data, population data, land utilization type data and night light intensity data, resampling, and unifying the spatial resolution of various data;
step 1.2: and collecting domestic production total value data of the research river basin, and distributing the domestic production total value of the river basin to each grid by taking the night lamp light intensity as a distribution coefficient to obtain domestic production total value data of grid scale.
4. The method for deriving an optimal characterization scheme for a water quality model according to claim 3, wherein the calculation formula of the domestic total production value data of the grid scale is:
Figure QLYQS_1
wherein:GDP i is a gridiA kind of electronic deviceGDPA value;NLI i is a gridiNight light intensity values of (2);nthe total grid number of the research area under the unified resolution;GDPstudy area economic data published for the national statistical office.
5. The method for estimating an optimal characteristic scheme of a water quality model according to claim 1, wherein the step 2 specifically comprises the following sub-steps:
step 2.1: according to the basin where the grid is located, selecting a water quality monitoring section of the same basin to provide a water quality reference value for grid interpolation;
step 2.2: according to the physical flow direction characteristics of the river, the site scale data is interpolated into the grid scale by adopting an inverse distance weight interpolation method with a weight coefficient of 1, and the calculation formula is as follows:
Figure QLYQS_2
Figure QLYQS_3
wherein:W i,j represented in a gridjIn the interpolation calculation of (a), the section is monitorediThe weight value of the water quality concentration value is dimensionless;d i,j representing grid points requiring interpolationjAnd cross sectioniIs a horizontal straight line distance of (2);irepresenting the intra-stream fieldiMonitoring sections of water quality;nrepresenting the total number of water quality monitoring sections in the river basin;C j representing a gridjIs a pollution index concentration of (1);C i representing water quality monitoring sectioniIs a pollution index concentration of (a).
6. The method for estimating an optimal characteristic scheme of a water quality model according to claim 1, wherein the step 3 comprises the following sub-steps:
step 3.1: calculating the correlation coefficient of the water quality grid data and the environmental characteristic data by adopting a Pearson correlation coefficient method, and calculating the significance of the water quality grid data and the environmental characteristic data;
step 3.2: constructing an XGBoost model as a water quality model, and randomly dividing samples with significantly related water quality grid data and environment characteristic data into two subsets, wherein one part of the training subsets and the other part of the training subsets are used as test subsets;
step 3.3: inputting the training set into a machine learning algorithm for training, and optimizing the super parameters of the water quality model by adopting a Bayesian optimizing method in the training process;
step 3.4: inputting the optimized super parameters into a water quality model, inputting a training set into the water quality model after super parameter optimization, verifying the water quality model by adopting a ten-fold cross verification method, and finally further verifying the water quality model after verification by adopting a ten-fold cross verification method by adopting a test set, wherein the optimal water quality model is preferably obtained.
7. The method according to claim 6, wherein in step 3.1, the significance level is set to be not more than 0.05 as significant correlation.
8. The method for deriving an optimal characterization scheme according to claim 6, wherein in step 3.4, a variance interpretation rate is used as an evaluation index of the super parameter, and a calculation formula of the variance interpretation rate is as follows:
Figure QLYQS_4
wherein:EVSrepresenting the variance interpretation rate;yrepresenting observations from the test subset;
Figure QLYQS_5
representing simulated values from a water quality model;Varrepresenting the variance.
9. The method for deriving an optimal characterization scheme for a water quality model according to claim 1, wherein a SHAP method is used to calculate SHAP values for each feature on a grid-by-grid basis, and the SHAP method has the following calculation formula:
Figure QLYQS_6
in the method, in the process of the invention,
Figure QLYQS_7
representing the first in the gridiSHAP values of term features, i.e., average of marginal contribution;Frepresenting the total number of features;Srepresenting all possible feature subsets except feature i; />
Figure QLYQS_8
Representing the sub-setSPerforming permutation and combination calculation to obtain a weight factor; />
Figure QLYQS_9
Representing the sub-setSGiving the expected output value; />
Figure QLYQS_10
Representing the sub-setSAnd featuresiTogether with the expected output value.
10. The method for estimating an optimal characteristic scheme of a water quality model according to claim 1, wherein the step 5 specifically comprises:
step 5.1: calculating the SHAP absolute value of each feature according to the SHAP value calculated in the step 4, and calculating the average value of all grid SHAP absolute values to be used as the global importance of the feature;
step 5.2: and identifying key environmental features according to the importance of each feature, determining the secondary sequence of each environmental feature input into a machine learning algorithm according to the importance, and formulating the machine learning algorithm to construct an optimal feature scheme of the water quality model.
CN202310668458.3A 2023-06-07 2023-06-07 Method for pushing optimal characteristic scheme of water quality model Pending CN116401962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310668458.3A CN116401962A (en) 2023-06-07 2023-06-07 Method for pushing optimal characteristic scheme of water quality model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310668458.3A CN116401962A (en) 2023-06-07 2023-06-07 Method for pushing optimal characteristic scheme of water quality model

Publications (1)

Publication Number Publication Date
CN116401962A true CN116401962A (en) 2023-07-07

Family

ID=87009066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310668458.3A Pending CN116401962A (en) 2023-06-07 2023-06-07 Method for pushing optimal characteristic scheme of water quality model

Country Status (1)

Country Link
CN (1) CN116401962A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933982A (en) * 2023-09-15 2023-10-24 北京金水永利科技有限公司 Method and system for evaluating influence of rainfall on river water quality

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757604A (en) * 2022-11-25 2023-03-07 河南理工大学 GDP (generalized projection) space-time evolution analysis method based on noctilucent image data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757604A (en) * 2022-11-25 2023-03-07 河南理工大学 GDP (generalized projection) space-time evolution analysis method based on noctilucent image data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LOUIS DE MESNARD: "Pollution models and inverse distance weighting: Some critical remarks", COMPUTERS&GEOSCIENCES, pages 459 - 469 *
SHANLIN TONG, ET AL: "A novel framework to improve the consistency of water quality attribution from natural and anthropogenic factors", JOURNAL OF ENVIRONMENTAL MANAGEMENT, pages 1 - 10 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116933982A (en) * 2023-09-15 2023-10-24 北京金水永利科技有限公司 Method and system for evaluating influence of rainfall on river water quality
CN116933982B (en) * 2023-09-15 2023-11-28 北京金水永利科技有限公司 Method and system for evaluating influence of rainfall on river water quality

Similar Documents

Publication Publication Date Title
CN112905560B (en) Air pollution prediction method based on multi-source time-space big data deep fusion
Wang et al. Spatial economic dependency in the Environmental Kuznets Curve of carbon dioxide: The case of China
Li et al. An extended cellular automaton using case‐based reasoning for simulating urban development in a large complex region
Bezak et al. Reconstruction of past rainfall erosivity and trend detection based on the REDES database and reanalysis rainfall
CN104764868B (en) A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression
Chen et al. Stochastic generation of daily precipitation amounts: review and evaluation of different models
CN114254802B (en) Prediction method for vegetation coverage space-time change under climate change drive
CN109800921B (en) Regional winter wheat yield estimation method based on remote sensing phenological assimilation and particle swarm optimization
CN116401962A (en) Method for pushing optimal characteristic scheme of water quality model
CN117116382A (en) Water quality space-time prediction method and system for water-bearing lake under influence of diversion engineering
Verma et al. Comparative analysis of CMIP5 and CMIP6 in conjunction with the hydrological processes of reservoir catchment, Chhattisgarh, India
CN114357737B (en) Agent optimization calibration method for time-varying parameters of large-scale hydrologic model
CN115345069A (en) Lake water volume estimation method based on maximum water depth record and machine learning
Sweeney et al. Statistical challenges in estimating past climate changes
CN117078114B (en) Water quality evaluation method and system for water-bearing lakes under influence of diversion engineering
CN110716998A (en) Method for spatializing fine-scale population data
CN113901348A (en) Oncomelania snail distribution influence factor identification and prediction method based on mathematical model
Zhang et al. A weighted ensemble of regional climate projections for exploring the spatiotemporal evolution of multidimensional drought risks in a changing climate
Fuentes et al. Statistical assessment of numerical models
CN115510763A (en) Air pollutant concentration prediction method and system based on data-driven exploration
CN113610436A (en) Disaster-bearing body dynamic vulnerability assessment method and system
Lu et al. Auto station precipitation data making up using an improved neuro net
CN117993305B (en) Dynamic evaluation method for river basin land utilization and soil erosion relation
CN118072865B (en) Organic degradation material distribution model construction method and system
CN103955953A (en) Terrain collaborative variable selection method for digital soil cartography

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230707

RJ01 Rejection of invention patent application after publication