CN117111092A - High-spatial-resolution remote sensing water quality detection method based on machine learning - Google Patents

High-spatial-resolution remote sensing water quality detection method based on machine learning Download PDF

Info

Publication number
CN117111092A
CN117111092A CN202310885732.2A CN202310885732A CN117111092A CN 117111092 A CN117111092 A CN 117111092A CN 202310885732 A CN202310885732 A CN 202310885732A CN 117111092 A CN117111092 A CN 117111092A
Authority
CN
China
Prior art keywords
water quality
water
index
remote sensing
red
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310885732.2A
Other languages
Chinese (zh)
Inventor
宋挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Mengbole Information Technology Co ltd
Original Assignee
Suzhou Mengbole Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Mengbole Information Technology Co ltd filed Critical Suzhou Mengbole Information Technology Co ltd
Priority to CN202310885732.2A priority Critical patent/CN117111092A/en
Publication of CN117111092A publication Critical patent/CN117111092A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/55Specular reflectivity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Pathology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Medical Informatics (AREA)
  • Remote Sensing (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention relates to a high spatial resolution remote sensing water quality detection method based on machine learning, which is characterized in that satellite data water pixels of cloudless, water bloom-free and water-free plants are screened, a plurality of characteristics such as a floating algae index, a water body cleaning index, an FUI color index and the like are introduced by combining automatic station water quality monitoring data, and a random forest machine learning algorithm is used for carrying out remote sensing inversion including water quality parameters such as total phosphorus, total nitrogen, permanganate index, turbidity, conductivity, blue algae density and the like on water bodies such as lakes and rivers in a research area. The processing result is displayed in a chart mode of single-day remote sensing water quality monitoring, long-time sequence statistics and month-by-month monitoring, so that the remote sensing water quality monitoring is rapidly, accurately and automatically analyzed. The invention realizes the long-term effective monitoring of the remote sensing multiple water quality parameters, has obvious advantages for inland water bodies, especially small and medium-sized lakes and rivers, and is beneficial to the real-time and rapid monitoring of the remote sensing water quality and the protection of the water bodies of people with larger scale.

Description

High-spatial-resolution remote sensing water quality detection method based on machine learning
Technical Field
The invention belongs to the technical field of water detection, and particularly relates to a high-spatial-resolution remote sensing water quality detection method based on machine learning.
Background
Water crisis is fifth ranked in the ten major risks worldwide, mainly including water pollution and water resource shortage. The water pollution is a complex problem, so that the economic development and the establishment of economic policies are severely restricted, and the physical health of people is threatened. The water quality monitoring is an important basis for water quality evaluation and water pollution control, and along with the increasing serious problem of water pollution, the requirements for dynamic monitoring and evaluation of water pollution are also urgent, and the adoption of accurate, rapid and low-cost water quality monitoring means is particularly important.
The traditional water quality monitoring is to obtain accurate water quality parameters in situ by arranging sampling points, but the analysis process is complex, the timeliness and the frequency of data can not meet the requirements of management and decision, and especially the support for sudden and large-scale water environment events and water quality pollution traceability investigation is very limited. The automatic water quality monitoring technology can monitor the target water body in real time all the weather, can effectively reflect the current situation of water quality and the change condition of the water quality, but has high manufacturing cost, and has extremely high economic cost for a large-range water area needing to be provided with a plurality of monitoring points.
The satellite remote sensing has the characteristics of large scale, high frequency and low cost, is very suitable for water quality monitoring, is more practical and economical compared with ground monitoring, can effectively overcome the defects of traditional and automatic monitoring, and can be easily integrated into geographic information.
The remote sensing water quality parameter inversion research mainly comprises water color water quality parameters such as suspended matters, yellow substances, chlorophyll a and the like, and non-water color water quality parameters such as total phosphorus, total nitrogen, conductivity, permanganate index and the like. The key of remote sensing water quality inversion is to analyze the relation between the water-leaving radiation received by the satellite sensor and the water quality parameter concentration. Generally, water quality parameter inversion based on remote sensing is mainly classified into four types, namely, a biological optical model, an empirical model, a semi-empirical semi-analytical model, and a machine learning model. Since the application of water quality remote sensing inversion, common water quality parameter inversion researches mainly aim at water color water quality parameters with an optical mechanism and obvious spectral response, and non-water color water quality parameters such as nitrogen, phosphorus and the like have no obvious spectral characteristics, so that the conventional remote sensing inversion method is limited in precision and difficult to have universality.
In recent years, machine learning algorithms have proven to have powerful feature recognition and learning capabilities, and complex networks and structures can be used to capture rich features of input data and obtain explicit relationships to output variables. Therefore, the method can effectively capture the spectral characteristics of different water bodies, comprehensively analyze the potential relation between the spectral characteristics and the water quality parameters, and provide good technical support for large-scale and long-term water quality parameter inversion of satellites.
In addition, the traditional remote sensing means have the defects of large data downloading amount, long processing time, high storage requirement and the like, and the defects also prevent the remote sensing monitoring of the water quality parameters in a long time sequence from being further popularized. At present, there is an urgent need for long-term monitoring of large-scale remote sensing water quality, and how to realize large-scale long-term water quality monitoring of inland water bodies, especially small-area water bodies, becomes a technical problem to be solved at present.
Disclosure of Invention
The invention aims to provide a high-spatial-resolution remote sensing water quality detection method based on machine learning, which can perform rapid remote sensing inversion and monitoring on water quality parameters including total phosphorus, total nitrogen, permanganate index, turbidity, conductivity, blue algae density and the like of lakes, rivers and the like to realize rapid and automatic analysis of the high-spatial-resolution remote sensing water quality inversion and monitoring,
the method solves the defects of poor universality, large data processing capacity and long flow period of remote sensing water quality inversion in the prior art, can be used for particularly rapidly monitoring water quality in large-scale and long-time sequences, and is more beneficial to remote sensing water quality detection of water bodies with smaller areas.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a high spatial resolution remote sensing water quality detection method based on machine learning comprises the following steps:
s1, obtaining data, namely obtaining the reflectivity data of the atmosphere bottom layer of a region to be detected, which is subjected to radiometric calibration and atmosphere correction, through a Sentinel-2 satellite;
s2, data processing, namely performing cloud pixel, bloom and aquatic plant pixel processing on the data in the S1, extracting water pixels, introducing an improved normalized water Index, a normalized vegetation Index, a floating algae Index, a black and odorous water difference Index, an improved black and odorous ratio Index, a water cleaning Index, a FUI color Index, an Index1 and an Index2 wave band combination as characteristic variables based on the reflectivity of each wave band, performing model training, and performing time and space matching with ground monitoring points to form an effective data sample set;
s3, data training and model construction, wherein a sample set of each water quality parameter is divided into a training set, a verification set and a test set according to the ratio of 7:2:1, random forest training and modeling are carried out, parameter adjustment is carried out by taking program operation efficiency and test set precision as principles, and finally, the characteristic set and model parameters of each water quality parameter are determined.
Preferably, the cloud pixel processing in S2 further includes the following steps:
s21, adopting a single-band threshold R rs (664nm)>0.2 as a cloud mask for cloud interference filtering.
Preferably, the S2 bloom and aquatic plant pixel treatment comprises the steps of:
s22, filtering water bloom and aquatic plant interference in a region to be detected based on a normalized vegetation coverage index, wherein a threshold value is set to 0, and the water bloom and the aquatic plant interference are taken as characteristic variables to participate in machine learning model construction, and the normalized vegetation coverage index has the formula:
NDVI=(R rs (nir)-R rs (red))/(R rs (nir)+R rs (red)) (1)
wherein R is rs (red)、R rs (nir) is the reflectance of red and near-red bands, respectively, corresponding to B4 (665 nm) and B8 (833 nm) of the sensor-2.
Preferably, the extracting the water body pixels in the S2 includes the following steps:
s23, vector extraction is carried out on the water boundary of the area to be detected based on an improved normalized water index, the threshold value is set to be 0, and the water boundary is taken as a characteristic variable to participate in machine learning model construction, wherein the formula of the improved normalized water index is as follows:
MNDWI=(R rs (green)-R rs (swir))/(R rs (green)+R rs (swir)) (2)
wherein R is rs (green)、R rs (swir) is the reflectance of the green and short wave infrared bands, respectively, corresponding to B3 (560 nm) and B11 (1613 nm) of sentinel-2. 5. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 4, wherein the processing of the floating algae index characteristic variable introduced in S2 comprises the following steps:
s24, using a floating algae index as one of input features of model training, wherein the formula of the floating algae index is as follows:
FAI=R rs (nir)-R′ rc (nir)
wherein R is rs (red)、R rs (nir)、R rs (swir) is the reflectance of the red, near-red and short-wave infrared bands, respectively, corresponding to B4 (664 nm), B8 (833 nm) and B11 (1613 nm) of sensor-2.
Preferably, the processing of the black and odorous water body difference value index, the water body cleaning index and the improved black and odorous ratio index characteristic variable introduced in the step S2 comprises the following steps:
s25, a black and odorous water body difference value index, a water body cleaning index and an improved black and odorous ratio index are indexes set up for identifying water color abnormality, and are DBWI, WCI and BOI respectively, and the calculation formulas corresponding to the sensor-2 are as follows:
DBWI=(R rs (green)-R rs (red))/(R rs (green)+R rs (red))
BOI=(R rs (green)-R rs (red))/(R rs (blue)+R rs (green)+R rs (red)) (4)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
Preferably, the Index1 and Index2 characteristic variable processes introduced in S2 include the following steps:
s26, referring to two custom indexes of Index1 and Index2 as input features of model training, wherein the formulas of the Index1 and the Index2 are as follows:
Index1=R rs (green)/R rs (blue)
Index2=R rs (green)/R rs (red) (5)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
Preferably, the processing of the FUI color index feature variable introduced in S2 includes the following steps:
s27, using the CIE 1931-XYZ color space system, the B11 (1613 nm) -B8 (833 nm) -B2 (442 nm) band combinations using the sentinel-2MSI are assigned to B, G and R channels, respectively, and then CIE chromaticity coordinates (x, y) are calculated by normalizing X, Y and Z to between 0 and 1, wherein the conversion from RGB to X, Y, Z can be expressed as:
X=2.7689×R+1.7517×G+1.1302×B
Y=1.0000×R+4.5907×G+0.06001×B
Z=0.0000×R+0.0565×G+5.5934×B (6)
the CIE chromaticity coordinates x, y, z are normalized tristimulus values calculated by equation (6):
finally, the hue angle α representing any pair of coordinates (x ', y') of the upstream radiation spectrum is calculated:
wherein ARCTAN2 is a four-quadrant arctangent function that allows alpha to range from-180 deg. to 180 deg., adding 180 deg. to convert its range to 0 deg. to 360 deg., and an increase in hue angle alpha indicates a color change from blue to red.
Preferably, the time and space matching processing between each remote sensing feature in the S2 and the ground monitoring point includes the following steps:
and S28, screening data to obtain a high-quality satellite-ground synchronous training data set, wherein the automatic station selects Beijing time when the sensor-2 passes the border or monitoring data similar to the Beijing time, and combines the remote sensing characteristic set with ground monitoring data, which are matched in time, by taking the same date as a screening condition to form the training data set.
Preferably, the data training and model construction in S3 includes the following steps:
s31, training and constructing a model of water quality parameters by using a random forest algorithm in machine learning, inputting randomly sampled data into a decision tree, and voting to obtain final output; the number of decision trees refers to the number of resampling times of constructing decision trees of a forest, namely a self-service method; dividing a sample set of each water quality parameter into a training set, a verification set and a test set according to the ratio of 7:2:1, and finally determining characteristic variables of each water quality parameter and the number of decision trees through testing, screening and parameter adjustment under the principle of comprehensively considering the program operation efficiency and the inversion precision of the test set, wherein the number of the decision trees is shown in the following table:
the invention has at least the following beneficial effects: the method realizes the rapid, accurate and automatic analysis of the high-spatial-resolution remote sensing water quality detection, realizes the long-term effective detection of large-scale remote sensing water quality, has obvious advantages for inland water bodies, particularly small and medium-sized lakes and rivers, and is beneficial to the real-time and rapid monitoring of the large-scale remote sensing water quality and the protection of the water bodies.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a spatial distribution of ground monitoring points for machine learning training as used in the present invention.
Fig. 2 illustrates intermediate parameters of the FUI color index of the present invention.
FIG. 3 is a graph of the accuracy evaluation results of each water quality parameter test sample in the present invention.
FIG. 4 shows a case of monitoring results of total phosphorus on a single day in the invention.
Figure 5 is a case of the single day monitoring of total nitrogen in the present invention.
FIG. 6 shows a case of a single day monitoring of permanganate index in the present invention.
Detailed Description
The invention provides a high-spatial-resolution remote sensing water quality detection method based on machine learning. The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Machine learning algorithms have proven to have powerful feature recognition and learning capabilities, and can use complex networks and structures to capture rich features of input data and obtain explicit relationships to output variables. Therefore, the method can effectively capture the spectral characteristics of different water bodies, comprehensively analyze the potential relation between the spectral characteristics and the water quality parameters, and provide good technical support for large-scale and long-term water quality parameter inversion of satellites.
In some environment monitoring scenes requiring large-scale satellite remote sensing data processing, the performance of the thematic cloud platform is better, various data sets can be called on line, the traditional data downloading and preprocessing work is avoided, and the method is particularly suitable for long-time sequence and large-scale remote sensing application. Therefore, based on urgent requirements for long-term monitoring of large-scale remote sensing water quality and the latest development of cloud computing technology, the invention combines automatic station water quality monitoring data, uses random forest machine learning algorithm to develop long-time sequence monitoring and analysis of a plurality of water quality parameters of water bodies such as lakes, rivers and the like in a research area, and provides a detection method for realizing large-scale and high-spatial-resolution remote sensing water quality monitoring of inland water bodies, in particular small-area water bodies.
Sentinel-2 is a high resolution multispectral imaging satellite, and Sentinel-2 satellite carries a multispectral instrument (MSI) and can cover 13 spectral bands. MSI multispectral imaging load carried by Sentinel-2 satellite has the advantages of high spatial resolution (10 m), short revisiting period (better than 5 d) and rich spectrum (13 spectral bands from visible light, near infrared to short wave infrared), and provides a good data source for large-scale and high spatial resolution remote sensing water quality monitoring.
The high-spatial-resolution remote sensing water quality detection method based on machine learning is shown in combination with fig. 1-6, and the specific detection method comprises the following steps:
s1, obtaining data, namely obtaining the reflectivity data of the bottom layer of the atmosphere to be detected, which is subjected to radiometric calibration and atmospheric correction, through a satellite; using Multispectral Instrument, level-2A product, the L2A data product provided radioscaled and atmospheric corrected atmospheric Bottom reflectance data (Bottom-of-Atmosphere corrected reflectance), a spatial resolution of 10m, and a revisit period of up to 5 days.
S2, data processing, namely carrying out cloud pixel, water bloom, aquatic plant pixel processing and water body pixel extraction on the data in the S1, introducing a plurality of characteristic variables based on the reflectivity of each wave band to carry out a plurality of characteristics such as model training and carrying out time and space matching with ground monitoring points, and forming an effective data sample base.
The data processing comprises the following steps:
s21, adopting a single-band threshold R rs (664 nm) > 0.2 as a cloud mask.
S22, vector extraction is carried out on the water boundary of the area to be detected based on an improved normalized water index, the threshold value is set to be 0, and the water boundary is taken as a characteristic variable to participate in machine learning model construction, wherein the formula of the improved normalized water index is as follows:
MNDWI=(R rs (green)-R rs (swir))/(R rs (green)+R rs (swir)) (1)
wherein R is rs (green)、R rs (swir) is the reflectance of the green and short wave infrared bands, respectively, corresponding to B3 (560 nm) and B11 (1613 nm) of sentinel-2.
S23, filtering water bloom and interference of aquatic plants in a region to be detected based on a normalized vegetation coverage index, wherein a threshold value is set to 0, and the water bloom and the interference of the aquatic plants are taken as characteristic variables to participate in machine learning model construction, and the normalized vegetation coverage index has the formula:
NDVI=(R rs (nir)-R rs (red))/(R rs (nir)+R rs (red)) (2)
wherein R is rs (red)、R rs (nir) is the reflectance of red and near-red bands, respectively, corresponding to B4 (665 nm) and B8 (833 nm) of the sensor-2.
S24, using a floating algae index as one of input features of model training, wherein the formula of the floating algae index is as follows:
FAI=R rs (nir)-R′ rc (nir)
wherein R is rs (red)、R rs (nir)、R rs (swir) is respectively red,The reflectivity of the near-red and short-wave infrared bands corresponds to B4 (664 nm), B8 (833 nm) and B11 (1613 nm) of the sensor-2.
S25, processing the introduced DBWI, WCI and BOI characteristic variables comprises the following steps:
DBWI, WCI and BOI are all indexes set up for identifying water color abnormality, and the calculation formulas corresponding to the sensor-2 are as follows:
DBWI=(R rs (green)-R rs (red))/(R rs (green)+R rs (red))
BOI=(R rs (green)-R rs (red))/(R rs (blue)+R rs (green)+R rs (red)) (4)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
S26, referring to two custom indexes of Index1 and Index2 as input features of model training, wherein the formulas of the Index1 and the Index2 are as follows:
Index1=R rs (green)/R rs (blue)
Index2=R rs (green)/R rs (red) (5)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
S27, processing the feature variable of the introduced FUI color index comprises the following steps:
the CIE chromaticity coordinates (x, y) were then calculated using the CIE 1931-XYZ color space system using the B11 (1613 nm) -B8 (833 nm) -B2 (442 nm) band combinations of the sendiner-2 MSI assigned to B, G and R channels, respectively, and by normalizing X, Y and Z to between 0 and 1, where the conversion from RGB to X, Y, Z can be expressed as:
X=2.7689×R+1.7517×G+1.1302×B
Y=1.0000×R+4.5907×G+0.0601×B
Z=0.0000×R+0.0565×G+5.5934×B (6)
the CIE chromaticity coordinates x, y, z are normalized tristimulus values calculated by equation (6):
finally, the hue angle α representing any pair of coordinates (x ', y') of the upstream radiation spectrum is calculated:
wherein ARCTAN2 is a four-quadrant arctangent function that allows alpha to range from-180 deg. to 180 deg., adding 180 deg. to convert its range to 0 deg. to 360 deg., and an increase in hue angle alpha indicates a color change from blue to red.
S28, screening the data is needed to achieve the purpose of obtaining a high-quality satellite-to-ground synchronous training data set. Since the sentinel-2 transit is in Beijing time 10:30, so in this embodiment, the automatic station selects the monitoring data at or near the time, and uses the same date as the screening condition to obtain the remote sensing feature set matched with the time and combine with the ground monitoring data to form the training data set.
S3, data training and model construction, wherein a sample set of each water quality parameter is divided into a training set, a verification set and a test set according to the ratio of 7:2:1, random forest training and modeling are carried out, parameter adjustment is carried out by taking the program operation efficiency and the test set precision as principles, and finally, the characteristic set and model parameters of each water quality parameter are determined.
Specifically, the sample training and model construction includes the following steps:
the training and model construction of water quality parameters are carried out by using a random forest algorithm in machine learning, random forest regression is an integrated learning method, randomly sampled data are input into a plurality of weak learners (decision trees), and final output is obtained by voting. The learning process of random forest regression is fast, and for large-scale data sets, the random forest regression is an efficient processing algorithm and has stronger robustness to noise in the data sets. In the random forest modeling process, the determination of the feature variable and the number of decision trees, namely the resampling times of the self-help method, are two important factors, wherein the number of decision trees refers to the number of decision trees for constructing a forest.
Dividing a sample set of each water quality parameter into a training set, a verification set and a test set according to the ratio of 7:2:1, and finally determining characteristic variables of each water quality parameter and the number of decision trees by multiple tests, screening and parameter adjustment under the principle of comprehensively considering the program operation efficiency and the inversion precision of the test set, wherein the number of the decision trees is shown in the following table:
in order to verify the accuracy of the method, the water quality detection is performed by the method, and the water quality detection is spatially distributed on ground monitoring points as shown in fig. 1.
The following is a related data comparison.
And 3-6, after the random forest model training of each water quality parameter is completed, the precision evaluation result of each model is carried out by using a test set. The machine learning random forest algorithm has good performance on inversion of water quality parameters, and R in the test result 2 The highest turbidity reaches 0.6036, the total phosphorus is high, R 2 0.5188 followed by total nitrogen, permanganate index, conductivity, R 2 0.3404, 0.2706 and 0.2699 respectively, the blue algae density is relatively low, R 2 0.1994; the lowest MAPE has conductivity of 14.11%, then sequentially has permanganate index, total nitrogen, total phosphorus and turbidity, and the MAPE respectively has 27.31%, 31.20%, 45.34% and 59.32%, and the highest MAPE of blue algae density reaches 118.68%; from the results of RMSE/mean, the conductivity was the lowest, 20.84%, then the permanganate index, total nitrogen, total phosphorus and turbidity, in that order, 28.54%, 35.15%, 35.97%, 55.5, respectively7%, blue algae density is highest and reaches 79.67%. As the ground data of blue algae density is mainly provided by YSI6600 water quality multiparameter of buoy station, the phycocyanin fluorescence method used by the ground data is unstable compared with other parameter detection methods, and the horizontal and vertical migration of phytoplankton is more frequent in a short time, the blue algae density test result is relatively poor, and especially the overestimation phenomenon occurs in a low-value area and the dispersity is higher. In general, the effect of the machine learning random forest algorithm on the inversion of the water quality parameters is good as a whole, and the machine learning random forest algorithm has higher response capability on a low-value area and a high-value area, thereby having the value of business application.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. The high-spatial-resolution remote sensing water quality detection method based on machine learning is characterized by comprising the following steps of:
s1, obtaining data, namely obtaining the reflectivity data of the atmosphere bottom layer of a region to be detected, which is subjected to radiometric calibration and atmosphere correction, through a Sentinel-2 satellite;
s2, data processing, namely performing cloud pixel, bloom and aquatic plant pixel processing on the data in the S1, extracting water pixels, introducing an improved normalized water Index, a normalized vegetation Index, a floating algae Index, a black and odorous water difference Index, an improved black and odorous ratio Index, a water cleaning Index, a FUI color Index, an Index1 and an Index2 wave band combination as characteristic variables based on the reflectivity of each wave band, performing model training, and performing time and space matching with ground monitoring points to form an effective data sample set;
s3, data training and model construction, wherein a sample set of each water quality parameter is divided into a training set, a verification set and a test set according to the ratio of 7:2:1, random forest training and modeling are carried out, parameter adjustment is carried out by taking program operation efficiency and test set precision as principles, and finally, the characteristic set and model parameters of each water quality parameter are determined.
2. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 1, wherein the S2 cloud pixel processing further comprises the steps of:
s21, adopting a single-band threshold R rs (664nm)>0.2 as a cloud mask for cloud interference filtering.
3. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 2, wherein the S2 bloom and aquatic plant pixel processing comprises the steps of:
s22, filtering water bloom and aquatic plant interference in a region to be detected based on a normalized vegetation coverage index, wherein a threshold value is set to 0, and the water bloom and the aquatic plant interference are taken as characteristic variables to participate in machine learning model construction, and the normalized vegetation coverage index has the formula:
NDVI=(R rs (nir)-R rs (red))/(R rs (nir)+R rs (red)) (1)
wherein R is rs (red)、R rs (nir) is the reflectance of red and near-red bands, respectively, corresponding to B4 (665 nm) and B8 (833 nm) of the sensor-2.
4. The method for detecting water quality by high spatial resolution remote sensing based on machine learning according to claim 3, wherein the step of extracting water pixels in S2 comprises the following steps:
s23, vector extraction is carried out on the water boundary of the area to be detected based on an improved normalized water index, the threshold value is set to be 0, and the water boundary is taken as a characteristic variable to participate in machine learning model construction, wherein the formula of the improved normalized water index is as follows:
MNDWI=(R rs (green)-R rs (swir))/(R rs (green)+R rs (swir)) (2)
wherein R is rs (green)、R rs (swir) is the reflectance of the green and short wave infrared bands, respectively, corresponding to B3 (560 nm) and B11 (1613 nm) of sentinel-2.
5. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 4, wherein the processing of the floating algae index characteristic variable introduced in S2 comprises the following steps:
s24, using a floating algae index as one of input features of model training, wherein the formula of the floating algae index is as follows:
FAI=R rs (nir)-R′ rc (nir)
wherein R is rs (red)、R rs (nir)、R rs (swir) is the reflectance of the red, near-red and short-wave infrared bands, respectively, corresponding to B4 (664 nm), B8 (833 nm) and B11 (1613 nm) of sensor-2.
6. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 5, wherein the black and odorous water body difference value index, the water body cleaning index and the improved black and odorous ratio index feature variable processing introduced in S2 comprises the following steps:
s25, a black and odorous water body difference value index, a water body cleaning index and an improved black and odorous ratio index are indexes set up for identifying water color abnormality, and are DBWI, WCI and BOI respectively, and the calculation formulas corresponding to the sensor-2 are as follows:
DBWI=(R rs (gteen)-R rs (red))/(R rs (green)+R rs (red))
BOI=(R rs (green)-R rs (red))/(R rs (blue)+R rs (green)+R rs (red))(4)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
7. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 6, wherein the Index1 and Index2 feature variable processing introduced in S2 comprises the following steps:
s26, referring to two custom indexes of Index1 and Index2 as input features of model training, wherein the formulas of the Index1 and the Index2 are as follows:
Index1=R rs (green)/R rs (blue)
Index2=R rs (green)/R rs (red) (5)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
8. The machine learning-based high spatial resolution remote sensing water quality detection method of claim 7, wherein the FUI color index feature variable processing introduced in S2 comprises the steps of:
s27, using the CIE 1931-XYZ color space system, the B11 (1613 nm) -B8 (833 nm) -B2 (442 nm) band combinations using the sentinel-2MSI are assigned to B, G and R channels, respectively, and then CIE chromaticity coordinates (x, y) are calculated by normalizing X, Y and Z to between 0 and 1, wherein the conversion from RGB to X, Y, Z can be expressed as:
X=2.7689×R+1.7517×G+1.1302×B
Y=1.0000×R+4.5907×G+0.0601×B
Z=0.0000×R+0.0565×G+5.5934×B (6)
the CIE chromaticity coordinates x, y, z are normalized tristimulus values calculated by equation (6):
finally, the hue angle α representing any pair of coordinates (x ', y') of the upstream radiation spectrum is calculated:
wherein ARCTAN2 is a four-quadrant arctangent function that allows alpha to range from-180 deg. to 180 deg., adding 180 deg. to convert its range to 0 deg. to 360 deg., and an increase in hue angle alpha indicates a color change from blue to red.
9. The machine learning-based high-spatial resolution remote sensing water quality detection method as set forth in claim 8, wherein the time and space matching processing of each remote sensing feature in S2 with the ground monitoring point comprises the following steps:
and S28, screening data to obtain a high-quality satellite-ground synchronous training data set, wherein the automatic station selects Beijing time when the sensor-2 passes the border or monitoring data similar to the Beijing time, and combines the remote sensing characteristic set with ground monitoring data, which are matched in time, by taking the same date as a screening condition to form the training data set.
10. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 9, wherein the data training and model construction in S3 comprises the steps of:
s31, training and constructing a model of water quality parameters by using a random forest algorithm in machine learning, inputting randomly sampled data into a decision tree, and voting to obtain final output;
the number of decision trees refers to the number of resampling times of constructing decision trees of a forest, namely a self-service method; dividing a sample set of each water quality parameter into a training set, a verification set and a test set according to the ratio of 7:2:1, and finally determining characteristic variables of each water quality parameter and the number of decision trees through testing, screening and parameter adjustment under the principle of comprehensively considering the program operation efficiency and the inversion precision of the test set, wherein the number of the decision trees is shown in the following table:
CN202310885732.2A 2023-07-19 2023-07-19 High-spatial-resolution remote sensing water quality detection method based on machine learning Pending CN117111092A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310885732.2A CN117111092A (en) 2023-07-19 2023-07-19 High-spatial-resolution remote sensing water quality detection method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310885732.2A CN117111092A (en) 2023-07-19 2023-07-19 High-spatial-resolution remote sensing water quality detection method based on machine learning

Publications (1)

Publication Number Publication Date
CN117111092A true CN117111092A (en) 2023-11-24

Family

ID=88810034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310885732.2A Pending CN117111092A (en) 2023-07-19 2023-07-19 High-spatial-resolution remote sensing water quality detection method based on machine learning

Country Status (1)

Country Link
CN (1) CN117111092A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117373024A (en) * 2023-12-07 2024-01-09 潍坊市海洋发展研究院 Method, device, electronic equipment and computer readable medium for generating annotation image
CN117541951A (en) * 2024-01-10 2024-02-09 深圳块织类脑智能科技有限公司 Polluted water body detection method based on unmanned aerial vehicle multispectral remote sensing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117373024A (en) * 2023-12-07 2024-01-09 潍坊市海洋发展研究院 Method, device, electronic equipment and computer readable medium for generating annotation image
CN117373024B (en) * 2023-12-07 2024-03-08 潍坊市海洋发展研究院 Method, device, electronic equipment and computer readable medium for generating annotation image
CN117541951A (en) * 2024-01-10 2024-02-09 深圳块织类脑智能科技有限公司 Polluted water body detection method based on unmanned aerial vehicle multispectral remote sensing

Similar Documents

Publication Publication Date Title
CN117111092A (en) High-spatial-resolution remote sensing water quality detection method based on machine learning
CN112051222A (en) River and lake water quality monitoring method based on high-resolution satellite image
Heenkenda et al. Quantifying mangrove chlorophyll from high spatial resolution imagery
CN111368817B (en) Method and system for quantitatively evaluating thermal effect based on earth surface type
Roelfsema et al. Spatial distribution of benthic microalgae on coral reefs determined by remote sensing
Chen et al. Aboveground biomass of salt-marsh vegetation in coastal wetlands: Sample expansion of in situ hyperspectral and Sentinel-2 data using a generative adversarial network
Concepcion et al. Estimation of photosynthetic growth signature at the canopy scale using new genetic algorithm-modified visible band triangular greenness index
CN115481368B (en) Vegetation coverage estimation method based on full remote sensing machine learning
CN110987955A (en) Urban black and odorous water body grading method based on decision tree
CN111178169A (en) Urban surface covering fine classification method and device based on remote sensing image
CN112733596A (en) Forest resource change monitoring method based on medium and high spatial resolution remote sensing image fusion and application
CN114781537B (en) Sea entry and drainage port suspected pollution discharge identification method based on high-resolution satellite image
CN114219847B (en) Method and system for determining crop planting area based on phenological characteristics and storage medium
CN109300133B (en) Urban river network area water body extraction method
CN112329790B (en) Quick extraction method for urban impervious surface information
CN116482317B (en) Lake water nutrition state real-time monitoring method, system, equipment and medium
CN117115077A (en) Lake cyanobacteria bloom detection method
Liu et al. Trophic state assessment of optically diverse lakes using Sentinel-3-derived trophic level index
CN114612794B (en) Remote sensing identification method for ground cover and planting structure of finely divided agricultural area
CN114813651A (en) Remote sensing water quality inversion method combining difference learning rate and spectrum geometric characteristics
CN116148188A (en) Air-space-ground integrated lake water quality tracing method, system, equipment and storage medium
CN114199880A (en) Citrus disease and insect pest real-time detection method based on edge calculation
CN117372710A (en) Forest gap extraction method based on Sentinel-2MSI remote sensing image
CN117035066A (en) Ground surface temperature downscaling method coupling geographic weighting and random forest
CN116721385A (en) Machine learning-based RGB camera data cyanobacteria bloom monitoring method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination