CN117111092A - High-spatial-resolution remote sensing water quality detection method based on machine learning - Google Patents
High-spatial-resolution remote sensing water quality detection method based on machine learning Download PDFInfo
- Publication number
- CN117111092A CN117111092A CN202310885732.2A CN202310885732A CN117111092A CN 117111092 A CN117111092 A CN 117111092A CN 202310885732 A CN202310885732 A CN 202310885732A CN 117111092 A CN117111092 A CN 117111092A
- Authority
- CN
- China
- Prior art keywords
- water quality
- water
- index
- remote sensing
- red
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 title claims abstract description 147
- 238000010801 machine learning Methods 0.000 title claims abstract description 37
- 238000001514 detection method Methods 0.000 title claims abstract description 26
- 238000012544 monitoring process Methods 0.000 claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 26
- 241000195493 Cryptophyta Species 0.000 claims abstract description 18
- 238000007637 random forest analysis Methods 0.000 claims abstract description 16
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 11
- 238000007667 floating Methods 0.000 claims abstract description 11
- 238000004140 cleaning Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 35
- 238000012360 testing method Methods 0.000 claims description 19
- 238000000034 method Methods 0.000 claims description 18
- 238000003066 decision tree Methods 0.000 claims description 16
- 238000010276 construction Methods 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 9
- 241000192710 Microcystis aeruginosa Species 0.000 claims description 7
- 238000002310 reflectometry Methods 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 230000005855 radiation Effects 0.000 claims description 4
- 238000001228 spectrum Methods 0.000 claims description 4
- 238000012952 Resampling Methods 0.000 claims description 3
- 230000005856 abnormality Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000011144 upstream manufacturing Methods 0.000 claims description 3
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 abstract description 16
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 abstract description 8
- 229910052757 nitrogen Inorganic materials 0.000 abstract description 8
- 229910052698 phosphorus Inorganic materials 0.000 abstract description 8
- 239000011574 phosphorus Substances 0.000 abstract description 8
- 230000007774 longterm Effects 0.000 abstract description 7
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 238000011160 research Methods 0.000 abstract description 4
- 230000003595 spectral effect Effects 0.000 description 8
- 238000003911 water pollution Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 description 3
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000000701 chemical imaging Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 108010053210 Phycocyanin Proteins 0.000 description 1
- 229930002868 chlorophyll a Natural products 0.000 description 1
- ATNHDLDRLWWWCB-AENOIHSZSA-M chlorophyll a Chemical compound C1([C@@H](C(=O)OC)C(=O)C2=C3C)=C2N2C3=CC(C(CC)=C3C)=[N+]4C3=CC3=C(C=C)C(C)=C5N3[Mg-2]42[N+]2=C1[C@@H](CCC(=O)OC\C=C(/C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)[C@H](C)C2=C5 ATNHDLDRLWWWCB-AENOIHSZSA-M 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002795 fluorescence method Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/55—Specular reflectivity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Pathology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Immunology (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Radar, Positioning & Navigation (AREA)
- Electromagnetism (AREA)
- Computer Networks & Wireless Communication (AREA)
- Medical Informatics (AREA)
- Remote Sensing (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention relates to a high spatial resolution remote sensing water quality detection method based on machine learning, which is characterized in that satellite data water pixels of cloudless, water bloom-free and water-free plants are screened, a plurality of characteristics such as a floating algae index, a water body cleaning index, an FUI color index and the like are introduced by combining automatic station water quality monitoring data, and a random forest machine learning algorithm is used for carrying out remote sensing inversion including water quality parameters such as total phosphorus, total nitrogen, permanganate index, turbidity, conductivity, blue algae density and the like on water bodies such as lakes and rivers in a research area. The processing result is displayed in a chart mode of single-day remote sensing water quality monitoring, long-time sequence statistics and month-by-month monitoring, so that the remote sensing water quality monitoring is rapidly, accurately and automatically analyzed. The invention realizes the long-term effective monitoring of the remote sensing multiple water quality parameters, has obvious advantages for inland water bodies, especially small and medium-sized lakes and rivers, and is beneficial to the real-time and rapid monitoring of the remote sensing water quality and the protection of the water bodies of people with larger scale.
Description
Technical Field
The invention belongs to the technical field of water detection, and particularly relates to a high-spatial-resolution remote sensing water quality detection method based on machine learning.
Background
Water crisis is fifth ranked in the ten major risks worldwide, mainly including water pollution and water resource shortage. The water pollution is a complex problem, so that the economic development and the establishment of economic policies are severely restricted, and the physical health of people is threatened. The water quality monitoring is an important basis for water quality evaluation and water pollution control, and along with the increasing serious problem of water pollution, the requirements for dynamic monitoring and evaluation of water pollution are also urgent, and the adoption of accurate, rapid and low-cost water quality monitoring means is particularly important.
The traditional water quality monitoring is to obtain accurate water quality parameters in situ by arranging sampling points, but the analysis process is complex, the timeliness and the frequency of data can not meet the requirements of management and decision, and especially the support for sudden and large-scale water environment events and water quality pollution traceability investigation is very limited. The automatic water quality monitoring technology can monitor the target water body in real time all the weather, can effectively reflect the current situation of water quality and the change condition of the water quality, but has high manufacturing cost, and has extremely high economic cost for a large-range water area needing to be provided with a plurality of monitoring points.
The satellite remote sensing has the characteristics of large scale, high frequency and low cost, is very suitable for water quality monitoring, is more practical and economical compared with ground monitoring, can effectively overcome the defects of traditional and automatic monitoring, and can be easily integrated into geographic information.
The remote sensing water quality parameter inversion research mainly comprises water color water quality parameters such as suspended matters, yellow substances, chlorophyll a and the like, and non-water color water quality parameters such as total phosphorus, total nitrogen, conductivity, permanganate index and the like. The key of remote sensing water quality inversion is to analyze the relation between the water-leaving radiation received by the satellite sensor and the water quality parameter concentration. Generally, water quality parameter inversion based on remote sensing is mainly classified into four types, namely, a biological optical model, an empirical model, a semi-empirical semi-analytical model, and a machine learning model. Since the application of water quality remote sensing inversion, common water quality parameter inversion researches mainly aim at water color water quality parameters with an optical mechanism and obvious spectral response, and non-water color water quality parameters such as nitrogen, phosphorus and the like have no obvious spectral characteristics, so that the conventional remote sensing inversion method is limited in precision and difficult to have universality.
In recent years, machine learning algorithms have proven to have powerful feature recognition and learning capabilities, and complex networks and structures can be used to capture rich features of input data and obtain explicit relationships to output variables. Therefore, the method can effectively capture the spectral characteristics of different water bodies, comprehensively analyze the potential relation between the spectral characteristics and the water quality parameters, and provide good technical support for large-scale and long-term water quality parameter inversion of satellites.
In addition, the traditional remote sensing means have the defects of large data downloading amount, long processing time, high storage requirement and the like, and the defects also prevent the remote sensing monitoring of the water quality parameters in a long time sequence from being further popularized. At present, there is an urgent need for long-term monitoring of large-scale remote sensing water quality, and how to realize large-scale long-term water quality monitoring of inland water bodies, especially small-area water bodies, becomes a technical problem to be solved at present.
Disclosure of Invention
The invention aims to provide a high-spatial-resolution remote sensing water quality detection method based on machine learning, which can perform rapid remote sensing inversion and monitoring on water quality parameters including total phosphorus, total nitrogen, permanganate index, turbidity, conductivity, blue algae density and the like of lakes, rivers and the like to realize rapid and automatic analysis of the high-spatial-resolution remote sensing water quality inversion and monitoring,
the method solves the defects of poor universality, large data processing capacity and long flow period of remote sensing water quality inversion in the prior art, can be used for particularly rapidly monitoring water quality in large-scale and long-time sequences, and is more beneficial to remote sensing water quality detection of water bodies with smaller areas.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a high spatial resolution remote sensing water quality detection method based on machine learning comprises the following steps:
s1, obtaining data, namely obtaining the reflectivity data of the atmosphere bottom layer of a region to be detected, which is subjected to radiometric calibration and atmosphere correction, through a Sentinel-2 satellite;
s2, data processing, namely performing cloud pixel, bloom and aquatic plant pixel processing on the data in the S1, extracting water pixels, introducing an improved normalized water Index, a normalized vegetation Index, a floating algae Index, a black and odorous water difference Index, an improved black and odorous ratio Index, a water cleaning Index, a FUI color Index, an Index1 and an Index2 wave band combination as characteristic variables based on the reflectivity of each wave band, performing model training, and performing time and space matching with ground monitoring points to form an effective data sample set;
s3, data training and model construction, wherein a sample set of each water quality parameter is divided into a training set, a verification set and a test set according to the ratio of 7:2:1, random forest training and modeling are carried out, parameter adjustment is carried out by taking program operation efficiency and test set precision as principles, and finally, the characteristic set and model parameters of each water quality parameter are determined.
Preferably, the cloud pixel processing in S2 further includes the following steps:
s21, adopting a single-band threshold R rs (664nm)>0.2 as a cloud mask for cloud interference filtering.
Preferably, the S2 bloom and aquatic plant pixel treatment comprises the steps of:
s22, filtering water bloom and aquatic plant interference in a region to be detected based on a normalized vegetation coverage index, wherein a threshold value is set to 0, and the water bloom and the aquatic plant interference are taken as characteristic variables to participate in machine learning model construction, and the normalized vegetation coverage index has the formula:
NDVI=(R rs (nir)-R rs (red))/(R rs (nir)+R rs (red)) (1)
wherein R is rs (red)、R rs (nir) is the reflectance of red and near-red bands, respectively, corresponding to B4 (665 nm) and B8 (833 nm) of the sensor-2.
Preferably, the extracting the water body pixels in the S2 includes the following steps:
s23, vector extraction is carried out on the water boundary of the area to be detected based on an improved normalized water index, the threshold value is set to be 0, and the water boundary is taken as a characteristic variable to participate in machine learning model construction, wherein the formula of the improved normalized water index is as follows:
MNDWI=(R rs (green)-R rs (swir))/(R rs (green)+R rs (swir)) (2)
wherein R is rs (green)、R rs (swir) is the reflectance of the green and short wave infrared bands, respectively, corresponding to B3 (560 nm) and B11 (1613 nm) of sentinel-2. 5. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 4, wherein the processing of the floating algae index characteristic variable introduced in S2 comprises the following steps:
s24, using a floating algae index as one of input features of model training, wherein the formula of the floating algae index is as follows:
FAI=R rs (nir)-R′ rc (nir)
wherein R is rs (red)、R rs (nir)、R rs (swir) is the reflectance of the red, near-red and short-wave infrared bands, respectively, corresponding to B4 (664 nm), B8 (833 nm) and B11 (1613 nm) of sensor-2.
Preferably, the processing of the black and odorous water body difference value index, the water body cleaning index and the improved black and odorous ratio index characteristic variable introduced in the step S2 comprises the following steps:
s25, a black and odorous water body difference value index, a water body cleaning index and an improved black and odorous ratio index are indexes set up for identifying water color abnormality, and are DBWI, WCI and BOI respectively, and the calculation formulas corresponding to the sensor-2 are as follows:
DBWI=(R rs (green)-R rs (red))/(R rs (green)+R rs (red))
BOI=(R rs (green)-R rs (red))/(R rs (blue)+R rs (green)+R rs (red)) (4)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
Preferably, the Index1 and Index2 characteristic variable processes introduced in S2 include the following steps:
s26, referring to two custom indexes of Index1 and Index2 as input features of model training, wherein the formulas of the Index1 and the Index2 are as follows:
Index1=R rs (green)/R rs (blue)
Index2=R rs (green)/R rs (red) (5)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
Preferably, the processing of the FUI color index feature variable introduced in S2 includes the following steps:
s27, using the CIE 1931-XYZ color space system, the B11 (1613 nm) -B8 (833 nm) -B2 (442 nm) band combinations using the sentinel-2MSI are assigned to B, G and R channels, respectively, and then CIE chromaticity coordinates (x, y) are calculated by normalizing X, Y and Z to between 0 and 1, wherein the conversion from RGB to X, Y, Z can be expressed as:
X=2.7689×R+1.7517×G+1.1302×B
Y=1.0000×R+4.5907×G+0.06001×B
Z=0.0000×R+0.0565×G+5.5934×B (6)
the CIE chromaticity coordinates x, y, z are normalized tristimulus values calculated by equation (6):
finally, the hue angle α representing any pair of coordinates (x ', y') of the upstream radiation spectrum is calculated:
wherein ARCTAN2 is a four-quadrant arctangent function that allows alpha to range from-180 deg. to 180 deg., adding 180 deg. to convert its range to 0 deg. to 360 deg., and an increase in hue angle alpha indicates a color change from blue to red.
Preferably, the time and space matching processing between each remote sensing feature in the S2 and the ground monitoring point includes the following steps:
and S28, screening data to obtain a high-quality satellite-ground synchronous training data set, wherein the automatic station selects Beijing time when the sensor-2 passes the border or monitoring data similar to the Beijing time, and combines the remote sensing characteristic set with ground monitoring data, which are matched in time, by taking the same date as a screening condition to form the training data set.
Preferably, the data training and model construction in S3 includes the following steps:
s31, training and constructing a model of water quality parameters by using a random forest algorithm in machine learning, inputting randomly sampled data into a decision tree, and voting to obtain final output; the number of decision trees refers to the number of resampling times of constructing decision trees of a forest, namely a self-service method; dividing a sample set of each water quality parameter into a training set, a verification set and a test set according to the ratio of 7:2:1, and finally determining characteristic variables of each water quality parameter and the number of decision trees through testing, screening and parameter adjustment under the principle of comprehensively considering the program operation efficiency and the inversion precision of the test set, wherein the number of the decision trees is shown in the following table:
the invention has at least the following beneficial effects: the method realizes the rapid, accurate and automatic analysis of the high-spatial-resolution remote sensing water quality detection, realizes the long-term effective detection of large-scale remote sensing water quality, has obvious advantages for inland water bodies, particularly small and medium-sized lakes and rivers, and is beneficial to the real-time and rapid monitoring of the large-scale remote sensing water quality and the protection of the water bodies.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a spatial distribution of ground monitoring points for machine learning training as used in the present invention.
Fig. 2 illustrates intermediate parameters of the FUI color index of the present invention.
FIG. 3 is a graph of the accuracy evaluation results of each water quality parameter test sample in the present invention.
FIG. 4 shows a case of monitoring results of total phosphorus on a single day in the invention.
Figure 5 is a case of the single day monitoring of total nitrogen in the present invention.
FIG. 6 shows a case of a single day monitoring of permanganate index in the present invention.
Detailed Description
The invention provides a high-spatial-resolution remote sensing water quality detection method based on machine learning. The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Machine learning algorithms have proven to have powerful feature recognition and learning capabilities, and can use complex networks and structures to capture rich features of input data and obtain explicit relationships to output variables. Therefore, the method can effectively capture the spectral characteristics of different water bodies, comprehensively analyze the potential relation between the spectral characteristics and the water quality parameters, and provide good technical support for large-scale and long-term water quality parameter inversion of satellites.
In some environment monitoring scenes requiring large-scale satellite remote sensing data processing, the performance of the thematic cloud platform is better, various data sets can be called on line, the traditional data downloading and preprocessing work is avoided, and the method is particularly suitable for long-time sequence and large-scale remote sensing application. Therefore, based on urgent requirements for long-term monitoring of large-scale remote sensing water quality and the latest development of cloud computing technology, the invention combines automatic station water quality monitoring data, uses random forest machine learning algorithm to develop long-time sequence monitoring and analysis of a plurality of water quality parameters of water bodies such as lakes, rivers and the like in a research area, and provides a detection method for realizing large-scale and high-spatial-resolution remote sensing water quality monitoring of inland water bodies, in particular small-area water bodies.
Sentinel-2 is a high resolution multispectral imaging satellite, and Sentinel-2 satellite carries a multispectral instrument (MSI) and can cover 13 spectral bands. MSI multispectral imaging load carried by Sentinel-2 satellite has the advantages of high spatial resolution (10 m), short revisiting period (better than 5 d) and rich spectrum (13 spectral bands from visible light, near infrared to short wave infrared), and provides a good data source for large-scale and high spatial resolution remote sensing water quality monitoring.
The high-spatial-resolution remote sensing water quality detection method based on machine learning is shown in combination with fig. 1-6, and the specific detection method comprises the following steps:
s1, obtaining data, namely obtaining the reflectivity data of the bottom layer of the atmosphere to be detected, which is subjected to radiometric calibration and atmospheric correction, through a satellite; using Multispectral Instrument, level-2A product, the L2A data product provided radioscaled and atmospheric corrected atmospheric Bottom reflectance data (Bottom-of-Atmosphere corrected reflectance), a spatial resolution of 10m, and a revisit period of up to 5 days.
S2, data processing, namely carrying out cloud pixel, water bloom, aquatic plant pixel processing and water body pixel extraction on the data in the S1, introducing a plurality of characteristic variables based on the reflectivity of each wave band to carry out a plurality of characteristics such as model training and carrying out time and space matching with ground monitoring points, and forming an effective data sample base.
The data processing comprises the following steps:
s21, adopting a single-band threshold R rs (664 nm) > 0.2 as a cloud mask.
S22, vector extraction is carried out on the water boundary of the area to be detected based on an improved normalized water index, the threshold value is set to be 0, and the water boundary is taken as a characteristic variable to participate in machine learning model construction, wherein the formula of the improved normalized water index is as follows:
MNDWI=(R rs (green)-R rs (swir))/(R rs (green)+R rs (swir)) (1)
wherein R is rs (green)、R rs (swir) is the reflectance of the green and short wave infrared bands, respectively, corresponding to B3 (560 nm) and B11 (1613 nm) of sentinel-2.
S23, filtering water bloom and interference of aquatic plants in a region to be detected based on a normalized vegetation coverage index, wherein a threshold value is set to 0, and the water bloom and the interference of the aquatic plants are taken as characteristic variables to participate in machine learning model construction, and the normalized vegetation coverage index has the formula:
NDVI=(R rs (nir)-R rs (red))/(R rs (nir)+R rs (red)) (2)
wherein R is rs (red)、R rs (nir) is the reflectance of red and near-red bands, respectively, corresponding to B4 (665 nm) and B8 (833 nm) of the sensor-2.
S24, using a floating algae index as one of input features of model training, wherein the formula of the floating algae index is as follows:
FAI=R rs (nir)-R′ rc (nir)
wherein R is rs (red)、R rs (nir)、R rs (swir) is respectively red,The reflectivity of the near-red and short-wave infrared bands corresponds to B4 (664 nm), B8 (833 nm) and B11 (1613 nm) of the sensor-2.
S25, processing the introduced DBWI, WCI and BOI characteristic variables comprises the following steps:
DBWI, WCI and BOI are all indexes set up for identifying water color abnormality, and the calculation formulas corresponding to the sensor-2 are as follows:
DBWI=(R rs (green)-R rs (red))/(R rs (green)+R rs (red))
BOI=(R rs (green)-R rs (red))/(R rs (blue)+R rs (green)+R rs (red)) (4)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
S26, referring to two custom indexes of Index1 and Index2 as input features of model training, wherein the formulas of the Index1 and the Index2 are as follows:
Index1=R rs (green)/R rs (blue)
Index2=R rs (green)/R rs (red) (5)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
S27, processing the feature variable of the introduced FUI color index comprises the following steps:
the CIE chromaticity coordinates (x, y) were then calculated using the CIE 1931-XYZ color space system using the B11 (1613 nm) -B8 (833 nm) -B2 (442 nm) band combinations of the sendiner-2 MSI assigned to B, G and R channels, respectively, and by normalizing X, Y and Z to between 0 and 1, where the conversion from RGB to X, Y, Z can be expressed as:
X=2.7689×R+1.7517×G+1.1302×B
Y=1.0000×R+4.5907×G+0.0601×B
Z=0.0000×R+0.0565×G+5.5934×B (6)
the CIE chromaticity coordinates x, y, z are normalized tristimulus values calculated by equation (6):
finally, the hue angle α representing any pair of coordinates (x ', y') of the upstream radiation spectrum is calculated:
wherein ARCTAN2 is a four-quadrant arctangent function that allows alpha to range from-180 deg. to 180 deg., adding 180 deg. to convert its range to 0 deg. to 360 deg., and an increase in hue angle alpha indicates a color change from blue to red.
S28, screening the data is needed to achieve the purpose of obtaining a high-quality satellite-to-ground synchronous training data set. Since the sentinel-2 transit is in Beijing time 10:30, so in this embodiment, the automatic station selects the monitoring data at or near the time, and uses the same date as the screening condition to obtain the remote sensing feature set matched with the time and combine with the ground monitoring data to form the training data set.
S3, data training and model construction, wherein a sample set of each water quality parameter is divided into a training set, a verification set and a test set according to the ratio of 7:2:1, random forest training and modeling are carried out, parameter adjustment is carried out by taking the program operation efficiency and the test set precision as principles, and finally, the characteristic set and model parameters of each water quality parameter are determined.
Specifically, the sample training and model construction includes the following steps:
the training and model construction of water quality parameters are carried out by using a random forest algorithm in machine learning, random forest regression is an integrated learning method, randomly sampled data are input into a plurality of weak learners (decision trees), and final output is obtained by voting. The learning process of random forest regression is fast, and for large-scale data sets, the random forest regression is an efficient processing algorithm and has stronger robustness to noise in the data sets. In the random forest modeling process, the determination of the feature variable and the number of decision trees, namely the resampling times of the self-help method, are two important factors, wherein the number of decision trees refers to the number of decision trees for constructing a forest.
Dividing a sample set of each water quality parameter into a training set, a verification set and a test set according to the ratio of 7:2:1, and finally determining characteristic variables of each water quality parameter and the number of decision trees by multiple tests, screening and parameter adjustment under the principle of comprehensively considering the program operation efficiency and the inversion precision of the test set, wherein the number of the decision trees is shown in the following table:
in order to verify the accuracy of the method, the water quality detection is performed by the method, and the water quality detection is spatially distributed on ground monitoring points as shown in fig. 1.
The following is a related data comparison.
And 3-6, after the random forest model training of each water quality parameter is completed, the precision evaluation result of each model is carried out by using a test set. The machine learning random forest algorithm has good performance on inversion of water quality parameters, and R in the test result 2 The highest turbidity reaches 0.6036, the total phosphorus is high, R 2 0.5188 followed by total nitrogen, permanganate index, conductivity, R 2 0.3404, 0.2706 and 0.2699 respectively, the blue algae density is relatively low, R 2 0.1994; the lowest MAPE has conductivity of 14.11%, then sequentially has permanganate index, total nitrogen, total phosphorus and turbidity, and the MAPE respectively has 27.31%, 31.20%, 45.34% and 59.32%, and the highest MAPE of blue algae density reaches 118.68%; from the results of RMSE/mean, the conductivity was the lowest, 20.84%, then the permanganate index, total nitrogen, total phosphorus and turbidity, in that order, 28.54%, 35.15%, 35.97%, 55.5, respectively7%, blue algae density is highest and reaches 79.67%. As the ground data of blue algae density is mainly provided by YSI6600 water quality multiparameter of buoy station, the phycocyanin fluorescence method used by the ground data is unstable compared with other parameter detection methods, and the horizontal and vertical migration of phytoplankton is more frequent in a short time, the blue algae density test result is relatively poor, and especially the overestimation phenomenon occurs in a low-value area and the dispersity is higher. In general, the effect of the machine learning random forest algorithm on the inversion of the water quality parameters is good as a whole, and the machine learning random forest algorithm has higher response capability on a low-value area and a high-value area, thereby having the value of business application.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (10)
1. The high-spatial-resolution remote sensing water quality detection method based on machine learning is characterized by comprising the following steps of:
s1, obtaining data, namely obtaining the reflectivity data of the atmosphere bottom layer of a region to be detected, which is subjected to radiometric calibration and atmosphere correction, through a Sentinel-2 satellite;
s2, data processing, namely performing cloud pixel, bloom and aquatic plant pixel processing on the data in the S1, extracting water pixels, introducing an improved normalized water Index, a normalized vegetation Index, a floating algae Index, a black and odorous water difference Index, an improved black and odorous ratio Index, a water cleaning Index, a FUI color Index, an Index1 and an Index2 wave band combination as characteristic variables based on the reflectivity of each wave band, performing model training, and performing time and space matching with ground monitoring points to form an effective data sample set;
s3, data training and model construction, wherein a sample set of each water quality parameter is divided into a training set, a verification set and a test set according to the ratio of 7:2:1, random forest training and modeling are carried out, parameter adjustment is carried out by taking program operation efficiency and test set precision as principles, and finally, the characteristic set and model parameters of each water quality parameter are determined.
2. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 1, wherein the S2 cloud pixel processing further comprises the steps of:
s21, adopting a single-band threshold R rs (664nm)>0.2 as a cloud mask for cloud interference filtering.
3. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 2, wherein the S2 bloom and aquatic plant pixel processing comprises the steps of:
s22, filtering water bloom and aquatic plant interference in a region to be detected based on a normalized vegetation coverage index, wherein a threshold value is set to 0, and the water bloom and the aquatic plant interference are taken as characteristic variables to participate in machine learning model construction, and the normalized vegetation coverage index has the formula:
NDVI=(R rs (nir)-R rs (red))/(R rs (nir)+R rs (red)) (1)
wherein R is rs (red)、R rs (nir) is the reflectance of red and near-red bands, respectively, corresponding to B4 (665 nm) and B8 (833 nm) of the sensor-2.
4. The method for detecting water quality by high spatial resolution remote sensing based on machine learning according to claim 3, wherein the step of extracting water pixels in S2 comprises the following steps:
s23, vector extraction is carried out on the water boundary of the area to be detected based on an improved normalized water index, the threshold value is set to be 0, and the water boundary is taken as a characteristic variable to participate in machine learning model construction, wherein the formula of the improved normalized water index is as follows:
MNDWI=(R rs (green)-R rs (swir))/(R rs (green)+R rs (swir)) (2)
wherein R is rs (green)、R rs (swir) is the reflectance of the green and short wave infrared bands, respectively, corresponding to B3 (560 nm) and B11 (1613 nm) of sentinel-2.
5. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 4, wherein the processing of the floating algae index characteristic variable introduced in S2 comprises the following steps:
s24, using a floating algae index as one of input features of model training, wherein the formula of the floating algae index is as follows:
FAI=R rs (nir)-R′ rc (nir)
wherein R is rs (red)、R rs (nir)、R rs (swir) is the reflectance of the red, near-red and short-wave infrared bands, respectively, corresponding to B4 (664 nm), B8 (833 nm) and B11 (1613 nm) of sensor-2.
6. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 5, wherein the black and odorous water body difference value index, the water body cleaning index and the improved black and odorous ratio index feature variable processing introduced in S2 comprises the following steps:
s25, a black and odorous water body difference value index, a water body cleaning index and an improved black and odorous ratio index are indexes set up for identifying water color abnormality, and are DBWI, WCI and BOI respectively, and the calculation formulas corresponding to the sensor-2 are as follows:
DBWI=(R rs (gteen)-R rs (red))/(R rs (green)+R rs (red))
BOI=(R rs (green)-R rs (red))/(R rs (blue)+R rs (green)+R rs (red))(4)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
7. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 6, wherein the Index1 and Index2 feature variable processing introduced in S2 comprises the following steps:
s26, referring to two custom indexes of Index1 and Index2 as input features of model training, wherein the formulas of the Index1 and the Index2 are as follows:
Index1=R rs (green)/R rs (blue)
Index2=R rs (green)/R rs (red) (5)
wherein R is rs (blue)、R rs (green) and R rs (red) is the reflectance of the blue, green, red bands, respectively, corresponding to B2 (492.1 nm), B3 (559 nm), and B4 (665 nm) of the sensor-2.
8. The machine learning-based high spatial resolution remote sensing water quality detection method of claim 7, wherein the FUI color index feature variable processing introduced in S2 comprises the steps of:
s27, using the CIE 1931-XYZ color space system, the B11 (1613 nm) -B8 (833 nm) -B2 (442 nm) band combinations using the sentinel-2MSI are assigned to B, G and R channels, respectively, and then CIE chromaticity coordinates (x, y) are calculated by normalizing X, Y and Z to between 0 and 1, wherein the conversion from RGB to X, Y, Z can be expressed as:
X=2.7689×R+1.7517×G+1.1302×B
Y=1.0000×R+4.5907×G+0.0601×B
Z=0.0000×R+0.0565×G+5.5934×B (6)
the CIE chromaticity coordinates x, y, z are normalized tristimulus values calculated by equation (6):
finally, the hue angle α representing any pair of coordinates (x ', y') of the upstream radiation spectrum is calculated:
wherein ARCTAN2 is a four-quadrant arctangent function that allows alpha to range from-180 deg. to 180 deg., adding 180 deg. to convert its range to 0 deg. to 360 deg., and an increase in hue angle alpha indicates a color change from blue to red.
9. The machine learning-based high-spatial resolution remote sensing water quality detection method as set forth in claim 8, wherein the time and space matching processing of each remote sensing feature in S2 with the ground monitoring point comprises the following steps:
and S28, screening data to obtain a high-quality satellite-ground synchronous training data set, wherein the automatic station selects Beijing time when the sensor-2 passes the border or monitoring data similar to the Beijing time, and combines the remote sensing characteristic set with ground monitoring data, which are matched in time, by taking the same date as a screening condition to form the training data set.
10. The machine learning-based high spatial resolution remote sensing water quality detection method according to claim 9, wherein the data training and model construction in S3 comprises the steps of:
s31, training and constructing a model of water quality parameters by using a random forest algorithm in machine learning, inputting randomly sampled data into a decision tree, and voting to obtain final output;
the number of decision trees refers to the number of resampling times of constructing decision trees of a forest, namely a self-service method; dividing a sample set of each water quality parameter into a training set, a verification set and a test set according to the ratio of 7:2:1, and finally determining characteristic variables of each water quality parameter and the number of decision trees through testing, screening and parameter adjustment under the principle of comprehensively considering the program operation efficiency and the inversion precision of the test set, wherein the number of the decision trees is shown in the following table:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310885732.2A CN117111092A (en) | 2023-07-19 | 2023-07-19 | High-spatial-resolution remote sensing water quality detection method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310885732.2A CN117111092A (en) | 2023-07-19 | 2023-07-19 | High-spatial-resolution remote sensing water quality detection method based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117111092A true CN117111092A (en) | 2023-11-24 |
Family
ID=88810034
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310885732.2A Pending CN117111092A (en) | 2023-07-19 | 2023-07-19 | High-spatial-resolution remote sensing water quality detection method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117111092A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117373024A (en) * | 2023-12-07 | 2024-01-09 | 潍坊市海洋发展研究院 | Method, device, electronic equipment and computer readable medium for generating annotation image |
CN117541951A (en) * | 2024-01-10 | 2024-02-09 | 深圳块织类脑智能科技有限公司 | Polluted water body detection method based on unmanned aerial vehicle multispectral remote sensing |
-
2023
- 2023-07-19 CN CN202310885732.2A patent/CN117111092A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117373024A (en) * | 2023-12-07 | 2024-01-09 | 潍坊市海洋发展研究院 | Method, device, electronic equipment and computer readable medium for generating annotation image |
CN117373024B (en) * | 2023-12-07 | 2024-03-08 | 潍坊市海洋发展研究院 | Method, device, electronic equipment and computer readable medium for generating annotation image |
CN117541951A (en) * | 2024-01-10 | 2024-02-09 | 深圳块织类脑智能科技有限公司 | Polluted water body detection method based on unmanned aerial vehicle multispectral remote sensing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117111092A (en) | High-spatial-resolution remote sensing water quality detection method based on machine learning | |
CN112051222A (en) | River and lake water quality monitoring method based on high-resolution satellite image | |
Heenkenda et al. | Quantifying mangrove chlorophyll from high spatial resolution imagery | |
CN111368817B (en) | Method and system for quantitatively evaluating thermal effect based on earth surface type | |
Roelfsema et al. | Spatial distribution of benthic microalgae on coral reefs determined by remote sensing | |
Chen et al. | Aboveground biomass of salt-marsh vegetation in coastal wetlands: Sample expansion of in situ hyperspectral and Sentinel-2 data using a generative adversarial network | |
Concepcion et al. | Estimation of photosynthetic growth signature at the canopy scale using new genetic algorithm-modified visible band triangular greenness index | |
CN115481368B (en) | Vegetation coverage estimation method based on full remote sensing machine learning | |
CN110987955A (en) | Urban black and odorous water body grading method based on decision tree | |
CN111178169A (en) | Urban surface covering fine classification method and device based on remote sensing image | |
CN112733596A (en) | Forest resource change monitoring method based on medium and high spatial resolution remote sensing image fusion and application | |
CN114781537B (en) | Sea entry and drainage port suspected pollution discharge identification method based on high-resolution satellite image | |
CN114219847B (en) | Method and system for determining crop planting area based on phenological characteristics and storage medium | |
CN109300133B (en) | Urban river network area water body extraction method | |
CN112329790B (en) | Quick extraction method for urban impervious surface information | |
CN116482317B (en) | Lake water nutrition state real-time monitoring method, system, equipment and medium | |
CN117115077A (en) | Lake cyanobacteria bloom detection method | |
Liu et al. | Trophic state assessment of optically diverse lakes using Sentinel-3-derived trophic level index | |
CN114612794B (en) | Remote sensing identification method for ground cover and planting structure of finely divided agricultural area | |
CN114813651A (en) | Remote sensing water quality inversion method combining difference learning rate and spectrum geometric characteristics | |
CN116148188A (en) | Air-space-ground integrated lake water quality tracing method, system, equipment and storage medium | |
CN114199880A (en) | Citrus disease and insect pest real-time detection method based on edge calculation | |
CN117372710A (en) | Forest gap extraction method based on Sentinel-2MSI remote sensing image | |
CN117035066A (en) | Ground surface temperature downscaling method coupling geographic weighting and random forest | |
CN116721385A (en) | Machine learning-based RGB camera data cyanobacteria bloom monitoring method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |