CN104359847A - Method and device for acquiring centroid set used for representing typical water category - Google Patents

Method and device for acquiring centroid set used for representing typical water category Download PDF

Info

Publication number
CN104359847A
CN104359847A CN201410742576.5A CN201410742576A CN104359847A CN 104359847 A CN104359847 A CN 104359847A CN 201410742576 A CN201410742576 A CN 201410742576A CN 104359847 A CN104359847 A CN 104359847A
Authority
CN
China
Prior art keywords
data
water surface
surface sampling
water
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410742576.5A
Other languages
Chinese (zh)
Other versions
CN104359847B (en
Inventor
张兵
申茜
李俊生
张方方
叶虎平
吴艳红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Remote Sensing and Digital Earth of CAS
Original Assignee
Institute of Remote Sensing and Digital Earth of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Remote Sensing and Digital Earth of CAS filed Critical Institute of Remote Sensing and Digital Earth of CAS
Priority to CN201410742576.5A priority Critical patent/CN104359847B/en
Publication of CN104359847A publication Critical patent/CN104359847A/en
Application granted granted Critical
Publication of CN104359847B publication Critical patent/CN104359847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The embodiment of the invention discloses a method and device for acquiring a centroid set used for representing the typical water category. The method comprises the steps of acquiring water sample data; carrying out dimensionality reduction on reflectance spectral data of all water surface sampling points to obtain two principal component data of the reflectance spectra of each sampling point; determining the number of categories; carrying out fuzzy classification on the water surface sampling points for at least two times according to the two principal component data of the reflectance spectra of each sampling point, wherein all the times of fuzzy classification are carried out based on different distance measures; solving the intersection between the results obtained in the process of all the times of fuzzy classification and the initial centroid set corresponding to the ith category to obtain the centroid set corresponding to the ith category, wherein i is equal to 1, 2......, C. After the method and the device are adopted, the stability of the centroid set corresponding to the ith category is improved, and the centroid set has representativeness.

Description

Method and device for acquiring centroid set representing typical water body category
Technical Field
The invention relates to the technical field of remote sensing, in particular to a method and a device for acquiring a centroid set representing typical water body categories.
Background
Remote sensing inversion of optical complex water quality parameters has been a difficult point. At present, many researches adopt a classification inversion strategy to perform remote sensing inversion, that is, remote sensing pixels are classified first, and then different strategies are adopted for inversion of different types of remote sensing pixels, so as to achieve the purpose of improving inversion accuracy. The remote sensing pixel classification can be divided into hard classification and fuzzy classification, wherein the fuzzy classification has the characteristic of improving the smoothness and stability of an inversion result and is a main research direction in the field.
The fuzzy classification of the remote sensing pixels is to calculate the weight coefficient of each pixel belonging to each category, so that the pixels are classified according to the weight coefficient of each pixel belonging to each category to obtain a plurality of centroid sets. The weight coefficient of each pixel belonging to each category is determined by the distance between the pixel and the centroid pixel of each category, and when the distance between the pixel and the centroid pixel is smaller, the similarity between the pixel and the centroid pixel is considered to be better, and the weight coefficient of the pixel belonging to the category represented by the centroid pixel is considered to be larger. Therefore, how to obtain the most effective centroid set representing typical water body categories is one of important research contents for inverting the inland water body water quality parameters by adopting a fuzzy classification method.
The inventor finds that the centroid set representing the typical water body category obtained by fuzzy classification at present has poor stability in the process of implementing the invention.
Disclosure of Invention
The invention aims to provide a method and a device for acquiring a centroid set representing a typical water body class, so as to improve the stability of the centroid set representing the typical water body class.
In order to achieve the purpose, the invention provides the following technical scheme:
a method of obtaining a set of centroids representative of a typical water body class, comprising:
acquiring water body sample data, wherein the water body sample data comprises remote sensing reflectivity spectrum data of a plurality of water surface sampling points;
performing dimensionality reduction on the reflectivity spectrum data of a plurality of wave bands of each water surface sampling point through principal component analysis and transformation to obtain two principal component data of the reflectivity spectrum of each water surface sampling point;
determining the number C of categories;
performing fuzzy classification processes at least twice, wherein distance measures based on the fuzzy classification processes are different in each time; the fuzzy classification process divides the water surface sampling points into C types according to two principal component data of the reflectivity spectrum of each water surface sampling point to obtain C initial mass center sets;
solving an intersection of the initial mass center set obtained by executing the fuzzy classification process and corresponding to the ith category to obtain a mass center set corresponding to the ith category; wherein i is 1, 2, … …, C.
The above method, preferably, the determining the number of categories C includes:
calculating the BIC indexes of the water surface sampling points under different classification numbers according to two main component data of the reflectivity spectrum of each water surface sampling point;
and determining the classification number corresponding to the minimum BIC index as the class number C.
The above method, preferably, the determining the number of categories C includes:
calculating Dunn' c indexes of the plurality of water surface sampling points under different classification data according to two main component data of the reflectivity spectrum of each water surface sampling point;
and determining the classification number corresponding to the maximum Dunn' C index as the classification number C.
Preferably, in the above method, the water body sample data further includes: the water environment variable of a plurality of surface sampling points comprises: water quality parameters and inherent optical quantities of different components of the water body; after obtaining the set of centroids corresponding to the ith class, the method further comprises:
displaying the water body environment variable parameters of all samples in the centroid set corresponding to the ith category;
when a removing instruction triggered by a user is received, determining target sample data according to the identification mark of the sample data carried in the removing instruction, deleting the target sample data, and executing the step of executing the fuzzy classification process at least twice again.
In the above method, preferably, the sampling region of the water body sample data includes at least one of: inland waters and near-shore waters.
An apparatus for acquiring a set of centroids representative of a typical class of water, comprising:
the system comprises a sample acquisition module, a data acquisition module and a data acquisition module, wherein the sample acquisition module is used for acquiring water body sample data which comprises remote sensing reflectivity spectrum data of a plurality of water surface sampling points;
the dimension reduction module is used for reducing the dimension of the reflectivity spectrum data of a plurality of wave bands of each water surface sampling point through principal component analysis and transformation to obtain two principal component data of the reflectivity spectrum of each water surface sampling point;
a determining module for determining the number of categories C;
the classification module is used for executing fuzzy classification processes at least twice, and the fuzzy classification processes in each time are different in distance measure; the fuzzy classification process classifies the acquired water surface sampling points into C types according to two principal component data of the reflectivity spectrum of each water surface sampling point to obtain C initial mass center sets;
the centroid set acquisition module is used for solving an intersection of the initial centroid set obtained by executing the fuzzy classification process and corresponding to the ith category to obtain a centroid set corresponding to the ith category; wherein i is 1, 2, … …, C.
The above apparatus, preferably, the determining module includes:
the first calculation unit is used for calculating the BIC indexes of the water surface sampling points under different classification numbers according to two main component data of the reflectivity spectrum of each water surface sampling point;
and the first determining unit is used for determining the classification number corresponding to the minimum BIC index as the category number C.
The above apparatus, preferably, the determining module includes:
the second calculation unit is used for calculating Dunn' c indexes of the plurality of water surface sampling points under different classification data according to two main component data of the reflectivity spectrum of each water surface sampling point;
and the second determining unit is used for determining the classification number corresponding to the maximum Dunn' C index as the classification number C.
Above-mentioned device, preferably, the water body sample data further includes: the water environment variable of a plurality of surface sampling points comprises: water quality parameters and inherent optical quantities of different components of the water body; the device further comprises:
the display module is used for displaying the water body environment variable parameters of all samples in the centroid set corresponding to the ith category;
and the deleting module is used for determining target sample data according to the identification identifier of the sample data carried in the removing instruction when a removing instruction triggered by a user is received, deleting the target sample data, and generating a triggering instruction to instruct the classifying module to re-execute the step of executing the fuzzy classification process at least twice.
Preferably, the sample acquiring module is specifically configured to acquire water sample data, where the water sample data includes remote sensing reflectance spectrum data of a plurality of water surface sampling points; the sampling region of the water body sample data comprises at least one of the following regions: inland waters and near-shore waters.
According to the scheme, the method and the device for acquiring the centroid set representing the typical water body category are used for acquiring the water body sample data, reducing the dimension of the reflectivity spectrum data of each water surface sampling point to obtain two main component data of the reflectivity spectrum of each sampling point, determining the category number, and classifying the water surface sampling points at least twice according to the two main component data of the reflectivity spectrum of each water surface sampling point; wherein the distance measures on which the fuzzy classification is performed each time are different; and intersecting the initial centroid set corresponding to the ith class obtained by performing the fuzzy classification process each time to obtain a centroid set corresponding to the ith class, wherein i is 1, 2, … … and C. The centroid set is made more representative while improving the stability of the centroid set corresponding to the ith category.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a method for obtaining a centroid set representing a typical water body category according to an embodiment of the present application;
FIG. 2 is a flow chart of an implementation of determining the number of categories C according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of another implementation of determining the number of categories C according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of another implementation of a method for obtaining a centroid set representing a typical water body category according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an apparatus for acquiring a centroid set representing a typical water body category according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a determining module provided in an embodiment of the present application;
fig. 7 is another schematic structural diagram of a determination module provided in an embodiment of the present application;
fig. 8 is another structural schematic diagram of an apparatus for acquiring a centroid set representing a typical water body category according to an embodiment of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An implementation flowchart of a method for acquiring a centroid set representing a typical water body category provided by an embodiment of the present application may include:
step S11: acquiring water body sample data, wherein the water body sample data comprises remote sensing reflectivity spectrum data of a plurality of water surface sampling points;
the remote sensing reflectivity spectrum data of the plurality of water surface sampling points can be remote sensing reflectivity spectra of all the sampling points obtained through actual measurement. The remote sensing reflectivity spectrum of the optical complex water body is obtained by measuring by adopting a 'measuring method above the water surface', the measuring method is a universal method for measuring the remote sensing reflectivity spectrum of the water body at present, and the method can remove the influence of skylight on the remote sensing reflectivity spectrum of the water body. And each site (i.e., each sample point) can acquire a remote sensing reflectance spectrum. The remote sensing reflectance spectrum refers to the case where the reflectance of light varies with wavelength, for example, the remote sensing reflectance spectrum may be a remote sensing reflectance spectrum in the range of 551 bands including at least 350nm to 900nm at intervals of 1nm (i.e., 1 nm), and the reflectance value is a real number between 0 and 1. In the embodiment of the invention, the spectral ranges and spectral resolutions (i.e. band intervals) of different sampling points are the same.
The remote sensing reflectivity spectrum data of the plurality of water surface sampling points can also be an image which is acquired by a remote sensor and covers the optical complex water body. In general, M rows, N columns, and L-band remote sensing images, where M × N is Num pixels, and the value of L-band corresponding to each pixel may be regarded as the reflectance spectrum of the water surface point corresponding to the pixel.
Step S12: performing dimensionality reduction on the reflectivity spectrum data of a plurality of wave bands of each water surface sampling point through principal component analysis transformation (namely PCA transformation) to obtain two principal component data of the reflectivity spectrum of each water surface sampling point;
in the embodiment of the invention, the reflectivity spectrum data of the L wave bands of each sampling point is expressed by two main components through PCA conversion. That is to say, in the embodiment of the present invention, before the classification, the dimension reduction is performed on the reflectance spectrum data of each sampling point, so that the data amount is reduced, and the calculation amount is reduced.
Step S13: determining the number C of categories;
in the embodiment of the invention, fuzzy classification can be carried out by adopting a fuzzy clustering algorithm. The fuzzy clustering is one of the unsupervised classifications, and the unsupervised classification requires the number of classes to be input in advance. Only if the number of classes is known can unsupervised classification of the sample be performed.
The number of categories C may be determined empirically by the researcher.
Step S14: performing fuzzy classification processes at least twice, wherein distance measures based on the fuzzy classification processes are different in each time; the fuzzy classification process divides the water surface sampling points into C types according to two principal component data of the reflectivity spectrum of each water surface sampling point to obtain C initial mass center sets;
each time the fuzzy classification process is executed, C initial centroid sets are obtained. Each initial centroid set represents a typical water body category, and each initial centroid set comprises a plurality of water body sample data which are similar to or identical to the typical water body category.
Optionally, fuzzy c-means clustering (FCM) may be used to perform fuzzy classification on the water sample data. Of course, in the embodiment of the present invention, the fuzzy classification is not limited to using the FCM clustering algorithm, and other fuzzy clustering algorithms, such as an improved fuzzy clustering algorithm EFC-md (evolution fuzzy clustering with fuzzy distances) may be used, as long as the fuzzy clustering algorithm based on distance measure is suitable for the embodiment of the present invention.
The distance is a simple and effective index for measuring the similarity of data. In the fuzzy classification process, the weight coefficient of each class of the pixel is determined according to the distance between the pixel and the centroid pixel of each class.
In the embodiment of the present invention, the distance measure used in the jth fuzzy classification process is different from the distance measure used in the previous J-1 fuzzy classification processes, where J is 1, 2, 3, … … J, J is the total number of times of performing the fuzzy classification process, and J is greater than or equal to 2. I.e. the distance measures used in any two fuzzy classification processes are different.
Optionally, in the embodiment of the present invention, the distance measure may use, but is not limited to, the following: euclidean distance (Euc), cosine distance (SAD), OPD divergence (orthogonalprojection divergence), TD divergence (transformed divergence), mahalanobis distance, and the like.
Preferably, four fuzzy classification processes may be performed. Specifically, when the fuzzy classification process is performed four times, one distance measure is used each time the fuzzy classification process is performed, and four distance measures are used in total for performing the fuzzy classification process four times. The selected distance measures may be: euclidean distance, cosine distance, OPD divergence and TD divergence. Of course, in the embodiment of the present invention, the distance measurement is not limited to these four distances, and may be any four distances among the above five distance measurements.
Step S15: solving an intersection of the initial mass center set obtained by executing the fuzzy classification process and corresponding to the ith category to obtain a mass center set corresponding to the ith category; wherein i is 1, 2, … …, C.
Assuming that the C centroid sets obtained by the jth fuzzy classification are respectively Uj1,Uj2,……UjCThen the centroid set U corresponding to the ith classiIs Ui=U1i∩U2i∩……∩UJiWherein, Uji(J ═ 1, 2, 3, … … J, J being the total number of times the fuzzy classification process was performed, J ≧ 2) the initial centroid set corresponding to the ith class was obtained for the jth fuzzy classification.
By analyzing the environmental parameters of each sample in the centroid set corresponding to the ith class, the typical water body represented by the centroid set corresponding to the ith class can be determined. How to determine what typical water body the centroid set corresponding to the ith category represents belongs to the common general knowledge in the art, and is not described here in detail.
And averaging the reflectivity spectrums in the centroid set corresponding to the ith class to obtain the centroid spectrum representing the typical water body class.
The method for acquiring the centroid set representing the typical water body category, provided by the embodiment of the invention, comprises the steps of acquiring water body sample data, performing dimensionality reduction on the reflectivity spectrum data of each water surface sampling point to obtain two principal component data of the reflectivity spectrum of each sampling point, determining the category number, and performing fuzzy classification on a plurality of water surface sampling points at least twice according to the two principal component data of the reflectivity spectrum of each water surface sampling point; wherein the distance measures on which the fuzzy classification is performed each time are different; and intersecting the initial centroid set corresponding to the ith class obtained by performing the fuzzy classification process each time to obtain a centroid set corresponding to the ith class, wherein i is 1, 2, … … and C. The centroid set is made more representative while improving the stability of the centroid set corresponding to the ith category.
Preferably, since the category number C is determined not to be objective enough through experience, based on this, the embodiment of the present invention proposes that the category number C may be determined according to the sample data itself.
Optionally, an implementation flowchart of determining the category number C is shown in fig. 2, and may include:
step S21: calculating the BIC indexes of the water surface sampling points under different classification numbers according to two main component data of the reflectivity spectrum of each water surface sampling point;
the BIC index is an index for evaluating the effectiveness of fuzzy clustering. According to the embodiment of the invention, the BIC index is calculated by using the principal component data obtained after dimensionality reduction of each water surface sampling point instead of the originally acquired reflectivity spectrum data of the water surface sampling point, so that the calculated amount of calculating the BIC index is reduced.
In the embodiment of the invention, the classification number is changed from 2 to 16 one by one, and the BIC index of the sample data is calculated once every time the classification number is changed. When calculating the BIC index, the sample data after PCA transformation is used for calculation. The BIC index may be calculated according to the following formula:
<math> <mrow> <mi>BIC</mi> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>i</mi> </msub> <mi>log</mi> <msub> <mi>n</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>n</mi> <mi>i</mi> </msub> <mi>log</mi> <mi> n</mi> <mo>-</mo> <mfrac> <mrow> <msub> <mi>n</mi> <mi>i</mi> </msub> <mo>*</mo> <mi>d</mi> </mrow> <mn>2</mn> </mfrac> <mi>log</mi> <mrow> <mo>(</mo> <mn>2</mn> <mi>&pi;</mi> <mo>)</mo> </mrow> <mo>-</mo> <mfrac> <msub> <mi>n</mi> <mi>i</mi> </msub> <mn>2</mn> </mfrac> <mi>log</mi> <msub> <mi>&Sigma;</mi> <mi>i</mi> </msub> <mo>-</mo> <mfrac> <mrow> <msub> <mi>n</mi> <mi>i</mi> </msub> <mo>-</mo> <mi>m</mi> </mrow> <mn>2</mn> </mfrac> <mo>)</mo> </mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>m olg n</mi> </mrow> </math>
wherein,niIs the number of samples in the ith category. d is the data dimension of each surface sample point, here equal to 2. m is the number of classifications. n is the total number of surface samples. SigmaiIs a maximum likelihood estimate of the variance of the ith classxjIs the jth sample point, C, within the ith classiIs the centroid of the ith class.
Step S22: and determining the classification number corresponding to the minimum BIC index as the class number C.
In the embodiment of the invention, the optimal class number C is determined based on the sample, so that the stability of the obtained centroid set is further enhanced.
Optionally, another implementation flowchart for determining the category number C is shown in fig. 3, and may include:
step S31: calculating Dunn' c indexes of the plurality of water surface sampling points under different classification data according to two main component data of the reflectivity spectrum of each water surface sampling point;
the Dunn' c index is another indicator for evaluating the effectiveness of fuzzy clustering. In the embodiment of the invention, the Dunn 'c index is calculated by using the principal component data obtained after dimensionality reduction of each water surface sampling point instead of the originally acquired reflectivity spectrum data of the water surface sampling point, so that the calculation amount for calculating the Dunn' c index is reduced.
In the embodiment of the invention, the classification number is changed from 2 to 16 one by one, and the Dunn' c index of sample data is calculated once every time the classification number is changed. In calculating the Dunn' c index, the calculation is performed using the sample data after the PCA transformation.
Step S32: and determining the classification number corresponding to the maximum Dunn' C index as the classification number C.
In the embodiment of the invention, the optimal class number C is determined based on the sample, so that the stability of the obtained centroid set is further enhanced.
Further, the obtained water body sample data further comprises: the water environment variable of a plurality of surface sampling points comprises: water quality parameters and inherent optical quantities of different components of the water body; wherein, the water quality parameters can include: chlorophyll a concentration, total particulate matter concentration, organic particulate matter concentration, inorganic particulate matter concentration, dissolved organic carbon concentration and the like. Intrinsic optical quantities of different components of a body of water may include: the beam attenuation coefficient of the particles and the CDOM, the absorption coefficient of the particles, the absorption coefficient of the non-algae particles, the absorption coefficient of phytoplankton, the scattering coefficient of the particles and the like.
As shown in fig. 4, fig. 4 is a flowchart of another implementation of the method for obtaining a centroid set representing typical water body categories provided in the present application, and on the basis of the embodiment shown in fig. 1, after obtaining a centroid set corresponding to an ith category, the method further includes:
step S41: displaying the water body environment variable parameters of all samples in the centroid set corresponding to the ith category;
step S42: when a removing instruction triggered by a user is received, determining target sample data according to the identification mark of the sample data carried in the removing instruction, deleting the target sample data, and executing the step of executing the fuzzy classification process at least twice again.
Whether the water environment variables of the sampling points in the ith mass center set are obviously abnormal or not can be judged by researchers (namely users) according to experience. When a researcher judges that the water environment variable of the sampling point in the ith mass center set is obviously abnormal, a removing instruction for deleting the obviously abnormal sample data is triggered, and the removing instruction carries the identification of the sample data to be removed.
In order to avoid the occurrence of the same spectrum of the foreign matter, in the embodiment of the invention, researchers analyze and judge the actually measured environment parameters corresponding to the samples in the same centroid set, if the environment parameters are abnormal, the sample data of the sampling points with obvious abnormality are removed, and the step S14 and the step S15 are executed again to obtain a new centroid set. Until the water environment variables of the sampling points in each centroid set are not obviously abnormal. The occurrence of foreign matter co-spectral conditions is avoided, so that the classified water body represents types with physical significance.
When a user-triggered determination instruction is received, the centroid spectrum of the centroid set of the ith class may be output. Other operation results, such as the ith centroid set and the like, can also be output according to the requirements of the user.
Optionally, in the above embodiment, in order to make the determined centroid set have a wider application range, in the embodiment of the present invention, water body sample data of a plurality of water areas is collected. Specifically, in the embodiment of the present invention, the sampling region of the water body sample data may include, but is not limited to, at least one of the following: inland waters, nearshore waters, and the like. Wherein the inland waters may include: rivers, lakes, ponds, pond dams, reservoirs and other water areas. In the embodiment of the invention, the sampling can be carried out in only one inland water area, and the sampling can be carried out in more than two inland water areas.
For example, water body sample data can be collected at the lake Tai, the three gorges reservoir, the Dian pond, the nest lake and the yellow river mouth.
Corresponding to the embodiment of the method, the present application further provides an apparatus for acquiring a set of centroids representing typical water body categories, and a schematic structural diagram of the apparatus for acquiring a set of centroids representing typical water body categories provided by the present application is shown in fig. 5, and may include:
a sample obtaining module 51, a dimension reduction module 52, a determination module 53, a classification module 54 and a centroid set obtaining module 55; wherein,
the sample acquisition module 51 is configured to acquire water sample data, where the water sample data includes remote sensing reflectance spectrum data of a plurality of water surface sampling points;
the remote sensing reflectivity spectrum data of the plurality of water surface sampling points can be remote sensing reflectivity spectra of all the sampling points obtained through actual measurement. The remote sensing reflectivity spectrum of the optical complex water body is obtained by measuring by adopting a 'measuring method above the water surface', the measuring method is a universal method for measuring the remote sensing reflectivity spectrum of the water body at present, and the method can remove the influence of skylight on the remote sensing reflectivity spectrum of the water body. And each site (i.e., each sample point) can acquire a remote sensing reflectance spectrum. The remote sensing reflectance spectrum refers to the case where the reflectance of light varies with wavelength, for example, the remote sensing reflectance spectrum may be a remote sensing reflectance spectrum in the range of 551 bands including at least 350nm to 900nm at intervals of 1nm (i.e., 1 nm), and the reflectance value is a real number between 0 and 1. In the embodiment of the invention, the spectral ranges and spectral resolutions (i.e. band intervals) of different sampling points are the same.
The remote sensing reflectivity spectrum data of the plurality of water surface sampling points can also be an image which is acquired by a remote sensor and covers the optical complex water body. In general, M rows, N columns, and L-band remote sensing images, where M × N is Num pixels, and the value of L-band corresponding to each pixel may be regarded as the reflectance spectrum of the water surface point corresponding to the pixel.
The dimension reduction module 52 is configured to perform dimension reduction on the reflectance spectrum data of a plurality of wave bands of each water surface sampling point through principal component analysis and transformation to obtain two principal component data of the reflectance spectrum of each water surface sampling point;
in the embodiment of the invention, the reflectivity spectrum data of the L wave bands of each sampling point is expressed by two main components through PCA conversion. That is to say, in the embodiment of the present invention, before the classification, the dimension reduction is performed on the reflectance spectrum data of each sampling point, so that the data amount is reduced, and the calculation amount is reduced.
The determining module 53 is configured to determine the number C of categories;
in the embodiment of the invention, fuzzy classification can be carried out by adopting a fuzzy clustering algorithm. The fuzzy clustering is one of unsupervised classification, and the unsupervised classification requires the number of classes to be input in advance. Only if the number of classes is known can unsupervised classification of the sample be performed.
The number of categories C may be determined empirically by the researcher.
The classification module 54 is configured to perform the fuzzy classification process at least twice, each fuzzy classification process being based on a different distance measure; the fuzzy classification process divides the water surface sampling points into C types according to two principal component data of the reflectivity spectrum of each water surface sampling point to obtain C initial mass center sets;
each time the fuzzy classification process is executed, C initial centroid sets are obtained.
Optionally, fuzzy c-means clustering (FCM) may be used to perform fuzzy classification on the water sample data. Of course, in the embodiment of the present invention, the fuzzy classification is not limited to using the FCM clustering algorithm, and other fuzzy clustering algorithms, such as an improved fuzzy clustering algorithm EFC-md (evolution fuzzy clustering with fuzzy distances) may be used, as long as the fuzzy clustering algorithm based on distance measure is suitable for the embodiment of the present invention.
The distance is a simple and effective index for measuring the similarity of data. In the fuzzy classification process, the weight coefficient of each class of the pixel is determined according to the distance between the pixel and the centroid pixel of each class.
In the embodiment of the present invention, the distance measure used in the jth fuzzy classification process is different from the distance measure used in the previous J-1 fuzzy classification processes, where J is 1, 2, 3, … … J, J is the total number of times of performing the fuzzy classification process, and J is greater than or equal to 2. I.e. the distance measures used in any two fuzzy classification processes are different.
Optionally, in the embodiment of the present invention, the distance measure may use, but is not limited to, the following: euclidean distance (Euc), cosine distance (SAD), OPD divergence (orthogonalprojection divergence), TD divergence (transformed divergence), mahalanobis distance, and the like.
Preferably, four fuzzy classification processes may be performed. Specifically, when the fuzzy classification process is performed four times, one distance measure is used each time the fuzzy classification process is performed, and four distance measures are used in total for performing the fuzzy classification process four times. The selected distance measures may be: euclidean distance, cosine distance, OPD divergence and TD divergence. Of course, in the embodiment of the present invention, the distance measurement is not limited to these four distances, and may be any four distances among the above five distance measurements.
The centroid set obtaining module 55 is configured to obtain an intersection between each initial centroid set obtained by performing the fuzzy classification process and the initial centroid set corresponding to the ith category, so as to obtain a centroid set corresponding to the ith category; wherein i is 1, 2, … …, C.
Assuming that the C centroid sets obtained by the jth fuzzy classification are respectively Uj1,Uj2,……UjCThen the centroid set U corresponding to the ith classiIs Ui=U1i∩U2i∩……∩UJiWherein, Uji(J ═ 1, 2, 3, … … J, J being the total number of times the fuzzy classification process was performed, J ≧ 2) the initial centroid set corresponding to the ith class was obtained for the jth fuzzy classification.
By analyzing the environmental parameters of each sample in the centroid set corresponding to the ith class, the typical water body represented by the centroid set corresponding to the ith class can be determined. How to determine what typical water body the centroid set corresponding to the ith category represents belongs to the common general knowledge in the art, and is not described here in detail.
And averaging the reflectivity spectrums in the centroid set corresponding to the ith class to obtain the centroid spectrum representing the typical water body class.
The device for acquiring the centroid set representing the typical water body category, provided by the embodiment of the invention, acquires water body sample data, performs dimensionality reduction on the reflectivity spectrum data of each water surface sampling point to obtain two principal component data of the reflectivity spectrum of each sampling point, determines the category number, and performs fuzzy classification on the water surface sampling points at least twice according to the two principal component data of the reflectivity spectrum of each water surface sampling point; wherein the distance measures on which the fuzzy classification is performed each time are different; and intersecting the initial centroid set corresponding to the ith class obtained by performing the fuzzy classification process each time to obtain a centroid set corresponding to the ith class, wherein i is 1, 2, … … and C. The centroid set is made more representative while improving the stability of the centroid set corresponding to the ith category.
Optionally, a schematic structural diagram of the determining module 53 is shown in fig. 6, and may include:
a first calculation unit 61 and a first determination unit 62; wherein,
the first calculating unit 61 is used for calculating the BIC indexes of the water surface sampling points under different classification numbers according to two main component data of the reflectivity spectrum of each water surface sampling point;
the BIC index is an index for evaluating the effectiveness of fuzzy clustering. According to the embodiment of the invention, the BIC index is calculated by using the principal component data obtained after dimensionality reduction of each water surface sampling point instead of the originally acquired reflectivity spectrum data of the water surface sampling point, so that the calculated amount of calculating the BIC index is reduced.
In the embodiment of the invention, the classification number is changed from 2 to 16 one by one, and the BIC index of the sample data is calculated once every time the classification number is changed. When calculating the BIC index, the sample data after PCA transformation is used for calculation.
The first determining unit 62 is configured to determine the classification number corresponding to the minimum BIC index as the category number C.
In the embodiment of the invention, the optimal class number C is determined based on the sample, so that the stability of the obtained centroid set is further enhanced.
Optionally, another schematic structural diagram of the determining module 53 is shown in fig. 7, and may include:
a second calculation unit 71 and a second determination unit 72; wherein,
the second calculating unit 71 is configured to calculate Dunn' c indexes of the plurality of water surface sampling points under different classification data according to two principal component data of the reflectivity spectrum of each water surface sampling point;
the Dunn' c index is another indicator for evaluating the effectiveness of fuzzy clustering. In the embodiment of the invention, the Dunn 'c index is calculated by using the principal component data obtained after dimensionality reduction of each water surface sampling point instead of the originally acquired reflectivity spectrum data of the water surface sampling point, so that the calculation amount for calculating the Dunn' c index is reduced.
In the embodiment of the invention, the classification number is changed from 2 to 16 one by one, and the Dunn' c index of sample data is calculated once every time the classification number is changed. In calculating the Dunn' c index, the calculation is performed using the sample data after the PCA transformation.
The second determining unit 72 is configured to determine the classification number corresponding to the maximum Dunn' C index as the classification number C.
In the embodiment of the invention, the optimal class number C is determined based on the sample, so that the stability of the obtained centroid set is further enhanced.
Further, in the embodiment of the present invention, the water body sample data may further include: the water environment variable of a plurality of surface sampling points comprises: water quality parameters and inherent optical quantities of different components of the water body; wherein, the water quality parameters can include: chlorophyll a concentration, total particulate matter concentration, organic particulate matter concentration, inorganic particulate matter concentration, dissolved organic carbon concentration and the like. Intrinsic optical quantities of different components of a body of water may include: the beam attenuation coefficient of the particles and the CDOM, the absorption coefficient of the particles, the absorption coefficient of the non-algae particles, the absorption coefficient of phytoplankton, the scattering coefficient of the particles and the like.
On the basis of the embodiment shown in fig. 5, another structural schematic diagram of the apparatus for acquiring a centroid set representing a typical water body category provided by the present application is shown in fig. 8, and may further include:
a display module 81 and a deletion module 82; wherein,
the display module 81 is used for displaying the water body environment variable parameters of each sample in the centroid set corresponding to the ith category;
the deleting module 82 is configured to, when a removing instruction triggered by a user is received, determine target sample data according to an identification identifier of the sample data carried in the removing instruction, delete the target sample data, and generate a trigger instruction to instruct the classifying module to re-execute the step of executing the fuzzy classification process at least twice.
Whether the water environment variables of the sampling points in the ith mass center set are obviously abnormal or not can be judged by researchers (namely users) according to experience. When a researcher judges that the water environment variable of the sampling point in the ith mass center set is obviously abnormal, a removing instruction for deleting the obviously abnormal sample data is triggered, and the removing instruction carries the identification of the sample data to be removed.
In order to avoid the occurrence of the same spectrum of the foreign object, in the embodiment of the present invention, a researcher analyzes and judges the actually measured environment parameters corresponding to the samples in the same centroid set, and if the environment parameters are abnormal, the classification module 54 and the centroid set acquisition module 55 are triggered again to operate to obtain a new centroid set. Until the water environment variables of the sampling points in each centroid set are not obviously abnormal. The occurrence of foreign matter co-spectral conditions is avoided, so that the classified water body represents types with physical significance.
In the above embodiment, optionally, the sample obtaining module is specifically configured to obtain water sample data, where the water sample data includes remote sensing reflectance spectrum data of a plurality of water surface sampling points; the sampling region of the water body sample data may include, but is not limited to, at least one of the following: inland waters, nearshore waters, and the like. Wherein the inland waters may include: rivers, lakes, ponds, pond dams, reservoirs and other water areas. In the embodiment of the invention, the sampling can be carried out in only one inland water area, and the sampling can be carried out in more than two inland water areas.
For example, water body sample data can be collected at the lake Tai, the three gorges reservoir, the Dian pond, the nest lake and the yellow river mouth.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of obtaining a set of centroids representative of a typical water body class, comprising:
acquiring water body sample data, wherein the water body sample data comprises remote sensing reflectivity spectrum data of a plurality of water surface sampling points;
performing dimensionality reduction on the reflectivity spectrum data of a plurality of wave bands of each water surface sampling point through principal component analysis and transformation to obtain two principal component data of the reflectivity spectrum of each water surface sampling point;
determining the number C of categories;
performing fuzzy classification processes at least twice, wherein distance measures based on the fuzzy classification processes are different in each time; the fuzzy classification process divides the water surface sampling points into C types according to two principal component data of the reflectivity spectrum of each water surface sampling point to obtain C initial mass center sets;
solving an intersection of the initial mass center set obtained by executing the fuzzy classification process and corresponding to the ith category to obtain a mass center set corresponding to the ith category; wherein i is 1, 2, … …, C.
2. The method of claim 1, wherein the determining the number of classes C comprises:
calculating the BIC indexes of the water surface sampling points under different classification numbers according to two main component data of the reflectivity spectrum of each water surface sampling point;
and determining the classification number corresponding to the minimum BIC index as the class number C.
3. The method of claim 1, wherein the determining the number of classes C comprises:
calculating Dunn' c indexes of the plurality of water surface sampling points under different classification data according to two main component data of the reflectivity spectrum of each water surface sampling point;
and determining the classification number corresponding to the maximum Dunn' C index as the classification number C.
4. The method of claim 1, wherein the water sample data further comprises: the water environment variable of a plurality of surface sampling points comprises: water quality parameters and inherent optical quantities of different components of the water body; after obtaining the set of centroids corresponding to the ith class, the method further comprises:
displaying the water body environment variable parameters of all samples in the centroid set corresponding to the ith category;
when a removing instruction triggered by a user is received, determining target sample data according to the identification mark of the sample data carried in the removing instruction, deleting the target sample data, and executing the step of executing the fuzzy classification process at least twice again.
5. The method of claim 1, wherein the sampling region of water body sample data comprises at least one of: inland waters and near-shore waters.
6. An apparatus for acquiring a set of centroids representative of a typical class of water, comprising:
the system comprises a sample acquisition module, a data acquisition module and a data acquisition module, wherein the sample acquisition module is used for acquiring water body sample data which comprises remote sensing reflectivity spectrum data of a plurality of water surface sampling points;
the dimension reduction module is used for reducing the dimension of the reflectivity spectrum data of a plurality of wave bands of each water surface sampling point through principal component analysis and transformation to obtain two principal component data of the reflectivity spectrum of each water surface sampling point;
a determining module for determining the number of categories C;
the classification module is used for executing fuzzy classification processes at least twice, and the fuzzy classification processes in each time are different in distance measure; the fuzzy classification process classifies the acquired water surface sampling points into C types according to two principal component data of the reflectivity spectrum of each water surface sampling point to obtain C initial mass center sets;
the centroid set acquisition module is used for solving an intersection of the initial centroid set obtained by executing the fuzzy classification process and corresponding to the ith category to obtain a centroid set corresponding to the ith category; wherein i is 1, 2, … …, C.
7. The apparatus of claim 6, wherein the determining module comprises:
the first calculation unit is used for calculating the BIC indexes of the water surface sampling points under different classification numbers according to two main component data of the reflectivity spectrum of each water surface sampling point;
and the first determining unit is used for determining the classification number corresponding to the minimum BIC index as the category number C.
8. The apparatus of claim 6, wherein the determining module comprises:
the second calculation unit is used for calculating Dunn' c indexes of the plurality of water surface sampling points under different classification data according to two main component data of the reflectivity spectrum of each water surface sampling point;
and the second determining unit is used for determining the classification number corresponding to the maximum Dunn' C index as the classification number C.
9. The apparatus of claim 6, wherein the water body sample data further comprises: the water environment variable of a plurality of surface sampling points comprises: water quality parameters and inherent optical quantities of different components of the water body; the device further comprises:
the display module is used for displaying the water body environment variable parameters of all samples in the centroid set corresponding to the ith category;
and the deleting module is used for determining target sample data according to the identification identifier of the sample data carried in the removing instruction when a removing instruction triggered by a user is received, deleting the target sample data, and generating a triggering instruction to instruct the classifying module to execute the step of executing the fuzzy classifying process at least twice again.
10. The device of claim 6, wherein the sample acquisition module is specifically configured to acquire water sample data comprising remote sensing reflectance spectral data of a plurality of water surface sampling points; the sampling region of the water body sample data comprises at least one of the following regions: inland waters and near-shore waters.
CN201410742576.5A 2014-12-08 2014-12-08 Method and device for acquiring centroid set used for representing typical water category Active CN104359847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410742576.5A CN104359847B (en) 2014-12-08 2014-12-08 Method and device for acquiring centroid set used for representing typical water category

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410742576.5A CN104359847B (en) 2014-12-08 2014-12-08 Method and device for acquiring centroid set used for representing typical water category

Publications (2)

Publication Number Publication Date
CN104359847A true CN104359847A (en) 2015-02-18
CN104359847B CN104359847B (en) 2017-02-22

Family

ID=52527130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410742576.5A Active CN104359847B (en) 2014-12-08 2014-12-08 Method and device for acquiring centroid set used for representing typical water category

Country Status (1)

Country Link
CN (1) CN104359847B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108020561A (en) * 2016-11-03 2018-05-11 应用材料以色列公司 For the method adaptively sampled in check object and its system
CN112378864A (en) * 2020-10-27 2021-02-19 核工业北京地质研究院 Airborne hyperspectral soil information retrieval method
CN112528559A (en) * 2020-12-04 2021-03-19 广东省科学院广州地理研究所 Chlorophyll a concentration inversion method combining presorting and machine learning
CN112906531A (en) * 2021-02-07 2021-06-04 清华苏州环境创新研究院 Multi-source remote sensing image space-time fusion method and system based on unsupervised classification
CN113627322A (en) * 2021-08-09 2021-11-09 台州市污染防治工程技术中心 Method and system for eliminating abnormal points and electronic equipment
CN114545416A (en) * 2022-02-25 2022-05-27 中山大学 Object-oriented quantitative precipitation estimation method and device and terminal equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05332922A (en) * 1992-03-31 1993-12-17 Shimadzu Corp Measuring method of cluster of water
CN101403796A (en) * 2008-11-18 2009-04-08 北京交通大学 City ground impermeability degree analyzing and drawing method
CN103983584A (en) * 2014-05-30 2014-08-13 中国科学院遥感与数字地球研究所 Retrieval method and retrieval device of chlorophyll a concentration of inland case II water

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05332922A (en) * 1992-03-31 1993-12-17 Shimadzu Corp Measuring method of cluster of water
CN101403796A (en) * 2008-11-18 2009-04-08 北京交通大学 City ground impermeability degree analyzing and drawing method
CN103983584A (en) * 2014-05-30 2014-08-13 中国科学院遥感与数字地球研究所 Retrieval method and retrieval device of chlorophyll a concentration of inland case II water

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KUN SHI ET AL.: "Remote chlorophyll-a estimates for inland waters based on a cluster-based classification", 《SCIENCE OF THE TOTAL ENVIRONMENT》 *
QIAN SHEN ET AL.: "Classification of Several Optically Complex Waters in China Using in Situ Remote Sensing Reflectance", 《REMOTE SENSING》 *
TIMOTHY S. MOORE ET AL.: "A class-based approach to characterizing and mapping the uncertainty of the MODIS ocean chlorophyll product", 《REMOTE SENSING OF ENVIRONMENT》 *
申茜 等: "湖泊水体固有光学量光谱拟合与分析研究综述", 《遥感信息》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108020561A (en) * 2016-11-03 2018-05-11 应用材料以色列公司 For the method adaptively sampled in check object and its system
CN112378864A (en) * 2020-10-27 2021-02-19 核工业北京地质研究院 Airborne hyperspectral soil information retrieval method
CN112378864B (en) * 2020-10-27 2024-07-19 核工业北京地质研究院 Airborne hyperspectral soil information inversion method
CN112528559A (en) * 2020-12-04 2021-03-19 广东省科学院广州地理研究所 Chlorophyll a concentration inversion method combining presorting and machine learning
CN112528559B (en) * 2020-12-04 2024-04-23 广东省科学院广州地理研究所 Chlorophyll a concentration inversion method combining pre-classification and machine learning
CN112906531A (en) * 2021-02-07 2021-06-04 清华苏州环境创新研究院 Multi-source remote sensing image space-time fusion method and system based on unsupervised classification
CN113627322A (en) * 2021-08-09 2021-11-09 台州市污染防治工程技术中心 Method and system for eliminating abnormal points and electronic equipment
CN114545416A (en) * 2022-02-25 2022-05-27 中山大学 Object-oriented quantitative precipitation estimation method and device and terminal equipment

Also Published As

Publication number Publication date
CN104359847B (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN104359847B (en) Method and device for acquiring centroid set used for representing typical water category
Shan et al. Simple and rapid detection of microplastics in seawater using hyperspectral imaging technology
Chen et al. Dynamic monitoring of wetland cover changes using time-series remote sensing imagery
CN103983584B (en) The inversion method of a kind of inland case �� waters chlorophyll-a concentration and device
Wang et al. Spatial and temporal variations of chlorophyll-a concentration from 2009 to 2012 in Poyang Lake, China
Tao et al. A novel method for discriminating Prorocentrum donghaiense from diatom blooms in the East China Sea using MODIS measurements
Ye et al. Real-time observation, early warning and forecasting phytoplankton blooms by integrating in situ automated online sondes and hybrid evolutionary algorithms
Jin et al. Developing and applying novel spectral feature parameters for classifying soil salt types in arid land
Lee et al. Iterative random vs. Kennard-Stone sampling for IR spectrum-based classification task using PLS2-DA
Kim et al. Time-series modelling of harmful cyanobacteria blooms by convolutional neural networks and wavelet generated time-frequency images of environmental driving variables
Groetsch et al. Cyanobacterial bloom detection based on coherence between ferrybox observations
Zhou et al. Tracking spatio-temporal dynamics of harmful algal blooms using long-term MODIS observations of Chaohu Lake in China from 2000 to 2021
Kc et al. Surface water quality assessment using remote sensing, GIS and artificial intelligence
Zeng et al. Optically-derived estimates of phytoplankton size class and taxonomic group biomass in the Eastern Subarctic Pacific Ocean
Yilmaz et al. Marine mucilage mapping with explained deep learning model using water-related spectral indices: a case study of Dardanelles Strait, Turkey
Hassan et al. Hybrid predictive model for water quality monitoring based on sentinel-2A L1C data
Evans et al. Linking monitoring and modelling: can long-term datasets be used more effectively as a basis for large-scale prediction?
Rodrigues et al. Phenology parameter extraction from time-series of satellite vegetation index data using phenosat
Mueller et al. Erosion probability maps: Calibrating precision agriculture data with soil surveys using logistic regression
Chadha et al. Flood Prediction And Rainfall Analysis Using LightGradient Boosted Machine
Zhang et al. Diagnosis of heavy metal cross contamination in leaf of rice based on hyperspectral image: a greenhouse experiment
Wang et al. Monitoring phycocyanin concentrations in high-latitude inland lakes using Sentinel-3 OLCI data: The case of Lake Hulun, China
Najafizadegan et al. Variable-complexity machine learning models for large-scale oil spill detection: The case of Persian Gulf
Assegide et al. Spatiotemporal Dynamics of Water Quality Indicators in Koka Reservoir, Ethiopia
Mohd et al. Thresholding and fuzzy rule-based classification approaches in handling mangrove forest mixed pixel problems associated with in QuickBird remote sensing image analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant