CN104359847A - Method and device for acquiring centroid set used for representing typical water category - Google Patents
Method and device for acquiring centroid set used for representing typical water category Download PDFInfo
- Publication number
- CN104359847A CN104359847A CN201410742576.5A CN201410742576A CN104359847A CN 104359847 A CN104359847 A CN 104359847A CN 201410742576 A CN201410742576 A CN 201410742576A CN 104359847 A CN104359847 A CN 104359847A
- Authority
- CN
- China
- Prior art keywords
- water surface
- data
- surface sampling
- water
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 title claims abstract description 214
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000005070 sampling Methods 0.000 claims abstract description 117
- 230000003595 spectral effect Effects 0.000 claims abstract description 20
- 238000001228 spectrum Methods 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 19
- 239000003643 water by type Substances 0.000 claims description 15
- 230000009466 transformation Effects 0.000 claims description 13
- 230000003287 optical effect Effects 0.000 claims description 8
- 230000001960 triggered effect Effects 0.000 claims description 7
- 238000000513 principal component analysis Methods 0.000 claims description 6
- 238000002310 reflectometry Methods 0.000 claims 12
- 238000000985 reflectance spectrum Methods 0.000 abstract description 51
- 230000007613 environmental effect Effects 0.000 description 17
- 238000010521 absorption reaction Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 239000002245 particle Substances 0.000 description 8
- 230000002159 abnormal effect Effects 0.000 description 6
- 238000012217 deletion Methods 0.000 description 6
- 230000037430 deletion Effects 0.000 description 6
- 239000013618 particulate matter Substances 0.000 description 6
- 230000005856 abnormality Effects 0.000 description 5
- 230000008030 elimination Effects 0.000 description 4
- 238000003379 elimination reaction Methods 0.000 description 4
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- 241000195493 Cryptophyta Species 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 229930002868 chlorophyll a Natural products 0.000 description 2
- ATNHDLDRLWWWCB-AENOIHSZSA-M chlorophyll a Chemical compound C1([C@@H](C(=O)OC)C(=O)C2=C3C)=C2N2C3=CC(C(CC)=C3C)=[N+]4C3=CC3=C(C=C)C(C)=C5N3[Mg-2]42[N+]2=C1[C@@H](CCC(=O)OC\C=C(/C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)[C@H](C)C2=C5 ATNHDLDRLWWWCB-AENOIHSZSA-M 0.000 description 2
- 238000007429 general method Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000000691 measurement method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004441 surface measurement Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 239000003653 coastal water Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Landscapes
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
Description
技术领域technical field
本发明涉及遥感技术领域,更具体地说,涉及一种获取代表典型水体类别的质心集的方法及装置。The present invention relates to the technical field of remote sensing, and more specifically, relates to a method and a device for obtaining a centroid set representing a typical water body category.
背景技术Background technique
光学复杂水体水质参数的遥感反演一直是个难点。目前,很多研究采用分类反演的策略进行遥感反演,即先对遥感像元进行分类,然后针对不同类别的遥感像元采用不同的策略进行反演,以达到提高反演精度目的。对遥感像元进行分类可分为硬分类和模糊分类,其中,模糊分类具有提高反演结果的平滑性和稳定性的特点,是该领域的主要研究方向。Remote sensing inversion of water quality parameters in optically complex water bodies has always been a difficult point. At present, many studies use the strategy of classification inversion for remote sensing inversion, that is, first classify the remote sensing pixels, and then use different strategies for inversion for different types of remote sensing pixels to achieve the purpose of improving the inversion accuracy. The classification of remote sensing pixels can be divided into hard classification and fuzzy classification. Among them, fuzzy classification has the characteristics of improving the smoothness and stability of the inversion results, and is the main research direction in this field.
对遥感像元进行模糊分类是计算各个像元属于各类别的权重系数,从而根据各个象元属于各类别的权重系数对像元进行分类,得到若干个质心集。其中,每个像元属于各类别的权重系数由该像元与各类别的质心像元之间的距离确定,当像元与质心像元之间的距离越小,认为该像元与质心像元的相似度越好,认为该像元属于质心像元所代表类别的权重系数就越大。因此,如何得到最有效的代表典型水体类别的质心集是采用模糊分类方法反演内陆水体水质参数的重要研究内容之一。The fuzzy classification of remote sensing pixels is to calculate the weight coefficients of each pixel belonging to each category, and then classify the pixels according to the weight coefficients of each pixel belonging to each category, and obtain several centroid sets. Among them, the weight coefficient of each pixel belonging to each category is determined by the distance between the pixel and the centroid pixel of each category. When the distance between the pixel and the centroid pixel is smaller, it is considered that the pixel and the centroid image The better the similarity of the pixel, the greater the weight coefficient that the pixel belongs to the category represented by the centroid pixel. Therefore, how to obtain the most effective centroid set representing the typical water body category is one of the important research contents of using the fuzzy classification method to invert the water quality parameters of inland water bodies.
发明人在实现本发明的过程中发现,目前通过模糊分类获取的代表典型水体类别的质心集的稳定性较差。In the process of implementing the present invention, the inventors found that the centroid sets representing typical water body categories currently obtained through fuzzy classification have poor stability.
发明内容Contents of the invention
本发明的目的是提供一种获取代表典型水体类别的质心集的方法及装置,以提高代表典型水体类别的质心集的稳定性。The purpose of the present invention is to provide a method and device for obtaining a centroid set representing a typical water body category, so as to improve the stability of the centroid set representing a typical water body category.
为实现上述目的,本发明提供了如下技术方案:To achieve the above object, the present invention provides the following technical solutions:
一种获取代表典型水体类别的质心集的方法,包括:A method for obtaining a set of centroids representative of typical water body classes, comprising:
获取水体样本数据,所述水体样本数据包括若干个水面采样点的遥感反射率光谱数据;Obtaining water body sample data, the water body sample data includes remote sensing reflectance spectral data of several water surface sampling points;
通过主成分分析变换对各个水面采样点的若干个波段的反射率光谱数据进行降维,得到各个水面采样点的反射率光谱的两个主成分数据;Through principal component analysis transformation, the reflectance spectrum data of several bands of each water surface sampling point is reduced in dimension, and two principal component data of the reflectance spectrum of each water surface sampling point are obtained;
确定类别数目C;Determine the number of categories C;
至少两次执行模糊分类过程,各次模糊分类过程基于的距离测度不同;所述模糊分类过程依据各个水面采样点的反射率光谱的两个主成分数据将所述若干水面采样点分为C类,得到C个初始质心集;The fuzzy classification process is performed at least twice, and the distance measures based on each fuzzy classification process are different; the fuzzy classification process divides the several water surface sampling points into C types according to the two principal component data of the reflectance spectrum of each water surface sampling point , get C initial centroid sets;
将各次执行模糊分类过程得到的,与第i个类别对应的初始质心集求交集,得到与第i个类别对应的质心集;其中,i=1,2,……,C。Compute the intersection of the initial centroid sets corresponding to the i-th category obtained by each execution of the fuzzy classification process, and obtain the centroid set corresponding to the i-th category; where, i=1, 2, . . . , C.
上述方法,优选的,所述确定类别数目C包括:In the above method, preferably, the determination of the number of categories C includes:
依据各个水面采样点的反射率光谱的两个主成分数据计算不同分类数目下,所述若干水面采样点的BIC指数;According to the two principal component data of the reflectance spectrum of each water surface sampling point, under different classification numbers, the BIC index of the described several water surface sampling points;
将BIC指数最小时所对应的分类数目确定为类别数目C。The number of categories corresponding to the minimum BIC index is determined as the number of categories C.
上述方法,优选的,所述确定类别数目C包括:In the above method, preferably, the determination of the number of categories C includes:
依据各个水面采样点的反射率光谱的两个主成分数据计算不同分类数据下,所述若干水面采样点的Dunn’c指数;According to two principal component data calculations of the reflectance spectrum of each water surface sampling point under different classification data, the Dunn'c index of described several water surface sampling points;
将Dunn’c指数最大时所对应的分类数目确定为类别数目C。The number of categories corresponding to the maximum Dunn'c index is determined as the number of categories C.
上述方法,优选的,所述水体样本数据还包括:所述若干个水面采样点的水体环境变量,所述水体环境变量包括:水质参数和水体不同组分的固有光学量;在得到与第i个类别对应的质心集之后,所述方法还包括:In the above method, preferably, the water body sample data also includes: the water body environmental variables of the several water surface sampling points, and the water body environmental variables include: water quality parameters and inherent optical quantities of different components of the water body; After the centroid sets corresponding to categories, the method also includes:
显示与第i个类别对应的质心集内各个样本的水体环境变量参数;Display the water body environmental variable parameters of each sample in the centroid set corresponding to the i-th category;
当接收到用户触发的剔除指令时,依据所述剔除指令中携带的样本数据的识别标识确定目标样本数据,删除所述目标样本数据,并重新执行所述至少两次执行模糊分类过程的步骤。When receiving the elimination instruction triggered by the user, determine the target sample data according to the identification of the sample data carried in the elimination instruction, delete the target sample data, and re-execute the step of performing the fuzzy classification process at least twice.
上述方法,优选的,所述水体样本数据的采样区域包括如下区域中的至少一个:内陆水域和近岸水域。In the above method, preferably, the sampling area of the water body sample data includes at least one of the following areas: inland waters and nearshore waters.
一种获取代表典型水体类别的质心集的装置,包括:A means of obtaining a set of centroids representing a typical class of water bodies, comprising:
样本获取模块,用于获取水体样本数据,所述水体样本数据包括若干个水面采样点的遥感反射率光谱数据;The sample acquisition module is used to acquire water body sample data, and the water body sample data includes remote sensing reflectance spectral data of several water surface sampling points;
降维模块,用于通过主成分分析变换对各个水面采样点的若干个波段的反射率光谱数据进行降维,得到各个水面采样点的反射率光谱的两个主成分数据;A dimensionality reduction module is used to reduce the dimensionality of the reflectance spectrum data of several bands of each water surface sampling point through principal component analysis transformation, so as to obtain two principal component data of the reflectance spectrum of each water surface sampling point;
确定模块,用于确定类别数目C;A determination module is used to determine the number of categories C;
分类模块,用于至少两次执行模糊分类过程,各次模糊分类过程基于的距离测度不同;所述模糊分类过程依据各个水面采样点的反射率光谱的两个主成分数据将所获取的若干水面采样点分为C类,得到C个初始质心集;The classification module is used to perform the fuzzy classification process at least twice, and the distance measures based on each fuzzy classification process are different; the fuzzy classification process divides the obtained several water surface data according to the two principal component data of the reflectance spectrum of each water surface sampling point The sampling points are divided into C categories, and C initial centroid sets are obtained;
质心集获取模块,用于将各次执行模糊分类过程得到的,与第i个类别对应的初始质心集求交集,得到与第i个类别对应的质心集;其中,i=1,2,……,C。The centroid set acquisition module is used to intersect the initial centroid set corresponding to the i-th category obtained by each execution of the fuzzy classification process, and obtain the centroid set corresponding to the i-th category; wherein, i=1, 2, ... ..., C.
上述装置,优选的,所述确定模块包括:In the above device, preferably, the determination module includes:
第一计算单元,用于依据各个水面采样点的反射率光谱的两个主成分数据计算不同分类数目下,所述若干水面采样点的BIC指数;The first calculation unit is used to calculate the BIC index of the several water surface sampling points under different classification numbers according to the two principal component data of the reflectance spectrum of each water surface sampling point;
第一确定单元,用于将BIC指数最小时所对应的分类数目确定为类别数目C。The first determining unit is configured to determine the category number corresponding to the minimum BIC index as the category number C.
上述装置,优选的,所述确定模块包括:In the above device, preferably, the determination module includes:
第二计算单元,用于依据各个水面采样点的反射率光谱的两个主成分数据计算不同分类数据下,所述若干水面采样点的Dunn’c指数;The second calculation unit is used to calculate the Dunn'c index of the several water surface sampling points under different classification data according to the two principal component data of the reflectance spectrum of each water surface sampling point;
第二确定单元,用于将Dunn’c指数最大时所对应的分类数目确定为类别数目C。The second determination unit is used to determine the number of categories corresponding to the maximum Dunn'c index as the number of categories C.
上述装置,优选的,所述水体样本数据还包括:所述若干个水面采样点的水体环境变量,所述水体环境变量包括:水质参数和水体不同组分的固有光学量;所述装置还包括:In the above-mentioned device, preferably, the water body sample data also includes: the water body environmental variables of the several water surface sampling points, and the water body environmental variables include: water quality parameters and inherent optical quantities of different components of the water body; the device also includes :
显示模块,用于显示与第i个类别对应的质心集内各个样本的水体环境变量参数;The display module is used to display the water body environment variable parameters of each sample in the centroid set corresponding to the i-th category;
删除模块,用于当接收到用户触发的剔除指令时,依据所述剔除指令中携带的样本数据的识别标识确定目标样本数据,删除所述目标样本数据,并生成触发指令以指示所述分类模块重新执行所述至少两次执行模糊分类过程的步骤。A deletion module, configured to determine the target sample data according to the identification of the sample data carried in the removal instruction when receiving a removal instruction triggered by the user, delete the target sample data, and generate a trigger instruction to instruct the classification module The step of performing the fuzzy classification process at least twice is re-performed.
上述装置,优选的,所述样本获取模块具体用于,获取水体样本数据,所述水体样本数据包括若干个水面采样点的遥感反射率光谱数据;所述水体样本数据的采样区域包括如下区域中的至少一个:内陆水域和近岸水域。The above device, preferably, the sample acquisition module is specifically used to acquire water body sample data, the water body sample data includes remote sensing reflectance spectral data of several water surface sampling points; the sampling area of the water body sample data includes the following areas At least one of: inland waters and near-shore waters.
通过以上方案可知,本申请提供的一种获取代表典型水体类别的质心集的方法及装置,获取水体样本数据,对各个水面采样点的反射率光谱数据进行降维得到每个采样点的反射率光谱的两个主成分数据,确定类别数目,至少两次依据各个水面采样点的反射率光谱的两个主成分数据对所述若干水面采样点进行分类;其中,各次进行模糊分类所基于的距离测度不同;将各次执行模糊分类过程得到的,与第i个类别对应的初始质心集求交集,得到与第i个类别对应的质心集,其中,i=1,2,……,C。在提高与第i个类别对应的质心集的稳定性的同时,使得质心集更具有代表性。It can be seen from the above scheme that the present application provides a method and device for obtaining a centroid set representing a typical water body category, obtains water body sample data, and performs dimensionality reduction on the reflectance spectral data of each water surface sampling point to obtain the reflectance of each sampling point The two principal component data of the spectrum determine the number of categories, and at least twice classify the several water surface sampling points according to the two principal component data of the reflectance spectrum of each water surface sampling point; wherein, each fuzzy classification is based on The distance measure is different; the initial centroid set corresponding to the i-th category obtained by each fuzzy classification process is obtained, and the centroid set corresponding to the i-th category is obtained, where i=1, 2, ..., C . While improving the stability of the centroid set corresponding to the i-th category, it makes the centroid set more representative.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的获取代表典型水体类别的质心集的方法;Fig. 1 is the method for obtaining the centroid set representing a typical water body category provided by the embodiment of the present application;
图2为本申请实施例提供的确定类别数目C的一种实现流程图;Fig. 2 is a kind of implementation flowchart of determining the category number C provided by the embodiment of the present application;
图3为本申请实施例提供的确定类别数目C的另一种实现流程图;FIG. 3 is another implementation flowchart for determining the number of categories C provided by the embodiment of the present application;
图4为本申请实施例提供的获取代表典型水体类别的质心集的方法的另一种实现流程图;Fig. 4 is another implementation flowchart of the method for obtaining a centroid set representing a typical water body category provided by the embodiment of the present application;
图5为本申请实施例提供的获取代表典型水体类别的质心集的装置的一种结构示意图;FIG. 5 is a schematic structural diagram of a device for obtaining a centroid set representing a typical water body category provided by an embodiment of the present application;
图6为本申请实施例提供的确定模块的一种结构示意图;FIG. 6 is a schematic structural diagram of a determination module provided in an embodiment of the present application;
图7为本申请实施例提供的确定模块的另一种结构示意图;FIG. 7 is another schematic structural diagram of the determination module provided by the embodiment of the present application;
图8为本申请实施例提供的获取代表典型水体类别的质心集的装置的另一种结构示意图。FIG. 8 is another schematic structural diagram of a device for obtaining a centroid set representing a typical water body category provided by an embodiment of the present application.
说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的部分,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示的以外的顺序实施。The terms "first", "second", "third", "fourth", etc., if any, in the description and claims and the above drawings are used to distinguish similar parts and not necessarily to describe specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated herein.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
本申请实施例提供的获取代表典型水体类别的质心集的方法的一种实现流程图,可以包括:An implementation flowchart of a method for obtaining a centroid set representing a typical water body category provided in the embodiment of the present application may include:
步骤S11:获取水体样本数据,所述水体样本数据包括若干个水面采样点的遥感反射率光谱数据;Step S11: Acquiring water body sample data, the water body sample data includes remote sensing reflectance spectral data of several water surface sampling points;
所述若干水面采样点的遥感反射率光谱数据可以是通过实际测量得到的各个采样点的遥感反射率光谱。光学复杂水体的遥感反射率光谱是采用“水面以上测量法”测量得到的,该测量方法是目前测量水体的遥感反射率光谱的通用方法,该方法可以去除天空光对水体的遥感反射率光谱的影响。并且每个站点(即每个采样点)都可以获取一条遥感反射率光谱。遥感反射率光谱是指光的反射率随波长变化的情况,例如,遥感反射率光谱可以是以1nm(即1纳米)为间隔,至少包括从350nm至900nm的551个波段范围内的遥感反射率光谱,反射率数值为介于0到1之间的实数。本发明实施例中,不同采样点的光谱范围和光谱分辨率(即波段间隔)相同。The remote sensing reflectance spectrum data of the several water surface sampling points may be the remote sensing reflectance spectrum of each sampling point obtained through actual measurement. The remote sensing reflectance spectrum of optically complex water bodies is measured by the "above water surface measurement method". This measurement method is currently a general method for measuring the remote sensing reflectance spectrum of water bodies. Influence. And each site (that is, each sampling point) can obtain a remote sensing reflectance spectrum. The remote sensing reflectance spectrum refers to the situation that the reflectance of light changes with the wavelength. For example, the remote sensing reflectance spectrum can be 1nm (ie 1 nanometer) as an interval, including at least 551 remote sensing reflectances from 350nm to 900nm. Spectral, reflectance values are real numbers between 0 and 1. In the embodiment of the present invention, the spectral range and spectral resolution (that is, the interval between bands) of different sampling points are the same.
所述若干水面采样点的遥感反射率光谱数据也可以是遥感器获取的覆盖光学复杂水体的图像。通常,M行、N列、L波段的遥感图像,共M*N=Num个像元,每个像元对应的L个波段的数值可以看作是该像元对应的水面点的反射率光谱。The remote sensing reflectance spectral data of the several water surface sampling points may also be images obtained by remote sensors covering optically complex water bodies. Usually, a remote sensing image with M rows, N columns, and L bands has a total of M*N=Num pixels, and the value of the L bands corresponding to each pixel can be regarded as the reflectance spectrum of the water surface point corresponding to the pixel .
步骤S12:通过主成分分析变换(即PCA变换)对各个水面采样点的若干个波段的反射率光谱数据进行降维,得到各个水面采样点的反射率光谱的两个主成分数据;Step S12: performing dimensionality reduction on the reflectance spectrum data of several bands of each water surface sampling point through principal component analysis transformation (i.e. PCA transformation), to obtain two principal component data of the reflectance spectrum of each water surface sampling point;
本发明实施例中,通过PCA变换,将每个采样点的L个波段的反射率光谱数据用两个主成分来表示。也就是说,本发明实施例中,在进行分类前,先对各个采样点的反射率光谱数据进行降维,降低了数据量,减少了计算量。In the embodiment of the present invention, through PCA transformation, the reflectance spectral data of L bands at each sampling point are represented by two principal components. That is to say, in the embodiment of the present invention, before classification, the reflectance spectrum data of each sampling point is firstly reduced in dimension, which reduces the amount of data and the amount of calculation.
步骤S13:确定类别数目C;Step S13: Determine the number of categories C;
本发明实施例中,可以采用模糊聚类算法进行模糊分类。其中,模糊聚类是非监督分类的一种,而进行非监督分类需要预先输入类别的数目。只有在类别数目已知的情况下,才能够对样本进行非监督分类。In the embodiment of the present invention, a fuzzy clustering algorithm may be used for fuzzy classification. Among them, fuzzy clustering is a kind of unsupervised classification, and the number of categories needs to be input in advance for unsupervised classification. Unsupervised classification of samples is only possible if the number of classes is known.
其中,类别数目C可以由研究人员根据经验确定。Among them, the number of categories C can be determined empirically by researchers.
步骤S14:至少两次执行模糊分类过程,各次模糊分类过程基于的距离测度不同;所述模糊分类过程依据各个水面采样点的反射率光谱的两个主成分数据将所述若干水面采样点分为C类,得到C个初始质心集;Step S14: Execute the fuzzy classification process at least twice, and the distance measures based on each fuzzy classification process are different; the fuzzy classification process divides the several water surface sampling points according to the two principal component data of the reflectance spectrum of each water surface sampling point. For class C, get C initial centroid sets;
每执行一次模糊分类过程,得到C个初始质心集。每一个初始质心集代表一个典型水体的类别,每一个初始质心集中都有若干个与该典型水体类别相似或相同的水体样本数据。Each time the fuzzy classification process is performed, C initial centroid sets are obtained. Each initial centroid set represents a typical water body category, and each initial centroid set has several water body sample data similar or identical to the typical water body category.
可选的,可以采用模糊c-均值聚类算法(fuzzy c-means algorithm,FCM)对水体样本数据进行模糊分类。当然,本发明实施例中,并不限于使用FCM聚类算法进行模糊分类,也可以采用其它的模糊聚类算法,如改进的模糊聚类算法EFC-MD(Evolutionary fuzzy clustering with minkowski distances),只要是基于距离测度的模糊聚类算法均适用于本发明实施例。Optionally, a fuzzy c-means clustering algorithm (fuzzy c-means algorithm, FCM) can be used to perform fuzzy classification on the water body sample data. Of course, in the embodiment of the present invention, it is not limited to using the FCM clustering algorithm for fuzzy classification, and other fuzzy clustering algorithms can also be used, such as the improved fuzzy clustering algorithm EFC-MD (Evolutionary fuzzy clustering with minkowski distances), as long as All fuzzy clustering algorithms based on distance measures are applicable to the embodiments of the present invention.
距离是度量数据相似性简单而有效的指标。模糊分类过程中,需要依据像元与各类别的质心像元之间的距离确定该像元属于各类别的权重系数。Distance is a simple and effective measure of data similarity. In the process of fuzzy classification, it is necessary to determine the weight coefficient of the pixel belonging to each category according to the distance between the pixel and the centroid pixel of each category.
本发明实施例中,第j次模糊分类过程所使用的距离测度与前j-1次模糊分类过程所使用的距离测度均不同,其中,j=1,2,3,……J,J为执行模糊分类过程的总的次数,J≥2。即任意两次模糊分类过程所使用的距离测度均不同。In the embodiment of the present invention, the distance measure used in the j-th fuzzy classification process is different from the distance measure used in the previous j-1 fuzzy classification process, where j=1, 2, 3, ... J, J is The total number of times the fuzzy classification process is performed, J≥2. That is, the distance measures used in any two fuzzy classification processes are different.
可选的,本发明实施例中,所述的距离测度可以使用但不限于以下几种:欧氏距离(Euclidean distance,Euc),余弦距离(SAD),OPD散度(orthogonalprojection divergence),TD散度(transformed divergence),马氏距离等。Optionally, in the embodiment of the present invention, the distance measure can be used but not limited to the following: Euclidean distance (Euclidean distance, Euc), cosine distance (SAD), OPD divergence (orthogonalprojection divergence), TD divergence Degree (transformed divergence), Mahalanobis distance, etc.
优选的,可以执行四次模糊分类过程。具体的,四次执行模糊分类过程时,每次执行模糊分类过程,使用一种距离测度,四次执行模糊分类过程,共使用四种距离测度。所选用的距离测度可以分别为:欧式距离、余弦距离、OPD散度和TD散度。当然,本发明实施例中,并不限于这四种,也可以是上述五种距离测度中的任意四种。Preferably, four fuzzy classification processes can be performed. Specifically, when the fuzzy classification process is performed four times, one distance measure is used for each time the fuzzy classification process is performed, and four distance measures are used for four times of fuzzy classification process execution. The selected distance measures can be respectively: Euclidean distance, cosine distance, OPD divergence and TD divergence. Certainly, in the embodiment of the present invention, it is not limited to these four types, and may be any four of the above five distance measures.
步骤S15:将各次执行模糊分类过程得到的,与第i个类别对应的初始质心集求交集,得到与第i个类别对应的质心集;其中,i=1,2,……,C。Step S15: Compute the intersection of the initial centroid sets corresponding to the i-th category obtained by each execution of the fuzzy classification process, and obtain the centroid set corresponding to the i-th category; where, i=1, 2, . . . , C.
假设第j次模糊分类得到的C个质心集分别为Uj1,Uj2,……UjC,则与第i个类别对应的质心集Ui为Ui=U1i∩U2i∩……∩UJi,其中,Uji(j=1,2,3,……J,J为执行模糊分类过程的总的次数,J≥2)为进行第j次模糊分类得到与第i个类别对应的初始质心集。Assuming that the C centroid sets obtained from the j-th fuzzy classification are U j1 , U j2 , ... U jC , then the centroid set U i corresponding to the i-th category is U i = U 1i ∩U 2i ∩...∩ U Ji , where, U ji (j=1, 2, 3, ... J, J is the total number of fuzzy classification processes, J≥2) is the fuzzy classification for the jth time to obtain the i-th category corresponding Initial set of centroids.
通过分析与第i个类别对应的质心集内各个样本的环境参量就可以确定与第i个类别对应质心集代表什么典型水体。具体如何确定与第i个类别对应质心集代表什么典型水体属于本领域的公知常识,这里不再赘述。By analyzing the environmental parameters of each sample in the centroid set corresponding to the i-th category, it can be determined what typical water body the centroid set corresponding to the i-th category represents. How to determine the typical water body represented by the centroid set corresponding to the i-th category belongs to common knowledge in the field, and will not be repeated here.
将与第i个类别对应的质心集内的反射率光谱求均值,得到代表典型水体类别的质心光谱。The reflectance spectrum in the centroid set corresponding to the i-th category is averaged to obtain the centroid spectrum representing a typical water body category.
本发明实施例提供的获取代表典型水体类别的质心集的方法,获取水体样本数据,对各个水面采样点的反射率光谱数据进行降维得到每个采样点的反射率光谱的两个主成分数据,确定类别数目,至少两次依据各个水面采样点的反射率光谱的两个主成分数据对所述若干水面采样点进行模糊分类;其中,各次进行模糊分类所基于的距离测度不同;将各次执行模糊分类过程得到的,与第i个类别对应的初始质心集求交集,得到与第i个类别对应的质心集,其中,i=1,2,……,C。在提高与第i个类别对应的质心集的稳定性的同时,使得质心集更具有代表性。The method for obtaining a centroid set representing a typical water body category provided by the embodiment of the present invention is to obtain water body sample data, perform dimensionality reduction on the reflectance spectrum data of each water surface sampling point to obtain two principal component data of the reflectance spectrum of each sampling point , determine the number of categories, carry out fuzzy classification to the several water surface sampling points according to the two principal component data of the reflectance spectrum of each water surface sampling point at least twice; wherein, the distance measure based on each fuzzy classification is different; Obtained by performing the fuzzy classification process for the second time, the initial centroid set corresponding to the i-th category is intersected to obtain the centroid set corresponding to the i-th category, where i=1, 2, ..., C. While improving the stability of the centroid set corresponding to the i-th category, it makes the centroid set more representative.
优选的,由于类别数目C通过经验确定不足够客观,基于此,本发明实施例提出类别数目C可以依据样本数据本身确定。Preferably, since the determination of the number of categories C through experience is not objective enough, based on this, the embodiment of the present invention proposes that the number of categories C can be determined according to the sample data itself.
可选的,确定类别数目C的一种实现流程图如图2所示,可以包括:Optionally, an implementation flowchart for determining the number of categories C is shown in Figure 2, which may include:
步骤S21:依据各个水面采样点的反射率光谱的两个主成分数据计算不同分类数目下,所述若干水面采样点的BIC指数;Step S21: Calculate the BIC index of the several water surface sampling points under different classification numbers according to the two principal component data of the reflectance spectrum of each water surface sampling point;
BIC指数为对模糊聚类的有效性进行评价的一个指标。本发明实施例中使用各个水面采样点降维后得到的主成分数据计算BIC指数,而不是使用原始获取的水面采样点的反射率光谱数据进行计算,降低了计算BIC指数的计算量。BIC index is an index to evaluate the effectiveness of fuzzy clustering. In the embodiment of the present invention, the principal component data obtained after dimension reduction of each water surface sampling point is used to calculate the BIC index instead of the originally acquired reflectance spectrum data of the water surface sampling point, which reduces the calculation amount of the BIC index.
本发明实施例中,分类数目从2到16逐一变化,分类数目每变化一次,计算一次样本数据的BIC指数。在计算BIC指数时,使用经过PCA变换后的样本数据进行计算。BIC指数可以依据如下公式计算:In the embodiment of the present invention, the number of categories changes one by one from 2 to 16, and the BIC index of the sample data is calculated every time the number of categories changes. When calculating the BIC index, the sample data after PCA transformation is used for calculation. The BIC index can be calculated according to the following formula:
其中,ni是第i个类别内的样本数目。d是每个水面采样点的数据维度,这里等于2。m是分类数目。n是水面采样点的总的数目。∑i是第i个类别的方差的最大似然估计xj是第i个类别内的第j个样本点,Ci是第i个类别的质心。where n i is the number of samples in the i-th category. d is the data dimension of each water surface sampling point, which is equal to 2 here. m is the number of categories. n is the total number of water surface sampling points. ∑ i is the maximum likelihood estimate of the variance of the ith category x j is the j-th sample point in the i-th category, and C i is the centroid of the i-th category.
步骤S22:将BIC指数最小时所对应的分类数目确定为类别数目C。Step S22: Determine the number of categories corresponding to the minimum BIC index as the number of categories C.
本发明实施例中,基于样本本身确定最优类别数目C,进一步增强了所得到的质心集的稳定性。In the embodiment of the present invention, the optimal number of categories C is determined based on the samples themselves, which further enhances the stability of the obtained centroid set.
可选的,确定类别数目C的另一种实现流程图如图3所示,可以包括:Optionally, another implementation flowchart for determining the number of categories C is shown in Figure 3, which may include:
步骤S31:依据各个水面采样点的反射率光谱的两个主成分数据计算不同分类数据下,所述若干水面采样点的Dunn’c指数;Step S31: Calculate the Dunn'c index of the several water surface sampling points under different classification data according to the two principal component data of the reflectance spectrum of each water surface sampling point;
Dunn’c指数为另一个对模糊聚类的有效性进行评价的指标。本发明实施例中使用各个水面采样点降维后得到的主成分数据计算Dunn’c指数,而不是使用原始获取的水面采样点的反射率光谱数据进行计算,降低了计算Dunn’c指数的计算量。Dunn'c index is another index to evaluate the effectiveness of fuzzy clustering. In the embodiment of the present invention, the Dunn'c index is calculated by using the principal component data obtained after the dimension reduction of each water surface sampling point, instead of using the reflectance spectral data of the originally obtained water surface sampling point for calculation, which reduces the calculation of the Dunn'c index quantity.
本发明实施例中,分类数目从2到16逐一变化,分类数目每变化一次,计算一次样本数据的Dunn’c指数。在计算Dunn’c指数时,使用经过PCA变换后的样本数据进行计算。In the embodiment of the present invention, the number of categories changes one by one from 2 to 16, and the Dunn'c index of the sample data is calculated every time the number of categories changes. When calculating the Dunn'c index, the sample data after PCA transformation are used for calculation.
步骤S32:将Dunn’c指数最大时所对应的分类数目确定为类别数目C。Step S32: Determine the number of categories corresponding to the maximum Dunn'c index as the number of categories C.
本发明实施例中,基于样本本身确定最优类别数目C,进一步增强了所得到的质心集的稳定性。In the embodiment of the present invention, the optimal number of categories C is determined based on the samples themselves, which further enhances the stability of the obtained centroid set.
进一步的,所获取的水体样本数据还包括:所述若干个水面采样点的水体环境变量,所述水体环境变量包括:水质参数和水体不同组分的固有光学量;其中,水质参数可以包括:叶绿素a浓度、总颗粒物浓度、有机颗粒物浓度、无机颗粒物浓度、溶解有机碳浓度等。水体不同组分的固有光学量可以包括:颗粒物和CDOM的光束衰减系数、CDOM吸收系数、颗粒物吸收系数、非藻类颗粒物吸收系数、浮游植物吸收系数、颗粒物散射系数等。Further, the acquired water body sample data also includes: the water body environmental variables of the several water surface sampling points, and the water body environmental variables include: water quality parameters and inherent optical quantities of different components of the water body; wherein, the water quality parameters may include: Chlorophyll a concentration, total particulate matter concentration, organic particulate matter concentration, inorganic particulate matter concentration, dissolved organic carbon concentration, etc. The intrinsic optical quantities of different components of the water body can include: beam attenuation coefficient of particles and CDOM, CDOM absorption coefficient, particle absorption coefficient, non-algae particle absorption coefficient, phytoplankton absorption coefficient, particle scattering coefficient, etc.
如图4所示,图4为本申请提供的获取代表典型水体类别的质心集的方法的另一种实现流程图,在图1所示实施例的基础上,在得到与第i个类别对应的质心集之后,所述方法还包括:As shown in Figure 4, Figure 4 is another implementation flowchart of the method for obtaining a centroid set representing a typical water body category provided by the present application. On the basis of the embodiment shown in Figure 1, after obtaining the i-th category corresponding After the set of centroids, the method also includes:
步骤S41:显示与第i个类别对应的质心集内各个样本的水体环境变量参数;Step S41: Display the water environment variable parameters of each sample in the centroid set corresponding to the i-th category;
步骤S42:当接收到用户触发的剔除指令时,依据所述剔除指令中携带的样本数据的识别标识确定目标样本数据,删除所述目标样本数据,并重新执行所述至少两次执行模糊分类过程的步骤。Step S42: When receiving the elimination instruction triggered by the user, determine the target sample data according to the identification of the sample data carried in the elimination instruction, delete the target sample data, and re-execute the fuzzy classification process at least twice A step of.
可以由研究人员(即用户)根据经验判断第i个质心集内的采样点的水体环境变量是否有明显异常。当研究人员判断出第i个质心集内的采样点的水体环境变量有明显异常时,触发删除有明显异常的样本数据的剔除指令,该剔除指令中携带有待剔除样本数据的识别标识。It can be judged by the researchers (that is, users) based on experience whether the water environment variables of the sampling points in the i-th centroid set are obviously abnormal. When the researchers judge that the water environment variables of the sampling points in the i-th centroid set are obviously abnormal, they will trigger the deletion command to delete the sample data with obvious abnormalities, and the deletion command carries the identification mark of the sample data to be deleted.
为了避免异物同谱的情况发生,本发明实施例中,由研究人员对同一质心集内的样本对应的实测环境参量进行分析判断,如果环境参量有异常,则剔除有明显异常的采样点的样本数据,重新执行步骤S14和步骤S15以得到新的质心集。直至各个质心集内的采样点的水体环境变量无明显异常。避免了异物同谱情况发生,使得分类后的水体所代表的类型具有物理意义。In order to avoid the occurrence of foreign matter with the same spectrum, in the embodiment of the present invention, the researchers analyze and judge the measured environmental parameters corresponding to the samples in the same centroid set, and if the environmental parameters are abnormal, the samples of the sampling points with obvious abnormalities are eliminated data, re-execute step S14 and step S15 to obtain a new centroid set. There is no obvious abnormality in the water environmental variables up to the sampling points in each centroid set. The occurrence of different objects with the same spectrum is avoided, so that the type represented by the classified water body has physical meaning.
当接收到用户触发的确定指令时,可以输出第i个类别的质心集的质心光谱。也可以根据用户需求输出其它运行结果,如第i个质心集等。When a determination instruction triggered by the user is received, the centroid spectrum of the centroid set of the i-th category may be output. Other running results can also be output according to user requirements, such as the i-th centroid set, etc.
上述实施例中,可选的,为了使得所确定的质心集具有更广的应用范围,本发明实施例中,收集多个水域的水体样本数据。具体的,本发明实施例中,水体样本数据的采样区域可以包括但不限于以下几种中的至少一种:内陆水域和近岸水域等。其中,内陆水域可以包括:河流、湖泊、池塘、塘堰、水库等水域。本发明实施例中,可以只在一种内陆水域进行采样,也可以在两种以上的内陆水域进行采样。In the above embodiment, optionally, in order to make the determined centroid set have a wider application range, in the embodiment of the present invention, water body sample data of multiple water areas are collected. Specifically, in the embodiment of the present invention, the sampling area of the water body sample data may include but not limited to at least one of the following: inland waters, nearshore waters, and the like. Among them, inland waters may include: rivers, lakes, ponds, ponds, reservoirs and other waters. In the embodiment of the present invention, sampling may be performed only in one type of inland water area, or may be performed in more than two types of inland water areas.
例如,可以在太湖、三峡水库、滇池、巢湖和黄河口采集水体样本数据。For example, water body sample data can be collected in Taihu Lake, Three Gorges Reservoir, Dianchi Lake, Chaohu Lake and the mouth of the Yellow River.
与方法实施例相对应,本申请还提供一种获取代表典型水体类别的质心集的装置,本申请提供的获取代表典型水体类别的质心集的装置的一种结构示意图如图5所示,可以包括:Corresponding to the method embodiment, the present application also provides a device for obtaining a centroid set representing a typical water body category. A schematic structural diagram of the device for obtaining a centroid set representing a typical water body category provided by the present application is shown in FIG. 5 , which can be include:
样本获取模块51,降维模块52,确定模块53,分类模块54和质心集获取模块55;其中,Sample acquisition module 51, dimensionality reduction module 52, determination module 53, classification module 54 and centroid set acquisition module 55; wherein,
样本获取模块51用于获取水体样本数据,所述水体样本数据包括若干个水面采样点的遥感反射率光谱数据;Sample acquiring module 51 is used for acquiring water body sample data, and described water body sample data comprises the remote sensing albedo spectrum data of several water surface sampling points;
所述若干水面采样点的遥感反射率光谱数据可以是通过实际测量得到的各个采样点的遥感反射率光谱。光学复杂水体的遥感反射率光谱是采用“水面以上测量法”测量得到的,该测量方法是目前测量水体的遥感反射率光谱的通用方法,该方法可以去除天空光对水体的遥感反射率光谱的影响。并且每个站点(即每个采样点)都可以获取一条遥感反射率光谱。遥感反射率光谱是指光的反射率随波长变化的情况,例如,遥感反射率光谱可以是以1nm(即1纳米)为间隔,至少包括从350nm至900nm的551个波段范围内的遥感反射率光谱,反射率数值为介于0到1之间的实数。本发明实施例中,不同采样点的光谱范围和光谱分辨率(即波段间隔)相同。The remote sensing reflectance spectrum data of the several water surface sampling points may be the remote sensing reflectance spectrum of each sampling point obtained through actual measurement. The remote sensing reflectance spectrum of optically complex water bodies is measured by the "above water surface measurement method". This measurement method is currently a general method for measuring the remote sensing reflectance spectrum of water bodies. Influence. And each site (that is, each sampling point) can obtain a remote sensing reflectance spectrum. The remote sensing reflectance spectrum refers to the situation that the reflectance of light changes with the wavelength. For example, the remote sensing reflectance spectrum can be 1nm (ie 1 nanometer) as an interval, including at least 551 remote sensing reflectances from 350nm to 900nm. Spectral, reflectance values are real numbers between 0 and 1. In the embodiment of the present invention, the spectral range and spectral resolution (that is, the interval between bands) of different sampling points are the same.
所述若干水面采样点的遥感反射率光谱数据也可以是遥感器获取的覆盖光学复杂水体的图像。通常,M行、N列、L波段的遥感图像,共M*N=Num个像元,每个像元对应的L个波段的数值可以看作是该像元对应的水面点的反射率光谱。The remote sensing reflectance spectral data of the several water surface sampling points may also be images obtained by remote sensors covering optically complex water bodies. Usually, a remote sensing image with M rows, N columns, and L bands has a total of M*N=Num pixels, and the value of the L bands corresponding to each pixel can be regarded as the reflectance spectrum of the water surface point corresponding to the pixel .
降维模块52用于通过主成分分析变换对各个水面采样点的若干个波段的反射率光谱数据进行降维,得到各个水面采样点的反射率光谱的两个主成分数据;Dimensionality reduction module 52 is used for carrying out dimensionality reduction to the reflectance spectrum data of several bands of each water surface sampling point through principal component analysis transformation, obtains two principal component data of the reflectance spectrum of each water surface sampling point;
本发明实施例中,通过PCA变换,将每个采样点的L个波段的反射率光谱数据用两个主成分来表示。也就是说,本发明实施例中,在进行分类前,先对各个采样点的反射率光谱数据进行降维,降低了数据量,减少了计算量。In the embodiment of the present invention, through PCA transformation, the reflectance spectral data of L bands at each sampling point are represented by two principal components. That is to say, in the embodiment of the present invention, before classification, the reflectance spectrum data of each sampling point is firstly reduced in dimension, which reduces the amount of data and the amount of calculation.
确定模块53用于确定类别数目C;Determining module 53 is used for determining category number C;
本发明实施例中,可以采用模糊聚类算法进行模糊分类。其中,模糊分聚类是非监督分类的一种,而进行非监督分类需要预先输入类别的数目。只有在类别数目已知的情况下,才能够对样本进行非监督分类。In the embodiment of the present invention, a fuzzy clustering algorithm may be used for fuzzy classification. Among them, fuzzy clustering is a kind of unsupervised classification, and the number of categories needs to be input in advance for unsupervised classification. Unsupervised classification of samples is only possible if the number of classes is known.
其中,类别数目C可以由研究人员根据经验确定。Among them, the number of categories C can be determined empirically by researchers.
分类模块54用于至少两次执行模糊分类过程,各次模糊分类过程基于的距离测度不同;所述模糊分类过程依据各个水面采样点的反射率光谱的两个主成分数据将所述若干水面采样点分为C类,得到C个初始质心集;The classification module 54 is used to perform the fuzzy classification process at least twice, and the distance measures based on each fuzzy classification process are different; the fuzzy classification process samples the several water surfaces according to the two principal component data of the reflectance spectrum of each water surface sampling point. The points are divided into C categories, and C initial centroid sets are obtained;
每执行一次模糊分类过程,得到C个初始质心集。Each time the fuzzy classification process is performed, C initial centroid sets are obtained.
可选的,可以采用模糊c-均值聚类算法(fuzzy c-means algorithm,FCM)对水体样本数据进行模糊分类。当然,本发明实施例中,并不限于使用FCM聚类算法进行模糊分类,也可以采用其它的模糊聚类算法,如改进的模糊聚类算法EFC-MD(Evolutionary fuzzy clustering with minkowski distances),只要是基于距离测度的模糊聚类算法均适用于本发明实施例。Optionally, a fuzzy c-means clustering algorithm (fuzzy c-means algorithm, FCM) can be used to perform fuzzy classification on the water body sample data. Of course, in the embodiment of the present invention, it is not limited to using the FCM clustering algorithm for fuzzy classification, and other fuzzy clustering algorithms can also be used, such as the improved fuzzy clustering algorithm EFC-MD (Evolutionary fuzzy clustering with minkowski distances), as long as All fuzzy clustering algorithms based on distance measures are applicable to the embodiments of the present invention.
距离是度量数据相似性简单而有效的指标。模糊分类过程中,需要依据像元与各类别的质心像元之间的距离确定该像元属于各类别的权重系数。Distance is a simple and effective measure of data similarity. In the process of fuzzy classification, it is necessary to determine the weight coefficient of the pixel belonging to each category according to the distance between the pixel and the centroid pixel of each category.
本发明实施例中,第j次模糊分类过程所使用的距离测度与前j-1次模糊分类过程所使用的距离测度均不同,其中,j=1,2,3,……J,J为执行模糊分类过程的总的次数,J≥2。即任意两次模糊分类过程所使用的距离测度均不同。In the embodiment of the present invention, the distance measure used in the j-th fuzzy classification process is different from the distance measure used in the previous j-1 fuzzy classification process, where j=1, 2, 3, ... J, J is The total number of times the fuzzy classification process is performed, J≥2. That is, the distance measures used in any two fuzzy classification processes are different.
可选的,本发明实施例中,所述的距离测度可以使用但不限于以下几种:欧氏距离(Euclidean distance,Euc),余弦距离(SAD),OPD散度(orthogonalprojection divergence),TD散度(transformed divergence),马氏距离等。Optionally, in the embodiment of the present invention, the distance measure can be used but not limited to the following: Euclidean distance (Euclidean distance, Euc), cosine distance (SAD), OPD divergence (orthogonalprojection divergence), TD divergence Degree (transformed divergence), Mahalanobis distance, etc.
优选的,可以执行四次模糊分类过程。具体的,四次执行模糊分类过程时,每次执行模糊分类过程,使用一种距离测度,四次执行模糊分类过程,共使用四种距离测度。所选用的距离测度可以分别为:欧式距离、余弦距离、OPD散度和TD散度。当然,本发明实施例中,并不限于这四种,也可以是上述五种距离测度中的任意四种。Preferably, four fuzzy classification processes can be performed. Specifically, when the fuzzy classification process is performed four times, one distance measure is used for each time the fuzzy classification process is performed, and four distance measures are used for four times of fuzzy classification process execution. The selected distance measures can be respectively: Euclidean distance, cosine distance, OPD divergence and TD divergence. Certainly, in the embodiment of the present invention, it is not limited to these four types, and may be any four of the above five distance measures.
质心集获取模块55用于将各次执行模糊分类过程得到的,与第i个类别对应的初始质心集求交集,得到与第i个类别对应的质心集;其中,i=1,2,……,C。The centroid set acquisition module 55 is used to obtain the intersection of the initial centroid sets corresponding to the i-th category obtained by each execution of the fuzzy classification process, and obtain the centroid set corresponding to the i-th category; wherein, i=1, 2, ... ..., C.
假设第j次模糊分类得到的C个质心集分别为Uj1,Uj2,……UjC,则与第i个类别对应的质心集Ui为Ui=U1i∩U2i∩……∩UJi,其中,Uji(j=1,2,3,……J,J为执行模糊分类过程的总的次数,J≥2)为进行第j次模糊分类得到与第i个类别对应的初始质心集。Assuming that the C centroid sets obtained from the j-th fuzzy classification are U j1 , U j2 , ... U jC , then the centroid set U i corresponding to the i-th category is U i = U 1i ∩U 2i ∩...∩ U Ji , where, U ji (j=1, 2, 3, ... J, J is the total number of fuzzy classification processes, J≥2) is the fuzzy classification for the jth time to obtain the i-th category corresponding Initial set of centroids.
通过分析与第i个类别对应的质心集内各个样本的环境参量就可以确定与第i个类别对应质心集代表什么典型水体。具体如何确定与第i个类别对应质心集代表什么典型水体属于本领域的公知常识,这里不再赘述。By analyzing the environmental parameters of each sample in the centroid set corresponding to the i-th category, it can be determined what typical water body the centroid set corresponding to the i-th category represents. How to determine the typical water body represented by the centroid set corresponding to the i-th category belongs to common knowledge in the field, and will not be repeated here.
将与第i个类别对应的质心集内的反射率光谱求均值,得到代表典型水体类别的质心光谱。The reflectance spectrum in the centroid set corresponding to the i-th category is averaged to obtain the centroid spectrum representing a typical water body category.
本发明实施例提供的获取代表典型水体类别的质心集的装置,获取水体样本数据,对各个水面采样点的反射率光谱数据进行降维得到每个采样点的反射率光谱的两个主成分数据,确定类别数目,至少两次依据各个水面采样点的反射率光谱的两个主成分数据对所述若干水面采样点进行模糊分类;其中,各次进行模糊分类所基于的距离测度不同;将各次执行模糊分类过程得到的,与第i个类别对应的初始质心集求交集,得到与第i个类别对应的质心集,其中,i=1,2,……,C。在提高与第i个类别对应的质心集的稳定性的同时,使得质心集更具有代表性。The device for obtaining a centroid set representing a typical water body type provided by an embodiment of the present invention obtains water body sample data, and performs dimensionality reduction on the reflectance spectrum data of each water surface sampling point to obtain two principal component data of the reflectance spectrum of each sampling point , determine the number of categories, carry out fuzzy classification to the several water surface sampling points according to the two principal component data of the reflectance spectrum of each water surface sampling point at least twice; wherein, the distance measure based on each fuzzy classification is different; Obtained by performing the fuzzy classification process for the second time, the initial centroid set corresponding to the i-th category is intersected to obtain the centroid set corresponding to the i-th category, where i=1, 2, ..., C. While improving the stability of the centroid set corresponding to the i-th category, it makes the centroid set more representative.
可选的,确定模块53的一种结构示意图如图6所示,可以包括:Optionally, a schematic structural diagram of the determining module 53 is shown in FIG. 6, which may include:
第一计算单元61和第一确定单元62;其中,The first calculating unit 61 and the first determining unit 62; wherein,
第一计算单元61用于依据各个水面采样点的反射率光谱的两个主成分数据计算不同分类数目下,所述若干水面采样点的BIC指数;The first calculation unit 61 is used to calculate the BIC index of the several water surface sampling points under different classification numbers according to the two principal component data of the reflectance spectrum of each water surface sampling point;
BIC指数为对模糊聚类的有效性进行评价的一个指标。本发明实施例中使用各个水面采样点降维后得到的主成分数据计算BIC指数,而不是使用原始获取的水面采样点的反射率光谱数据进行计算,降低了计算BIC指数的计算量。BIC index is an index to evaluate the effectiveness of fuzzy clustering. In the embodiment of the present invention, the principal component data obtained after dimension reduction of each water surface sampling point is used to calculate the BIC index instead of the originally acquired reflectance spectrum data of the water surface sampling point, which reduces the calculation amount of the BIC index.
本发明实施例中,分类数目从2到16逐一变化,分类数目每变化一次,计算一次样本数据的BIC指数。在计算BIC指数时,使用经过PCA变换后的样本数据进行计算。In the embodiment of the present invention, the number of categories changes one by one from 2 to 16, and the BIC index of the sample data is calculated every time the number of categories changes. When calculating the BIC index, the sample data after PCA transformation is used for calculation.
第一确定单元62用于将BIC指数最小时所对应的分类数目确定为类别数目C。The first determining unit 62 is configured to determine the category number C corresponding to the minimum BIC index as the category number C.
本发明实施例中,基于样本本身确定最优类别数目C,进一步增强了所得到的质心集的稳定性。In the embodiment of the present invention, the optimal number of categories C is determined based on the samples themselves, which further enhances the stability of the obtained centroid set.
可选的,确定模块53的另一种结构示意图如图7所示,可以包括:Optionally, another structural schematic diagram of the determination module 53 is shown in FIG. 7, which may include:
第二计算单元71和第二确定单元72;其中,The second calculating unit 71 and the second determining unit 72; wherein,
第二计算单元71用于依据各个水面采样点的反射率光谱的两个主成分数据计算不同分类数据下,所述若干水面采样点的Dunn’c指数;The second calculation unit 71 is used to calculate the Dunn'c index of the several water surface sampling points under different classification data according to the two principal component data of the reflectance spectrum of each water surface sampling point;
Dunn’c指数为另一个对模糊聚类的有效性进行评价的指标。本发明实施例中使用各个水面采样点降维后得到的主成分数据计算Dunn’c指数,而不是使用原始获取的水面采样点的反射率光谱数据进行计算,降低了计算Dunn’c指数的计算量。Dunn'c index is another index to evaluate the effectiveness of fuzzy clustering. In the embodiment of the present invention, the Dunn'c index is calculated by using the principal component data obtained after the dimension reduction of each water surface sampling point, instead of using the reflectance spectral data of the originally obtained water surface sampling point for calculation, which reduces the calculation of the Dunn'c index quantity.
本发明实施例中,分类数目从2到16逐一变化,分类数目每变化一次,计算一次样本数据的Dunn’c指数。在计算Dunn’c指数时,使用经过PCA变换后的样本数据进行计算。In the embodiment of the present invention, the number of categories changes one by one from 2 to 16, and the Dunn'c index of the sample data is calculated every time the number of categories changes. When calculating the Dunn'c index, the sample data after PCA transformation are used for calculation.
第二确定单元72用于将Dunn’c指数最大时所对应的分类数目确定为类别数目C。The second determination unit 72 is used to determine the number of categories corresponding to the maximum Dunn'c index as the number of categories C.
本发明实施例中,基于样本本身确定最优类别数目C,进一步增强了所得到的质心集的稳定性。In the embodiment of the present invention, the optimal number of categories C is determined based on the samples themselves, which further enhances the stability of the obtained centroid set.
进一步的,本发明实施例中,水体样本数据还可以包括:所述若干个水面采样点的水体环境变量,所述水体环境变量包括:水质参数和水体不同组分的固有光学量;其中,水质参数可以包括:叶绿素a浓度、总颗粒物浓度、有机颗粒物浓度、无机颗粒物浓度、溶解有机碳浓度等。水体不同组分的固有光学量可以包括:颗粒物和CDOM的光束衰减系数、CDOM吸收系数、颗粒物吸收系数、非藻类颗粒物吸收系数、浮游植物吸收系数、颗粒物散射系数等。Further, in the embodiment of the present invention, the water body sample data may also include: the water body environmental variables of the several water surface sampling points, and the water body environmental variables include: water quality parameters and inherent optical quantities of different components of the water body; wherein, the water quality The parameters may include: chlorophyll a concentration, total particulate matter concentration, organic particulate matter concentration, inorganic particulate matter concentration, dissolved organic carbon concentration, and the like. The intrinsic optical quantities of different components of the water body can include: beam attenuation coefficient of particles and CDOM, CDOM absorption coefficient, particle absorption coefficient, non-algae particle absorption coefficient, phytoplankton absorption coefficient, particle scattering coefficient, etc.
在图5所示实施例的基础上,本发明申请提供的获取代表典型水体类别的质心集的装置的另一种结构示意图如图8所示,还可以包括:On the basis of the embodiment shown in Figure 5, another structural schematic diagram of the device for obtaining a centroid set representing a typical water body category provided by the application of the present invention is shown in Figure 8, and may also include:
显示模块81和删除模块82;其中,Display module 81 and delete module 82; Wherein,
显示模块81用于显示与第i个类别对应的质心集内各个样本的水体环境变量参数;The display module 81 is used to display the water body environment variable parameters of each sample in the centroid set corresponding to the i category;
删除模块82用于当接收到用户触发的剔除指令时,依据所述剔除指令中携带的样本数据的识别标识确定目标样本数据,删除所述目标样本数据,并生成触发指令以指示所述分类模块重新执行所述至少两次执行模糊分类过程的步骤。The deletion module 82 is used to determine the target sample data according to the identification of the sample data carried in the removal instruction when receiving the removal instruction triggered by the user, delete the target sample data, and generate a trigger instruction to instruct the classification module The step of performing the fuzzy classification process at least twice is re-performed.
可以由研究人员(即用户)根据经验判断第i个质心集内的采样点的水体环境变量是否有明显异常。当研究人员判断出第i个质心集内的采样点的水体环境变量有明显异常时,触发删除有明显异常的样本数据的剔除指令,该剔除指令中携带有待剔除样本数据的识别标识。It can be judged by the researchers (that is, users) based on experience whether the water environment variables of the sampling points in the i-th centroid set are obviously abnormal. When the researchers judge that the water environment variables of the sampling points in the i-th centroid set are obviously abnormal, they will trigger the deletion command to delete the sample data with obvious abnormalities, and the deletion command carries the identification mark of the sample data to be deleted.
为了避免异物同谱的情况发生,本发明实施例中,由研究人员对同一质心集内的样本对应的实测环境参量进行分析判断,如果环境参量有异常,则重新触发分类模块54和质心集获取模块55工作以得到新的质心集。直至各个质心集内的采样点的水体环境变量无明显异常。避免了异物同谱情况发生,使得分类后的水体所代表的类型具有物理意义。In order to avoid the occurrence of different objects with the same spectrum, in the embodiment of the present invention, the researchers analyze and judge the measured environmental parameters corresponding to the samples in the same centroid set. If the environmental parameters are abnormal, the classification module 54 and the acquisition of the centroid set will be retriggered. Module 55 works to get a new set of centroids. There is no obvious abnormality in the water environmental variables up to the sampling points in each centroid set. The occurrence of different objects with the same spectrum is avoided, so that the type represented by the classified water body has physical meaning.
上述实施例中,可选的,所述样本获取模块具体用于,获取水体样本数据,所述水体样本数据包括若干个水面采样点的遥感反射率光谱数据;所述水体样本数据的采样区域可以包括但不限于以下几种中的至少一种:内陆水域和近岸水域等。其中,内陆水域可以包括:河流、湖泊、池塘、塘堰、水库等水域。本发明实施例中,可以只在一种内陆水域进行采样,也可以在两种以上的内陆水域进行采样。In the above embodiment, optionally, the sample acquisition module is specifically configured to acquire water body sample data, and the water body sample data includes remote sensing reflectance spectral data of several water surface sampling points; the sampling area of the water body sample data can be Including but not limited to at least one of the following: inland waters and coastal waters, etc. Among them, inland waters may include: rivers, lakes, ponds, ponds, reservoirs and other waters. In the embodiment of the present invention, sampling may be performed only in one type of inland water area, or may be performed in more than two types of inland water area.
例如,可以在太湖、三峡水库、滇池、巢湖和黄河口采集水体样本数据。For example, water body sample data can be collected in Taihu Lake, Three Gorges Reservoir, Dianchi Lake, Chaohu Lake and the mouth of the Yellow River.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410742576.5A CN104359847B (en) | 2014-12-08 | 2014-12-08 | Method and device for acquiring centroid set used for representing typical water category |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410742576.5A CN104359847B (en) | 2014-12-08 | 2014-12-08 | Method and device for acquiring centroid set used for representing typical water category |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104359847A true CN104359847A (en) | 2015-02-18 |
CN104359847B CN104359847B (en) | 2017-02-22 |
Family
ID=52527130
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410742576.5A Active CN104359847B (en) | 2014-12-08 | 2014-12-08 | Method and device for acquiring centroid set used for representing typical water category |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104359847B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108020561A (en) * | 2016-11-03 | 2018-05-11 | 应用材料以色列公司 | For the method adaptively sampled in check object and its system |
CN112378864A (en) * | 2020-10-27 | 2021-02-19 | 核工业北京地质研究院 | Airborne hyperspectral soil information retrieval method |
CN112528559A (en) * | 2020-12-04 | 2021-03-19 | 广东省科学院广州地理研究所 | Chlorophyll a concentration inversion method combining presorting and machine learning |
CN112906531A (en) * | 2021-02-07 | 2021-06-04 | 清华苏州环境创新研究院 | Multi-source remote sensing image space-time fusion method and system based on unsupervised classification |
CN113627322A (en) * | 2021-08-09 | 2021-11-09 | 台州市污染防治工程技术中心 | Method and system for eliminating abnormal points and electronic equipment |
CN114545416A (en) * | 2022-02-25 | 2022-05-27 | 中山大学 | An object-oriented quantitative precipitation estimation method, device and terminal equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05332922A (en) * | 1992-03-31 | 1993-12-17 | Shimadzu Corp | Measuring method of cluster of water |
CN101403796A (en) * | 2008-11-18 | 2009-04-08 | 北京交通大学 | City ground impermeability degree analyzing and drawing method |
CN103983584A (en) * | 2014-05-30 | 2014-08-13 | 中国科学院遥感与数字地球研究所 | Retrieval method and retrieval device of chlorophyll a concentration of inland case II water |
-
2014
- 2014-12-08 CN CN201410742576.5A patent/CN104359847B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05332922A (en) * | 1992-03-31 | 1993-12-17 | Shimadzu Corp | Measuring method of cluster of water |
CN101403796A (en) * | 2008-11-18 | 2009-04-08 | 北京交通大学 | City ground impermeability degree analyzing and drawing method |
CN103983584A (en) * | 2014-05-30 | 2014-08-13 | 中国科学院遥感与数字地球研究所 | Retrieval method and retrieval device of chlorophyll a concentration of inland case II water |
Non-Patent Citations (4)
Title |
---|
KUN SHI ET AL.: "Remote chlorophyll-a estimates for inland waters based on a cluster-based classification", 《SCIENCE OF THE TOTAL ENVIRONMENT》 * |
QIAN SHEN ET AL.: "Classification of Several Optically Complex Waters in China Using in Situ Remote Sensing Reflectance", 《REMOTE SENSING》 * |
TIMOTHY S. MOORE ET AL.: "A class-based approach to characterizing and mapping the uncertainty of the MODIS ocean chlorophyll product", 《REMOTE SENSING OF ENVIRONMENT》 * |
申茜 等: "湖泊水体固有光学量光谱拟合与分析研究综述", 《遥感信息》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108020561A (en) * | 2016-11-03 | 2018-05-11 | 应用材料以色列公司 | For the method adaptively sampled in check object and its system |
CN112378864A (en) * | 2020-10-27 | 2021-02-19 | 核工业北京地质研究院 | Airborne hyperspectral soil information retrieval method |
CN112378864B (en) * | 2020-10-27 | 2024-07-19 | 核工业北京地质研究院 | Airborne hyperspectral soil information inversion method |
CN112528559A (en) * | 2020-12-04 | 2021-03-19 | 广东省科学院广州地理研究所 | Chlorophyll a concentration inversion method combining presorting and machine learning |
CN112528559B (en) * | 2020-12-04 | 2024-04-23 | 广东省科学院广州地理研究所 | Chlorophyll a concentration inversion method combining pre-classification and machine learning |
CN112906531A (en) * | 2021-02-07 | 2021-06-04 | 清华苏州环境创新研究院 | Multi-source remote sensing image space-time fusion method and system based on unsupervised classification |
CN113627322A (en) * | 2021-08-09 | 2021-11-09 | 台州市污染防治工程技术中心 | Method and system for eliminating abnormal points and electronic equipment |
CN114545416A (en) * | 2022-02-25 | 2022-05-27 | 中山大学 | An object-oriented quantitative precipitation estimation method, device and terminal equipment |
Also Published As
Publication number | Publication date |
---|---|
CN104359847B (en) | 2017-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104359847B (en) | Method and device for acquiring centroid set used for representing typical water category | |
Hong et al. | Rapid identification of soil organic matter level via visible and near-infrared spectroscopy: Effects of two-dimensional correlation coefficient and extreme learning machine | |
Jiang et al. | HDCB-Net: A neural network with the hybrid dilated convolution for pixel-level crack detection on concrete bridges | |
CN106124449B (en) | A kind of soil near-infrared spectrum analysis prediction technique based on depth learning technology | |
Li et al. | A data-driven approach for tool wear recognition and quantitative prediction based on radar map feature fusion | |
US20180144182A1 (en) | Analyzing digital holographic microscopy data for hematology applications | |
CN106443701B (en) | Method for early warning before flood and waterlog based on sequential water range remote sensing image | |
CN107179310B (en) | Raman spectrum characteristic peak recognition methods based on robust noise variance evaluation | |
CN112014331A (en) | A detection method, device, equipment and storage medium for water pollution | |
CN112766227B (en) | A hyperspectral remote sensing image classification method, device, equipment and storage medium | |
CN103983584A (en) | Retrieval method and retrieval device of chlorophyll a concentration of inland case II water | |
CN103456011A (en) | Improved hyperspectral RX abnormal detection method by utilization of complementary information | |
CN106290389B (en) | The algal tufa and non-algal tufa condition classification method of a kind of eutrophic lake MODIS images | |
Yang et al. | Development of automated microplastic identification workflow for Raman micro-imaging and evaluation of the uncertainties during micro-imaging | |
Li et al. | Identification and visualization of environmental microplastics by Raman imaging based on hyperspectral unmixing coupled machine learning | |
Lotter et al. | Identifying plastics with photoluminescence spectroscopy and machine learning | |
Hu et al. | Recognition method of coal and gangue based on multispectral spectral characteristics combined with one-dimensional convolutional neural network | |
Mazni et al. | An investigation into real-time surface crack classification and measurement for structural health monitoring using transfer learning convolutional neural networks and Otsu method | |
CN108827909A (en) | Soil rapid classification method based on visible and near infrared spectrum and multiple targets fusion | |
Li et al. | A novel processing methodology for traffic-speed road surveys using point lasers | |
Tan et al. | Qualitative analysis for microplastics based on GAF coding and IFCNN image fusion enabled FITR spectroscopy method | |
Chianese et al. | Influence of image noise on crack detection performance of deep convolutional neural networks | |
CN110887798B (en) | Nonlinear full-spectrum water turbidity quantitative analysis method based on extreme random tree | |
CN113496218A (en) | Evaluation method and system for hyperspectral remote sensing sensitive band selection mode | |
Bickler | Prospects for machine learning for shell midden analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |