CN106778738A

CN106778738A - Ground feature extraction method based on decision theory rough set

Info

Publication number: CN106778738A
Application number: CN201611085439.4A
Authority: CN
Inventors: 谢锋; 李成; 曹世杰
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2016-11-30
Filing date: 2016-11-30
Publication date: 2017-05-31

Abstract

The present invention relates to a feature extraction method based on rough set of decision theory, comprising the steps of: (1) acquiring multi-source remote sensing data, and preprocessing the remote sensing multi-band data to obtain band data vectors; (2) collecting expert knowledge base And adding the expert knowledge vector into the band data vector as a condition set, the condition set is open; (3) normalize the condition set; (4) form a decision set by the real ground class, And transform the decision set into a vector and align and combine it with the condition set as a remote sensing decision table; (5) Use the forward greedy search algorithm to process the decision table to obtain the feature ranking, get the core features of the ground objects and distinguish different land types According to the decision-making rules, the features of ground objects are extracted under the decision-making rules. The present invention is based on the feature extraction of the decision set, eliminates the noise in the data, and improves the classification accuracy.

Description

A Method of Feature Extraction Based on Rough Sets of Decision Theory

技术领域technical field

本发明涉及图像处理，尤其涉及一种基于决策理论粗糙集的地物特征提取方法。The invention relates to image processing, in particular to a feature extraction method of ground objects based on decision theory rough sets.

背景技术Background technique

遥感地物特征提取指区分城乡工矿居民用地(城建地)、水体、农用地、林地、裸地不同地物关键信息提取，如从高光谱数据中获取特定地物信息。目前利用遥感影像提取地物信息的主要是人工提取与计算机辅助半自动方法；这些方法中需要大量人工操作、效率低，而高光谱、多光谱内含信息挖掘较少。The feature extraction of remote sensing features refers to the extraction of key information on different features of urban and rural industrial and mining residential land (urban construction land), water body, agricultural land, forest land, and bare land, such as obtaining specific feature information from hyperspectral data. At present, manual extraction and computer-aided semi-automatic methods are mainly used to extract ground feature information from remote sensing images; these methods require a lot of manual operations and are inefficient, while hyperspectral and multispectral information mining is less.

目前，德国的eCognition系统对于遥感地物特征提取实现较好，它是基于多尺度分割，分类时需要有分类协议，每一次分类的分类协议用于其它分类时需经过修改，这里的分类协议类似于知识支持，其特点是分类细化，但对分类协议有较大依赖(有协议库支持较好)，并且也不是专门针对地物特征提取的系统，应用效率不高。At present, the eCognition system in Germany is better for feature extraction of remote sensing features. It is based on multi-scale segmentation, and a classification protocol is required for classification. The classification protocol for each classification needs to be modified when it is used for other classifications. The classification protocol here is similar. Based on knowledge support, it is characterized by classification and refinement, but it relies heavily on classification protocols (protocol library support is better), and it is not a system specifically for feature extraction of ground features, and its application efficiency is not high.

由于地物都具有反射或辐射电磁波的特性，即地物波谱响应，或者通过总结前人研究成果形成专家知识如植被指数VI(vegetation indices)，由两个或者两个以上波段响应以一定形式组合而成的对植被长势、生物量等有一定指示意义的参数。波段以某种形式组合可以在一定程度上消除植被冠层光谱中大气与土壤背景的干扰，且与一些重要的生物物理参数(如LAI，干物资等)有着密切的函数关系。常用的植被指数有归一化植被指数(NDVI)、比值植被指数(SR)、环境植被指数(EVI)、缨帽变换中的绿被植被指数(GVI)和正交植被指数(PVI)等(陈述彭等，1990)。除植被以外，钱乐祥等发现水体在TM中有(TM2+TM3)>(TM4+TM5)的谱间光谱特征，Duggin等发现用TM4与TM6就能区别林地与其他植被，解译经验表明TM1/TM3，TM5/TM7，TM5+TM7，(TM4-TM3)/(TM4+TM3)等都会特出某些地物特征。多个谱段的数据加减乘除及线性非线性组合构成了具有指示意义的数值，这是隐藏在数据集中的事物固有知识，而前人只是根据文献资料和经验进行随机组合试验达到实际意义。Since ground features have the characteristics of reflecting or radiating electromagnetic waves, that is, the spectral response of ground features, or expert knowledge formed by summarizing previous research results, such as vegetation index VI (vegetation indices), is composed of two or more band responses in a certain form. The resulting parameters have certain indicative significance for vegetation growth, biomass, etc. A certain combination of bands can eliminate the interference of the atmosphere and soil background in the vegetation canopy spectrum to a certain extent, and has a close functional relationship with some important biophysical parameters (such as LAI, dry matter, etc.). Commonly used vegetation indexes include normalized difference vegetation index (NDVI), ratio vegetation index (SR), environmental vegetation index (EVI), green vegetation index (GVI) in tasseled cap transformation, and orthogonal vegetation index (PVI), etc. ( Statement Peng et al., 1990). In addition to vegetation, Qian Lexiang et al. found that water bodies have (TM2+TM3)>(TM4+TM5) interspectral spectral features in TM. Duggin et al. found that TM4 and TM6 can be used to distinguish forest land from other vegetation. Interpretation experience shows that TM1/TM3, TM5/TM7, TM5+TM7, (TM4-TM3)/(TM4+TM3), etc. will have certain features of the ground. The data addition, subtraction, multiplication, division and linear nonlinear combination of multiple spectral bands constitute indicative values, which is the inherent knowledge hidden in the data set, while the predecessors only performed random combination experiments based on literature and experience to achieve practical significance.

由于地物关键特征涉及到遥感大数据，地物表现复杂，因此提升了信息提取难度，使得提取效果受到影响。针对现有的利用遥感影像进行地物提取方法存在的受各种干扰信息严重、提取效率低效果差、提取方法也并非针对高光谱/多光谱分辨率遥感影像提取等缺点，本发明提出高效的遥感影像地物特征提取方法。Since the key features of ground features involve remote sensing big data, the performance of ground features is complex, which increases the difficulty of information extraction and affects the extraction effect. Aiming at the shortcomings of the existing method of ground object extraction using remote sensing images, such as serious interference information, low extraction efficiency, and poor extraction methods, the present invention proposes an efficient Feature extraction method of remote sensing images.

发明内容Contents of the invention

为解决上述技术问题，本发明的目的是提供一种误差小、分类精度高的基于决策理论粗糙集的地物特征提取方法。In order to solve the above technical problems, the object of the present invention is to provide a feature extraction method based on decision theory rough sets with small error and high classification accuracy.

本发明的基于决策理论粗糙集的地物特征提取方法，包括步骤：The present invention's method for feature extraction based on decision theory rough sets comprises the steps of:

(1)获取多源遥感数据，并对遥感多波段数据进行预处理得到波段数据向量；(1) Obtain multi-source remote sensing data, and preprocess the remote sensing multi-band data to obtain band data vectors;

(2)收集专家知识库并将专家知识向量加入到所述波段数据向量中作为条件集，所述条件集为开放式；(2) Collect expert knowledge base and add expert knowledge vector in described band data vector as condition set, and described condition set is open type;

(3)对所述条件集进行归一化处理；(3) performing normalization processing on the condition set;

(4)由真实地类构成决策集，并将决策集转化为向量与条件集对齐合并作为遥感决策表；(4) The decision set is formed by the real ground class, and the decision set is converted into a vector and aligned with the condition set and merged as a remote sensing decision table;

(5)利用前向贪婪搜索算法对决策表进行处理，得到特征排序，由特征主次得到地物核心特征以及区分不同地类的决策规则，在决策规则下进行地物特征提取。(5) Use the forward greedy search algorithm to process the decision table to obtain the feature ranking, get the core features of the ground objects and the decision rules for distinguishing different land types from the primary and secondary features, and extract the ground feature features under the decision rules.

进一步的，利用前向贪婪搜索算法对决策表进行处理的具体步骤为Further, the specific steps of processing the decision table using the forward greedy search algorithm are as follows:

(11)建立核心特征集R并赋为空集；(11) Set up the core feature set R and assign it to an empty set;

(12)对每一个属于条件集C-R中的特征a_k，计算k＝1,2…,C＝{a₁,a₂,…a_k,…}，为属于D_X的概率，为x的描述，X是决策类，满足准则Ⅰ：≥α，α为置信水平；(12) For each feature a _k belonging to the condition set CR, calculate k=1,2...,C={a ₁ ,a ₂ ,...a _k ,...}, for the probability that _X belongs to D, is the description of x, X is a decision class, and satisfies criterion Ⅰ: ≥α, α is the confidence level;

(13)计算代价函数ω_i是第i类，i＝1，2，…，c，X是决策类，最大的P代表最大可能是边界域中大类；(13) Calculate the cost function ω _i is the i-th category, i=1, 2, ..., c, X is the decision-making category, and the largest P represents the largest possible category in the boundary domain;

(14)对遥感数据中每一个单元i计算代价函数，累加为总代价函数i＝1，2，…，n；(14) Calculate the cost function for each unit i in the remote sensing data, and accumulate it into the total cost function i=1,2,...,n;

(15)计算|U|为单元总数，γ是一种集置信水平、代价或风险量度、普适度为一体的指标；(15) calculation |U| is the total number of units, γ is an indicator that integrates confidence level, cost or risk measurement, and universality;

(16)首先对每一个特征计算一个γ值，从中选取最大者，满足以下两个条件：(16) First calculate a γ value for each feature, and select the largest one, which meets the following two conditions:

准则Ⅱ：总代价循环间不增加，Criterion II: The total cost does not increase between cycles,

准则Ⅲ：普适度最大；Criterion Ⅲ: maximum universality;

(17)γ₀初始值赋0；(17) The initial value of γ ₀ is assigned 0;

(18)若γ与γ₀之差的绝对值大于重要性阈值Δ，则认为特征a_m重要，加入核心特征集R里，进入步骤(12)循环，直到γ与γ₀之差的绝对值小于等于重要性阈值Δ时，认为提不出重要特征，循环结束，返回R；(18) If the absolute value of the difference between γ and γ ₀ is greater than the importance threshold Δ, the feature a _m is considered important, and added to the core feature set R, and enters the cycle of step (12) until the absolute value of the difference between γ and γ ₀ When it is less than or equal to the importance threshold Δ, it is considered that no important features can be extracted, the loop ends, and R is returned;

(19)核心特征集R中特征次序对应决策规则。(19) The order of features in the core feature set R corresponds to the decision rule.

进一步的，所述步骤(1)中对遥感多波段数据进行预处理包括向量编码、坐标配准处理。Further, the preprocessing of the remote sensing multi-band data in the step (1) includes vector encoding and coordinate registration processing.

进一步的，所述步骤(2)中专家知识包括植被指数、由两个或者两个以上波段响应以一定形式组合而成的对植被长势、生物量等有指示意义的参数。Further, the expert knowledge in the step (2) includes the vegetation index, the parameters indicative of vegetation growth, biomass, etc., which are combined in a certain form from the responses of two or more bands.

进一步的，所述步骤(4)中由真实地类构成的决策集，是基于地形图、专题图、规划图中的一种或多种资料结合人工实地调查，给每个像元单元赋予不同的类别数值，得到相应区域的专题类别形成的。Further, the decision set composed of real ground types in the step (4) is based on one or more materials in topographic maps, thematic maps, and planning maps combined with manual field surveys, and assigns different values to each pixel unit. The value of the category is obtained from the thematic category of the corresponding area.

借由上述方案，本发明使用了复合标准来最小化误差，提出的γ度量是一种集置信水平、代价或风险量度、普适度为一体的指标，具有如下三个优点：By means of the above scheme, the present invention uses composite standards to minimize errors, and the proposed γ measure is an index that integrates confidence level, cost or risk measure, and universality, and has the following three advantages:

1、便于处理离散及数值型数据；1. It is easy to process discrete and numerical data;

2、置信度保证了它的抗噪性强；2. The confidence guarantees its strong noise resistance;

3、代价函数体现了真正的错分；3. The cost function reflects the real misclassification;

本发明控制了真正的误差，得到核心特征集，使得分类精度得到了提高。The invention controls the real error, obtains the core feature set, and improves the classification accuracy.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，并可依照说明书的内容予以实施，以下以本发明的较佳实施例并配合附图详细说明如后。The above description is only an overview of the technical solutions of the present invention. In order to understand the technical means of the present invention more clearly and implement them according to the contents of the description, the preferred embodiments of the present invention and accompanying drawings are described in detail below.

附图说明Description of drawings

图1是本发明的流程图；Fig. 1 is a flow chart of the present invention;

图2是本发明的特征提取方法在四种典型分类算法的精度表现对比表。Fig. 2 is a comparison table of accuracy performance of the feature extraction method of the present invention in four typical classification algorithms.

具体实施方式detailed description

下面结合附图和实施例，对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明，但不用来限制本发明的范围。The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

本发明一较佳实施例所述的一种基于决策理论粗糙集的地物特征提取方法，经现场调查了解区域内各种地物特点与周围环境，然后从物方、像方、人三方进行分析，利用决策理论粗糙集进行地物特征序列及关联规则提取，在具有一组条件特征(属性)集与决策属性集的形式背景下，建立遥感决策表，设计启发式搜索算法处理决策表得到特征排序，由特征主次得到决策规则。在决策规则下进行地物特征提取，如图1所示，首先收集遥感数据与地(形)图；接着挑选精度最高的为基准图，其他数据都校正到基准图坐标系；然后分二步，分别构造条件集与决策集，条件集由高光谱段，波段代数，比值等专家知识构成，并且是开放式的，允许加入新特征；决策集由真实地类构成；这样，遥感决策表通过依照图像单元顺序连接条件集与决策集构成；决策表里特征数可能很多，是个NP-hard问题，为了得到一个合适解，本发明实施了一个前向贪婪搜索算法，步骤如下：A feature extraction method based on rough sets of decision-making theory described in a preferred embodiment of the present invention, through on-site investigation to understand the characteristics of various features and the surrounding environment in the area, and then proceed from the three aspects of object space, image space and people Analysis, use the rough set of decision theory to extract the feature sequence and association rules of the ground object, under the formal background of a set of conditional feature (attribute) set and decision attribute set, establish the remote sensing decision table, design a heuristic search algorithm to process the decision table to get Feature sorting, the decision rules are obtained from the primary and secondary features. Under the decision-making rules, ground feature extraction is carried out, as shown in Figure 1, first collect remote sensing data and terrain (shape) maps; then select the most accurate reference map, and other data are corrected to the coordinate system of the reference map; and then divide into two steps , to construct the condition set and decision set respectively. The condition set is composed of hyperspectral band, band algebra, ratio and other expert knowledge, and it is open, allowing new features to be added; the decision set is composed of real ground classes; thus, the remote sensing decision table is passed The condition set and the decision set are connected according to the order of the image units; the number of features in the decision table may be many, which is an NP-hard problem. In order to obtain a suitable solution, the present invention implements a forward greedy search algorithm, and the steps are as follows:

步骤1:建立核心特征集R并赋为空集；Step 1: set up the core feature set R and assign it to an empty set;

步骤2:对每一个属于条件集C-R中的特征a_k计算 Step 2: Calculate for each feature a _k that belongs to the condition set CR

这里，k＝1，2…，C＝{a₁,a₂,…a_k,…}，为(x的描述)属于D_X(决策类X)的概率，满足准则Ⅰ：α为置信水平；Here, k=1, 2..., C={a ₁ , a ₂ ,...a _k ,...}, for Probability of (description of x) belonging to D _X (decision class X), satisfying criterion I: α is the confidence level;

步骤3:计算代价函数，即这里ω_i是第i(i＝1,2,…,c)类；X是决策类；最大的P代表最大可能是边界域中大类，同时，如果等价类是边界域中小类的概率增高，这更可能引起误分，于是代价累计1；Step 3: Calculate the cost function, namely Here ω _i is the i (i=1,2,...,c) class; X is the decision class; the largest P represents the largest possible class in the boundary domain, and at the same time, if The equivalence class is an increase in the probability of small classes in the boundary domain, which is more likely to cause misclassification, so the cumulative cost is 1;

步骤4：对遥感数据中每一个单元i(i＝1,2,…,n)计算代价函数，累加为总代价函数: Step 4: Calculate the cost function for each unit i (i=1,2,…,n) in the remote sensing data, and accumulate it into the total cost function:

步骤5：计算|U|为单元总数；γ度量是一种集置信水平,代价或风险量度，普适度为一体的指标；Step 5: Calculate |U| is the total number of units; the γ measure is an indicator that integrates confidence level, cost or risk measure, and universality;

步骤6：首先对每一个特征(共k个)计算一个γ值，从中选取最大者，满足以下2条件：Step 6: First calculate a γ value for each feature (a total of k), and select the largest one, which meets the following two conditions:

准则Ⅱ：总代价循环间不增加；Criterion II: The total cost does not increase between cycles;

准则Ⅲ：普适度最大；Criterion Ⅲ: maximum universality;

步骤7：γ₀初始值赋0；Step 7: assign 0 to the initial value of γ ₀ ;

步骤8：若γ与γ₀之差的绝对值大于重要性阈值Δ，则认为特征a_m重要，加入核心特征集R里，进入步骤2循环，直到γ与γ₀之差的绝对值不大于重要性阈值Δ时，认为提不出重要特征了，循环结束，返回R；Step 8: If the absolute value of the difference between γ and γ ₀ is greater than the importance threshold Δ, the feature a _m is considered important, and added to the core feature set R, and enters the cycle of step 2 until the absolute value of the difference between γ and γ ₀ is not greater than When the importance threshold is Δ, it is considered that no important features can be proposed, the loop ends, and R is returned;

步骤9：核心特征集R中特征次序对应决策规则。Step 9: The order of features in the core feature set R corresponds to the decision rule.

通过以上步骤得到地物核心特征，以及区分不同地类的决策规则。Through the above steps, the core features of ground objects and the decision rules for distinguishing different land types are obtained.

以下结合一个实施例子对具体实现方法进行说明，即对很多个波段遥感影像进行地物特征提取，获得满足通用GIS软件要求的专题信息矢量数据(.shp文件)。其实施如下：The specific implementation method will be described below with an implementation example, that is, feature extraction is performed on many bands of remote sensing images to obtain thematic information vector data (.shp files) that meet the requirements of general GIS software. Its implementation is as follows:

1、遥感多波段数据预处理，具体包括向量编码、配准等。1. Remote sensing multi-band data preprocessing, specifically including vector encoding, registration, etc.

遥感光谱向量编码采用离散化编码，具体为：根据遥感影像数据，以每个波段为一维，波段个数确定向量的维数，选取除热红外波段以外的数据组成向量，并且向量排列严格按位对齐，形成波段数据向量。The remote sensing spectrum vector encoding adopts discretization encoding, specifically: according to the remote sensing image data, each band is regarded as one dimension, the number of bands determines the dimension of the vector, and the data other than the thermal infrared band are selected to form a vector, and the vector arrangement is strictly in accordance with Bit-aligned to form a vector of band data.

2、加入专家知识，即总结前人研究成果形成的专家知识如植被指数VI(vegetation indices)，由两个或者两个以上波段响应以一定形式组合而成的对植被长势、生物量等有一定指示意义的参数。波段以某种形式组合可以在一定程度上消除植被冠层光谱中大气与土壤背景的干扰，且与一些重要的生物物理参数(如LAI，干物资等)有着密切的函数关系。常用的植被指数有归一化植被指数(NDVI)、比值植被指数(SR)、环境植被指数(EVI)、缨帽变换中的绿被植被指数(GVI)和正交植被指数(PVI)等(陈述彭等,，1990)。除植被以外，钱乐祥等发现水体在TM中有(TM2+TM3)>(TM4+TM5)的谱间光谱特征，Duggin等发现用TM4与TM6就能区别林地与其他植被，解译经验表明TM1/TM3，TM5/TM7，TM5+TM7，(TM4-TM3)/(TM4+TM3)等都会提出某些地物特征。2. Add expert knowledge, that is, expert knowledge formed by summarizing previous research results, such as vegetation index VI (vegetation indices), which is composed of two or more band responses in a certain form and has a certain effect on vegetation growth and biomass. Parameters that indicate meaning. A certain combination of bands can eliminate the interference of the atmosphere and soil background in the vegetation canopy spectrum to a certain extent, and has a close functional relationship with some important biophysical parameters (such as LAI, dry matter, etc.). Commonly used vegetation indices include normalized difference vegetation index (NDVI), ratio vegetation index (SR), environmental vegetation index (EVI), green vegetation index (GVI) in tasseled cap transformation, and orthogonal vegetation index (PVI), etc. ( Statement Peng et al., 1990). In addition to vegetation, Qian Lexiang et al. found that water bodies have (TM2+TM3)>(TM4+TM5) spectral features in TM. Duggin et al. found that TM4 and TM6 can be used to distinguish forest land from other vegetation. Interpretation experience shows that TM1/TM3, TM5/TM7, TM5+TM7, (TM4-TM3)/(TM4+TM3), etc. will all propose certain ground features.

加入专家知识TM2+TM3、TM4+TM5、TM2-TM4、TM3-TM7、TM1/TM3、TM5/TM7到波段数据向量，并按位对齐，作为条件集。Add expert knowledge TM2+TM3, TM4+TM5, TM2-TM4, TM3-TM7, TM1/TM3, TM5/TM7 to the band data vector, and align by bit, as a condition set.

3、对条件集进行归一化处理，不同指标往往具有不同的量纲和量纲单位，这样的情况会影响到数据分析的结果，为了消除指标之间的量纲影响，需要进行归一化处理，即y＝(x-MinValue)/(MaxValue-MinValue)，其中：x、y分别为转换前、后的值，MaxValue、MinValue分别为特征的最大值与最小值。3. Normalize the condition set. Different indicators often have different dimensions and dimensional units. This situation will affect the results of data analysis. In order to eliminate the dimensional influence between indicators, normalization is required Processing, that is, y=(x-MinValue)/(MaxValue-MinValue), where: x, y are the values before and after conversion respectively, and MaxValue and MinValue are the maximum value and minimum value of the feature respectively.

4、基于其他资料(地形图、专题图、规划图)与人实地调查，得到本区域专题类别，给每个像元单元赋予不同的类别数值，形成决策集，并将决策集化为向量并与条件集对齐合并为遥感决策表。4. Based on other materials (topographic maps, thematic maps, planning maps) and field surveys, the thematic category of the region is obtained, and each pixel unit is given a different category value to form a decision set, and the decision set is converted into a vector and Align with condition set and merge into remote sensing decision table.

5、利用前向贪婪搜索算法对决策表进行处理，得到特征排序，由特征主次得到本地区地物决策规则，能达到比不进行特征提取更好的分类精度效果。5. Use the forward greedy search algorithm to process the decision table, get the feature ranking, and get the decision rules of the local features based on the primary and secondary features, which can achieve better classification accuracy than that without feature extraction.

如图2所示，第一列的5种数据集分别是：TM波段、12维数据、RⅠ、RⅡ、RⅢ，其中RⅠ、RⅡ、RⅢ为特征集，分别为本方法所提核心特征；第二列为数据集的特征数。对这些数据集使用相同的四种分类算法，即，平行六面体法(Parallelepiped)，马氏距离法(MahaDist)，极大似然法(MaxLike)，与支持向量机(SVM)，分类精度用总精度OA及kappa系数表示。可见，本发明的特征约简(即将特征提取)消除了数据中的噪声，使得分类精度得到提高。As shown in Figure 2, the five data sets in the first column are: TM band, 12-dimensional data, RI, RII, and RIII, among which RI, RII, and RIII are feature sets, which are the core features proposed by this method; The second column is the feature number of the dataset. The same four classification algorithms are used for these data sets, namely, parallelepiped method (Parallelepiped), Mahalanobis distance method (MahaDist), maximum likelihood method (MaxLike), and support vector machine (SVM). Accuracy OA and kappa coefficient express. It can be seen that the feature reduction (that is, feature extraction) of the present invention eliminates the noise in the data, so that the classification accuracy is improved.

以上所述仅是本发明的优选实施方式，并不用于限制本发明，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明技术原理的前提下，还可以做出若干改进和变型，这些改进和变型也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention. It should be pointed out that for those of ordinary skill in the art, some improvements can be made without departing from the technical principle of the present invention. and modifications, these improvements and modifications should also be considered as the protection scope of the present invention.

Claims

1. A feature extraction method based on decision theory rough set, is characterized in that, comprises steps:

(1) Obtain multi-source remote sensing data, and preprocess the remote sensing multi-band data to obtain band data vectors;

(2) Collect expert knowledge base and add expert knowledge vector in described band data vector as condition set, and described condition set is open type;

(3) performing normalization processing on the condition set;

(4) The decision set is formed by the real ground class, and the decision set is converted into a vector and aligned with the condition set and merged as a remote sensing decision table;

(5) Use the forward greedy search algorithm to process the decision table to obtain the feature ranking, get the core features of the ground objects and the decision rules for distinguishing different land types from the primary and secondary features, and extract the ground feature features under the decision rules.

2. the feature extraction method based on decision theory rough set according to claim 1, is characterized in that: utilize forward greedy search algorithm to the concrete step that decision table is processed as

(11) Set up the core feature set R and assign it to an empty set;

(12) For each feature a _k belonging to the condition set CR, calculate k=1,2...,C={a ₁ ,a ₂ ,...a _k ,...}, for the probability that _X belongs to D, is the description of x, X is a decision class, and satisfies criterion Ⅰ: α is the confidence level;

(13) Calculate the cost function ω _i is the i-th category, i=1, 2, ..., c, X is the decision-making category, and the largest P represents the largest possible category in the boundary domain;

(14) Calculate the cost function for each unit i in the remote sensing data, and accumulate it into the total cost function i=1,2,...,n;

(15) calculation |U| is the total number of units, γ is an indicator that integrates confidence level, cost or risk measurement, and universality;

(16) First calculate a γ value for each feature, and select the largest one, which meets the following two conditions:

Criterion II: The total cost does not increase between cycles,

Criterion Ⅲ: maximum universality;

(17) The initial value of γ ₀ is assigned 0;

(18) If the absolute value of the difference between γ and γ ₀ is greater than the importance threshold Δ, the feature a _m is considered important, and added to the core feature set R, and enters the cycle of step (12) until the absolute value of the difference between γ and γ ₀ When it is less than or equal to the importance threshold Δ, it is considered that no important features can be extracted, the loop ends, and R is returned;

(19) The order of features in the core feature set R corresponds to the decision rule.

3. The feature extraction method based on decision theory rough sets according to claim 1, characterized in that: the preprocessing of the remote sensing multi-band data in the step (1) includes vector encoding and coordinate registration processing.

4. the feature extraction method based on decision theory rough set according to claim 1, is characterized in that: in described step (2), expert knowledge comprises vegetation index, is responded to by two or more wave bands in a certain form Combined parameters that are indicative of vegetation growth, biomass, etc.

5. the feature extraction method based on decision theory rough set according to claim 1, is characterized in that: in the described step (4), the decision set that is made up of real ground class is based on topographic map, thematic map, planning One or more types of data in the picture are combined with manual field surveys to assign different category values to each pixel unit to obtain thematic category of the corresponding area.