CN104463229B

CN104463229B - High-spectral data supervised classification method based on coefficient correlation redundancy

Info

Publication number: CN104463229B
Application number: CN201410840648.XA
Authority: CN
Inventors: 张淼; 张晔; 沈毅
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2014-12-30
Filing date: 2014-12-30
Publication date: 2017-06-27
Anticipated expiration: 2034-12-30
Also published as: CN104463229A

Abstract

A kind of high-spectral data supervised classification method based on coefficient correlation redundancy, belongs to remote sensing images field of information processing.Methods described step is as follows：Step one, using the training sample set required for coefficient correlation redundancy automatic screening Supervised classification；Step 2, the parameter to SVM kernel functions carry out optimizing；Step 3, completed to two classification tasks of high-spectrum remote sensing using SVM classifier algorithm；Step 4, many classification tasks are realized based on one-to-one strategy.The method aids in automatic screening training sample by calculating the coefficient correlation redundancy of single selection sample set, and it is used for the automatic optimal of classifier parameters by sample is abandoned, so that the nicety of grading of SVM classifier algorithm has obtained effective lifting, and time loss is reduced by simplifying support vector, it is allowed to more be had practical value in the high-precision classification task for the treatment of high-spectrum remote sensing.

Description

High-spectral data supervised classification method based on coefficient correlation redundancy

Technical field

The invention belongs to remote sensing images field of information processing, it is related to a kind of bloom for optimizing training data and classifier parameters Modal data supervised classification method.

Background technology

Classification of hyperspectral remote sensing image is the important content of high-spectrum remote sensing field of information processing, and whether foundation uses The priori of classification can be divided into Supervised classification and unsupervised classification, and Supervised classification precision is higher than unsupervised classification, because And it is applied to the sophisticated category application of remote sensing images.Classification concept is also not quite similar in different application, in high spectrum image point In class research, for Supervised classification, researcher first has to select representational pixel (or picture for each classification Unit) gather as training sample.High light spectrum image-forming equipment is often sacrificial in spatial resolution due to emphasizing spectral resolution Domestic animal, therefore it is the work highly dependent upon analysis personnel's experience to select suitable training sample by visual inspection.Additionally, also Usually need by other information, such as land data or existing map to select representative training sample for each classification This.For Supervised classification, same category of sample is needed with congeniality, while being also required to meet certain variance model Enclose.Therefore in actual applications, it is necessary to the multiple training sample regions of selection or set.If class variance is larger, then choosing It is also very laborious to select training region, while whether we can not determine the entirely appropriate image classification of training sample of selection. Therefore, selection and screening training sample are one highly dependent upon researcher's judgment, while the work for taking very much.

SVMs (Support Vector Machine, SVM) algorithm is developed on the basis of Statistical Learning Theory The sorting technique got up, has unique advantage in terms of small sample, non-linear and high dimensional pattern classification, therefore for wave band number The high-spectral data of numerous (typically 100 to 1000) has relatively more outstanding classifying quality always.SVM is in linear classifier On the basis of, developed by introducing structural risk minimization principle, Optimum Theory and kernel method and formed, it is most have in statistical learning Effect is also most widely used method, can effectively overcome in Hyperspectral data classification application Hughes phenomenons (i.e. nicety of grading with The increase for wave band number is reduced on the contrary).Additionally, SVM can be calculated directly high dimensional data, it is not necessary to by dimension-reduction treatment, So using whole wave band datas classify and ensure that the adequacy and integrality of spectral information application, therefore many high-precision All it is often that computing is carried out using all-wave segment data and in high-performance workstation in degree classification task.

The content of the invention

In order to solve the problems, such as to be difficult to carry out training sample Effective selection in hyperspectral classification method, the present invention is provided A kind of high-spectral data supervised classification method based on coefficient correlation redundancy.The method selects sample by calculating single The coefficient correlation redundancy of set aids in automatic screening training sample, and will abandon sample and be used for the automatic seeking of classifier parameters It is excellent so that the nicety of grading of SVM classifier algorithm has obtained effective lifting, and reduce the time by simplifying support vector and disappear Consumption, is allowed to more be had practical value in the high-precision classification task for the treatment of high-spectrum remote sensing.

The specific tasks that the present invention is faced for high-spectrum remote sensing analysis personnel, devise a kind of assisting sifting training Sample simultaneously provides the sorting technique of parameter optimization.In the unqualified pixel or suboptimum pixel in screening out training sample, it is proposed that The coefficient correlation redundancy of nonlinear correlation information content between analysis multivariable input such that it is able to once to E × 1 pixel Line segment sampler in M preselected pixels carry out total evaluation, it is ensured that the pixel selected can be realized being integrated with classification The maximization of information content；Meanwhile, the training sample of abandonment is effectively incorporated into the parameter selection process of SVM classifier；Most The test sample that analysis personnel are concerned about is classified automatically by SVM classifier algorithm afterwards, realize it is a set of fast and accurately Supervised classification scheme.Specific implementation step is as follows：

Step one, using the training sample set required for coefficient correlation redundancy automatic screening Supervised classification：

1) high-spectrum remote sensing for shootingWherein Row, Column represent high-spectrum remote sensing It is wide and long, B represents the wave band number of high-spectrum remote sensing, is selected according to the line segment sampler of E × 1 by graphical analysis personnel Training sample；

2) training sample for artificial selection carries out automatic screening, and the pixel for setting each reservation is M, it is desirable to M ＜ E, that is, deleted the E-M pixel of training sample；

3) M pixel P of all possible options is calculated₁..., P_MCoefficient correlation redundancy, and it is maximum to select numerical value One coefficient correlation redundancy, i.e.,ThenIt is the M pixel retained from E pixel；

4) a category label Class for determination is marked to the E pixel unification that this is selected^*, the M pixel that will retain In the form of data pairTraining sample set is included into, remaining E-M pixel is also with data To formIt is included into abandonment sample set；

5) if necessary to continue to select more training samples, then return 1), otherwise carry out step 2.

Step 2, the parameter to SVM kernel functions carry out optimizing：

1) according to the different characteristics of SVM classifier kernel function, from the RBF (Radial with symmetrical inner product Basis Function, RBF) used as the kernel function of SVMs, RBF forms are： Wherein σ is width parameter, and the traversal examination of data below is carried out to parameter σ：σ ∈ { 0.01,0.1,1,10 }；

2) optimizing scheme：First, selected with identical category label in the training sample set for being obtained from step one One class pixel of value, is -1 by its new category label definition, and remaining all pixels then correspond to new class in training sample set Other label 1, and same treatment is also done to abandoning the pixel in sample set, construct two classification problems of standard, application SVM algorithm based on RBF kernel functions goes classification, and calculates the nicety of grading for the category, wherein：RBF forms are：σ is width parameter, and the traversal examination of data below is carried out to parameter σ：σ ∈ 0.01, 0.1,1,10 }；Then, according to same method, above-mentioned treatment is all done (due to this to each classification in training sample set Processing procedure is related to calculating of the class to other all classes, therefore also referred to as one-to-many strategy), then obtain for all categories Nicety of grading；Nicety of grading to all categories takes average, obtains average nicety of grading；

3) the σ values corresponding to average nicety of grading maximum are selected as the parameter of SVM classifier algorithm.

Step 3, completed to two classification tasks of high-spectrum remote sensing using SVM classifier algorithm：

Select h from training sample set, the other pixel of the species of k two is as training data to the svm classifier of known parameters σ Device is trained, and to all L test sample P_t(1≤t≤L) carries out classification estimation, and all test samples are obtained afterwards Classification estimate：

f_{H, k}(P_t), 1≤t≤L；

If there was only two kinds of classifications in training sample set, step 4 need not be carried out, using f_{H, k}(P_t) can complete right The kind judging of all test samples；Otherwise, it is necessary to select a pair other different classes of combinations in addition, and repeat step Three, until all category combinations two-by-two are all finished by calculating.

Step 4, many classification tasks are realized based on one-to-one strategy：

According to the results obtained in step three, using the Voting principle of one-to-one strategy, each test sample is finally judged Category label.

Compared with prior art, the invention has the advantages that：

1) present invention consider high spectrum image spatial resolution it is relatively low the characteristics of, for expert along training set-determination process In the line segment sampler commonly used carry out automatic screening, and the step can complete adopting for each E × 1 pixel in analysis personnel After sample, by computer automatic execution, therefore from for practical angle, any calculating pressure will not be brought, as analysis personnel Choose after all of training sample, this method can also automatically provide the suitable of SVM classifier by the training sample for abandoning Parameter, it is overall that there is extremely strong operability.

2) the training sample screening process based on coefficient correlation redundancy proposed by the present invention, can effectively analyze multivariable Overall nonlinear correlation information under collective effect, can evaluate well between the end pixel of line segment sampler two due to space away from From larger and nonlinear correlation relation that present, and the information content is used to evaluate the overall useful classification information of multivariable, made The grader computing for obtaining subsequently is more efficient, and classification results are also more accurate.

3) present invention considers the different characteristics of one-to-many tactful and one-to-one strategy：It is less that the former is more suitable for training sample And training sample number relative equilibrium of all categories the parameter determination stage application, can objective evaluation grader it is other to unitary class Performance；The latter is then adapted to many classification problems of test sample of big data quantity.Therefore, one-to-many strategy is applied to parameter optimization, One-to-one strategy is applied to final test sample classification, and two kinds of strategies are used in combination so that this method improves classification effectiveness With performance.

4) present invention proposes a kind of high-spectral data training sample screening scheme based on coefficient correlation redundancy, and ties Close SVM algorithm and constitute high-spectral data supervised classification method, the traditional svm classifier algorithm of contrast is while the calculating time is reduced Realize high-precision classification.

Brief description of the drawings

Fig. 1 is flow chart of the invention；

Fig. 2 is the classification results scatter diagram of the inventive method；

Fig. 3 is the classification results scatter diagram of standard SVM methods.

Specific embodiment

Technical scheme is further described below in conjunction with the accompanying drawings, but is not limited thereto, it is every to this Inventive technique scheme is modified or equivalent, without deviating from the spirit and scope of technical solution of the present invention, all should be covered In protection scope of the present invention.

Specific embodiment one：As shown in figure 1, the EO-1 hyperion number based on coefficient correlation redundancy that present embodiment is provided Four steps are divided into according to supervised classification method, are comprised the following steps that：

Step one：Using the training sample set required for coefficient correlation redundancy automatic screening Supervised classification.

1) high-spectrum remote sensing for shootingWherein Row, Column represent high-spectrum remote sensing It is wide and long, B represents the wave band number of high-spectrum remote sensing, is selected according to the line segment sampler of E × 1 by graphical analysis personnel Training sample, in order to the screening operation for ensureing follow-up is meaningful, it is desirable to E >=2, i.e., once may be selected the E training sample of pixel, Use vectorial P_m(m=1 ..., E) is represented, each vectorial P_mDimension be equal to B.Due to vectorial P_mEach dimension data represent Each band class information of the pixel, therefore also referred to as P_mIt is pixel vectors, or abbreviation pixel；

5) if necessary to continue to select more training samples, then return 1), the step of otherwise carrying out below.

Coefficient correlation redundancy (the Correlation Coefficient used in above-mentioned steps are given below Redundancy, CCR) computational methods：

Before the coefficient correlation redundancy between calculating multivariable, first need to calculate the non-linear phase between variable two-by-two Relation number (Nonlinear Correlation Coefficient, NCC).Consider two variable Xs and Y, their element number (i.e. vectorial dimension) is B, and the desirable status number of variable is g, and wherein g needs the element of numerical value different less than in B element Number, can otherwise cause some states zero element occur, so as to produce the computing of singularity.The distribution of state is by following side Formula determines：First, by the element of variable X and Y respectively by order arrangement from small to large；Then, by B/g value of foremost Be set to first state, ensuing B/g value is set to second state, and the rest may be inferred, and claim each state minimum value and Maximum is state threshold；Finally, for variable X and Y, their element to (X (1), Y (1)), (X (2), Y (2)) ..., (X (B), Y (B)) will be put into the two-dimensional state lattice of g × g according to above identified state threshold.

After being processed more than, the free position probability of variable X and Y is p_iThe joint probability of=1/g, variable X and Y is p_ij=n_ij/ B, wherein n_ijIt is the number of element pair in (i, j) individual two-dimensional state lattice.Nonlinear interaction coefficient (Nonlinear Correlation Coefficient, NCC) it is defined as：

NCC (X, Y)=H (X)+H (Y)-H (X, Y)；

Notice p_i=1/g, then NCC can be reduced to：

The B element of variable X and Y to (X (1), Y (1)), (X (2), Y (2)) ..., (X (B), Y (B)) is in the two dimension of g × g The universal correlation in statistical significance between two variables is contained in distribution in state lattice, it is thus possible to weigh between two variables Nonlinear correlation relation.

When multiple variables, the M pixel P that such as the present invention relates to₁..., P_M, it is universal between variable two-by-two The degree of contact can be described with nonlinear interaction coefficient, therefore variable its nonlinear correlation system to be investigated for M Matrix number can be written as：

Wherein, P_uAnd P_vRepresent M pixel P₁..., P_MBetween arbitrary two, 1≤u≤M, 1≤v≤M, NCC (P_u, P_v) represent pixel variable P_uAnd P_vBetween nonlinear interaction coefficient, because variable is perfectly correlated with its own, and its is non-linear Coefficient correlation is 1, therefore the diagonal entry of nonlinear interaction coefficient matrix R is 1；0≤r of other elements in matrix R_uv≤ 1, (u ≠ v, u≤M, v≤M) represents the degree of correlation between u-th variable and v-th variable.When between all of variable all When orthogonal, R is unit matrix, and in this case, the degree of correlation between the multiple variables investigated is most weak.When All perfectly correlated between all of variable, each element in R is equal to 1, in this case, the variable investigated it Between have most strong correlation.It can be seen that, the universal correlation between M variable to be investigated lies in nonlinear interaction coefficient matrix In R, in order to quantitatively measure it, it is proposed that the concept of coefficient correlation redundancy：

Wherein, λ_iIt is the characteristic value of nonlinear interaction coefficient matrix R.Because matrix R is M × M dimensions, so its characteristic equation Solution be controlled in less amount of calculation so that increased training sample screening process of the invention can't be brought in computing Burden.

Because the strong point of high light spectrum image-forming equipment is spectral resolution very high, general 10nm or so, so to a certain extent Cause the relatively low short slab of its image resolution ratio, direct consequence is exactly the poor spatial resolution that is mapped of each pixel, i.e., The ground region that each pixel is included is larger, so when samples selection is trained, often in the form of line segment Pixel in selection continuum.Line segment type selector is different with circular (or approximate with square) selector, and the latter can guarantee that Space length in region between pixel is no more than selector diameter, but the former can but make selector two ends pixel have compared with Big space length.The growth of this space length is, it is necessary to we can not be only with correlation information when screening Evaluation meanses, therefore we devise the high-spectral data training sample screening based on coefficient correlation redundancy.The algorithm is main Using can either metric linear relevant information, the nonlinear interaction coefficient of nonlinear correlation information can be measured again as support, And the situation of multidimensional variable input is expanded to by matrix operation, while also having less amount of calculation.

Step 2：Parameter to SVM kernel functions carries out optimizing.

1) according to the different characteristics of SVM classifier kernel function, from the RBF (Radial with symmetrical inner product Basis Function, RBF) as the kernel function of SVMs, because either low-dimensional, higher-dimension, small sample or full-page proof This situation, RBF kernel functions are applicable, and are ideal classification foundation functions with loose convergence domain.RBF forms For：Wherein, P is the vector corresponding to the test pixel of input, P_iIt is training sample set In pixel corresponding to vector, σ is width parameter, controls the radial effect scope of function.Simultaneously, it is contemplated that implementation process In the operand size that can bear, the traversal examination of data below is carried out to parameter σ：σ ∈ { 0.01,0.1,1,10 }；

2) training sample set obtained with preceding step is combined into training of the training data completion to SVM algorithm, due to SVM points Class device is two graders, i.e., can only distinguish two kinds of different classifications every time, so taking following optimizing scheme：

First, the class pixel with identical category index value is selected from training sample set, by its new category label - 1 is defined as, and remaining all pixels then correspond to new category label 1 in training sample set, and in abandonment sample set Pixel also does such treatment.Two classification problems of standard are thus constructed, can be applied based on RBF kernel functions SVM algorithm goes to classify, and can calculate the classification accuracy rate of all test samples, i.e., for the nicety of grading of the category, This mode classification is also referred to as one-to-many tactful svm classifier；

Then, according to same method, above-mentioned treatment is all done to each classification in training sample set, then can obtains To the nicety of grading for all categories；Nicety of grading to all categories takes average, obtains average nicety of grading.

Because training sample set and abandonment sample set are above to have obtained, so change the different values of σ, just Different average classification accuracy values can be obtained, the numerical value can be used to the performance of evaluating σ.

The SVM classifier algorithm used in above-mentioned steps is given below：

Due to having specified that RBF kernel functions, then grader is represented by：

Wherein, sgn () is sign function, α=(α₁, α₂..., α_N) it is Lagrange multiplier, P is the test pixel of input Corresponding vector, P_iVector corresponding to the pixel in training sample set, N is training sample sum, and b is threshold value, y_i∈ { -1,1 } is actual substitution numerical value of kind of the different classification of new category label, i.e., two when SVM computings are carried out.

SVM adds the maximized constraints in class interval by by above mentioned problem, is converted to the antithesis of convex quadratic programming Problem is solved, and obtains α and b, and the training sample in wherein α corresponding to nonzero value is support vector.So far, grader instruction White silk is finished, and test sample substitution grader expression formula can be carried out into classification determines.Additionally, during classifier training, also It is related to the penalty factor being modified to the sample that crosses the border, but for the image application in EO-1 hyperion field, many experiments all tables The influence very little of the bright factor pair classification results, therefore the present invention is during parameter optimization, penalty factor is not traveled through Examination.

Step 3：Two classification tasks to high-spectrum remote sensing are completed using SVM classifier algorithm.

Because SVM classifier is two graders, therefore we need to select h from training sample set, and the species of k two is other Pixel is trained as training data to the SVM classifier of known parameters σ, and to all L test sample P_t, 1≤t≤L Classification estimation is carried out, the classification estimate of all test samples is obtained afterwards：

f_{H, k}(P_t), 1≤t≤L.

If there was only two kinds of classifications in training sample set, need not carry out the step of behind the present invention, using f_{H, k}(P_t) The kind judging to all test samples can be completed；Otherwise, it is necessary to select a pair other different classes of combinations in addition, lay equal stress on Step 3 is performed again, until all category combinations two-by-two are all finished by calculating.

Step 4：Many classification tasks are realized based on one-to-one strategy.

One-to-one strategy to any two kinds of classifications all structural classification devices, and by these grader concurrent operations, test sample Final classification determined by vote by ballot.It is easy that the strategy causes each SVM to differentiate, there is extraordinary table on the training time It is existing.

, it is necessary to calculate test sample P before decision-making_tTo the score function F of every kind of classification_h(P_t), 1≤t≤L.The function The positive negative score of each sub-classifier in step 3 is counted, its computing formula is as follows：

Wherein, w is the classification sum of many classification tasks, i.e., the class number for being included in training sample set.

The final decision of one-to-one strategy takes Voting principle, and the final classification mark of each test sample is obtained according to following formula Number：

Specific embodiment two：Present embodiment chooses a hyperspectral image data for standard, PaviaU data acquisition systems. Data separate ROSIS hyperspectral imagers shoot, and place is the Pavia universities of North of Italy, single containing 103 wave bands Band image size is 610 × 340, and is investigated on the spot by related scientific research personnel and give ground truly with reference to figure, so using The data acquisition system is tested, can actual assessment go out the nicety of grading of classifier algorithm.The configuration of experiment computer used is as follows： Intel i52.5GHz processors, 4G internal memories.

Experiment 1：PaviaU data acquisition systems are classified using the inventive method.

We have chosen more Meadows, Asphalt, Trees, Painted metal of pixel in PaviaU images Sheets this 4 kinds of classifications are tested, and training sample is selected with the line segment sampler of 5 × 1 sizes, and 4 pixels are retained every time, right Have in screening every timePlant permutation and combination.4 kinds of classifications have all respectively carried out 20 samplings, i.e., respectively have selected 100 pictures Element training sample, by screening after, every kind of classification all remains 80 training samples of pixel；The other test specimens of 4 species This is respectively 18549,6531,2964,1245 pixels.

Perform step：Using the training sample set required for coefficient correlation redundancy automatic screening Supervised classification.

When automatic screening is carried out, training sample is selected with the line segment formula sampler of 5 × 1 sizes, what setting retained every time Pixel is 4, that is, be automatically deleted 1.4 kinds of classifications are carried out with 20 independent sample operations, the final training sample for obtaining respectively Collection is combined into：

{(P₁, Class₁), (P₂, Class₂) ..., (P₃₂₀, Class₃₂₀)}。

Abandoning sample set is：

{(P₃₂₁, Class₃₂₁), (P₃₂₂, Class₃₂₂) ..., (P₄₀₀, Class₄₀₀)}。

Perform step 2：Parameter to SVM kernel functions carries out optimizing.

Width parameter σ to grader Kernel Function is substituted into one by one, and span is { 0.01,0.1,1,10 }.Will Each class in above-mentioned training sample set is two classification SVM and is calculated with remaining all classes, and is made of the whole sample set that abandons Test, counts the nicety of grading of the category.The other nicety of grading of four species can be so respectively obtained, obtains average after taking average Nicety of grading.During correspondence σ=0.1, when the average nicety of grading obtains maximum, therefore kernel functional parameter is defined as 0.1.

Perform step 3：Two classification tasks to high-spectrum remote sensing are completed using SVM classifier algorithm.

Respectively with training sample set { (P₁, Class₁), (P₂, Class₂) ..., (P₃₂₀, Class₃₂₀) in class two-by-two SVM is not trained as training data, afterwards all test samples are carried out with classification calculating, storage result in case walk below Suddenly use.

Because experiment have chosen 4 kinds of classifications altogether, therefore step 3 will be performed altogetherTime.

Perform step 4：Many classification tasks are realized based on one-to-one strategy.

According to the result that previous step is obtained, using the Voting principle of one-to-one strategy, each test specimens is finally judged This category label.

Because PaviaU data have ground truly with reference to figure, we calculate the nicety of grading of every kind of classification accordingly.This Outward, also the support vector sum that the time loss and all two graders of each execution step are used is counted.

Experiment 2：As the experiment contrasted with sorting technique proposed by the present invention.

Contrast experiment is carried out using the one-to-one tactful SVM classifier algorithm of standard, wherein width parameter σ is directly using real Test result σ=0.1 of 1 optimizing.The training sample that experiment 1 is selected is also adopted by contrast experiment, i.e., 4 kinds classifications are equally all chosen 100 training samples of pixel, but not through screening；Test sample and experiment 1 are completely the same, to the other survey of 4 species Sample sheet is classified according to pixel, and obtains the nicety of grading of every kind of classification.Additionally, to the classification time loss and branch of experiment 2 The vectorial sum of support has been also carried out statistics.

Fig. 2 and Fig. 3 are respectively the classification results scatter diagram for testing 1 and experiment 2, the picture of white point presentation class mistake therein Element, black region is the correct pixel of classification, and gray area is then background, that is, be not chosen as training, the region of test sample.It is young Two figures of thin observation, it can be found that the misclassification pixel of experiment 1 is substantially less.As shown in table 1, contrast is of the invention for specific nicety of grading Method and tradition SVM methods, all other niceties of grading of 4 species have lifting, and average nicety of grading improves 0.96%.

The experiment of table 11 and the nicety of grading statistics of experiment 2

	Experiment 1	Experiment 2
			The nicety of grading of classification 1 (%)	97.29	96.51
The nicety of grading of classification 2 (%)	98.91	98.87

The nicety of grading of classification 3 (%)	87.13	85.53
			The nicety of grading of classification 4 (%)	87.41	85.99
Average nicety of grading (%)	92.69	91.73

We have also counted two time loss of experiment, refer to table 2.Wherein, the time-consuming of experiment 1 is divided into three parts, i.e., The one-to-one plan of the screening training sample time of step one, the classifier parameters optimal time of step 2, step 3 and step 4 Slightly SVM completes many classification tasks and takes；And it is only then that many classification tasks of one-to-one tactful SVM completions take to test 2.Present invention side Method can effectively reduce the classification time, be reduced for the embodiment 4.01 seconds, even if be included in all steps It is overall to take the advantage for also having 2.55 seconds.

The experiment of table 21 and the run time statistics of experiment 2

Additionally, we have also counted two support vector sums of experiment, 1 all two graders of experiment generate 870 altogether Individual support vector, experiment 2 all two graders generate 1102 support vectors altogether, the former 232 fewer than the latter support to Amount.It can be seen that, the inventive method is by quickly and efficiently automatic screening training sample process so that follow-up SVM classifier is in instruction The support vector that can be simplified during white silk, not only nicety of grading increases, and can directly reduce test sample Classification calculate the time.

Claims

1. a kind of high-spectral data supervised classification method based on coefficient correlation redundancy, it is characterised in that methods described step It is as follows：

1) high-spectrum remote sensing for shootingWherein Row, Column represent the width of high-spectrum remote sensing And length, B represents the wave band number of high-spectrum remote sensing, and training is selected according to the line segment sampler of E × 1 by graphical analysis personnel Sample；

2) training sample for artificial selection carries out automatic screening, and the pixel for setting each reservation is M, and M ＜ E delete The E-M pixel of training sample is fallen；

3) the coefficient correlation redundancy of M pixel of all possible options is calculated, and selects a maximum coefficient correlation of numerical value Redundancy, i.e.,ThenIt is the M pixel retained from E pixel；

4) a category label Class for determination is marked to the E pixel unification that this is selected^*, the M pixel that will retain is with number According to formTraining sample set is included into, remaining E-M pixel is also with data pair FormIt is included into abandonment sample set；

5) if necessary to continue to select more training samples, then return 1), otherwise carry out step 2；

Step 2, the parameter to SVM kernel functions carry out optimizing：

The class pixel with identical category index value is selected in a, the training sample set obtained from step one, by its new category Label definition is -1, and remaining all pixels then correspond to new category label 1 in training sample set, and to abandoning sample set In pixel also do same treatment, construct two classification problems of standard, gone using the SVM algorithm based on RBF kernel functions Classification, and calculate the nicety of grading for the category；

B, according to same method, above-mentioned treatment is all done to each classification in training sample set, then obtain for all The nicety of grading of classification；

C, the nicety of grading to all categories take average, obtain average nicety of grading；

D, the σ values corresponding to average nicety of grading maximum are selected as the parameter of SVM classifier algorithm；

H is selected from training sample set, the other pixel of the species of k two is entered as training data to the SVM classifier of known parameters σ Row training, and to all L test sample P_tClassification estimation is carried out, 1≤t≤L, the classification that all test samples are obtained afterwards is estimated Evaluation：

f_h,k(P_t), 1≤t≤L；

If there was only two kinds of classifications in training sample set, step 4 need not be carried out, using f_h,k(P_t) can complete to all The kind judging of test sample；Otherwise, it is necessary to select a pair other different classes of combinations in addition, and step 3 is repeated, directly All finished by calculating to all category combinations two-by-two；

Step 4, many classification tasks are realized based on one-to-one strategy：

According to the results obtained in step three, using the Voting principle of one-to-one strategy, the class of each test sample is finally judged Other label.

2. the high-spectral data supervised classification method based on coefficient correlation redundancy according to claim 1, its feature It is E >=2.

3. the high-spectral data supervised classification method based on coefficient correlation redundancy according to claim 1, its feature It is the concept of the coefficient correlation redundancy：

C C R (P_{1}, ..., P_{M}) = Σ_{i = 1}^{M} {(\frac{λ_{i}}{M})}^{2},

Wherein, λ_iIt is the characteristic value of nonlinear interaction coefficient matrix R.

4. the high-spectral data supervised classification method based on coefficient correlation redundancy according to claim 1, its feature It is that the SVM classifier is expressed as：

f (P) = sgn (Σ_{i = 1}^{N} y_{i} α_{i} K (P_{i}, P) + b);

Wherein, sgn () is sign function, α=(α₁,α₂,...,α_N) it is Lagrange multiplier, P is right by the test pixel of input The vector answered, P_iVector corresponding to the pixel in training sample set, N is training sample sum, and b is threshold value, y_i∈{-1, 1 } it is actual substitution numerical value of kind of the different classification of new category label, i.e., two when SVM computings are carried out.

5. the high-spectral data supervised classification method based on coefficient correlation redundancy according to claim 1, its feature It is that the final decision of a pair of strategy takes Voting principle, the final classification mark of each test sample is obtained according to following formula Number：

h^{*} = \underset{h = 1, ..., w}{\arg \max} {F_{h} (P_{t})};

Wherein, F_h(P_t) be every kind of classification score function, 1≤t≤L, w are the classification sum of many classification tasks, i.e. training sample The class number included in set.

6. the high-spectral data supervised classification method based on coefficient correlation redundancy according to claim 5, its feature It is the score function F of every kind of classification_h(P_t) computing formula it is as follows：

F_{h} (P_{t}) = Σ_{k = 1, k &NotEqual; h}^{w} f_{h, k} (P_{t}), 1 \leq t \leq L .