CN106203523B

CN106203523B - The hyperspectral image classification method of the semi-supervised algorithm fusion of decision tree is promoted based on gradient

Info

Publication number: CN106203523B
Application number: CN201610561589.1A
Authority: CN
Inventors: 张向荣; 焦李成; 张鑫; 冯婕; 白静; 马文萍; 侯彪; 马晶晶
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2016-07-17
Filing date: 2016-07-17
Publication date: 2019-03-01
Anticipated expiration: 2036-07-17
Also published as: CN106203523A

Abstract

The invention proposes a kind of hyperspectral image classification methods that the semi-supervised algorithm fusion of decision tree is promoted based on gradient, for solving the lower technical problem of nicety of grading present in the existing classification hyperspectral imagery combined based on Active Learning with semi-supervised learning, step includes: (1) input hyperspectral image data；(2) sample point feature is extracted；(3) training gradient promotes decision tree classifier parameter；(4) sample point classification is concentrated to study；(5) sample pixel confidence is assessed；(6) pass through rarefaction representation Screening Samples point；(7) updating has label training set；(8) output category result.The present invention assesses the confidence level of unmarked sample point using classifier prediction result and rarefaction representation, according to the height of unmarked sample pixel confidence, it is divided into two set and carries out different processing, the burden that handmarking is alleviated while improving nicety of grading, can be used for the fields such as geologic survey, atmosphere pollution.

Description

Hyperspectral image classification method based on gradient lifting decision tree semi-supervised algorithm fusion

Technical Field

The invention belongs to the technical field of image processing, relates to a hyperspectral image classification method, and particularly relates to a hyperspectral image classification method based on gradient lifting decision tree semi-supervised algorithm fusion, which can be used in the fields of geological survey, atmospheric pollution, military target strike and the like.

Background

With the development of optical remote sensing technology, the remote sensing imaging process is from panchromatic (black and white) image, color photography, multispectral scanning imaging to today's hyperspectral remote sensing imaging and hyperspectral imaging. The hyperspectral remote sensing technology adopts 10^-2The lambda continuous spectrum channel carries out continuous remote sensing imaging on the ground features, a large amount of ground feature image data with complete spectrum information are obtained, synchronous obtaining of ground feature space information, radiation information and spectrum information is achieved, the characteristic of 'map integration' is achieved, and convenience is brought to ground feature identification.

Commonly used hyperspectral image data include Indian pin dataset and Kennedy Space Center (KSC) dataset obtained by AVIRIS, an airborne visible light/infrared imaging spectrometer of NASA jet propulsion laboratory, NASA, and Botswana dataset obtained by EO-I hyper spectrometer of NASA, among others.

The hyperspectral image ground object classification problem is mainly that the ground object is classified by utilizing the spectral features of the ground object, the spectral form of each pixel content in the hyperspectral image is analyzed, and the category of the hyperspectral image ground object is judged according to the features of the content. The traditional hyperspectral image classification method mainly comprises a supervised classification method represented by a Support Vector Machine (SVM) and a neural network and an unsupervised classification method represented by a fuzzy clustering method. The supervised classification method needs a large number of labeled samples to train to obtain a classifier with good performance, the training data set of the hyperspectral remote sensing image classification problem is sample points labeled with class labels on the remote sensing images, and the class labels for labeling the sample points are all completed manually. However, requiring human experts to manually label hyperspectral images is a time-consuming, labor-intensive and costly task; for the unsupervised classification method, due to the lack of prior knowledge, the sample is divided into a plurality of classes only according to the spectral feature distribution rule of the ground features of the remote sensing image, the classification result only distinguishes different classes, the attribute of the class cannot be determined, and the correct correspondence between the clustered classes and the ground feature classes cannot be ensured.

Under the circumstances, a hyperspectral image classification method based on semi-supervised learning and active learning attracts wide attention of scholars at home and abroad. The semi-supervised learning trains the initial classifier by using a small amount of labeled data, and further improves the performance of the initial classifier by using a large amount of unlabelled data to achieve accurate learning, thereby making up the defects of supervised learning and unsupervised learning to a certain extent. Common semi-supervised classification methods include self-training methods, collaborative training, probabilistic model generation algorithms, semi-supervised Support Vector Machines (SVM), and graph-based methods. In these methods, a class label is assigned to unlabeled data, and the classifier is retrained using the data from which the class label is obtained, to obtain a final classification result. However, semi-supervised learning has the disadvantages that under the conditions of less samples and insufficient model training, the class label prediction of unlabeled data is often inaccurate, and adding the incorrectly labeled samples into the training set will result in the degradation of the learning performance of the classifier. Active learning aims to select samples which are valuable to a classification model through a certain query strategy and filter redundant sample information, so that the samples with abundant information are manually marked according to the knowledge and experience of field experts. The main task of active learning is to find an efficient sample query strategy, so that the quality of selectively marked samples is high and few, the classification performance can be ensured, and the workload of marking samples can be reduced. The current query strategies for active learning are as follows: 1) sampling based on sample uncertainty; 2) based on the sampling of the query expert committee, a committee is formed by a plurality of classifiers, and whether samples are selected or not is determined by voting. In active learning, the accuracy of the labeling can be guaranteed by experts labeling unlabeled samples, but manual labeling tends to be time consuming and labor intensive.

Active learning ensures one hundred percent accuracy by consulting human experts to introduce artificially labeled samples. The number of samples that can be manually marked is limited due to the time and effort involved in manual marking. The semi-supervised learning relies on a classifier to predict unmarked samples, and the quality cannot be guaranteed when the number of newly added samples is large. According to the characteristics of the two methods, scholars at home and abroad consider the combination of the two methods, a hyperspectral image classification method based on the combination of active learning and semi-supervised learning is provided, and the burden of manual marking is reduced while the number of newly added marked samples is ensured. For example, in a paper "a New Semi-supervised-assisted for Hyper-spectral image classification With differential Active Learning" (white pers, 2012), Inmaculada D' opino, Jun Li et al discloses a Semi-supervised Active Learning method for Hyper-spectral image classification, and a query strategy of Active Learning is used to screen unlabelled samples selected in the Semi-supervised Learning process, so as to select the sample With the most abundant information content. The method comprises the following specific steps: calculating the maximum posterior probability of unmarked samples in the neighborhood of the marked samples in a sparse polynomial logistic regression classifier; adding the given class mark with higher probability into a certain specific set; selecting samples in the set by using several common query strategies in active learning, and selecting the sample which contributes most to the performance improvement of the classifier; and adding the selected samples into the marked sample set, and retraining the classifier. The method saves time and labor, but due to the lack of a manual marking process, only the classifier is relied on for class marking prediction, and the classification precision needs to be improved.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a hyperspectral image classification method based on gradient lifting decision tree semi-supervised algorithm fusion.

In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:

(1) inputting a hyperspectral image containing C-type and N sample points, taking a neighborhood window of each sample point, taking the maximum value of each dimensional feature of all the sample points in the window as the spatial feature of the central sample point, and connecting the spectral feature and the spatial feature of the sample points in series to obtain a spatial spectral feature vector of the sample points;

(2) selecting a marked training set, a learning set and a testing set from an input hyperspectral image, and realizing the steps as follows:

(2a) randomly selecting r sample points from each type of sample points of the input hyperspectral image to obtain a labeled training setIts corresponding class label set isWherein n is the total number of marked training sample points, and n is C × r, x_iFor the ith labeled sample point of the labeled training set,/_iFor the class to which the ith labeled training sample point belongs, l_iE.g., {1,2,. and C }, wherein R is a real number domain, and D is a characteristic dimension of the sample point;

(2b) randomly selecting sample points with per1 ratio from the selected sample points except n marked sample points to obtain a learning setWhere s is the total number of learning set sample points, s ═ N × per1, z_qIs the q sample point in the learning set;

(2c) constructing a test set using the remaining sample pointsm is the total number of samples in the test set, and m is N-N-s, y_jIs the jth test sample point of the test set;

(3) using labeled training setsTraining parameters of a GBDT classifier of the gradient lifting decision tree by using the feature vectors of the sample points and the corresponding class mark matrix, wherein every two classes of marked sample points can be trained to obtain a two-classifier model, and finally, the C class of marked sample points can obtain C x (C-1)/2 two-classifier models;

(4) study setThe sample points in the learning set Z are input into a plurality of obtained two-classifier models to obtain each sample point in the learning set ZThe prediction class mark k;

(5) according to the obtained learning setAt each sample point z_qThe predicted class mark k of (2) judges each sample point z in each two-classifier model_qWhen the class mark is classified into the kth class, whether the winning times P of the class mark k is equal to C-1 or not is judged, if yes, the sample point is added into the empty set S_semiOtherwise, adding the sample point into the empty set S_actPerforming the following steps; judging all sample points in the learning set Z one by one to obtain a setAnd collectionsWherein z is_q1Is a set S_semiSample point of (1), z_q2Is a set S_actS' is the set S_semiThe total number of sample points in (1), S' is the set S_actThe total number of sample points in (1), wherein s' + s ═ s;

(6) using sparse representation, the resulting set S is_semiAnd set S_actThe sample points in (1) are screened, and the implementation steps are as follows:

(6a) constructing dictionary A ═ X by using all sample points in labeled training set X₁,x₂,…,x_n]And respectively aligning the sets S by using the constructed dictionary A_semiSample point z in_q1And set S_actSample point z in_q2Performing sparse representation: z is a radical of_q1＝Aα₁，z_q2＝Aα₂Wherein, α₁And α₂Is a sparse representation coefficient vector;

(6b) obtaining a sample point z by using an orthogonal matching pursuit algorithm OMP_q1And sample point z_q2Represents the coefficient vector:andwherein | · | purple₂Is 1₂Norm, measure data reconstruction error; i | · | purple wind₁Is 1₁Norm for guaranteeing vector α₁Sum vector α₂λ is a balance factor of the reconstruction error term and the sparse term;

(6c) representing coefficient vectors α from sparseness₁And α₂Class labels with labeled sample points corresponding to non-zero entries, i.e./, in_iE {1, 2.., C }, and set S_semiMiddle prediction class mark k and class mark l_iSame sample point z_q1Screening out and assigning the class labels of all the screened sample points to class labels l_i(ii) a At the same time set S_actMiddle prediction class mark k and class mark l_iDifferent sample points z_q2Screening out, and handing all the screened sample points to an expert for manual labeling;

(7) will gather S_semiMiddle-assigned class label l_iSample point z of_q1And set S_actSample point z in which manual labeling is performed_q2Adding the new classifier model into a labeled training set X, and retraining the classifier parameters to obtain a new classifier model;

(8) iterating the steps (3) to (7) until the set iteration times are met, and carrying out comparison on the test set by utilizing the finally obtained classifier modelThe sample points in the test set are classified to obtain the classification result of the test set

Compared with the prior art, the invention has the following advantages:

1. according to the invention, the confidence coefficient of the unmarked sample points is evaluated by adopting the classifier prediction result and sparse representation, and meanwhile, the two sets are divided according to the confidence coefficient of the unmarked sample points, and different processing is carried out according to the characteristics of the two sets.

2. The invention adopts the artificial mark and the unmarked sample points predicted by the classifier to update the marked training set, and simultaneously utilizes the marked sample points and the unmarked sample points to train the classifier, thereby effectively reducing the number of the required marked sample points, ensuring the classification accuracy and simultaneously lightening the burden of the artificial mark.

Drawings

FIG. 1 is a block diagram of an implementation flow of the present invention;

FIG. 2 is a simulation comparison diagram of classification accuracy when the number of marked training sample points is different between the present invention and the prior art.

Detailed Description

The invention is further illustrated below with reference to the accompanying drawings and examples.

Referring to fig. 1, the method of the present invention includes the following steps:

step 1, inputting hyperspectral image data:

and inputting a hyperspectral image, removing background sample points, wherein the number of the remaining sample points is N and comprises C categories.

Step 2, extracting the spatial spectrum characteristics of the sample points, wherein the implementation steps are as follows:

and 2a, using the spectral characteristic value of each wave band of each sample point as the spectral characteristic vector of the sample point, wherein the original characteristic dimension of the sample point is d.

And 2b, taking a neighborhood window of each sample point, wherein the window size is c multiplied by c, taking the maximum value of each dimensional feature of all the sample points in the window as the spatial feature of the central sample point, and the feature dimension is d.

And 2c, connecting the spectral features and the spatial features of the sample points in series to obtain a final feature vector, wherein the feature dimension is D, and D is 2 × D.

And 3, selecting a marked training set X, a test set Y and a learning set Z from the input hyperspectral image, and realizing the steps as follows:

step 3a, randomly selecting r sample points from each type of sample points of the input hyperspectral image to form a labeled training setIts corresponding class label set isWherein n is the total number of marked training sample points, and n is C × r, x_iFor the ith labeled sample point of the labeled training set,/_iFor the class to which the ith labeled training sample point belongs, l_iE.g. {1,2,. and C }, wherein R is a real number field;

step 3b, randomly selecting sample points with the proportion per1 from the selected sample points except the n marked sample points to form a learning setWherein s is the total number of the learning set sample points, and s is (N-N) x per1, z_qIs the q sample point in the learning set;

step 3c, forming a test set by using the residual sample pointsm is the total number of samples in the test set, and m is N-N-s, y_jIs the jth test sample point of the test set;

step 4, training parameters of a GBDT classifier, and performing class mark prediction on sample points in a learning set, wherein the implementation steps are as follows:

step 4a, inputting a marked training setThe feature vectors of the middle sample points and the corresponding class mark matrixes are input into a GBDT classifier, and classifier parameters are trained;

step 4b, inputting the learning setObtaining a sample point z by adding the feature vector of the middle sample point into the obtained classifier model_qA corresponding class label k;

step 5, dividing the sample points into two sets according to the confidence of the sample points in the learning set, and realizing the steps as follows:

step 5a, sample z_qObtaining a prediction result value score (k) and score (t) through classification of the obtained two classifiers, wherein the two classifiers are obtained by training the kth class marked sample point and the tth class marked sample point, k belongs to {1, 2.. multidot., (C }, t belongs to {1, 2.. multidot., (C }, k is not equal to t, score (k) and score (t) are obtained through the classifiers on a sample z_qThe predicted result values of the kth class and the t-th class of (1);

step 5b, sample z_qThe winning times P of the class k obtained based on the two classifiers is

Wherein,

step 5C, if P ═ C-1, denotes sample z_qThe confidence of the true class of (1) marked as k is higher; the main purpose of semi-supervised learning is to find out easy-to-mark high-confidence unmarked sample points, make class mark prediction by using classifier model, add them into marked training set, so that it uses z as reference_qPut into the empty set S_semiIn (1), get the setz_q1Is a set S_semiS' is the set S_semiThe total number of sample points in (1);

step 5d, if P ≠ C-1, it means that sample z_qThe confidence of the true class of (1) labeled k is lower; in active learning, samples which are difficult to be distinguished and rich in information content are screened out for manual marking, so that z is marked_qPut into the set S_actIn (1), get the setz_q2Is a set S_actSample points in (1), S' is the set S_actThe total number of sample points in (1);

step 6, for set S_semiAnd set S_actThe sample point sparse representation in (1) is realized by the following steps:

step 6a, constructing a dictionary A, wherein A is [ x ═ x₁,x₂,…,x_n]，x₁,x₂,…,x_nThe sample points in the marked training set are represented, n is the total number of the marked training sample points, and the characteristic dimension of the sample points is D, so that the size of the dictionary is Dxn;

step 6b, for set S_semiSample point z in_q1And set S_actSample point z in_q2Respectively carrying out sparse representation to obtain sparse representation z_q1＝Aα₁And z_q2＝Aα₂；

Step 6c, obtaining a sample point z by utilizing an Orthogonal Matching Pursuit (OMP) algorithm_q1And sample point z_q2Represents the coefficient vector:andwherein | · | purple₂Is 1₂Norm, measure data reconstruction error; i | · | purple wind₁Is 1₁Norm for guaranteeing vector α₁Sum vector α₂The method is characterized in that the sparsity of the method is realized by the following steps that lambda is a balance factor of a reconstruction error term and a sparse term:

step 6c1, initializing residual term r⁽⁰⁾＝z_qIndex setFor a K-dimensional zero vector, the initial iteration J is 1

Step 6c2, finding the residual r^(J-1)And column j x in dictionary A_jThe index lambda corresponding to the maximum inner product,

step 6c3, updating index set Λ^(J)，Λ^(J)(J) λ. According to the index set, selecting corresponding atom columns from the dictionary A to form a set A^(J)＝A(:,Λ^(J)(1:J))；

Step 6c4, obtaining J-order approximation by least squares

Step 6c5, updating residual r^(J)＝z_q-A^(J)α^(J),J＝J+1；

And 6c6, repeating the steps 6c 2-6 c5, judging whether J is larger than K, if so, finishing iteration, and otherwise, executing the step 6c 2.

z_qIs a set S_semiAnd S_actα is a sparse representation coefficient vector;

step 7, representing the coefficient vector α according to the sparseness₁And α₂Class label l with labeled sample points corresponding to positions of medium and non-zero items_iE {1, 2.., C }, set S of pairs_semiAnd set S_actSample point z in_q1And z_q2And (5) screening.

Step 7a, when the jth dictionary atom x in the dictionary A_jAnd study collectionQ (th) sample point z_qWhen belonging to the same class, α corresponds to position α_jiThe value is 1, and when different, 0; if set S_semiMiddle sample point z_q1Prediction class k and its sparse coefficient matrix α₁Class label l with labeled sample points corresponding to positions of medium and non-zero items_iSame, denotes the sample point z_q1If the marked sample point belongs to the same class, the sample point z is set_q1Is given as_i。

Step 7b, if set S_actSample point z in_q2Prediction class k and its sparse coefficient matrix α₂Class label l with labeled sample corresponding to middle non-zero item position_iDifferent, represents the sample point z_q2The class mark predicted by the classifier is inconsistent with the class mark obtained by sparse representation, and the sample point z_q2And screening out the sample points which belong to the sample points difficult to be classified, and submitting the sample points to an expert for manual marking.

Step 8, adding S_semiSample points z in the set assigned class labels_q1And S_actSample points z in the set for manual labeling_q2Adding the new labeled training set into the labeled training set X and inputting the new labeled training setRetraining classifier parameters by using the feature vectors of the middle sample points and the corresponding class mark matrix to obtain a new classifier model；

Step 9, outputting the classification result

Using a gradient boosting decision tree classifier, a new labeled training set is input in the first stepFeature vector and class label set of middle sample pointsFor training, second step input test setObtaining a class mark matrix of the test set by the feature vector of the middle test sample through a gradient boosting decision tree classifierWherein l'_jIndicating the class label to which the jth test sample belongs.

Step 10, calculating classification accuracy

And comparing the real class mark matrixes to obtain a classification precision result.

The technical effects of the present invention will be further described below with reference to simulation experiments.

1. Simulation conditions are as follows:

the simulation experiment is carried out by MATLAB 2014a software on a WINDOWS 7 system with a CPU of Intel Core (TM) i3-3110M, a master frequency of 2.40GHz and a memory of 4G.

2. Simulation content and analysis:

simulation experiments adopt Ind Pine images obtained by AVIRIS of NASA jet propulsion laboratory in 1992 6 in North Indiana by 145, which have 220 wave bands, and remove noise and absorbed by atmosphere and water, which have 200 wave bands and 16 types of ground object information, wherein the number of data of part of types is very small, and in the simulation experiments, only 9 types of data shown in Table 1 are considered, and the whole image is divided into 9 types.

TABLE 1 type 9 data in Indian Pine images

Categories	Category name	Number of
			1	Corn-no till	1434
2	Corn-min	834
			3	Grass/Pasture	497
4	Grass/Trees	747
			5	Hay-windrowed	489
6	Soybeans-no till	968
			7	Soybeans-min	2468
8	Soybean-clean	614
			9	Woods	1294

The invention is used for classifying the high-spectrum Image Indian Pine with the prior art, and the prior art for comparison is a Semi-supervised Active Learning method provided in a paper "A New Semi-supervised applied for Hyper-spectral Image Classification with differential Active Learning" (WHISPERS, 2012). The method takes a gradient lifting decision tree GBDT as a classifier, and is abbreviated as SSAc + GBDT on the basis of a hyperspectral image classification method combining active learning and semi-supervised learning.

The number of decision trees of the GBDT classifier is set to be 100, and the downsampling proportion is set to be 50%; the window size c × c is set to 15 × 15, and the selection ratio per1 of the learning set is set to 30%.

A fixed number of sample points are selected from 9 types of data shown in Table 1 as a marked training set, a certain proportion of sample points are selected as a learning set, the rest of sample points are used as a test set, the learning set and the test set are unmarked sample points, 10 classification experiments are carried out on the 9 types of data by using the method and the prior art, the average value of classification results is taken as the final classification accuracy, as shown in figure 2, the method is a classification accuracy simulation comparison graph when the number r of the marked training sample points of each type is respectively 5, 10 and 15, the abscissa represents the number of the marked training sample points of each type, and the ordinate represents the classification accuracy. It can be seen from fig. 2 that when the number of marked sample points in each category is different, the classification accuracy of the invention is obviously higher than that of the prior art.

In conclusion, the method classifies the hyperspectral images by combining the semi-supervised algorithm fusion on the basis of the gradient lifting decision tree, fully utilizes the structural information of the unmarked sample points, can reduce the calculated amount and obtain higher classification precision, and has certain advantages compared with the existing method.

Claims

1. A hyperspectral image classification method based on gradient lifting decision tree semi-supervised algorithm fusion comprises the following steps:

(1) inputting a hyperspectral image containing C-type and N sample points, taking a neighborhood window of each sample point, taking the maximum value of each dimensional feature of all the sample points in the window as the spatial feature of each sample point, and connecting the spectral feature and the spatial feature of each sample point in series to obtain a spatial spectral feature vector of each sample point;

(2a) randomly selecting r sample points from each type of sample points of the input hyperspectral image to obtain a labeled training setIts corresponding class label set isWherein n is the total number of marked training sample points, and n is C × r, x_iFor the ith labeled sample point of the labeled training set,/_iFor the class to which the ith labeled training sample point belongs, l_iE is e to {1,2, …, C }, R is a real number domain, and D is a characteristic dimension of the sample point;

(2b) randomly selecting sample points with per1 ratio from the selected sample points except n marked sample points to obtain a learning setWherein s is the total number of the learning set sample points, and s is (N-N) x per1, z_qIs the q sample point in the learning set;

(4) study setInputting the sample points into the obtained two-classifier models to obtain a prediction class mark k of each sample point in the learning set Z;

2. The hyperspectral image classification method based on gradient boosting decision tree semi-supervised algorithm fusion according to claim 1, wherein the winning times P of class mark k in the step (5) are realized according to the following steps:

(5a) using a classifier model obtained by training the kth class labeled sample point and the t th class labeled sample point to perform on the sample z_qClassifying to obtain a prediction result value score (k) and score (t), wherein k belongs to {1, 2.., C }, t belongs to {1, 2.., C }, and k is not equal to t;

(5b) using the obtained prediction result values score (k) and score (t), each sample point z is obtained_qNumber of wins P for category k:

wherein the indication functionf＝score(k)＞score(t)。

3. The gradient boosting decision tree semi-supervised algorithm fusion based hyperspectral image classification method according to claim 1, wherein the orthogonal matching pursuit algorithm OMP is used to obtain the sample point z in the step (6b)_q1And sample point z_q2The sparse representation coefficient vector is realized by the following steps:

(6b1) initializing the residual term r⁽⁰⁾＝z_qIndex setFor a K-dimensional zero vector, the initial iteration J is 1

(6b2) Finding the residual r^(J-1)And column j x in dictionary A_jThe index lambda corresponding to the maximum inner product,

(6b3) update index set Λ^(J)，Λ^(J)(J) λ; according to the indexSelecting corresponding atom columns from the dictionary A to form a set A^(J)＝A(:,Λ^(J)(1:J))；

(6b4) α for obtaining J-order approximation by least squares method^(J)＝argmin||z_q-A^(J)α||₂；

(6b5) Updating residual r^(J)＝z_q-A^(J)α^(J),J＝J+1；

(6b6) And (6b2) repeating the steps (6b5), judging whether J is larger than K, if so, finishing the iteration, and otherwise, executing the step (6b 2).