CN110826597A

CN110826597A - Remote sensing image classification method based on integrated depth Fisher vector

Info

Publication number: CN110826597A
Application number: CN201910960279.0A
Authority: CN
Inventors: 苏卫华; 李博扬; 张世月; 赵欣然; 李秉宣; 黄如强; 蹇锐; 刘洋; 谢鹏发; 李世国; 卫家诚
Original assignee: Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center; National Defense Technology Innovation Institute PLA Academy of Military Science
Current assignee: Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center; National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2020-02-21

Abstract

The invention discloses a remote sensing image classification method based on integrated depth Fisher vectors, which can solve the problems of long classification time and low classification precision of the current remote sensing image classification algorithm. The method comprises the following steps of (1) feature extraction; (2) fisher feature encoding; (3) feature concatenation and classification. The invention has the beneficial effects that: the algorithm provided by the invention is subjected to precision evaluation from a quantitative point of view, and compared with the existing algorithm, the algorithm achieves excellent classification effect, and the accuracy results of 98.81% and 95.21% are respectively obtained on UCM and RSSCN 7. From the aspect of classification precision, the algorithm provided by the invention can accurately realize remote sensing image classification.

Description

Remote sensing image classification method based on integrated depth Fisher vector

Technical Field

The invention belongs to an image classification method, and particularly relates to a remote sensing image classification method based on an integrated depth Fisher vector.

Background

With the continuous development of high-resolution remote sensing image acquisition technology, a large number of remote sensing satellite images with high spatial resolution are easier to obtain, and classification and identification based on high-precision remote sensing images are widely applied to the military and civil dual-purpose fields of land resource management, city planning, dangerous environment reconnaissance and the like, so that huge practical value is exerted.

Currently, generalized solutions to such problems mainly include three categories:

the first type is a classification scheme based on manual features, semantic descriptions based on the feature construction principle are generated by adopting manually designed manual features such as HOG, LBP and the like, and then the semantic descriptions are put into a pre-trained classifier to obtain a classification result.

The second type is an end-to-end-based classification scheme, which abandons the idea of artificially selecting a feature descriptor in the first type method, hands the work of optimizing feature learning to network self-learning, and autonomously completes the task of optimizing network iteration through a back propagation method.

The third category is a classification scheme based on deep learning features, a deep learning model trained by million-level images has the generalized cognitive ability on everything, a pre-trained deep learning model is used as a feature extractor, higher-level semantic features are extracted, and then the pre-trained deep learning model is placed into a pre-trained classifier for classification.

However, the prior art has at least the following disadvantages and shortcomings after being researched:

the first category of methods has the disadvantages: the high-precision remote sensing terrain classification task has the difficulties of large intra-class difference, small inter-class difference, variable scale, variable geometry, variable scene and the like, some terrain images have strong visual deceptiveness, and the classification scheme adopting manual features faces the problems of low semantic level and difficulty in realizing high-precision classification.

The second category of methods has the disadvantages: the existing remote sensing terrain data set has small data storage, the existing mainstream benchmarking data set only has thousands of data, and for the classification problem of small samples, the adoption of an end-to-end classification scheme is easy to cause overfitting of a network, so that the generalization capability of the network is poor, and the classification precision is low.

The third category of methods has the disadvantages: although the existing scheme based on the pre-training deep learning model is greatly improved compared with the former two schemes, the problems of long operation time, insufficient post-processing method and the like still exist, and the classification precision still needs to be further improved.

Therefore, in order to further improve the effect of terrain classification of the high-precision remote sensing image, high-level semantic information needs to be further mined, multi-scale image information is integrated, global and local semantic information is integrated, and a diversified preprocessing/postprocessing method is integrated on the basis of a deep learning feature extraction scheme, so that the semantic recognition capability of a network on the high-precision remote sensing image features is further improved.

Disclosure of Invention

The invention aims to provide a remote sensing image classification method based on integrated depth Fisher vectors, which can solve the problems of long classification time and low classification precision of the current remote sensing image classification algorithm.

The technical scheme of the invention is as follows: a remote sensing image classification method based on integrated depth Fisher vector comprises the following steps,

(1) extracting characteristics;

(2) fisher feature encoding;

(3) feature concatenation and classification.

And the step (1) comprises the step of inputting the preprocessed image features into a deep learning model obtained through ImageNet pre-training, and obtaining the highly-differentiated global semantic features and local semantic features in the multi-scale image.

The multi-scale image in the step (1) comprises a first class of scales which are the same 224 × 224 scales as the default scales of the pre-trained deep learning model ResNet-50, and under the condition of the scales, a first non-convolutional layer in the network is extracted as a global description:

the multi-scale image in the step (1) comprises a second class of scales (128 × 128, 256 × 256, 512 × 512) in sequence, under the condition of the class of multi-scales, a middle layer of the multi-scale image in the network is screened to serve as an optimal layer combined with subsequent Fisher coding to generate an optimal localized description, and through a layer-by-layer coding experiment, the 37 th layer output in ResNet-50 is adopted as a deep convolution feature to be coded:

in the step (1), after the second-class scale is subjected to feature extraction, L2 regularization is carried out, namely, feature preprocessing is carried out on the feature L to obtain the multi-scale localization description after the correlation is removed,

and the correlation between the local depth convolution characteristics L is reduced, and the same variance between the L characteristics is ensured.

And (3) performing Fisher feature coding in the step (2), further optimizing the local semantic features obtained by the feature extraction in the last step to generate deep Fisher features, and improving the description capability of the high-precision remote sensing terrain image.

The Fisher coding layer is used for describing the multi-scale localization of the input

Recoding is carried out, local description optimization of the remote sensing image is completed, a depth Fisher vector DFF of the terrain remote sensing image is output, and the Fisher coding layer uses a Gaussian mixture model to construct a word codebook

Constructing a word codebook dictionary by using a Gaussian mixture model to locally describe the input single scale

Coding expression is carried out, and Gaussian mean and variance information of a feature space is extracted:

wherein T is the number of local characteristic points on the terrain remote sensing image; f. of_tIs a t-th local feature;the mean difference between the local features and the Gaussian mixture model;

for variance differences between local features and Gaussian mixture models, { w_n,μ_n,σ_nα represents the mixed weight, mean and diagonal covariance of each Gaussian distribution in the word codebook B1, respectively_t(n) assigning weights to the flexibilities, characterizing the height of the t-th quasi-local features relative to the n-thThe weight values of the s-hybrid model,

wherein, N (f)_t；μ_n,σ_n) Is f_tThe value in the nth gaussian distribution, the coding result is:

DFF:

the step (2) is described for multi-scale localizationAnd (5) performing the coding flow on the residual scales in the step (A), and finally connecting the DFFs of all scales in series to obtain a multi-scale Fisher vector I.

And (3) fusing the global semantic features and the depth Fisher vectors in a series connection mode to obtain new feature vectors, inputting the new feature vectors into a linear classifier, and finishing the classification task of the high-precision remote sensing terrain.

Specifically, the step (3) is to concatenate the encoding results in a cascade manner, and then obtain a final representation of the image through L2 regularization: integrated deep fischer

ADFF＝[I,H]

Preferably, a linear support vector mechanism establishes a classification layer, the specific implementation is LIBSVM, punishment parameters of the LIBSVM are obtained by adopting ten-fold cross validation, and the classification layer outputs semantic labels to finish terrain classification.

The invention has the beneficial effects that: from a quantitative perspective, the algorithm presented herein was evaluated for accuracy, as shown in table 1 and table 2, and it can be seen that the algorithm achieved excellent classification compared to the above algorithm, and achieved 98.81% and 95.21% accuracy results on UCM and RSSCN7, respectively. In conclusion, the algorithm provided by the invention can accurately realize remote sensing image classification from the aspect of classification precision.

Drawings

FIG. 1 is a flow chart of an embodiment of a remote sensing image classification method based on an integrated depth Fisher vector provided by the invention;

FIG. 2 is an evaluation data set selected from the UCM data set employed in the present invention;

FIG. 3 is an evaluation data set selected from the RSSCN7 data set employed by the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the embodiments.

The remote sensing image classification method based on the integrated depth Fisher vector is used for quickly and accurately generating a high-precision classification result and mainly comprises the following three steps:

(1) feature extraction

And inputting the preprocessed image features into a deep learning model obtained through ImageNet pre-training, and obtaining the high-distinguishability global semantic features and the local semantic features in the multi-scale image.

The deep convolution feature extraction in the embodiment of the invention is performed under two types of scales, and the first type of scale is 224 multiplied by 224 scales which are the same as the default scale of the pre-trained deep learning model ResNet-50. Under this scale, the invention extracts its first non-convolutional layer H in the above network as a global description:

h is a vector with a characteristic dimension of 1 × M, d1, d2, …, dK, …, dK represent the kth value of H, with a dimension of 1 × 1.

The second category of scales is (128 × 128, 256 × 256, 512 × 512) in turn. Under the multi-scale condition, the invention is to screen a certain middle layer in the network as an optimal layer combined with subsequent Fisher coding to generate an optimal localized description. Through a layer-by-layer coding experiment, the invention adopts the 37 th layer output in ResNet-50 as the depth convolution characteristic L to be coded:

l is a vector with characteristic dimensions E × K × N. Wherein DN represents the vector with the characteristic numerical value of the Nth scale as DN and the dimension of E multiplied by K.

The multi-scale deep convolution features extracted through the pre-training deep learning model are generally highly coupled and have strong correlation, so that the subsequent codebook clustering is challenged. According to the invention, L2 regularization (Normalization) is implemented to carry out feature preprocessing on the feature L to obtain the multi-scale localization description after the correlation is removed.

L is normalized by L2, and each numerical value is consistent with the meaning of L in the previous step. L is a vector with characteristic dimensions E × K × N. Wherein CN represents a vector with an nth scale feature value of CN and dimension of E × K.

Therefore, the correlation between the local depth convolution characteristics L is reduced, and the same variance among the characteristics L is ensured.

(2) Fisher feature encoding

Through Fisher feature coding, the local semantic features obtained by the last step of feature extraction are further optimized to generate deep Fisher features, and the description capability of the high-precision remote sensing terrain image is improved.

Multi-scale localization description of input by Fisher coding layer

And recoding is carried out, local description optimization of the remote sensing image is completed, and a depth Fisher vector DFF of the terrain remote sensing image is output.

The Fisher encoding layer constructs a word codebook B1 using a Gaussian Mixture Model (GMM)

The word codebook B1 is a vector whose feature dimensions are T × K. The value of the kth codebook is dK, and the dimension is a vector of T × 1.

GMM codebook describes the attribution condition and focusing degree of new local features after deep convolution feature extraction, the effect is optimal when the size of the codebook is determined to be 16 according to experiments, and the GMM codebook dictionary is utilized to locally describe the input single scale

Carrying out coding expression, and extracting Gaussian mean (1st) and variance (2nd) information of a feature space:

wherein T is the number of local characteristic points on the terrain remote sensing image; f. of_tIs a t-th local feature;

the mean difference between the local features and the Gaussian mixture model;

is the variance difference between the local features and the gaussian mixture model. { w_n,μ_n,σ_nα represents the mixed weight, mean and diagonal covariance of each Gaussian distribution in the word codebook B1, respectively_tAnd (n) flexibly distributing weight, representing the weight value of the t-th quasi-local feature relative to the n-th Gaussian mixture model.

Wherein, N (f)_t；μ_n,σ_n) Is f_tThe value in the nth gaussian distribution. The encoding result is:

DFF:

subsequently, for multiscale localization description

And (5) carrying out similar coding processes on the residual scales, and finally connecting the DFFs of all scales in series to obtain a multi-scale Fisher vector I.

(3) Feature concatenation and classification

And fusing the global semantic features and the depth Fisher vectors in a series connection mode to obtain new feature vectors, inputting the new feature vectors into a linear classifier, completing a classification task of the high-precision remote sensing terrain, and ending the algorithm.

The encoding results are connected in series by adopting a cascading mode, and then the final expression of the image is obtained through the regularization processing of L2: deep Fisher Features (ADFF) are integrated.

ADFF＝[I,H]

The embodiment of the invention preferably selects a linear Support Vector Machine (SVM) to construct a classification layer, particularly realizes the adoption of LIBSVM, punishment parameters of the LIBSVM are obtained by adopting ten-fold cross validation, and the classification layer outputs semantic labels to finish terrain classification. The specific operation of this part is well known to those skilled in the art, and the detailed description of the embodiment of the present invention is omitted here. In conclusion, the invention adopts an integration scheme combining a deep learning method and an unsupervised feature coding method, combines the high-level semantic advantage of a deep learning model and the advantage of high resistance to change (size change resistance and scene change resistance) of unsupervised feature coding, fully captures the high-distinguishability semantic information of the remote sensing image, and improves the terrain classification performance of the remote sensing image.

Example (b):

as shown in fig. 1, the image of an arbitrary size is first resized to generate a scale 1: 224 × 224 and scale 2: two kinds of scales (128 × 128, 256 × 256, 512 × 512)

(1) Feature extraction:

for scale 1, its first non-convolutional layer in the above network is extracted as a global description:

for the scale 2, the output of the layer 37 in ResNet-50 is adopted as the depth convolution characteristic to be coded:

and performing feature preprocessing on the features L by implementing L2 regularization (Normalization) to obtain the decorrelated multi-scale localization description. To this end, step (1) is completed

(2) Fisher feature encoding

Multiscale localized description of input using Fisher coding layer

Carrying out unsupervised feature coding on each scale specification, respectively generating corresponding deep integration Fisher vector DFFs, connecting the DFFs of all scales in series to obtain a multi-scale Fisher vector I, and finishing the step (2)

(3) Feature concatenation and classification

ADFF＝[I,H]

Finally, the embodiment of the invention preferably selects a linear Support Vector Machine (SVM) to construct a classification layer, particularly realizes that LIBSVM is adopted, punishment parameters are obtained by adopting ten-fold cross validation, and the classification layer outputs semantic labels to finish terrain classification. So far, all the steps of the algorithm are completed.

The invention uses the following data sets as test evaluations:

UCM data set

The UCM (UC MercedLand-Use) dataset is the most popular test dataset in the field of high-precision remote sensing image classification, and is established by computer vision laboratories of California university in 2010. The data set covers 21 common land use categories, each sub-category consisting of 100 images of the same size, with a resolution of 256x256, all three channels of RGB. Figure 2 is selected from the data set.

RSSCN7 dataset

RSSCN7 is a data set published in 2015 that contains 7 common remote sensing topographic images, each category of images containing 400 images, and is challenging due to the diversity of image scales, with 4 different image scales for each category of 400 images. Figure 3 is selected from the data set.

As shown in table 1 and table 2 below, the remote sensing image classification method based on the integrated depth fisher vector achieves excellent image classification results.

Table 1: UCM remote sensing image terrain classification performance comparison

Table 2: RSSCN7 remote sensing image terrain classification performance comparison

From a quantitative perspective, the algorithm presented herein was evaluated for accuracy, as shown in table 1 and table 2, and it can be seen that the algorithm achieved excellent classification compared to the above algorithm, and achieved 98.81% and 95.21% accuracy results on UCM and RSSCN7, respectively. In conclusion, the algorithm provided by the invention can accurately realize remote sensing image classification from the aspect of classification precision.

In addition to the above embodiments, the present invention has other embodiments, and any technical solutions formed by equivalent replacement or equivalent transformation fall within the protection scope of the present invention.

Claims

1. A remote sensing image classification method based on integrated depth Fisher vectors is characterized in that: which comprises the following steps of,

(1) extracting characteristics;

(2) fisher feature encoding;

(3) feature concatenation and classification.

2. The remote sensing image classification method based on the integrated depth fisher vector as claimed in claim 1, wherein: and the step (1) comprises the step of inputting the preprocessed image features into a deep learning model obtained through ImageNet pre-training, and obtaining the highly-differentiated global semantic features and local semantic features in the multi-scale image.

3. The remote sensing image classification method based on the integrated depth fisher vector as claimed in claim 2, wherein: the multi-scale image in the step (1) comprises a first class of scales which are the same 224 × 224 scales as the default scales of the pre-trained deep learning model ResNet-50, and under the condition of the scales, a first non-convolutional layer in the network is extracted as a global description:

4. the remote sensing image classification method based on the integrated depth fisher vector as claimed in claim 2, wherein: the multi-scale image in the step (1) comprises a second class of scales (128 × 128, 256 × 256, 512 × 512) in sequence, under the condition of the class of multi-scales, a middle layer of the multi-scale image in the network is screened to serve as an optimal layer combined with subsequent Fisher coding to generate an optimal localized description, and through a layer-by-layer coding experiment, the 37 th layer output in ResNet-50 is adopted as a deep convolution feature to be coded:

5. the remote sensing image classification method based on the integrated depth fisher vector as claimed in claim 4, wherein: in the step (1), after the second-class scale is subjected to feature extraction, L2 regularization is carried out, namely, feature preprocessing is carried out on the feature L to obtain the multi-scale localization description after the correlation is removed,

6. The remote sensing image classification method based on the integrated depth fisher vector as claimed in claim 1, wherein: and (3) performing Fisher feature coding in the step (2), further optimizing the local semantic features obtained by the feature extraction in the last step to generate deep Fisher features, and improving the description capability of the high-precision remote sensing terrain image.

7. The remote sensing image classification method based on the integrated depth fisher vector as claimed in claim 6, wherein: the Fisher coding layer is used for describing the multi-scale localization of the input

Recoding is carried out, local description optimization of the remote sensing image is completed, a depth Fisher vector DFF of the terrain remote sensing image is output, and the Fisher coding layer uses a Gaussian mixture model to construct a word codebook B1

the mean difference between the local features and the Gaussian mixture model;

for variance differences between local features and Gaussian mixture models, { w_n,μ_n,σ_nα represents the mixed weight, mean and diagonal covariance of each Gaussian distribution in the word codebook B1, respectively_t(n) flexibly assigning weights representing the weight values of the t-th quasi-local features relative to the n-th Gaussian mixture model,

DFF:

8. the remote sensing image classification method based on the integrated depth fisher vector as claimed in claim 7, wherein: the step (2) is described for multi-scale localization

And (5) performing the coding flow on the residual scales in the step (A), and finally connecting the DFFs of all scales in series to obtain a multi-scale Fisher vector I.

9. The remote sensing image classification method based on the integrated depth fisher vector as claimed in claim 8, wherein: and (3) fusing the global semantic features and the depth Fisher vectors in a series connection mode to obtain new feature vectors, inputting the new feature vectors into a linear classifier, and finishing the classification task of the high-precision remote sensing terrain.

10. The remote sensing image classification method based on the integrated depth fisher vector as claimed in claim 9, wherein: specifically, the step (3) is that the encoding results are connected in series by adopting a cascading mode, then the final expression of the image is obtained by L2 regularization processing, the depth Fisher feature ADFF is integrated,

ADFF＝[I,H]