CN111461067B

CN111461067B - Zero sample remote sensing image scene identification method based on priori knowledge mapping and correction

Info

Publication number: CN111461067B
Application number: CN202010338879.6A
Authority: CN
Inventors: 李彦胜; 孔德宇; 张永军; 季铮; 肖锐
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2022-06-14
Anticipated expiration: 2040-04-26
Also published as: CN111461067A

Abstract

The invention provides a zero sample remote sensing image scene recognition method based on prior knowledge mapping and correction. Based on a visible remote sensing image scene sample with a class label and a priori knowledge representation vector set of a visible class, a depth feature extractor and a mapping model from robust visual features to priori knowledge representation features are obtained through remote sensing scene class learning and cross-mode learning between the visual feature vectors and the priori knowledge representation vectors. Based on the category prior knowledge representation vector of the whole category and the invisible remote sensing image scene sample, the prior knowledge representation vector of the invisible category is progressively corrected through unsupervised collaborative representation learning and unsupervised k-nearest neighbor algorithm respectively, so that the classification precision of the zero-sample remote sensing image scene is effectively improved.

Description

Zero sample remote sensing image scene identification method based on priori knowledge mapping and correction

Technical Field

The invention belongs to the technical field of remote sensing and photogrammetry, relates to a zero sample remote sensing image scene classification method, and particularly relates to a zero sample remote sensing image scene identification method based on priori knowledge mapping and correction.

Background

After the 21 st century, the development of remote sensing technology is more rapid, and the remote sensing technology plays an important role in land resource investigation, ecological environment monitoring, disaster analysis and prediction and the like. With the improvement of the resolution of the remote sensing image, the classification method based on the pixels and the objects is widely influenced by the phenomena of 'same-object different spectrum and same-spectrum foreign matter' of the high-resolution remote sensing image, and the requirement for efficient and stable remote sensing image interpretation cannot be met. Based on this consideration, remote sensing image scene classification is widely concerned by researchers at home and abroad. The remote sensing image scene classification aims at predicting the semantic category of an image block by excavating visual primitives and the spatial relationship among the visual primitives in the remote sensing image scene (image block), and can greatly reduce the confusion degree of pixel-level or object-level ground object interpretation, thereby improving the stability and accuracy of high-resolution remote sensing image interpretation, and having important application in the aspects of content-based remote sensing image retrieval, remote sensing image target detection and the like.

With the continuous opening of remote sensing image scene data sets, a large number of remote sensing image scene classification methods based on artificial features or deep learning are provided by multi-field researchers. However, most of the existing remote sensing image scene classification methods rely on all types of remote sensing image samples to learn classification models. With the coming of the remote sensing big data era, the remote sensing ground object categories show an explosive growth trend, so that it is unrealistic to collect sufficient remote sensing image samples for all the categories. How to introduce the priori knowledge in the field of remote sensing into the scene understanding process of the remote sensing image can identify the remote sensing image scene with the class never appearing in the training stage only by learning partial classes containing the remote sensing image, and the method has important practical significance in the era of remote sensing big data. Therefore, the development of Zero-sample learning (Zero-shot learning) in recent years provides a new idea for remote sensing image scene classification. Zero-sample learning aims to simulate the process of human learning, and samples in an invisible class (unseen) are inferentially identified with the aid of class prior knowledge (e.g. attribute vectors of classes, natural language semantic vectors of classes) through visible class (seen) sample learning. Currently, zero sample learning is mainly focused on the field of computer vision, the research on the classification of remote sensing image scenes is few, and a great deal of research work is needed to promote the development of the zero sample remote sensing image scene classification technology.

Disclosure of Invention

The invention provides a zero-sample remote sensing image scene recognition method based on priori knowledge mapping and correction, which is based on the problems of large modal span between a bottom remote sensing image scene sample and high-level priori knowledge representation, drift of a visible class priori knowledge space and an invisible class priori knowledge space, and offset of an invisible class priori knowledge representation space generated by remote sensing image scene mapping and an invisible class semantic space corrected based on the visible class priori knowledge space. Based on a visible remote sensing image scene sample with a class label and a priori knowledge representation vector set of a visible class, a depth feature extractor and a mapping model from robust visual features to priori knowledge representation features are obtained through remote sensing scene class learning and cross-mode learning between the visual feature vectors and the priori knowledge representation vectors. Based on the category prior knowledge representation vector of the whole category and the invisible remote sensing image scene sample, the prior knowledge representation vector of the invisible category is progressively corrected through unsupervised collaborative representation learning and unsupervised k-nearest neighbor algorithm respectively, so that the classification precision of the zero-sample remote sensing image scene is effectively improved.

The technical scheme adopted by the invention is as follows: a zero sample remote sensing image scene recognition method based on priori knowledge mapping and correction comprises the following steps:

a training stage:

step 1: creating a priori knowledge representation vector corresponding to each category of visible classes based on open natural language corpus or domain expert knowledge

Vector of prior knowledge representation corresponding to each category of invisible classes

Where p and q represent the number of classes, visible and invisible, respectively, d^sRepresenting the dimensionality of the vector for a priori knowledge;

and 2, step: input original remote sensing image scene data set D { (x)_i，y_i)：i＝1，...，M}，

Where D is a visible class data set, x_iRepresenting in visible classesIth remote sensing image scene, y_iRepresenting a category label of the ith image in the visible category, wherein M is the total number of samples of the visible remote sensing data; d^UIn the case of a data set of the invisible class,

representing the kth remote sensing image scene in the invisible class,

a category label of the kth image in the invisible category is represented, and N is the total number of samples of the invisible category data;

extracting image characteristics F of visible class data set and image characteristics F of invisible class data set by utilizing deep convolutional network^U；

And step 3: solving a mapping matrix W from F to S based on a robust cross-modal mapping target function of visual feature self-coding constraint, and thus finishing the learning of depth cross-modal mapping;

and 4, step 4: correction based on unsupervised collaborative representation learning^UTo obtain

And 5: using mapping matrix W in step 3 to convert F^UMapping to

Step 6: solving by using k nearest neighbor algorithm

Semantic vector obtained through mapping

The neighbor vectors in (1) are averaged to obtain

And (3) a testing stage:

and 7: giving an invisible test remote sensing image scene, extracting visual features and mapping to obtain semantic vectors according to the steps 2-5

And step 8: computing

And

cosine similarity between the two images is obtained, and a label of a test remote sensing image scene is obtained.

Furthermore, in the step 2, T is used for representing the convolutional layer hyper-parameter of the deep convolutional network, and V is the mapping hyper-parameter of the last fully-connected layer feature and the classification layer; learning the hyper-parameter T of the convolution layer and the hyper-parameter V of the mapping of the full connection layer by fine tuning the deep convolution network, and extracting the image characteristics of the visible data set by utilizing the hyper-parameter T of the convolution layer

The fine tuning deep network process only uses visible data; wherein f is_i＝Q(x_i(ii) a T), Q (; a) represents a non-linear mapping of a deep convolutional network, the deep convolutional network based on the remote sensing image scene data set optimizes a loss function as in equation one, wherein c_i＝σ(f_iV), σ () denotes the Softmax map,

wherein M is the total number of samples of the visible remote sensing data, and p represents the number of categories of the visible remote sensing data.

Further, the mapping matrix W in step 3 is obtained by the self-encoder, and the objective function is as follows:

where alpha is a self-encoding regularization coefficient,

denotes the F norm, s denotes the sum of F_iAnd (3) simplifying a corresponding priori knowledge semantic vector into a Sylvester equation, and solving W by using a Bartels-Stewart algorithm.

Further, the objective function of the collaborative representation coefficient ρ in unsupervised collaborative representation learning in step 4 is:

where β is the regularization constant, the closed form solution of the above equation:

wherein, I is a discrimination matrix, and the optimal co-expression coefficient is obtained by formula

Performing matrix operation with S to obtain reconstructed invisible semantic vector

Further, in step 5

Calculated as follows:

further, in step 6

Calculated as follows:

wherein the content of the first and second substances,

to represent

The k-th invisible class prior knowledge of the medium represents that the vector is in

The m neighbor prior knowledge searched in (1) represents a vector, k is 1 … q, and o is 1 … m.

Further, the label of the invisible type test remote sensing image scene in step 8 is calculated according to the following formula:

specifically, a set of test remote sensing image scenes is given

Visual features of remote sensing scene images

Further mapping it into semantic vector with matrix W

Calculating out

And

cosine similarity between them, wherein,

is an image of a scene

D (-) is the cosine distance equation.

The invention has the following advantages: the invention aims at the problems of mapping learning and reference correction of prior knowledge in a remote sensing scene zero sample classification task. Based on the category prior knowledge representation vector of the visible category and the remote sensing image scene sample, the depth cross-modal mapping from the visual space of the remote sensing image scene to the category prior knowledge representation space is realized by combining the scene category classification and the multitask learning of self-coding cross-modal mapping. Aiming at the offset problem of a visible prior knowledge representation space and an invisible prior knowledge representation space and the offset problem of an invisible prior knowledge representation space after self-coding cross-modal mapping model mapping and the offset problem of an invisible prior knowledge representation space after collaborative representation, the invention corrects the prior knowledge representation vector of the invisible category through unsupervised collaborative representation learning and unsupervised k-nearest neighbor algorithm respectively based on category prior knowledge representation vectors of all categories and invisible remote sensing image samples, and realizes a stable invisible remote sensing image scene recognition task.

Drawings

FIG. 1: is a general flow diagram of an embodiment of the invention;

FIG. 2: is a sample diagram of a data set according to an embodiment of the present invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

Referring to fig. 1, the method for identifying the zero-sample remote sensing image scene based on prior knowledge mapping and correction provided by the invention comprises the following steps:

Where p and q represent the number of classes, visible and invisible, respectively, d^sIs a semantic vector dimension.

Step 2: input original remote sensing image scene data set D { (x)_i，y_i)：i＝1，...，M}，

Let T denote the convolutional layer hyper-parameter of the deep convolutional network Resnet-50, and V is the mapping hyper-parameter of the last fully-connected layer feature f and the classification layer y. And learning the convolutional layer hyper-parameter T and the full connection layer mapping hyper-parameter V through a fine tuning deep convolutional network. The network optimization loss function based on the remote sensing image scene data set is as the formula I, wherein c_i＝σ(f_iV), σ () denotes the Softmax mapping, f_i＝Q(x_i(ii) a T), Q (; a.) represents a non-linear mapping of the deep convolutional network.

And learning the convolutional layer hyper-parameter T and the full-connection layer mapping hyper-parameter V through a fine tuning deep convolutional network. Extracting image features of visible class data set by using parameter T

Extracting image features of visible class datasets

D is a visible class data set, x_iRepresenting the ith remote-sensing image scene in the visible class, y_iRepresenting a category label of the ith image in the visible category, wherein M is the total number of samples of the visible remote sensing data; d^UIn the case of a data set of the invisible class,

representing the ith remote sensing image scene in the invisible class,

a category label representing the ith image in the invisible category, wherein N is the total number of samples of the invisible category data;

the fine tuning depth network process only uses visible remote sensing image scene samples.

And step 3: and solving a mapping matrix W from F to S. The mapping matrix W is obtained by the self-encoder, and the objective function is as follows:

wherein, alpha is a regularization coefficient of self-coding, and the optimal value is 0.001 through experimental analysis.

Denotes the F norm, s denotes the sum of F_iThe corresponding priori knowledge semantic vector, formula one, can be simplified into Sylvester equation, and the W is solved by using Bartels-Stewart algorithm.

And 4, step 4: correcting S with collaborative representation^UTo obtain

The objective function of the co-expression coefficient ρ is:

where β is the regularization constant. The closed-form solution of the above formula is:

wherein, I is a discrimination matrix. Optimal co-expression coefficient obtained by formula III

And 5: using mapping matrix W in step 3 to convert F^UMapping to

Calculated as follows:

step 6: solving by using k nearest neighbor algorithm

In the process of passingMapped a priori knowledge representative vector

The neighbor vectors in (1) are averaged to obtain

Wherein

Calculated as follows:

to represent

The j-th invisible class prior knowledge represents the vector is in

The m neighbor prior knowledge sought in (1) represents a vector.

And 7: giving an invisible image, extracting visual features and mapping to obtain a priori knowledge expression vector

And 8: computing

And

the cosine similarity between the two images, and the label of the test image is predicted. The label of the invisible type test image can be calculated as follows:

specifically, a set of test remote sensing scene images is given

Visual features of remote sensing scene images

Further mapping it to a priori knowledge representation vector by a matrix W

Computing

And

cosine similarity between them, wherein,

is an image of a scene

D (-) is the cosine distance equation.

In order to verify the effectiveness of the disclosed technology, a plurality of existing disclosed remote sensing image scene data sets are integrated, and a remote sensing image scene data set with more scene types is established. Based on the natural language models Word2vec and Bert, two types of class prior knowledge expression vectors are created for each class of the newly constructed remote sensing scene data set. Based on two different classification prior knowledge representation methods, experimental results show that the algorithm disclosed by the invention can obtain ideal classification precision under the condition of dividing multiple different visible classes and invisible classes.

The described method has been an evaluation test on a new data set obtained by integrating public data sets, which may reflect the effectiveness of the method. Specifically, a public evaluation dataset is shown in fig. 2, which comprises 70 classes of scene categories, each class comprising 800 images. Table 1 shows the results of vector testing using two prior knowledge of Word2vec and Bert under different partitioning modes for the visible class and the invisible class.

Table 1. visible class and invisible class are divided into two kinds of prior knowledge expression vectors of Word2vec and Bert according to different proportions, and the overall accuracy of the method on the test data set is measured

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A zero sample remote sensing image scene recognition method based on prior knowledge mapping and correction is characterized by comprising the following steps:

a training stage:

step 2: input original remote sensing image scene data set D { (x)_i，y_i)：i＝1，…，M}，

Where D is a visible class data set, x_iRepresenting the ith remote-sensing image scene in the visible class, y_iRepresenting a category label of the ith image in the visible category, wherein M is the total number of samples of the visible remote sensing data; d^UIn the case of a data set of the invisible class,

representing the kth remote sensing image scene in the invisible class,

a category label representing the kth image in the invisible category, wherein N is the total number of samples of the invisible category data;

And 5: using mapping matrix W in step 3 to convert F^UMapping to

Step 6:solving by using k nearest neighbor algorithm

Semantic vector obtained through mapping

The neighbor vectors in (1) are averaged to obtain

And (3) a testing stage:

And 8: computing

And

2. The method for identifying the scene of the zero-sample remote sensing image mapped and corrected based on the prior knowledge as claimed in claim 1, wherein: in the step 2, T is used for representing the convolutional layer hyper-parameter of the deep convolutional network, and V is the mapping hyper-parameter of the last fully-connected layer feature and the classification layer; learning the hyper-parameter T of the convolution layer and the hyper-parameter V of the mapping of the full connection layer by fine tuning the deep convolution network, and extracting the image characteristics of the visible data set by utilizing the hyper-parameter T of the convolution layer

The fine tuning deep network process only uses visible data; wherein f is_i＝Q(x_i；T)，Q represents the nonlinear mapping of the depth convolution network, the optimization loss function of the depth convolution network based on the remote sensing image scene data set is shown as the formula I, wherein c_i＝σ(f_iV), σ () denotes the Softmax map,

wherein x is_iRepresenting the ith remote-sensing image scene in the visible class, y_iClass label indicating the ith image in the visible class, d^fAnd representing the dimension of the characteristic, wherein M is the total number of samples of the visible remote sensing data, and p represents the number of categories of the visible category.

3. The zero-sample remote sensing image scene recognition method based on priori knowledge mapping and correction according to claim 2, wherein: the mapping matrix W in the step 3 is obtained through a self-encoder, and the objective function is as follows:

where alpha is a self-encoding regularization coefficient,

4. The method for identifying the scene of the zero-sample remote sensing image mapped and corrected based on the prior knowledge as claimed in claim 1, wherein: the objective function of the collaborative representation coefficient ρ in unsupervised collaborative representation learning in step 4 is:

5. The method for identifying the scene of the zero-sample remote sensing image mapped and corrected based on the prior knowledge as claimed in claim 1, wherein: in step 5

Calculated as follows:

6. the method for zero-sample remote sensing image scene recognition based on priori knowledge mapping and correction of claim 3, wherein: in step 6

Calculated as follows:

wherein the content of the first and second substances,

to represent

7. The method for zero-sample remote sensing image scene recognition based on priori knowledge mapping and correction of claim 6, wherein: the label of the invisible test remote sensing image scene in the step 8 is calculated according to the following formula:

specifically, a set of test remote sensing image scenes is given

Visual features of remote sensing scene images

Further mapping it into semantic vector with matrix W

Calculating out

And

cosine similarity between them, wherein,

is an image of a scene

D (-) is the cosine distance equation.