CN114169442A

CN114169442A - Remote sensing image small sample scene classification method based on double prototype network

Info

Publication number: CN114169442A
Application number: CN202111495585.5A
Authority: CN
Inventors: 高峰; 武晓博; 张萌月; 陈金勇; 王港; 程塨; 蔡黎明
Original assignee: Northwestern Polytechnical University; CETC 54 Research Institute
Current assignee: Northwestern Polytechnical University; CETC 54 Research Institute
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-03-11
Anticipated expiration: 2041-12-08
Also published as: CN114169442B

Abstract

The invention provides a remote sensing image small sample scene classification algorithm based on a double prototype network. Firstly, the invention utilizes the supervision information of the support label to carry out prototype self-calibration on the prototype supporting the feature generation, and obtains a more accurate prototype. And then, on the basis of the calibrated accurate prototype, converting the prediction result of the query sample to be used as the query prototype, and reversely predicting the support sample by using the query prototype. In this process, the information interaction between the support sample and the query sample is a further calibration of the prototype, which is called prototype mutual calibration. The method optimizes three losses, wherein the self-calibration loss and the mutual-calibration loss are beneficial to learning more representative prototypes by the model and making more accurate prediction. The model of the invention is very lightweight and easy to use, since the network does not need to learn additional parameters. Experiments on three public remote sensing data sets show that the method has better classification performance compared with other advanced small sample image classification methods.

Description

Remote sensing image small sample scene classification method based on double prototype network

Technical Field

The invention belongs to the field of remote sensing image recognition, and particularly relates to a remote sensing image small sample scene classification method based on a double prototype network.

Background

Remote sensing image scene classification has wide application in the real world, such as natural disaster detection, land utilization classification, city planning, etc. In recent years, deep learning has become a powerful means of remote sensing scene classification. However, two basic problems remain with the scene classification of remote sensing images. First, although it can predict the test samples of scene classes seen in the training phase with a high accuracy, it may be a dilemma in the face of class samples never seen in the training phase. Second, it is well known that the training process of the conventional deep learning method requires a large amount of labeled data, which is difficult to obtain in a new unknown environment. Under the background, the research on the small sample scene classification algorithm of the remote sensing image is widely concerned in the last two years. The purpose of small sample learning is to learn new concepts from a very small number of samples.

The existing small sample image classification method can be broadly divided into four categories, namely a small sample image classification algorithm based on a model, a small sample image classification algorithm based on measurement, a small sample image classification algorithm based on optimization and a small sample image classification algorithm based on data amplification. The model-based method aims to quickly update parameters of a small number of samples by designing a model structure and directly establish a mapping function between input and prediction, but the traditional gradient descent algorithm has more parameters and cannot quickly realize optimization. Metric-based methods mainly learn the mapping from the image to the embedding space and make the space some distinctiveness. The purpose of the small sample image classification algorithm based on optimization is to obtain a better initialized model or gradient descending direction, so that the model still has good generalization capability when facing a new class with limited sample size, however, the method is easy to be trapped in a local optimal point due to limited data size. The method based on data amplification proposes to generate false data by using a small amount of marker samples so as to realize data amplification, but noise is easily brought to a network due to irrational generated data.

Among the above small sample image classification methods, the metric-based method has been most widely studied due to its simple, effective, and easily reproducible characteristics. Among them, the prototype Network (Prototypical Network) is one of the most classical small sample classification methods based on metrics, and is described in documents "j.snell, k.swersky, and r.zemel.prototypical Networks for raw-shot learning.in.proc.adv.neural in.process.syst., 2017, pp.4077-4087.

Researchers have also made many extensions and improvements based on prototype networks. Some supplement prototypes by adding a priori knowledge (such as semantic word vectors and the like) in a training stage, and some add an attention-like mechanism in the whole network to improve the feature extraction capability of the network. However, these methods bring many extra calculations, and burden the network, and their usability in practical application scenarios is still considered regardless of the accuracy improvement effect. Therefore, for the classification of small samples of remote sensing images, how to use an algorithm which is as "economic" as possible to improve the classification accuracy is a long-term development direction in the future.

Disclosure of Invention

The invention provides a small sample classification method, namely a double Prototype Network (SPNet), which can effectively improve the Prototype effect and reduce the pressure of the Network. Specifically, in the case that the structure of the network only comprises a backbone network (ResNet-18), another prototype, namely a 'query prototype', is calculated from the existing information, and the prototype is combined with the original prototype and called as a 'double prototype', and two operations of 'prototype self-calibration' and 'prototype mutual calibration' are designed, so that the prototype is more representative in the training process and is more beneficial to the subsequent classification prediction based on the prototype.

The technical scheme adopted by the invention is as follows:

a remote sensing image small sample scene classification method based on a double prototype network comprises the following steps:

(1) preparing data: dividing a remote sensing image data set into a base class data set and a new class data set, carrying out data preprocessing, and dividing a preprocessed base class data set image into a support set and a query set;

(2) feature extraction: randomly selecting a support set sample and a query set sample from a base class data set according to tasks, respectively sending the support set sample and the query set sample into a ResNet-18 network for feature extraction, respectively obtaining a support sample feature and a query sample feature, and mapping the features into a dimension space of n multiplied by 1 by adopting global average pooling operation; wherein n represents the number of channels;

(3) respectively calculating an original prototype and a query prototype according to the support sample and the query sample, calculating original loss based on the characteristics of the original prototype and the query sample, calculating self-calibration loss based on the characteristics of the original prototype and the support sample, calculating mutual calibration loss based on the original prototype and the query prototype, weighting to obtain a total loss value, adjusting ResNet-18 network parameters according to the total loss value, and returning to the step (2) until the loss function is converged;

(4) and optimizing the trained ResNet-18 network by using the labeled samples in the new data set, extracting the characteristics of the remote sensing image based on the optimized ResNet-18 network, and classifying the remote sensing image by adopting a mode of combining cosine measurement and a softmax classifier.

Further, the step (1) of dividing the remote sensing image dataset into a base class dataset and a new class dataset comprises the following steps:

partitioning a remote sensing image dataset D into base class datasets D_baseAnd a new class data set D_novelBase class data set D_baseAnd a new class data set D_novelThere is no coincidence in the categories in (1); base class data set D_baseThe images in (1) are provided with labels and used for training the network; new class data set D_novelOnly k labeled images in each category are used for adjusting the ResNet-18 network, and the rest unlabeled images are used for testing the network performance; where k is a set value.

Further, the step (3) calculates the original loss, the self-calibration loss and the mutual-calibration loss of the support sample feature and the query sample feature, and weights the original loss, the self-calibration loss and the mutual-calibration loss to obtain an overall loss value, and the specific process includes:

calculating the raw loss L_ori: first, the original prototype P of each class c is calculated_c：

Wherein x is_iIs to support RGB image in set S, f_θ(. for) a ResNet-18 network, S_cRepresenting a subset of all sample constituents supporting class c in the set, K representing S_cThe number of samples in (1); then, the original prototype P is utilized_cAnd query sample features f_θ(q_i) Performing a cosine measure d (-) to obtain a query sample q_iProbability of belonging to a certain class c

Then, the cross entropy loss calculation is used to obtain the original loss L_ori：

Where c represents the query sample q_iCorresponding label, N_qRepresenting the number of query samples in each task; beta is a scaling factor, so that cosine measurement can be more remarkable in a Softmax function, and the score of each class is easier to distinguish; c is the target category number of the support set in each task, and C' is any target label in the support set of each task;

calculating self-calibration loss L_self: using the original prototype P_cAnd supporting sample features f_θ(x_i) Performing a cosine measure d (-) to obtain a support sample x_iProbability of belonging to a certain class c

The loss predicted by the metric between the original prototype and the supporting sample features is the self-calibration loss, and the calculation method is as follows:

calculating the mutual calibration loss L_inter: generating query prototypes from the query samples, using the query prototypes

And supporting sample features f_θ(x_i) Performing a cosine measure d (-) to obtain a support sample x_iProbability of belonging to a certain class c

And mutually calibrating with the original prototype to calculate the mutual calibration loss L_inter：

Wherein,

a query prototype representing class c is shown,

representing the number of samples in query set Q that belong to class c,

a sample representing class c in the query set;

calculating the total loss: l ═ L_ori+λ₁L_self+λ₂L_inter(ii) a Wherein λ is₁And λ₂Representing the weights of self-calibration loss and mutual-calibration loss, respectively.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

according to the remote sensing image small sample scene classification method based on the double prototype network, the class prototypes are more representative in the training process through two operations of 'prototype self-calibration' and 'prototype mutual calibration', so that the classification prediction based on the class prototypes is more beneficial to follow-up classification prediction. Meanwhile, as no additional module or attention mechanism exists in the whole network and no complex learnable parameters exist, the method is remarkable in light weight and rapid calculation. The experimental results on the three remote sensing image reference data sets prove that the performance of the double prototype network is superior to that of other methods.

Drawings

FIG. 1 is a schematic diagram of the technical idea of the present invention;

FIG. 2 is an overall frame diagram (taking 5-way 5-shot as an example) of the remote sensing image small sample scene classification method based on the double prototype network.

Detailed Description

The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.

As shown in fig. 1, the innovation of the network is that the information flow and information richness in the whole training process are changed greatly. The "original prototype self-calibration" is to calibrate an "original prototype" by using the characteristics of a support sample, and the "original prototype" is calculated by the support sample and is called as the "self-calibration"; and "prototype mutual calibration" is a calibration between "query prototype" and "original prototype". Both of the calibrations are finally blended into the loss function in a constraint mode, so that the whole network has no additional module or mechanism and no complex learnable parameters, and the method is remarkable in light weight and rapid calculation.

The hardware environment for implementation of the embodiment is as follows: intel (R) core (TM) i3-9100 CPU computer, 8.0GB memory, the software environment of operation is: ubuntu16.04.5lts and Pycharm 2019. The data sets NWPU-RESISC45, WHU-RS19, and UC Merced were published using large-scale remote sensing images. The NWPU-rescisc 45 dataset contains 45 scene classes, each of which has 700 RGB images of 256 × 256, and in the present embodiment, 35 classes are divided as a base class dataset D _ base, where 25 classes are used for training, 10 classes are used for verification (validation), and the remaining 10 classes are used as a new class dataset D _ novel. The WHU-RS19 data set has 19 scene categories, 1005 RGB images of 600 × 600 in total, 9 categories are divided for training, 5 categories are used for verification, and the remaining 5 are used as new category data. UC Merced has 21 scene categories, 100 RGB images of 256 × 256 each, 10 categories for training, 6 for verification, and the remaining five as new categories for testing. Under the 5-way 1-shot setting, each task includes 5 categories, each of which picks 1 support image and 15 query images. Under the 5-way 5-shot setting, each task includes 5 categories, each of which picks 5 support images and 15 query images.

As shown in fig. 2, the specific implementation process of the present invention is as follows:

1. preparing data

Partitioning a remote sensing image dataset D into base class datasets D_baseAnd a new class data set D_novel，D_baseAnd D_novelThere is no coincidence of the categories in (1). D_baseThe images in (1) are used to train the network; d_novelOnly k (1 or 5) labeled images in each category are used for optimizing the network, and the rest unlabeled images are used for testing the network performance. In the training process, images in the base class are sampled and preprocessed: each iteration process simulates the setting of small sample learning, namely 5 categories are randomly selected, each category only comprises k (1 or 5, respectively forming tasks of 5-way 1-shot and 5-way 5-shot) labeled samples for training (forming a support set S), and 15 unlabeled samples for verification (forming a query set Q). Each sample is then cropped to 224 x 224, then randomly horizontally flipped, brightness enhanced, color enhanced, contrast enhanced data enhanced, and normalized. Dividing the preprocessed base class data set image into a support set S and a query set Q;

the normalization treatment comprises the following steps:

normalizing the three RGB channels of each image I according to the following formula:

wherein, I_hH channel, I, representing an image_h' denotes the h channel after normalization, Mean_hRepresents the mean value of the h channel, Std_hRepresents the standard deviation of the h channel.

Random horizontal flipping, brightness enhancement, color enhancement, contrast enhancement these four data enhancement operations are all implemented using torchvision.

2. Feature extraction

Randomly selecting a support set sample and a query set sample from the base class data set according to tasks, sending the support set sample and the query set sample into a ResNet-18 network for feature extraction, and respectively obtaining the support sample features f_θ(x_i) And query sample features f_θ(q_i). Wherein x_i∈S,q_i∈Q，f_θ(. cndot.) represents a ResNet-18 network, and a global averaging pooling operation is added at the end of the network to map features into a dimension space of n × 1 × 1, with n representing the number of channels. This is added because of the requirement for dual prototype networks to compute prototypes, since prototypes are defined as n × 1 × 1 vectors that can represent class features, as will be further described below.

3. Calculating raw loss

First, the prototypical p is calculated by referring to the PrototypicalNetwork. E.g. original prototype P of class c_cThe following were used:

wherein S_cRepresenting a subset of all sample constituents supporting class c in the set, K representing S_cIs detected (i.e., 1 or 5). The concept of prototype is the feature mean of the supporting sample, which can simply and effectively extract the part of the sample that can represent the most class features.

Using the original prototype P_cAnd query features f_θ(q_i) Performing a cosine measure d (-) to obtain a query sample q_iProbability of belonging to a certain class c

The two formulas above use the Softmax classification function and the cross entropy loss function, respectively. Where c represents the query sample q_iCorresponding label, N_qRepresenting the number of query samples in each task, C being the number of target categories of a support set in each task, and C' being any type of target labels in the support set of each task; to make the cosine measure more prominent in the Softmax function and to make the score of each class more distinguishable, a scaling factor β, set to 50 in this example, is used to scale the cosine similarity to a range of [0, 50%]。

4. Prototype self-calibration

In the original prototype network explained above, the prototype is simply obtained from the support features and used directly to predict the query sample. We believe that the information flow in this process is not sufficient to produce an effective prototype. Therefore, we propose a prototype self-calibration mechanism to further exploit the information that supports the sample. In particular, by the original prototype P_cAnd support sample x_iThe support samples are predicted by the metric between, since the original prototype is calculated by the sample features in the support set, this prediction is called self-calibration, and similarly, the classification prediction method using the Softmax function is as follows:

further, we can direct the model to calibrate the prototype with the loss of self-calibration calculated by:

c and K are consistent with those of the C-way K-shot task, namely respectively representing the number of categories in the support set in each task and the number of samples of each category in the support set in each task.

5. Mutual calibration of prototypes

Then we continue to mine the information in the given sample. Based on the calibrated accurate prototype, a new method for predicting the support sample based on the information of the query sample is provided. Specifically, we use the predicted probability of each class of the query sample as another prototype of the class, query prototype P_qAnd use of P_qAs a guide, the support samples are backward predicted in the same way as the query samples in the original prototype network are predicted. Based on P_qBy influencing the model's predictions of the query, the prototype is implicitly calibrated. More precisely, the effect between the two prototypes is mutual, i.e. the two prototypes interact in the process, while ensuring end-to-end training. Experimental results show that this interaction is positive and can force the model to achieve better performance. We call this calibration process a prototype cross-calibration, which can be expressed as:

wherein, the prototype P is queried_qThe calculation method of (2) is as follows:

wherein,

representing the number of samples in query set Q that belong to class c,

a sample representing class c in the query set,

representing a prototype of a class c query, which is formed from an average query sampleAll probabilities predicted to class c are calculated,

to make use of query prototypes

And supporting sample features f_θ(x_i) Performing a cosine measure d (-) to obtain a support sample x_iProbability of belonging to a certain class c.

Therefore, cross entropy loss can be used to calculate the cross calibration loss:

c and K in the formula are consistent with those in the C-way K-shot task, namely representing the number of categories in the support set in each task and the number of samples of each category in the support set in each task respectively.

Finally, the overall loss function can be calculated:

L＝L_ori+λ₁L_self+λ₂L_inter

λ₁and λ₂Is a hyperparameter that adjusts the weight of the two auxiliary losses. In this example, it was determined by conducting a research experiment on the NWPU-RESISC45 dataset that the 5-way 1-shot scenario was set to 1 and 3, respectively, and the 5-way 5-shot scenario was set to 3 and 7, respectively.

6. Classification effect verification

By D_novelThe data set tests a ResNet-18 network trained with the above-described overall loss L. And (4) classifying by combining a cosine measure with a Softmax classifier. A total of two tasks, 5-way 1-shot and 5-way 5-shot, were tested, 600 randomly generated tasks in each case. In each task, firstly, the label samples in the new class data set are used for finely adjusting the trained ResNet-18 network, the finely adjusted ResNet-18 network is used for classifying the remote sensing images and calculating the average prediction accuracy as the final result.

And (4) evaluating the effectiveness of the method by selecting the classification accuracy rate accuray. The accure is the percentage of the correctly classified samples in the total samples, and generally, the larger the value of the accure is, the better the algorithm is. The accuray is calculated as follows:

where P and N represent positive and negative examples, respectively, TP represents a split positive example, and TN represents a split negative example.

The classification result obtained by the method is compared with the baseline method on the NWPU-RESISC45 data set, the comparison result is shown in Table 1, and the classification accuracy shows the effectiveness of the method. Compared with the method of the invention, the baseline model does not contain prototype self-calibration and prototype mutual-calibration operations. Specifically, the Baseline model is a Prototypical Network that takes the backbone Network as ResNet-18.

TABLE 1

Model	1-shot	5-shot
			Baseline	65.20±0.84％	80.52±0.55％
Ours	67.84％±0.87％	83.94％±0.50％

And the classification result obtained by the method is compared with the baseline method on the WHU-RS19 data set and the UC Mercated data set, the comparison results are respectively shown in the table 2 and the table 3, the classification accuracy shows the effectiveness of the method on different data sets, and the method has superiority compared with the Prototypical Network with a backbone Network of ResNet-18.

TABLE 2

Model	1-shot	5-shot
			Baseline	76.36％±0.67％	85.00％±0.36％
Ours	81.06％±0.60％	88.04％±0.28％

TABLE 3

Model	1-shot	5-shot
			Baseline	53.85％±0.78％	71.23％±0.48％
Ours	57.64％±0.73％	73.52％±0.51％

Claims

1. A remote sensing image small sample scene classification method based on a double prototype network is characterized by comprising the following steps:

2. The remote sensing image small sample scene classification method based on the biprimary network as claimed in claim 1, wherein the step (1) of dividing the remote sensing image dataset into a base class dataset and a new class dataset comprises the following processes:

3. The remote sensing image small sample scene classification method based on the double prototype network as claimed in claim 1, wherein the original loss, the self calibration loss and the mutual calibration loss of the support sample feature and the query sample feature are calculated in the step (3), and weighting is performed to obtain the total loss value, and the specific process comprises:

Wherein x is_iTo support RGB images in set S, f_θ(. for) a ResNet-18 network, S_cRepresenting a subset of all sample constituents supporting class c in the set, K representing S_cThe number of samples in (1); then, the original prototype P is utilized_cAnd query sample features f_θ(q_i) Performing a cosine measure d (-) to obtain a query sample q_iProbability of belonging to a certain class c

And then the original loss is obtained by using cross entropy loss calculationLose L_ori：

Where c represents the query sample q_iCorresponding label, N_qRepresenting the number of query samples in each task, beta being a scaling factor; c is the target category number of the support set in each task, and C' is any target label in the support set of each task;

Wherein,

a query prototype representing class c is shown,

representing the number of samples in query set Q that belong to class c,

a sample representing class c in the query set;