CN117636183A

CN117636183A - Small sample remote sensing image classification method based on self-supervision pre-training

Info

Publication number: CN117636183A
Application number: CN202311690568.6A
Authority: CN
Inventors: 池凯凯; 丁雷鸣
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-03-01

Abstract

A small sample remote sensing image classification method based on self-supervision pre-training comprises the following steps: s1, sampling a small sample data set, and dividing the data set into a training set, a verification set and a test set; s2, constructing a scene training data set; s3, constructing a self-supervision pre-training based double-metric network model; s4, training and verifying the self-supervision pre-training double-metric network model by using a scenario training method; s5, testing the self-supervision pre-training double-metric network model after training is completed. The invention can realize high accuracy in classifying the small sample image through pre-training learning, and finally realize the classification accuracy of the small sample remote sensing image.

Description

Small sample remote sensing image classification method based on self-supervision pre-training

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a small sample remote sensing image classification method based on self-supervision pre-training.

Background

Remote sensing, also known as remote sensing or telemetry, is a technique for obtaining earth surface information through carriers such as satellites, unmanned aerial vehicles, airplanes, and the like. With the continuous progress of technology, the remote sensing application field is also becoming wider and wider. The remote sensing image classification can be applied to monitoring of land utilization and land coverage, management of forest and water resources, city and traffic planning, environmental protection, agricultural production, weather forecast and the like. The deep learning method achieves better results in remote sensing image classification.

However, these excellent performing models require extensive training of the marker data to be optimized. Once the available marker data is limited, there is a risk of overfitting, resulting in a significant degradation of model performance. In practical applications, the marking data takes a lot of time and the available marking data is very limited. Furthermore, deep learning models have very limited generalization ability on new classes that have not been seen.

Inspired by the fact that a human can learn quickly only through a small amount of data, the concept of small sample learning is proposed. The goal of small sample learning is to learn a generalization-capable model in a few samples, making it perform well on new classes that have not been found. In small sample learning, typically only a few samples can be used to train the model, so conventional machine learning methods have difficulty coping with this situation. The main small sample methods can be summarized in three categories: metric-based learning, data-enhanced-based learning, and meta-based learning. In the meta-learning approach, the model may be trained with a large number of tasks and a small number of samples for each task. There has recently been little effort to apply small sample learning to remote sensing image classification.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a self-supervision pre-training-based method, which can achieve higher accuracy in classifying small sample images through pre-training learning and finally achieve the classification accuracy of the small sample remote sensing images.

The technical scheme adopted for solving the technical problems is as follows:

a small sample remote sensing image classification method based on self-supervision pre-training comprises the following steps:

s1, sampling a small sample data set, and dividing the data set into a training set, a verification set and a test set;

s2, constructing a scene training data set;

s3, constructing a self-supervision pre-training based double-metric network model;

s4, training and verifying the self-supervision pre-training double-metric network model by using a scenario training method;

s5, testing the self-supervision pre-training double-metric network model after training is completed.

Further, in the step S1, the small sample remote sensing image may be divided into 3:1:1, three of the total categories are used as training sets, one is used as verification set, the last is used as test set, and the categories of the three data sets are mutually exclusive.

In step S2, the scenario construction method randomly selects C categories in the data set, each category selects K pictures as a supporting set S, randomly selects M samples from the remaining samples of the selected category as a query set Q, and one supporting set and query set form a scenario training set.

Further, in the step S3, the self-supervision pre-training dual-metric network model is divided into two parts, the first part is a pre-training part, and the second part is a dual-metric network fine-tuning part;

the backbone network for the first part of the selection is SwinTransformer, and the data set used in the first part is a sample containing a training set class and does not do scene training; the pre-training model randomly masks a certain proportion of image blocks, and then predicts vision token corresponding to the masked image blocks;

the second part is to divide the data set into situational training, and divide each image I in the support set and the query set intoA block of pictures, where H represents the height of the pictureThe degree, W, represents the width of the image, and P represents the size of the image block; will support the set T _s And query set T _q Flattening all image blocks and inputting the flattened image blocks into SwinTransformer to obtain a support setAnd +.> After all image block encodings are obtained, a clean T is maintained _s And T _q Then on the basis of this, to T _s And T _q Adding Gaussian noise; finally for noisy and noiseless T respectively _s And T _q And (5) carrying out similarity measurement, and taking weighted average of the similarity measurement and the weighted average as a final prediction result.

In the step S4, a scenario data set is randomly extracted from the training set and input into a pre-trained model, forward propagation is performed, and network parameters are reversely updated through a loss function, and the verification model is obtained by randomly selecting the scenario data set input model from the verification set, and predicting the query set through the support set.

In the step S5, a model after the contextual model data set is input and trimmed is randomly selected from the test set, and the query set is predicted through the support set.

Preferably, the wide mouth size in the swinTransformer is set to 7, the embedding dimension is 96, the number of layers in the four stages is 2, 18 and 2, and the number of attention heads in each stage is 3, 6, 12 and 24, respectively.

Preferably, the noise in the dual metric network is gaussian noise, and a standard gaussian function N (0, 1) is used, with a mean of 0 and a variance of 1.

Preferably, the similarity measure between the compute support set and the query set employs Euclidean distance.

The beneficial effects of the invention are mainly shown in the following steps: the model is learned to better generalization through self-supervision pre-training without external labels, so that the risk of fitting the model in the training process is reduced. The problem of low generalization caused by insufficient sample size in the conventional small sample image classification learning is solved, so that the prediction effect is improved.

Drawings

FIG. 1 is a flow chart of a modeling method of the present invention;

FIG. 2 is a schematic diagram of a small sample task sampling according to the present invention;

FIG. 3 is a schematic diagram of the self-monitoring pre-training of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 3, a small sample remote sensing image classification method based on self-supervision pre-training comprises the following steps:

s1, in the embodiment, the small sample image classification method based on self-supervision pre-training provided by the invention firstly adopts the idea of Masked Image Modeling for self-supervision pre-training of a model on a training set as a pre-training task of self-supervision training SwinTransformer.

In one example, each image I is divided into in self-supervised pre-trainingA number of tiles, where H represents the height of the image, W represents the width of the image, and P represents the size of the tile; the image I is subjected to two different augmentation modes to obtain u and v. Two networks in the training model are respectively a teacher network and a student network, and parameters of the teacher network and the student network are consistent. The image I passing through the student network is to randomly cover up part of the image blocks. The goal of the training is that the parameters of the teacher's network are updated in magnitude by the parameters of the student's network.

S2, as shown in FIG. 2, a meta-training method is adopted, wherein the main method is to sample a plurality of meta-tasks in tasks, and each task comprises a training set and a testing set. The training set and the test set are also called a support set and a query set, in this example, we set up a meta-task to randomly select 5 images of different categories as the support set, and randomly select a total of 75 images of 15 images of the category corresponding to the support set as the query set. Various meta-tasks are input into the swinTransformer model pre-trained in S1, and the model is continuously fine-tuned to better conform to the data set distribution rules.

S3, reserving a clean matrix block in the matrix blocks output by the SwinTransformer model, then adding Gaussian noise into the matrix blocks of the support set and the query set respectively, and then adopting a parallel structure to calculate the similarity. Similarity is represented using Euclidean distance, comparing the distance of a clean query set to a clean support set, and comparing the distance of a noisy query set to a noisy support set. And finally, calculating the final distance by adopting a weighted summation mode.

Wherein the Euclidean distance matrix between the support set and the query set is used. Specifically, four tensors are generated as inputs, clean support set embedding, clean query set embedding, noisy support set embedding, and noisy query set embedding, respectively. The input tensors are first subjected to some shape transformation and normalization operations to ensure that they fit into the calculation of the euclidean distance. Specifically, the method is to flatten each embedded matrix into one vector and normalize each vector by L2 norm. The Euclidean distance between the two sets of embeddings is then calculated. For clean support and query sets, it obtains the Euclidean distance matrix by calculating the product of the two matrices. For noisy support sets and query sets, the Euclidean distance matrix is also obtained by calculating the product of the two matrices. Finally, returning to the two Euclidean distance matrices, the shape of the matrix is [ n, m ], where n is the number of samples in the support set and m is the number of samples in the query set. These matrices may be used for further similarity measurement or classification tasks.

In one embodiment, the calculated Euclidean distance between the clean and noisy support set and the query set is D1 and D2 respectively, the two are linearly combined by using the weight of 0.8 x D1+0.2 x D2 to obtain a matrix C, then the matrix C is reshaped into [5, 196, 196,5] and then [2,3] transposed, wherein 5 is the number of categories of a task, and 196 is the product of the sample size in the number of categories and the length of the embedded sequence.

In one embodiment, the predictive model is temperature scaled, and the predicted value of the model is scaled by dividing by the logarithmic temperature, the magnitude of this scaling being controlled by a temperature parameter. The scaled predicted values may affect the sensitivity of the model to the probability distribution, with the probability distribution being smoother the higher the temperature.

In one embodiment, log-Sumexp is used to aggregate the log probabilities of all patches to obtain a final prediction for each image. Specifically, the maximum value of the input tensor along the specified dimension dim=1, denoted as max_val, is calculated, and the difference of the input tensor minus max_val is calculated to obtain the adjusted tensor. And performing exponential operation on the adjusted tensor. The sum after the exponential operation is calculated along the specified dimension dim=1. Taking the logarithm of the sum obtained in the last step.

And S4, training and verifying the self-supervision pre-training double-metric network model by using a scenario training method.

In one embodiment, in each training iteration, the model randomly extracts a set of scenarios from the training set, which is propagated forward as input. Network parameters are updated by gradient descent, and finally a model that performs well on the training set is learned. In this process, the scenario set is also randomly extracted from the verification set for verifying the performance of the model.

In the model verification stage, multiple processes are performed. The set of scenarios is randomly extracted from the verification set as input data each time, and the labels of the query set are predicted based on the support set. And taking the average correct rate of a plurality of processes as the performance evaluation result of the model on the verification set.

In one embodiment, in the model test phase, a deep-neighbor neural network model based on an attention mechanism is used. The specific test mode is to randomly extract a scene set from the verification set as input data, and then predict the labels of the query set based on the support set. The average accuracy of the multiple processes is taken as the result of the performance evaluation of the model on the test set.

The embodiments described in this specification are merely illustrative of the manner in which the inventive concepts may be implemented. The scope of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, but the scope of the present invention and the equivalents thereof as would occur to one skilled in the art based on the inventive concept.

Claims

1. A method for classifying small sample remote sensing images based on self-supervision pre-training, which is characterized by comprising the following steps:

s2, constructing a scene training data set;

2. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 1, wherein in the step S1, the small sample remote sensing images can be classified into 3:1:1, three of the total categories are used as training sets, one is used as verification set, the last is used as test set, and the categories of the three data sets are mutually exclusive.

3. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 1 or 2, wherein in the step S2, the scene construction method randomly selects C categories in the data set, each category selects K pictures as a support set S, randomly selects M samples in the remaining samples of the selected category as a query set Q, and one support set and query set form a scene training set.

4. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 3, wherein in the step S3, the self-supervision pre-training dual-metric network model is divided into two parts, the first part is a pre-training part, and the second part is a dual-metric network fine-tuning part;

the second part is to divide the data set into situational training, and divide each image I in the support set and the query set intoA number of tiles, where H represents the height of the image, W represents the width of the image, and P represents the size of the tile; will support the set T _s And query set T _q Flattening all image blocks and inputting the flattened image blocks into SwinTransformer to obtain a support setAnd +.> After all image block encodings are obtained, a clean T is maintained _s And T _q Then on the basis of this, to T _s And T _q Adding Gaussian noise; finally for noisy and noiseless T respectively _s And T _q And (5) carrying out similarity measurement, and taking weighted average of the similarity measurement and the weighted average as a final prediction result.

5. The method of claim 4, wherein in step S4, a scenario data set is randomly extracted from the training set and input into the pre-trained model, forward propagation is performed, and the network parameters are reversely updated through the loss function, and the verification model is a scenario data set input model randomly selected from the verification set, and the query set is predicted through the support set.

6. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 5, wherein in step S5, a model after the contextual data set is input and trimmed is randomly selected from the test set, and the query set is predicted by the support set.

7. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 4, wherein the wide mouth size in the swinformer is set to 7, the embedding dimension is 96, the number of layers in four stages is 2, 18 and 2, and the number of attention heads in each stage is 3, 6, 12 and 24.

8. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 4, wherein the noise in the double-metric network adopts Gaussian noise, and a standard Gaussian function N (0, 1) is adopted, and the mean value is 0 and the variance is 1.

9. The method for classifying small sample remote sensing images based on self-supervised pre-training as recited in claim 4, wherein the distance between the computed support set and the query set is Euclidean distance.