CN117636183A - Small sample remote sensing image classification method based on self-supervision pre-training - Google Patents
Small sample remote sensing image classification method based on self-supervision pre-training Download PDFInfo
- Publication number
- CN117636183A CN117636183A CN202311690568.6A CN202311690568A CN117636183A CN 117636183 A CN117636183 A CN 117636183A CN 202311690568 A CN202311690568 A CN 202311690568A CN 117636183 A CN117636183 A CN 117636183A
- Authority
- CN
- China
- Prior art keywords
- training
- self
- small sample
- remote sensing
- supervision
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012360 testing method Methods 0.000 claims abstract description 16
- 238000012795 verification Methods 0.000 claims abstract description 14
- 238000005070 sampling Methods 0.000 claims abstract description 4
- 238000005259 measurement Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 description 10
- 238000009826 distribution Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000012271 agricultural production Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Abstract
A small sample remote sensing image classification method based on self-supervision pre-training comprises the following steps: s1, sampling a small sample data set, and dividing the data set into a training set, a verification set and a test set; s2, constructing a scene training data set; s3, constructing a self-supervision pre-training based double-metric network model; s4, training and verifying the self-supervision pre-training double-metric network model by using a scenario training method; s5, testing the self-supervision pre-training double-metric network model after training is completed. The invention can realize high accuracy in classifying the small sample image through pre-training learning, and finally realize the classification accuracy of the small sample remote sensing image.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a small sample remote sensing image classification method based on self-supervision pre-training.
Background
Remote sensing, also known as remote sensing or telemetry, is a technique for obtaining earth surface information through carriers such as satellites, unmanned aerial vehicles, airplanes, and the like. With the continuous progress of technology, the remote sensing application field is also becoming wider and wider. The remote sensing image classification can be applied to monitoring of land utilization and land coverage, management of forest and water resources, city and traffic planning, environmental protection, agricultural production, weather forecast and the like. The deep learning method achieves better results in remote sensing image classification.
However, these excellent performing models require extensive training of the marker data to be optimized. Once the available marker data is limited, there is a risk of overfitting, resulting in a significant degradation of model performance. In practical applications, the marking data takes a lot of time and the available marking data is very limited. Furthermore, deep learning models have very limited generalization ability on new classes that have not been seen.
Inspired by the fact that a human can learn quickly only through a small amount of data, the concept of small sample learning is proposed. The goal of small sample learning is to learn a generalization-capable model in a few samples, making it perform well on new classes that have not been found. In small sample learning, typically only a few samples can be used to train the model, so conventional machine learning methods have difficulty coping with this situation. The main small sample methods can be summarized in three categories: metric-based learning, data-enhanced-based learning, and meta-based learning. In the meta-learning approach, the model may be trained with a large number of tasks and a small number of samples for each task. There has recently been little effort to apply small sample learning to remote sensing image classification.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a self-supervision pre-training-based method, which can achieve higher accuracy in classifying small sample images through pre-training learning and finally achieve the classification accuracy of the small sample remote sensing images.
The technical scheme adopted for solving the technical problems is as follows:
a small sample remote sensing image classification method based on self-supervision pre-training comprises the following steps:
s1, sampling a small sample data set, and dividing the data set into a training set, a verification set and a test set;
s2, constructing a scene training data set;
s3, constructing a self-supervision pre-training based double-metric network model;
s4, training and verifying the self-supervision pre-training double-metric network model by using a scenario training method;
s5, testing the self-supervision pre-training double-metric network model after training is completed.
Further, in the step S1, the small sample remote sensing image may be divided into 3:1:1, three of the total categories are used as training sets, one is used as verification set, the last is used as test set, and the categories of the three data sets are mutually exclusive.
In step S2, the scenario construction method randomly selects C categories in the data set, each category selects K pictures as a supporting set S, randomly selects M samples from the remaining samples of the selected category as a query set Q, and one supporting set and query set form a scenario training set.
Further, in the step S3, the self-supervision pre-training dual-metric network model is divided into two parts, the first part is a pre-training part, and the second part is a dual-metric network fine-tuning part;
the backbone network for the first part of the selection is SwinTransformer, and the data set used in the first part is a sample containing a training set class and does not do scene training; the pre-training model randomly masks a certain proportion of image blocks, and then predicts vision token corresponding to the masked image blocks;
the second part is to divide the data set into situational training, and divide each image I in the support set and the query set intoA block of pictures, where H represents the height of the pictureThe degree, W, represents the width of the image, and P represents the size of the image block; will support the set T s And query set T q Flattening all image blocks and inputting the flattened image blocks into SwinTransformer to obtain a support setAnd +.> After all image block encodings are obtained, a clean T is maintained s And T q Then on the basis of this, to T s And T q Adding Gaussian noise; finally for noisy and noiseless T respectively s And T q And (5) carrying out similarity measurement, and taking weighted average of the similarity measurement and the weighted average as a final prediction result.
In the step S4, a scenario data set is randomly extracted from the training set and input into a pre-trained model, forward propagation is performed, and network parameters are reversely updated through a loss function, and the verification model is obtained by randomly selecting the scenario data set input model from the verification set, and predicting the query set through the support set.
In the step S5, a model after the contextual model data set is input and trimmed is randomly selected from the test set, and the query set is predicted through the support set.
Preferably, the wide mouth size in the swinTransformer is set to 7, the embedding dimension is 96, the number of layers in the four stages is 2, 18 and 2, and the number of attention heads in each stage is 3, 6, 12 and 24, respectively.
Preferably, the noise in the dual metric network is gaussian noise, and a standard gaussian function N (0, 1) is used, with a mean of 0 and a variance of 1.
Preferably, the similarity measure between the compute support set and the query set employs Euclidean distance.
The beneficial effects of the invention are mainly shown in the following steps: the model is learned to better generalization through self-supervision pre-training without external labels, so that the risk of fitting the model in the training process is reduced. The problem of low generalization caused by insufficient sample size in the conventional small sample image classification learning is solved, so that the prediction effect is improved.
Drawings
FIG. 1 is a flow chart of a modeling method of the present invention;
FIG. 2 is a schematic diagram of a small sample task sampling according to the present invention;
FIG. 3 is a schematic diagram of the self-monitoring pre-training of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a small sample remote sensing image classification method based on self-supervision pre-training comprises the following steps:
s1, in the embodiment, the small sample image classification method based on self-supervision pre-training provided by the invention firstly adopts the idea of Masked Image Modeling for self-supervision pre-training of a model on a training set as a pre-training task of self-supervision training SwinTransformer.
In one example, each image I is divided into in self-supervised pre-trainingA number of tiles, where H represents the height of the image, W represents the width of the image, and P represents the size of the tile; the image I is subjected to two different augmentation modes to obtain u and v. Two networks in the training model are respectively a teacher network and a student network, and parameters of the teacher network and the student network are consistent. The image I passing through the student network is to randomly cover up part of the image blocks. The goal of the training is that the parameters of the teacher's network are updated in magnitude by the parameters of the student's network.
S2, as shown in FIG. 2, a meta-training method is adopted, wherein the main method is to sample a plurality of meta-tasks in tasks, and each task comprises a training set and a testing set. The training set and the test set are also called a support set and a query set, in this example, we set up a meta-task to randomly select 5 images of different categories as the support set, and randomly select a total of 75 images of 15 images of the category corresponding to the support set as the query set. Various meta-tasks are input into the swinTransformer model pre-trained in S1, and the model is continuously fine-tuned to better conform to the data set distribution rules.
S3, reserving a clean matrix block in the matrix blocks output by the SwinTransformer model, then adding Gaussian noise into the matrix blocks of the support set and the query set respectively, and then adopting a parallel structure to calculate the similarity. Similarity is represented using Euclidean distance, comparing the distance of a clean query set to a clean support set, and comparing the distance of a noisy query set to a noisy support set. And finally, calculating the final distance by adopting a weighted summation mode.
Wherein the Euclidean distance matrix between the support set and the query set is used. Specifically, four tensors are generated as inputs, clean support set embedding, clean query set embedding, noisy support set embedding, and noisy query set embedding, respectively. The input tensors are first subjected to some shape transformation and normalization operations to ensure that they fit into the calculation of the euclidean distance. Specifically, the method is to flatten each embedded matrix into one vector and normalize each vector by L2 norm. The Euclidean distance between the two sets of embeddings is then calculated. For clean support and query sets, it obtains the Euclidean distance matrix by calculating the product of the two matrices. For noisy support sets and query sets, the Euclidean distance matrix is also obtained by calculating the product of the two matrices. Finally, returning to the two Euclidean distance matrices, the shape of the matrix is [ n, m ], where n is the number of samples in the support set and m is the number of samples in the query set. These matrices may be used for further similarity measurement or classification tasks.
In one embodiment, the calculated Euclidean distance between the clean and noisy support set and the query set is D1 and D2 respectively, the two are linearly combined by using the weight of 0.8 x D1+0.2 x D2 to obtain a matrix C, then the matrix C is reshaped into [5, 196, 196,5] and then [2,3] transposed, wherein 5 is the number of categories of a task, and 196 is the product of the sample size in the number of categories and the length of the embedded sequence.
In one embodiment, the predictive model is temperature scaled, and the predicted value of the model is scaled by dividing by the logarithmic temperature, the magnitude of this scaling being controlled by a temperature parameter. The scaled predicted values may affect the sensitivity of the model to the probability distribution, with the probability distribution being smoother the higher the temperature.
In one embodiment, log-Sumexp is used to aggregate the log probabilities of all patches to obtain a final prediction for each image. Specifically, the maximum value of the input tensor along the specified dimension dim=1, denoted as max_val, is calculated, and the difference of the input tensor minus max_val is calculated to obtain the adjusted tensor. And performing exponential operation on the adjusted tensor. The sum after the exponential operation is calculated along the specified dimension dim=1. Taking the logarithm of the sum obtained in the last step.
And S4, training and verifying the self-supervision pre-training double-metric network model by using a scenario training method.
In one embodiment, in each training iteration, the model randomly extracts a set of scenarios from the training set, which is propagated forward as input. Network parameters are updated by gradient descent, and finally a model that performs well on the training set is learned. In this process, the scenario set is also randomly extracted from the verification set for verifying the performance of the model.
In the model verification stage, multiple processes are performed. The set of scenarios is randomly extracted from the verification set as input data each time, and the labels of the query set are predicted based on the support set. And taking the average correct rate of a plurality of processes as the performance evaluation result of the model on the verification set.
S5, testing the self-supervision pre-training double-metric network model after training is completed.
In one embodiment, in the model test phase, a deep-neighbor neural network model based on an attention mechanism is used. The specific test mode is to randomly extract a scene set from the verification set as input data, and then predict the labels of the query set based on the support set. The average accuracy of the multiple processes is taken as the result of the performance evaluation of the model on the test set.
The embodiments described in this specification are merely illustrative of the manner in which the inventive concepts may be implemented. The scope of the present invention should not be construed as being limited to the specific forms set forth in the embodiments, but the scope of the present invention and the equivalents thereof as would occur to one skilled in the art based on the inventive concept.
Claims (9)
1. A method for classifying small sample remote sensing images based on self-supervision pre-training, which is characterized by comprising the following steps:
s1, sampling a small sample data set, and dividing the data set into a training set, a verification set and a test set;
s2, constructing a scene training data set;
s3, constructing a self-supervision pre-training based double-metric network model;
s4, training and verifying the self-supervision pre-training double-metric network model by using a scenario training method;
s5, testing the self-supervision pre-training double-metric network model after training is completed.
2. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 1, wherein in the step S1, the small sample remote sensing images can be classified into 3:1:1, three of the total categories are used as training sets, one is used as verification set, the last is used as test set, and the categories of the three data sets are mutually exclusive.
3. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 1 or 2, wherein in the step S2, the scene construction method randomly selects C categories in the data set, each category selects K pictures as a support set S, randomly selects M samples in the remaining samples of the selected category as a query set Q, and one support set and query set form a scene training set.
4. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 3, wherein in the step S3, the self-supervision pre-training dual-metric network model is divided into two parts, the first part is a pre-training part, and the second part is a dual-metric network fine-tuning part;
the backbone network for the first part of the selection is SwinTransformer, and the data set used in the first part is a sample containing a training set class and does not do scene training; the pre-training model randomly masks a certain proportion of image blocks, and then predicts vision token corresponding to the masked image blocks;
the second part is to divide the data set into situational training, and divide each image I in the support set and the query set intoA number of tiles, where H represents the height of the image, W represents the width of the image, and P represents the size of the tile; will support the set T s And query set T q Flattening all image blocks and inputting the flattened image blocks into SwinTransformer to obtain a support setAnd +.> After all image block encodings are obtained, a clean T is maintained s And T q Then on the basis of this, to T s And T q Adding Gaussian noise; finally for noisy and noiseless T respectively s And T q And (5) carrying out similarity measurement, and taking weighted average of the similarity measurement and the weighted average as a final prediction result.
5. The method of claim 4, wherein in step S4, a scenario data set is randomly extracted from the training set and input into the pre-trained model, forward propagation is performed, and the network parameters are reversely updated through the loss function, and the verification model is a scenario data set input model randomly selected from the verification set, and the query set is predicted through the support set.
6. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 5, wherein in step S5, a model after the contextual data set is input and trimmed is randomly selected from the test set, and the query set is predicted by the support set.
7. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 4, wherein the wide mouth size in the swinformer is set to 7, the embedding dimension is 96, the number of layers in four stages is 2, 18 and 2, and the number of attention heads in each stage is 3, 6, 12 and 24.
8. The method for classifying small sample remote sensing images based on self-supervision pre-training according to claim 4, wherein the noise in the double-metric network adopts Gaussian noise, and a standard Gaussian function N (0, 1) is adopted, and the mean value is 0 and the variance is 1.
9. The method for classifying small sample remote sensing images based on self-supervised pre-training as recited in claim 4, wherein the distance between the computed support set and the query set is Euclidean distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311690568.6A CN117636183A (en) | 2023-12-11 | 2023-12-11 | Small sample remote sensing image classification method based on self-supervision pre-training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311690568.6A CN117636183A (en) | 2023-12-11 | 2023-12-11 | Small sample remote sensing image classification method based on self-supervision pre-training |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117636183A true CN117636183A (en) | 2024-03-01 |
Family
ID=90028729
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311690568.6A Pending CN117636183A (en) | 2023-12-11 | 2023-12-11 | Small sample remote sensing image classification method based on self-supervision pre-training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117636183A (en) |
-
2023
- 2023-12-11 CN CN202311690568.6A patent/CN117636183A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109142171B (en) | Urban PM10 concentration prediction method based on feature expansion and fusing with neural network | |
CN111639719B (en) | Footprint image retrieval method based on space-time motion and feature fusion | |
CN110349185B (en) | RGBT target tracking model training method and device | |
CN107636691A (en) | Method and apparatus for identifying the text in image | |
CN113852432B (en) | Spectrum Prediction Sensing Method Based on RCS-GRU Model | |
CN111382686B (en) | Lane line detection method based on semi-supervised generation confrontation network | |
CN113128355A (en) | Unmanned aerial vehicle image real-time target detection method based on channel pruning | |
CN110738355A (en) | urban waterlogging prediction method based on neural network | |
CN112766496B (en) | Deep learning model safety guarantee compression method and device based on reinforcement learning | |
CN113840297B (en) | Frequency spectrum prediction method based on radio frequency machine learning model drive | |
CN112183742A (en) | Neural network hybrid quantization method based on progressive quantization and Hessian information | |
CN112910711A (en) | Wireless service flow prediction method, device and medium based on self-attention convolutional network | |
CN115545334B (en) | Land utilization type prediction method and device, electronic equipment and storage medium | |
CN116110022B (en) | Lightweight traffic sign detection method and system based on response knowledge distillation | |
CN112766603A (en) | Traffic flow prediction method, system, computer device and storage medium | |
CN111047078A (en) | Traffic characteristic prediction method, system and storage medium | |
CN112560948A (en) | Eye fundus map classification method and imaging method under data deviation | |
CN113487600A (en) | Characteristic enhancement scale self-adaptive sensing ship detection method | |
CN115544239A (en) | Deep learning model-based layout preference prediction method | |
CN115862324A (en) | Space-time synchronization graph convolution neural network for intelligent traffic and traffic prediction method | |
CN114973019A (en) | Deep learning-based geospatial information change detection classification method and system | |
CN111144462A (en) | Unknown individual identification method and device for radar signals | |
CN113297936A (en) | Volleyball group behavior identification method based on local graph convolution network | |
CN117034060A (en) | AE-RCNN-based flood classification intelligent forecasting method | |
CN117636183A (en) | Small sample remote sensing image classification method based on self-supervision pre-training |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |