CN115311502A - Remote sensing image small sample scene classification method based on multi-scale double-flow architecture - Google Patents

Remote sensing image small sample scene classification method based on multi-scale double-flow architecture Download PDF

Info

Publication number
CN115311502A
CN115311502A CN202211128397.3A CN202211128397A CN115311502A CN 115311502 A CN115311502 A CN 115311502A CN 202211128397 A CN202211128397 A CN 202211128397A CN 115311502 A CN115311502 A CN 115311502A
Authority
CN
China
Prior art keywords
training
images
flow network
attention
double
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211128397.3A
Other languages
Chinese (zh)
Inventor
李阳阳
陈茜
毛鹤亭
焦李成
尚荣华
李玲玲
马文萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202211128397.3A priority Critical patent/CN115311502A/en
Publication of CN115311502A publication Critical patent/CN115311502A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention discloses a remote sensing image small sample scene classification method based on a multi-scale double-flow architecture, which mainly solves the problems that image discrimination in the prior art has information loss and is easily influenced by complex background of a remote sensing image and severe change of object scale. The scheme is as follows: acquiring a data set for data preprocessing; randomly sampling the preprocessed data set to generate a support set and a query set for training, verifying and testing; constructing an integral double-flow network consisting of a global flow network, a local flow network and a key area positioning module; defining loss functions of global flow and local flow, training and verifying the whole double-flow network to obtain an optimal network model; and classifying the test samples by using the optimal network model to obtain a scene classification result. The method reduces the loss of the discrimination information of the remote sensing image, avoids the influence of the severe change of the complex background and the object scale on the scene classification, improves the classification precision, and can be used for natural disaster detection, urban planning, environment monitoring and vegetation investigation.

Description

Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
Technical Field
The invention belongs to the technical field of remote sensing image recognition, and particularly relates to a remote sensing image small sample scene classification method which can be used for natural disaster detection, city planning, environment monitoring, vegetation mapping and land cover analysis.
Background
Remote sensing is a detection technology for obtaining target information remotely, and with the rapid development of the remote sensing technology, remote sensing images play an increasingly important role in the military and civil fields. Scene classification divides each remote sensing image into different scene categories according to image content, is an important means for understanding the remote sensing images, and has wide application prospects in the fields of natural disaster detection, urban planning, environment monitoring, vegetation mapping, land coverage analysis and the like.
Before the deep learning attracts attention, the scene classification model mainly classifies by the bottom-layer features and middle-layer features that can be actually applied to the extracted image, and performs an encoding operation based on these features. In recent years, deep learning models have shown powerful learning capabilities based on the advent of large available data sets, advances in machine learning theory, and increases in available computing resources. The convolutional neural network is one of the most mainstream deep learning models in image processing, and is also the most used network model with the best performance in the field of scene classification.
However, remote sensing image scene classification based on deep learning has two basic problems. Firstly, the model depends on a large number of label training samples, the acquisition of artificially labeled high-resolution remote sensing images is very difficult and time-consuming, and if the available label data is insufficient, the deep learning model has the risk of overfitting, thereby causing performance degradation. Secondly, the deep neural network can predict the trained test samples of the scene classes with higher accuracy, but in the face of unseen class samples in the training stage, the model is difficult to classify the class samples.
In this context, research on scene classification of remote sensing images with small samples is receiving wide attention. Small sample learning is a new direction of research inspired by the ability of human fast learning, which enables machine vision systems to quickly learn new tasks from limited annotation data. Many existing small sample learning models focus on designing different architectures. For example, the method for learning based on metric finds an optimal metric space by designing different structures and metric modes, and the method for learning based on meta-learning guides a learning algorithm by designing a meta-learner, so that a model can be expected to be quickly generalized to a new task.
Li provides a deep small sample learning remote sensing scene classification method in a paper DLA-MatchNet for now-shot remote sensing scene classification. The method combines the channel attention module and the space attention module with the feature network by designing the self-adaptive discrimination learning matching network and adopting an attention mechanism and feature fusion scheme, thereby improving the feature representation capability. However, in the method, since the image is extracted as a compact image-level representation, most of the discrimination information is lost, and especially when the number of training samples is small, the loss is difficult to recover and affects the final classification result.
The peak provides a method for classifying small sample scenes of remote sensing images based on a double prototype network in patent document 202111495585.5, and the method designs two operations of prototype self-calibration and prototype mutual calibration, so that a prototype is more representative in a training process and is more beneficial to subsequent classification prediction based on the prototype, but the method is not subjected to deep extraction of the hierarchical features of the remote sensing images and is easily interfered by information such as complex backgrounds and scale transformation, and the classification precision is reduced.
Disclosure of Invention
The invention aims to provide a remote sensing image small sample scene classification method based on a multi-scale double-flow architecture aiming at the defects in the prior art, so as to reduce the loss of image discrimination information, avoid the influence of severe changes of complex background and object scale on scene classification and improve the classification precision.
In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
(1) Acquiring three different remote sensing image data sets from an open website, and sequentially carrying out cutting, random horizontal turning, random brightness enhancement, random color enhancement and random contrast enhancement on images in the data sets;
(2) Randomly sampling the preprocessed data set to obtain a training support set S 1 And training the query set T 1 Verification support set S 2 And verifying the query set T 2 Test support set S 3 And test query set T 3
(3) Constructing an integral double-flow network:
3a) Establishing a global flow network formed by connecting an attention depth embedding module A, a category-related attention module B and a measurement module C;
3b) Selecting an existing original network as a local stream network;
3c) Establishing a key area positioning module consisting of vector construction operation and greedy boundary search;
3d) The global flow network and the local flow network are connected through a key area positioning module to obtain an integral double-flow network;
(4) Using training support set S 1 And training the query set T 1 Training the whole double-flow network by a small sample scene training method to obtain a trained double-flow network;
(5) Will input the verification support set S 2 And verifying the query set T 2 Input to post-training dual-stream networkFine-tuning network parameters, and storing the network with the highest index as an optimal double-flow network model;
(6) Test support set S 3 And test query set T 3 And inputting the data into an optimal double-flow network model to obtain a final classification result.
Compared with the prior art, the invention has at least one or more of the following technical effects:
1. according to the invention, a double-flow network is constructed, and the probability of the class to which the sample belongs is calculated from the whole image and the most important region respectively, so that the loss of image discrimination information and the influence of object scale change are reduced.
2. According to the method, the attention characteristic diagram related to the scene category is obtained by designing the category-related attention module, the weight of the descriptor can be increased in the measurement process, and the interference of background information on the scene classification is reduced.
3. The invention obtains the key area of the image by designing the key area positioning, can quickly position the area with the maximum information amount in the global image, and can connect the global stream and the local stream so as to highlight the important objects which are beneficial to scene classification.
The experimental result shows that compared with the existing other methods, the method has better scene classification precision.
Drawings
FIG. 1 is a general flow chart of an implementation of the present invention;
FIG. 2 is a sub-flowchart of the category-dependent attention module computing the output attention feature map M of the present invention;
FIG. 3 is a sub-flowchart of establishing a key zone location module according to the present invention.
Detailed Description
The following describes in detail specific embodiments and effects of the present invention with reference to the accompanying drawings.
Referring to fig. 1, the implementation steps of the present invention are as follows:
step 1, acquiring a remote sensing image data set, and carrying out data preprocessing on the remote sensing image data set:
three different remote sensing image data sets are obtained from a public website, images in the data sets are cut to 224 multiplied by 224, and then the cut images are sequentially subjected to preprocessing of random horizontal turning, random brightness enhancement, random color enhancement and random contrast enhancement to obtain preprocessed remote sensing images.
And 2, randomly sampling the preprocessed data set to obtain a support set and a query set.
Randomly sampling the preprocessed data set to obtain a training support set S 1 And training the query set T 1 Verification support set S 2 And verifying the query set T 2 Test support set S 3 And test query set T 3 The method is concretely realized as follows:
2.1 ) the preprocessed three different remote sensing image data sets are divided respectively, namely, the NWPU-RESISC45 data set is divided into three parts according to the ratio of 25:10:10 into a training set, a verification set and a test set, and dividing a WHU-RS19 data set into 9:5:5, dividing the UC-Merced data set into a training set, a verification set and a test set according to the proportion of 10:6:5, dividing the ratio into a training set, a verification set and a test set;
2.2 C categories are randomly selected from the training set of each data set, K images are randomly sampled from each category, and the C x K images form a training support set S 1 Simultaneously, randomly selecting N images in each category of the C categories of residual images in equal quantity to form a training query set T 1
2.3 C categories are randomly selected from the verification set of each data set, K images are randomly sampled from each category, and a verification support set S is formed by the C x K images 2 (ii) a Then, N images are selected randomly in each category of the C categories of residual images in equal quantity to form a verification query set T 2
2.4 C) randomly selecting C categories from the test set of each data set, randomly sampling K images in each category, and forming a test support set S by the C x K images 3 Randomly selecting N images in each category of the C categories of residual images in equal quantity to form a test query set T 3
And 3, constructing an integral double-flow network.
3.1 Establish a global flow network:
3.1.1 Build an attention depth embedding module A composed of a convolution layer, four convolution blocks, an average pooling layer and a cascade of 1 × 1 convolution layers with 128 channels; the convolution layer consists of 7 multiplied by 7 convolution filters with the channel number of 64 and 3 multiplied by 3 maximum pooling operation, each convolution block consists of 4 3 multiplied by 3 convolution filters, one jump connection operation is added after every two convolution filters, the channel number of the first convolution block is 64, the channel number of the second convolution block is 128, the channel number of the third convolution block is 256, and the channel number of the fourth convolution block is 512;
3.1.2 Create category-dependent attention module B for computing an output attention profile M:
referring to fig. 2, the specific calculation of the category-related attention module B is as follows:
the input features are weighted by W g Get the compression characteristic f g (x j ) Pass weight of W k The global attention feature map f is obtained by the fully connected layer and the softmax function k (x j );
For compression characteristic f g (x j ) And a global attention feature map f k (x j ) Weighted summation is carried out, and the result is sequentially weighted as
Figure BDA0003849086020000041
The full connection layer, the ReLU activation function, a weight of
Figure BDA0003849086020000042
The full connection layer and the Sigmoid function to obtain a weight vector d i
Figure BDA0003849086020000043
Where δ denotes the ReLU activation function, σ denotes the Sigmoid activation function,
Figure BDA0003849086020000044
and
Figure BDA0003849086020000045
weights for both fully connected networks for scaling down and expanding the feature map dimensions, respectively, N representing the total number of feature map pixels, f g (x j )=W g ·x j Is a compression characteristic, f k (x j )=softmax(W k ·x j ) Is to compute an attention feature map along pixel point j,
Figure BDA0003849086020000046
representing a matrix multiplication;
vector d of weights i And after the sum of the product and the input feature point is added, nonlinear activation is carried out through a Sigmoid function, and the final output M (x) of the module is obtained:
M(x)=Sigmoid(∑d i f i )
wherein, f i Representing the characteristics of the input ith channel.
3.1.3 A build metrics module C for computing the similarity between the output given query image q to the class C
Figure BDA0003849086020000051
The implementation is as follows:
for each descriptor x i First, find its k nearest neighbors in class c
Figure BDA0003849086020000052
Recalculate x i And each of
Figure BDA0003849086020000053
The similarity between them;
and carrying out weighted summation on the similarity of the descriptors by using the attention diagram to obtain the similarity between the given query image q and the class c
Figure BDA0003849086020000054
Figure BDA0003849086020000055
Figure BDA0003849086020000056
Wherein, M (x) i ) The feature diagram of attention is shown in x i Response value at location, x i I-th descriptor representing q, m represents the total number of descriptors,
Figure BDA0003849086020000057
denotes x i The jth nearest neighbor in class c, cos (-), represents the cosine similarity between the two vectors, where other distance functions may also be used;
3.1.4 The input end of the category-related attention module B is connected to the output end of the second convolution block of the attention depth embedding module a, and then the output end of the category-related attention module B and the output end of the last 1 × 1 convolution layer of the attention depth embedding module a are simultaneously connected to the measurement module C, and the output of the measurement module C is the output of the global stream network.
3.2 The existing prototype network is selected as a local flow network, key areas of the query image and the support image are input, and the probability of the category of the key area of the query image is obtained, and the method is realized as follows:
3.2.1 Input support set image key area and query set image key area are sent into Resnet18 network for feature extraction, respectively obtain support sample feature f φ (x i ) And query sample features f φ (x q ) Using the support sample feature f φ (x i ) Prohape representation c of class k features in a compute support set k
Figure BDA0003849086020000058
Wherein S k Representing a data set of class k in the support set, x representing S k Y is its corresponding class, c k I.e. the prototype representation of class kIs the average of all embedded features in the support set for that category;
3.2.2 Computing the embedded features f of the query image φ (x q ) And class k prototype representation c k Distance between them, get the query image x q The probability of belonging to class k is:
Figure BDA0003849086020000061
wherein d (-) represents a distance function, c k′ A prototype representation of the representation category k'.
3.3 Establishing a key area positioning module to obtain the key area coordinate B = [ x ] of the attention feature map M a ,x b ,y a ,y b ]Referring to fig. 3, the key region locating module is constructed as follows:
3.3.1 Carry out vector construction operation, i.e. along the height and width directions of the space, the attention feature map is aggregated into two one-dimensional structure energy vectors:
Figure BDA0003849086020000062
wherein the content of the first and second substances,
Figure BDA0003849086020000063
is an energy vector obtained by polymerization in the width direction,
Figure BDA0003849086020000064
is an energy vector aggregated along the height direction, M represents the obtained attention feature map, M (i, W) represents a value of the feature map M at the (i, W) position, M (H, j) represents a value of the feature map M at the (H, j) position, H represents the total height, and W represents the total width;
3.3.2 Greedy boundary search is performed on the one-dimensional energy vectors to locate the most important one-dimensional region, and the coordinate point B = [ x ] of the critical region boundary box is obtained a ,x b ,y a ,y b ]:
To calculate the width boundary [ x ] of the critical region a ,x b ]For example, the greedy boundary search is implemented by:
first, the width coordinate x of the feature map is initialized 1 And x 2 And defining the key region as the smallest occupied area and containing energy not less than the total energy proportion E Tr A region of (E) [x1,x2] /E [0:W] >E Tr In which E Tr A hyper-parameter representing the energy ratio,
Figure BDA0003849086020000065
represents a width vector V w The sum of the energies of all the elements in (b),
Figure BDA0003849086020000066
representing the width x from space 1 To x 2 The sum of the energies of the regions of (a);
next, iteratively adjust [ x 1 ,x 2 ]Of the boundary of (2) to enable energy thereof
Figure BDA0003849086020000067
Converge to E Tr Nearby: when ratio of
Figure BDA0003849086020000068
Higher than E Tr When is in [ x ] 1 ,x 2 ]The region needs to be shrunk in the direction of slowest energy drop until the ratio is not higher than E Tr Until the end; when ratio of
Figure BDA0003849086020000069
Lower than E Tr At the same time, the area needs to be enlarged along the direction of fastest energy rise until the energy is not lower than E Tr Until the end;
then, mapping the boundary coordinates from the characteristic diagram to the input picture to obtain the width boundary coordinates [ x ] of the key region of the mapped input picture a ,x b ]:
x a =I w x 1 /W
x b =I w x 2 /W
Wherein, I w W represents the width of the feature map for inputting the width of the picture;
finally, the width boundary coordinate [ x ] is adopted and obtained a ,x b ]Obtaining the height boundary coordinate [ y ] of the key region of the input picture by the same calculation method a ,y b ]。
3.4 The output end of the attention module B related to the category of the global flow network is connected with the local flow network through the key area positioning module, and the whole double-flow network is obtained.
Step 4, utilizing the training support set S 1 And training the query set T 1 And training the whole double-flow network by a small sample scene training method to obtain the trained double-flow network.
4.1 Set the maximum training iteration number to be 300000, the initial learning rate to be 0.0001, and attenuate the learning rate every 100000 generations;
4.2 Add extra margin in the space of the existing cosine loss function to construct an improved cosine loss function L s
Figure BDA0003849086020000071
Wherein N represents the total number of samples in the query set,
Figure BDA0003849086020000072
query image q to class representing global stream network computation
Figure BDA0003849086020000073
The similarity between the two groups is similar to each other,
Figure BDA0003849086020000074
query images q through class c representing global stream network computation j The similarity between the images is shown in the specification, wherein M is an added extra margin hyper-parameter, k is the number of nearest neighbors, and M is the number of descriptors of a query image q;
4.3 ) cosine loss function L to be improved s With the existing central loss function L c Adding as a loss function L of a global flow network g
L g =L s +L c
Wherein the content of the first and second substances,
Figure BDA0003849086020000075
in the formula (I), the compound is shown in the specification,
Figure BDA0003849086020000076
representing support samples s i Class center of global features of (1); m is the size of each scene set, in each set, the class center is computed by averaging the global features of the corresponding support class;
4.4 Computing a negative log probability loss function L from the true label and predicted probability distribution of the image l It is set as the local flow network loss function:
Figure BDA0003849086020000081
wherein N represents the total number of samples in the query set, and C represents the total number of categories in the query set.
4.5 A training support set S 1 Images and training query set T in (1) 1 The images in the step (2) are input into a double-flow network in batches, and the value of a loss function of the images is calculated according to the probability of image prediction categories output by a global flow network and a local flow network;
4.6 Adopting Adam algorithm to carry out back propagation on the loss value and adjusting network parameters;
4.7 Steps 4.5) -4.6) are repeated until a preset maximum training iteration number is reached, and a trained double-current network is obtained.
Step 5, verifying the double-flow network:
input the verification support set S 2 And verifying the query set T 2 And inputting the network parameters into the trained double-flow network for fine adjustment, storing the network with the highest index as an optimal double-flow network model, and repeating the operation for 600 times.
Step 6, testing the double-current network:
test support set S 3 And test query set T 3 And inputting the test sample into an optimal double-current network model, outputting the probability that the test sample belongs to different categories, and taking the category with the highest probability as a final result of scene classification to finish the classification task.
The effect of the present invention is further explained by combining the simulation experiment as follows:
1. simulation experiment conditions are as follows:
1. operating platform configuration
The simulation platform of the experiment is a desktop computer with Intel (R) Core (TM) i7-7800X CPU and 32GB internal memory, the operating system is Ubuntu 18.04, a neural network is constructed by python 3.6 and Pythrch 1.4, and NVIDIA RTX 2080Ti and Cuda 10.0 are used for acceleration.
2. Simulation data set
The NWPU-RESISC45 dataset contains 45 scene categories, each category having 700 RGB images of 256 × 256;
the WHU-RS19 dataset contains 19 scene categories, 1005 RGB images of 600 × 600 in total;
the UC-Merced dataset contains 21 scene classes of 100 RGB images of 256 x 256 each.
3. Simulation parameter setting
The simulation experiment adopts an Adam optimizer, the initial learning rate is 0.0001, the training iteration times are 300000, the learning rate is attenuated every 100000 generations, the number k of nearest neighbors searched in the measurement module is set to be 3, the hyper-parameter M in the improved cosine loss function is set to be 0.01, and the hyper-parameter E of the key region positioning module Tr The setting was 70%.
The small sample scene is usually expressed as a C-way K-shot problem according to the number of categories and the number of samples of the selection support set and the query set, and the most common 5-way 1-shot small sample scene with N =15 and 5-way5-shot small sample scene with N =10 are selected in the present embodiment.
2. Emulated content
Simulation 1, namely respectively adopting the method and the existing MatchingNet, DLA-MatchNet and DN4 methods to classify scenes under the 5-way 1-shot small sample scenes of large-scale remote sensing image public data sets NWPU-RESISC45, WHU-RS19 and UC Merceded to obtain classification results of all the methods; the classification accuracy of each was calculated and the results are shown in table 1.
TABLE 1 Classification precision comparison of the present invention and the existing method in a 5-way 1-shot small sample scene of three datasets
Method NWPU-RESISC45 UC Merced WHU-RS19
Existing MatchingNet 54.46%±0.77% 46.16%±0.71% 60.60%±0.68%
Existing DLA 68.80%±0.70% 53.76%±0.62% 68.27%±1.83%
Existing DN4 66.39%±0.86% 57.25%±1.01% 82.14%±0.80%
The method of the invention 73.84%±0.80% 68.12%±0.81% 87.34%±0.62%
Simulation 2, namely respectively carrying out scene classification on a large-scale remote sensing image public data set NWPU-RESISC45, WHU-RS19 and 5-way5-shot small sample scene of UC Merceded by adopting the method and the conventional MatchingNet, DLA-MatchNet and DN4 methods to obtain classification results of all the methods; the classification accuracy of each was calculated and the results are shown in table 2.
TABLE 2 Classification accuracy comparison of the present invention and the prior art method in a 5-way5-shot small sample scenario of three datasets
Method NWPU-RESISC45 UC Merced WHU-RS19
Existing MatchingNet 67.87%±0.59% 66.73%±0.56% 82.99%±0.40%
Existing DLA 81.63%±0.46% 63.01%±0.51% 79.89%±0.33%
Existing DN4 83.24%±0.87% 79.74%±0.78% 96.02%±0.33%
The method of the invention 87.86%±0.51% 88.57%±0.52% 98.25%±0.15%
From the experimental results in tables 1 and 2, it can be seen that the accuracy of the method is highest in three data sets no matter the task is a 5-way 1-shot task or a 5-way5-shot task, which indicates that the classification performance of the method is best, the loss of image discrimination information can be reduced, the influence of severe changes of complex backgrounds and object scales on scene classification can be avoided, the classification accuracy of remote sensing image scenes in small sample scenes can be effectively improved, and the method has very important practical application value in real scenes with few classification samples.

Claims (7)

1. A remote sensing image small sample scene classification method based on a multi-scale double-flow architecture is characterized by comprising the following steps:
(1) Acquiring three different remote sensing image data sets from an open website, and sequentially carrying out cutting, random horizontal turning, random brightness enhancement, random color enhancement and random contrast enhancement on images in the data sets;
(2) Randomly sampling the preprocessed data set to obtain a training support set S 1 And training the query set T 1 Verification support set S 2 And verifying the query set T 2 Test support set S 3 And test query set T 3
(3) Constructing an integral double-flow network:
3a) Establishing a global flow network formed by connecting an attention depth embedding module A, a category-related attention module B and a measurement module C;
3b) Selecting an existing original network as a local stream network;
3c) Establishing a key area positioning module consisting of vector construction operation and greedy boundary search;
3d) The global flow network and the local flow network are connected through a key area positioning module to obtain an integral double-flow network;
(4) Using training support set S 1 And training the query set T 1 Training the whole double-flow network by a small sample scene training method to obtain a trained double-flow network;
(5) Will verify the support set S 2 And verifying the query set T 2 Inputting the network parameters into the trained double-flow network for fine adjustment, and storing the network with the highest index as an optimal double-flow network model;
(6) Test support set S 3 And test query set T 3 And inputting the data into an optimal double-flow network model to obtain a final classification result.
2. The method of claim 1, wherein the step (2) is implemented as follows:
2a) And (3) dividing three preprocessed remote sensing image data sets respectively, namely dividing the NWPU-RESISC45 data set according to the weight ratio of 25:10:10 into a training set, a verification set and a test set, and dividing a WHU-RS19 data set into 9:5:5, dividing the UC-Merced data set into a training set, a verification set and a test set according to the ratio of 10:6:5, dividing the ratio into a training set, a verification set and a test set;
2b) Respectively randomly selecting a class from the training set of each data set, randomly sampling K images in each class, and forming a training support set S by the C multiplied by K images 1 Simultaneously, randomly selecting N images in each category in equal amount in the remaining images of the C categories to form a training query set T 1
2c) Randomly selecting C categories from the verification set of each data set, randomly sampling K images in each category, and forming a verification support set S by the C multiplied by K images 2 (ii) a Randomly selecting N images in the same amount in each category in the images in the C categories to form a verification query set T 2
2d) Randomly selecting C categories from the test set of each data set, randomly sampling K images in each category, and forming a test support set S by the C multiplied by K images 3 Randomly selecting N images in each category in equal amount in the images in the C categories to form a test query set T 3
3. The method of claim 1, wherein the global flow network structure constructed in step (3 a) is as follows:
the attention depth embedding module A is formed by cascading a convolution layer, four convolution blocks, an average pooling layer and a 1 multiplied by 1 convolution layer with 128 channels; the convolution layer consists of 7 multiplied by 7 convolution filters with the channel number of 64 and 3 multiplied by 3 maximum pooling operation, each convolution block consists of 4 3 multiplied by 3 convolution filters, one jump connection operation is added after every two convolution filters, the channel number of the first convolution block is 64, the channel number of the second convolution block is 128, the channel number of the third convolution block is 256, and the channel number of the fourth convolution block is 512;
the category-dependent attention module B is used for outputting an attention feature map M,
the measurement module C is used for outputting the similarity between the given query image q and the class C
Figure FDA0003849086010000025
The input end of the category-related attention module B is connected with the output end of the second convolution block of the attention depth embedding module A, the output end of the category-related attention module B and the output end of the last 1 multiplied by 1 convolution layer of the attention depth embedding module A are simultaneously connected to the measurement module C, and the output of the measurement module C is the output of the global flow network.
4. The method of claim 3, wherein the category-dependent attention module B computes the output attention feature map M according to the following formula:
M(x)=Sigmoid(∑d i f i )
wherein f is i Characteristic of the ith channel representing input, d i Representing the weight vector, calculated as follows:
Figure FDA0003849086010000021
where δ denotes the ReLU activation function, σ denotes the Sigmoid activation function,
Figure FDA0003849086010000022
and
Figure FDA0003849086010000023
weights for both fully connected networks for scaling down and expanding the feature map dimensions, respectively, N representing the total number of feature map pixels, f g (x j )=W g ·x j ,f k (x j )=softmax(W k ·x j ) An attention feature map along pixel point j is calculated,
Figure FDA0003849086010000024
representing a matrix multiplication.
5. The method of claim 3, wherein the metrics module C computes similarity between the output given query image q to class C
Figure FDA0003849086010000031
Is for each descriptor x i First, find its k nearest neighbors in class c
Figure FDA0003849086010000032
Recalculate x i And each of
Figure FDA0003849086010000033
The similarity between the descriptors is weighted and summed by using the attention map, and the similarity between the given query image q and the class c is obtained
Figure FDA0003849086010000034
Figure FDA0003849086010000035
Figure FDA0003849086010000036
Wherein, M (x) i ) The feature diagram of attention is shown in x i Response value at location, x i I-th descriptor representing q, m represents the total number of descriptors,
Figure FDA0003849086010000037
denotes x i The jth nearest neighbor in class c, cos (-), represents the cosine similarity between the two vectors.
6. The method of claim 1, wherein step (3 c) establishes a key region location module consisting of a vector construction operation and a greedy border search as follows:
3c1) The vector construction operation is along the height and width directions of the space, and the attention is sought to be aggregated into two one-dimensional structure energy vectors:
Figure FDA0003849086010000038
wherein the content of the first and second substances,
Figure FDA0003849086010000039
is the energy vector polymerized along the height direction,
Figure FDA00038490860100000310
is an energy vector aggregated along the width direction, M represents the obtained attention feature map, M (i, W) represents a value of the feature map M at the (i, W) position, M (H, j) represents a value of the feature map M at the (H, j) position, H represents the total height, and W represents the total width;
3c2) Greedy boundary search is to quickly and accurately locate the most important one-dimensional region in a one-dimensional energy vector to obtain the coordinate point B = [ x ] of a region boundary box a ,x b ,y a ,y b ]。
7. The method of claim 1, wherein the step (4) utilizes a training support set S 1 And training the query set T 1 The whole double-flow network is trained by a small sample scene training method, and the following steps are realized:
4a) Setting the maximum training iteration number to be 300000, setting the initial learning rate to be 0.0001, and attenuating the learning rate every 100000 generations;
4b) Setting a global flow network loss function as a central loss function and adding the improved cosine loss function, and setting a local flow network loss function as a negative logarithm probability loss function;
4c) Will train the support set S 1 Images and training query set T in (1) 1 The images in the step (2) are input into a double-flow network in batches, and the value of a loss function of the images is calculated according to the probability of image prediction categories output by a global flow network and a local flow network;
4d) Adopting Adam algorithm to carry out back propagation on the loss value, and adjusting network parameters;
4e) And (5) repeating the steps (4 c) - (4 d) until the preset maximum training iteration number is reached, and obtaining the trained double-flow network.
CN202211128397.3A 2022-09-16 2022-09-16 Remote sensing image small sample scene classification method based on multi-scale double-flow architecture Pending CN115311502A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211128397.3A CN115311502A (en) 2022-09-16 2022-09-16 Remote sensing image small sample scene classification method based on multi-scale double-flow architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211128397.3A CN115311502A (en) 2022-09-16 2022-09-16 Remote sensing image small sample scene classification method based on multi-scale double-flow architecture

Publications (1)

Publication Number Publication Date
CN115311502A true CN115311502A (en) 2022-11-08

Family

ID=83866731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211128397.3A Pending CN115311502A (en) 2022-09-16 2022-09-16 Remote sensing image small sample scene classification method based on multi-scale double-flow architecture

Country Status (1)

Country Link
CN (1) CN115311502A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115828638A (en) * 2023-01-09 2023-03-21 西安深信科创信息技术有限公司 Automatic driving test scene script generation method and device and electronic equipment
CN116597419A (en) * 2023-05-22 2023-08-15 宁波弗浪科技有限公司 Vehicle height limiting scene identification method based on parameterized mutual neighbors
CN117468085A (en) * 2023-12-27 2024-01-30 浙江晶盛机电股份有限公司 Crystal bar growth control method and device, crystal growth furnace system and computer equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115828638A (en) * 2023-01-09 2023-03-21 西安深信科创信息技术有限公司 Automatic driving test scene script generation method and device and electronic equipment
CN115828638B (en) * 2023-01-09 2023-05-23 西安深信科创信息技术有限公司 Automatic driving test scene script generation method and device and electronic equipment
CN116597419A (en) * 2023-05-22 2023-08-15 宁波弗浪科技有限公司 Vehicle height limiting scene identification method based on parameterized mutual neighbors
CN116597419B (en) * 2023-05-22 2024-02-02 宁波弗浪科技有限公司 Vehicle height limiting scene identification method based on parameterized mutual neighbors
CN117468085A (en) * 2023-12-27 2024-01-30 浙江晶盛机电股份有限公司 Crystal bar growth control method and device, crystal growth furnace system and computer equipment

Similar Documents

Publication Publication Date Title
CN110135267B (en) Large-scene SAR image fine target detection method
CN110263705B (en) Two-stage high-resolution remote sensing image change detection system oriented to remote sensing technical field
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN114202672A (en) Small target detection method based on attention mechanism
CN111199214B (en) Residual network multispectral image ground object classification method
CN112633350B (en) Multi-scale point cloud classification implementation method based on graph convolution
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN112907602B (en) Three-dimensional scene point cloud segmentation method based on improved K-nearest neighbor algorithm
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN112348036A (en) Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
CN113838109B (en) Low-coincidence point cloud registration method
CN112560966B (en) Polarized SAR image classification method, medium and equipment based on scattering map convolution network
CN110704652A (en) Vehicle image fine-grained retrieval method and device based on multiple attention mechanism
CN113239753A (en) Improved traffic sign detection and identification method based on YOLOv4
CN115690627A (en) Method and system for detecting aerial image rotating target
CN115565019A (en) Single-channel high-resolution SAR image ground object classification method based on deep self-supervision generation countermeasure
CN111242028A (en) Remote sensing image ground object segmentation method based on U-Net
CN113536925B (en) Crowd counting method based on attention guiding mechanism
Zhang et al. SaltISCG: Interactive salt segmentation method based on CNN and graph cut
CN107358625B (en) SAR image change detection method based on SPP Net and region-of-interest detection
Li et al. A new algorithm of vehicle license plate location based on convolutional neural network
Liu et al. Iris double recognition based on modified evolutionary neural network
CN116386042A (en) Point cloud semantic segmentation model based on three-dimensional pooling spatial attention mechanism
Chen et al. Aircraft recognition from remote sensing images based on machine vision
CN109271833A (en) Target identification method, device and electronic equipment based on the sparse self-encoding encoder of stack

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination