CN115564982A - Same-domain remote sensing image classification method based on counterstudy - Google Patents

Same-domain remote sensing image classification method based on counterstudy Download PDF

Info

Publication number
CN115564982A
CN115564982A CN202110738534.4A CN202110738534A CN115564982A CN 115564982 A CN115564982 A CN 115564982A CN 202110738534 A CN202110738534 A CN 202110738534A CN 115564982 A CN115564982 A CN 115564982A
Authority
CN
China
Prior art keywords
remote sensing
discriminator
generator
input
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110738534.4A
Other languages
Chinese (zh)
Inventor
王慧
闫科
于克光
杨乐
李烁
李靖
蓝朝桢
于翔舟
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202110738534.4A priority Critical patent/CN115564982A/en
Publication of CN115564982A publication Critical patent/CN115564982A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to a method for classifying remote sensing images in the same domain based on counterstudy, and belongs to the technical field of remote sensing image data processing. The invention classifies the remote sensing image data by adopting the classification model formed by the generator and the discriminator, so that the classification model obtains data distribution similar to a truth label space as much as possible, thereby improving the overall perception capability and classification precision of the classification model on the input image. In addition, the generator adopts the framework of an encoder and a decoder, and the decoder adopts a multilayer convolutional neural network which is provided with a residual error part, an up-sampling part and an attention enhancement part; the output of the decoder is used as the input of the encoder, and meanwhile, the decoder is connected with the corresponding layers in the encoder so as to fuse the low-layer feature position information and the high-layer feature semantic information, improve the relevance between pixels, reduce the information loss in the up-sampling process and further improve the classification precision.

Description

Same-domain remote sensing image classification method based on counterstudy
Technical Field
The invention relates to a same-domain remote sensing image classification method based on counterstudy, and belongs to the technical field of remote sensing image data processing.
Background
The backbone networks ResNet and resnext with better performance do not exert their own performance advantages, and experimental results show that there is a certain gap between the improved methods of the above two backbone networks as encoders and the VGG network. There are two main reasons for this phenomenon: firstly, from the point of statistics, the conventional deep learning image classification method assumes that a training set and a test set obey the same distribution, i.e. the feature spaces of the training set and the test set are the same or similar; theoretically, the classification accuracy of the model on the training data should be the same as or slightly different from the accuracy on the test data, but in the actual situation, the classification accuracy on the test data is lower than that of the training data, and an overfitting phenomenon occurs. Second, although the complex deep learning model can better learn most features in the training data, some features depending on the relationship such as spatial structure and color texture cannot be effectively learned. Particularly, in a remote sensing image, large structural differences may exist between the same ground objects, and strong similarity may exist between the color tones and the textures of different ground objects, which reflects that certain relation exists between pixels at a semantic level; the traditional deep learning model training uses a cross entropy loss function for optimization, the gradient used by the loss function in backward propagation is only related to the difference between a single pixel in a prediction result and a pixel in a corresponding truth label, the correlation of neighborhood pixels is not considered, the correlation among the pixels is ignored, the precision of a classification result is low, and the situation that the edge of a ground object is discontinuous or the difference between the classification result and the truth label is large in geometric shape easily occurs.
Disclosure of Invention
The invention aims to provide a method for classifying remote sensing images in the same domain based on counterstudy, which aims to solve the problem of low classification precision of the existing remote sensing image classification method.
The invention provides a method for classifying the remote sensing images in the same domain based on counterstudy to solve the technical problems, which comprises the following steps:
acquiring remote sensing image data to be classified, and inputting the remote sensing image data to a trained classification model for classification; the classification model comprises a generator and a discriminator, wherein the generator adopts a structure of an encoder and a decoder and is used for obtaining a pixel classification prediction result of the remote sensing image; the discriminator adopts a convolutional neural network and is used for distinguishing a real label from a prediction result generated by the generator by acquiring high-order consistency between the real label and the prediction result; the generator and the arbiter are trained in a counter-training manner.
The invention classifies the remote sensing image data by adopting a classification model formed by a generator and a discriminator, and makes the classification model obtain data distribution similar to a truth label space as far as possible by utilizing the strong function fitting capacity of a generated countermeasure network, thereby improving the overall perception capability and the classification precision of the classification model to the input image.
Further, in order to improve the correlation between pixels and reduce the information loss in the up-sampling process, the encoder adopts a feature extraction network for mapping the input remote sensing image data to a high-dimensional feature space; the decoder adopts a multilayer convolution neural network, the multilayer convolution neural network comprises a plurality of convolution neural networks with different depths, and a residual error part, an up-sampling part and an attention enhancing part are arranged in each convolution neural network with different depths; the output of the decoder is used as the input of the encoder, and meanwhile, the decoder is connected with the corresponding layers in the encoder so as to fuse the position information of the low-layer feature and the semantic information of the high-layer feature.
Furthermore, the residual error part comprises two convolution modules, the input end and the output end of the residual error part are connected in a cross-layer mode, and the input end of the residual error part is used for receiving the characteristics of the spliced previous layer of convolution neural network and the encoder; the input end of the up-sampling part is used for receiving the output signal of the residual error part and restoring the feature map processed by the residual error part to the size corresponding to the high-order feature map.
Furthermore, the attention enhancing part comprises a semantic information enhancing module and a position information enhancing module which are processed in parallel, the input of the semantic information enhancing module and the position information enhancing module are both feature maps output by the up-sampling part, the semantic information enhancing module is used for operating the channel dimension of the input feature maps and completing the modeling of the relation of specific semantic information between the high-order feature maps and the low-order feature maps by utilizing the correlation between the high-order channel and the low-order channel; the position information enhancement module is used for establishing the position information correlation between the local features of the input feature map and other neighborhoods.
Further, the processing procedure of the semantic information enhancement module is as follows:
obtaining statistical information of the input characteristic diagram on the channel dimension by using global average pooling;
determining the weight of each channel dimension by utilizing linear transformation and an activation function according to the obtained statistical information of the channel dimension, wherein the calculation formula of the weight is as follows:
Figure BDA0003142365540000031
g c representing the feature vectors obtained through global average pooling,
Figure BDA0003142365540000032
and W 2 ∈R C/n×C Respectively representing the weight of 1 multiplied by 1 convolution layer, reLU represents ReLU function, sigma represents Sigmoid operation, and C represents the total number of categories;
based on the weights in each channel dimension, an enhanced feature map is determined.
Further, the enhanced feature map obtained by the location information enhancing module is:
v i,j =h i,j *x i,j
Figure BDA0003142365540000033
q=W s *x
wherein v = [ v = 1,1 ,v 1,2 ,…,v W,H ]V represents the enhanced feature map, x = [ x = 1,1 ,x 1,2 ,…,x W,H ],
Figure BDA0003142365540000034
A slice representing the input feature map along the channel dimension, (i, j) corresponds to the feature map's spatial position coordinates i e {1,2, \8230;, W }, j e {1,2, \8230;, H };
Figure BDA0003142365540000035
representing the mapping matrix after a 1 x 1 convolution operation,
Figure BDA0003142365540000036
denotes that x passes through W s Mapping the weight graph; h is i,j Is q i,j Scaled to [0,1 ] by Sigmoid]The result in (i) represents the importance of the position information at the (i, j) position in the feature map.
Further, the generator adopts a loss function in the training process as follows:
Figure BDA0003142365540000037
Figure BDA0003142365540000038
Figure BDA0003142365540000039
is used to reduce the performance of the discriminator,
Figure BDA00031423655400000310
a focus loss function for multi-path fusion to generate a correct classification prediction for each pixel of the input image, theta G The parameters of the generator G are represented by,
Figure BDA00031423655400000311
is that
Figure BDA00031423655400000312
And
Figure BDA00031423655400000313
a linear combination of (a) as
Figure BDA00031423655400000314
D (-) indicates that the discriminator judges that X as an input is the predicted value G (X) from the generator (n) ) Or truth label Y (n)
Furthermore, the discriminator is formed by connecting 8 convolutional layers in series, the convolutional kernel size of the convolutional layer is 4 × 4, the step length of the last convolutional layer is not more than 1, and the step lengths of the first convolutional layer to the seventh convolutional layer are all 2; the first convolutional layer uses the ReLU activation function, and the remaining convolutional layers use the LeakyReLU activation function.
Further, the loss function of the discriminator may be defined as follows:
Figure BDA0003142365540000041
Figure BDA0003142365540000042
wherein theta is D Represents the parameters in the discriminator D and,
Figure BDA0003142365540000043
representing a binary cross entropy loss, D (-) representing the arbiter determines that X of the input is the predicted value G (X) from the generator (n) ) Or true value label Y (n) Y represents a certain type of one-hot codes in the truth label, and x represents a certain type of prediction results generated by the generator.
Further, the parameter θ of the discriminator D Parameter θ of sum generator G The step-by-step updating is adopted, and the parameter theta of the generator is fixed firstly G Updating the parameter θ of the discriminator D Enabling the discriminator to distinguish the prediction results; re-fixing the discriminator parameter theta D Updating the parameter θ of the generator G So that the generator generates a prediction result that the discriminator cannot distinguish between true and false.
Drawings
FIG. 1 is a schematic diagram of a network architecture of a classification model employed by the present invention;
FIG. 2 is a schematic diagram of a generator in the classification model of the present invention;
FIG. 3 is a block diagram of a VGG (VGG-19) network employed by an encoder in the classification model of the present invention;
FIG. 4 is a diagram illustrating the structure of the attention and residual errors used by the decoder in the classification model according to the present invention;
FIG. 5-a is a schematic diagram of a countermeasure network employing a cGAN model framework;
FIG. 5-b is a schematic diagram of a countermeasure network employing a pix2pix model framework;
FIG. 6 is a schematic diagram of the structure of the discriminator used in the present invention;
FIG. 7 is a visualization of a multi-path fused focus loss function design employed by the generator of the present invention;
FIG. 8 is a diagram of partial predicted results of different model methods on a Vaihingen data set in an experimental example of the present invention;
FIG. 9-a is the original image No. 31 in the data set of Vaihingen in the experimental example of the present invention;
FIG. 9-b is a schematic diagram of the truth label of image No. 31 in the data set of Vaihingen in accordance with the present invention;
FIG. 9-c is a graph showing the classification of image number 31 in the Vaihingen dataset using the SVL-3 method in accordance with the present invention;
FIG. 9-d shows the results of classification of image No. 31 in the Vaihingen dataset by the RIT _ L7 method in accordance with the experimental examples of the present invention;
FIG. 9-e shows the classification result of image No. 31 in the Vaihingen dataset by DLR _8 method in the experimental example of the present invention;
FIG. 9-f shows the classification result of image number 31 in the Vaihingen data set by the CASIA method in the experimental example of the present invention;
FIG. 9-g shows the results of classifying the 31 st image in the Vaihingen data set by the AREANs-VGG method in the experimental example of the present invention;
FIG. 9-h shows the results of classification of image No. 31 in the Vaihingen dataset using the AREANs-ResNet method in the experimental example of the present invention;
FIG. 9-i shows the results of classification of image No. 3l in the Vaihingen dataset using the AREANs-ResNeSt method in the experimental example of the present invention;
FIG. 10-a is a Potsdam dataset No. 6 _13raw image according to an example of the present invention;
FIG. 10-b is a schematic diagram of the truth label of image No. 6_, 13 in the Potsdam dataset according to the example of the present invention;
FIG. 10-c shows the result of classification of image No. 6 _13in Potsdam dataset using the SVL _1 method in accordance with the present invention;
FIG. 10-d shows the results of classification of image No. 6 _13in the Potsdam dataset using the RIT _ L7 method in accordance with the present invention;
FIG. 10-e shows the classification of image No. 6 _13in Potsdam dataset using the UZ _1 method in accordance with the present invention;
FIG. 10-f shows the result of classification of image No. 6 _13in Potsdam dataset using the DST _5 method according to the experimental example of the present invention;
FIG. 10-g shows the results of classification of image No. 6 _13in the Potsdam dataset using the BKHz _3 method in accordance with the experimental example of the present invention;
FIG. 10-h shows the result of classification of image No. 6 _13in Potsdam dataset using CASIA2 method in accordance with the present invention;
FIG. 10-i shows the classification of image No. 6 _13in Potsdam dataset using BUCTY5 method in the experimental example of the present invention;
FIG. 10-j is a graph showing the results of classifying the 6 th image in Potsdam dataset by the AREANs-VGG method in accordance with the present invention;
FIG. 10-k shows the results of classification of image No. 6 _13in the Potsdam dataset using the AREANs-ResNet method in accordance with the present invention;
FIG. 10-1 shows the results of classification of image No. 6_, 13 in the Potsdam dataset using the AREANs-ResNeSt method in the experimental examples of the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings.
The classification model adopted by the method for classifying the remote sensing images in the same domain based on the counterstudy comprises a generator and a discriminator, wherein the generator adopts an encoder-decoder structure fused with a residual error attention enhancement mechanism, and the discriminator adopts a convolutional neural network. The classification model is divided into two stages during training, namely a first stage: performing supervision training on the model independently, and optimizing the model by utilizing a multi-path fusion focus loss function to ensure that the model obtains certain classification capability and provides an initial training model for the second-stage confrontation training; and a second stage: on the basis of the first stage, a discriminator is added, a confrontation training strategy is introduced, the whole framework is jointly optimized by combining the confrontation loss and a multi-path fusion focus loss function, and the image classification precision of the generator is improved.
1. Establishing a classification model
There are two types of conditional antagonistic generating network architectures, as shown in FIG. 5-a (cGAN) and as shown in FIG. 5-b (pix 2 pix). The generator input in cGAN consists of two parts: random noise z and a control condition c, and a generator G is influenced by the control condition to complete the mapping from z to G (z). The discriminator D continuously recognizes false G (z) by learning, as shown in equation (1):
Figure BDA0003142365540000061
where "1" indicates that the input for D is from the true value x and "0" indicates that the input for D is from the generator G (z). The final target function is:
Figure BDA0003142365540000062
pix2pix is aimed at mapping image space to true label space, so its input at the generator end is only the original image x, while the random noise z is realized by the Dropout layer in the network structure. The working principle of the discriminator is the same as that of the condition generation countermeasure network. To get a result as similar as possible to the truth label y, pix2pix is also augmented with L in the optimization objective 1 Distance constraint, as shown in equation (3):
Figure BDA0003142365540000071
therefore, the objective function of cGAN binding:
Figure BDA0003142365540000072
the final objective function is:
Figure BDA0003142365540000073
wherein λ is L 1 Weight of distance constraint.
Based on the above, the classification model adopted in the present invention is shown in fig. 1, and includes a generator network with a fusion attention mechanism and a residual error module and a discriminator network based on image fusion. The generator is mainly used for obtaining a pixel classification prediction result of the remote sensing image. The discriminator is used for evaluating whether the prediction result generated by the generator is reliable, and judges the input of the original image fused with the truth label as true (1) and judges the input of the original image fused with the prediction result of the generator as false (0).
The structure of the generator is shown in fig. 2, and includes two parts, namely an encoder and a decoder, wherein the encoder can be composed of various backbone networks, such as VGG, resNet, and resenestt; the decoder adopts a multilayer convolutional neural network, the multilayer convolutional neural network comprises a plurality of convolutional neural networks with different depths, and each convolutional neural network with different depths is provided with a Residual error part, an up-sampling part and an Attention enhancement part, wherein a Residual block is used as the Residual error part, an Upsampling block is used as the up-sampling part, and an Attention-enhanced block is used as the Attention enhancement part. The prediction result is output by the Softmax layer. For the present embodiment, the decoder employs five layers of convolutional neural networks, where Level-X (X =2,3,4, 5) represents neural networks at different depths in the decoder, and the last layer of neural networks is output through the sofimax layer. In the legend, a denotes the convolution + activation function, B denotes the down-sampling operation, and C denotes the embedded attention and residual structure block.
For this embodiment, the encoder uses a VGG network, as shown in fig. 3, which is composed of 5 blocks, each block is composed of convolution layer + Linear rectifying Unit (Rectified Linear Unit) + maximum pooling layer, since the human eye represents the abstraction process on different levels for visual information, channels are increased block by block, and the size of feature maps is decreased block by block. The features extracted from different blocks in the VGG all represent the expression of the target on different levels, and the higher the level is, the higher the abstraction degree is, so that the output of the network terminal is the high-level abstract expression of the input image. Given an input data set of X = [ X ] 1 ,x 2 ,…,x N ]Wherein
Figure BDA0003142365540000081
Representing the ith image of size w × h in the data setComprising c channels.
Figure BDA0003142365540000082
Which represents operations performed in the nth block of the encoder E (here, the VGG network), including conv series of convolution operations, reLU correction units (including Sigmoid, tanh, or ReLU functions), and pool operations. After a series of operations, the encoder finally outputs:
Figure BDA0003142365540000083
hierarchical features containing target visual representation information and implicit local texture information can be learned by the encoder, but classification errors are easily caused due to the lack of global context information for encoding target spatial relationships. The high-order characteristic graph of each channel obtained by the encoder can be regarded as a response to a specific category, and the high-order characteristic graph contains rich semantic information; however, high-order features often lack basic spatial information and cannot accurately describe the edge position of the target, and low-order features have complete spatial information but are limited by the receptive field and relatively lack semantic information.
The Residual block is formed by two 3 × 3 convolutions, as shown in fig. 4, the input end and the output end of the Residual block are connected in a cross-layer manner, and the input of the Residual block is the characteristics of the last layer of network and the encoder which are subjected to splicing processing. The residual error part can not only increase the depth of the network and improve the network performance, but also effectively solve the problem of model degradation caused by network deepening and relieve the problem of difficult fitting of the constant mapping of the multilayer neural network.
The feature graph processed by the residual error part enters an upsampling part, the feature graph is restored to the size of the corresponding high-order feature graph by the upsampling part, the part is realized by adopting the transposition convolution operation, and the method has the advantages that the parameters can be learned by comparing interpolation methods such as nearest neighbor interpolation, bilinear interpolation or bicubic interpolation, and the like, so that the manual presetting is not needed.
The attention enhancing part comprises a semantic information enhancing module and a position information enhancing module, the feature maps processed by the up-sampling part enter the two modules at the same time, the feature maps processed by the two modules are fused to obtain an enhanced result, and the enhanced result is different from the serial combination of CCNet.
The semantic information enhancement module operates according to the channel dimension of the input feature map, and completes the relation modeling of the specific semantic information between the high-order feature map and the low-order feature map by utilizing the correlation between the high-order channel and the low-order channel. Firstly, global Average Pooling (GAP) is used to obtain statistical information of input feature x on channel dimensions:
Figure BDA0003142365540000091
x c a feature map representing the input x on the c-th dimension channel,
Figure BDA0003142365540000092
g c representing global statistics acquired from the c-th dimension channel. Subsequently, in order to enhance the correlation of the feature maps on the different channels, a combination of linear transformations and activation functions is introduced:
Figure BDA0003142365540000093
g c representing the feature vector after passing through the GAP,
Figure BDA0003142365540000094
and W 2 ∈R C/n×C Each represents a weight of 1 × 1 convolution layer, reLU represents a ReLU function, and σ represents a Sigmoid operation. It should be noted that the full connection layer is not used for linear transformation, mainly to reduce the amount of computation and model parameters,
Figure BDA0003142365540000095
what is obtained is in each channel dimensionWith different weights, the final feature map is:
Figure BDA0003142365540000096
u=[u 1 ,u 2 ,…,u C ]the enhanced feature map is shown, and x represents a matrix multiplication in the channel dimension.
The position information enhancement module is used for establishing the position information correlation between the local features and other neighborhoods. The structure is relatively simple and can be expressed as follows:
q=W s *x (10)
Figure BDA0003142365540000097
wherein, x = [ x ] 1,1 ,x 1,2 ,…,x W,H ],
Figure BDA0003142365540000098
A slice representing the input feature map along the channel dimension, (i, j) corresponds to the spatial position coordinates of the feature map i e {1,2, \8230;, W }, j e {1,2, \8230;, H };
Figure BDA0003142365540000099
representing the mapping matrix after a 1 x 1 convolution operation,
Figure BDA00031423655400000910
denotes that x passes through W s Mapping the weight graph; h is a total of i,j Is q i,j Scaled to [0,1 ] via Sigmoid]The result in (f) represents the importance of the position information at the (i, j) position in the feature map. The final enhanced profile results are therefore:
v i,j =h i,j *x i,j (12)
v=[v 1,1 ,v 1,2 ,…,v W,H ]and v represents the enhanced feature map.
Through the parallel processing of the two modules, the feature map obtains the position information and the semantic information enhancement at the same time, and the final result is as follows:
y=u+v (13)
the function of the discriminator is to distinguish the real label and the prediction result generated by the generator by acquiring high-order consistency between the two labels. The discriminator of the present invention employs a structure similar to a markov discriminator (PatchGAN), as shown in fig. 6. The Markov discriminator judges the truth of a picture block with a specific size in the image without inputting the whole image into the discriminator, and averages the judgment results of all the picture blocks to be used as the final output result of the discriminator. The method aims to reduce the dimensionality of input data, reduce the parameter quantity and improve the operation speed of the network on the premise of ensuring the precision. In this embodiment, the discriminator is formed by connecting 8 convolutional layers in series, the convolutional kernel size of the convolutional layer is 4 × 4, and the step lengths of the other convolutional layers are 2 except for the step length of the last convolutional layer being 1; the first convolutional layer employs a ReLU activation function, and the remaining convolutional layers employ a LeakyReLU as an activation function. Here, the last layer of the discriminator is not a full-link layer in the conventional discriminator, but is a convolutional layer, and in doing so, the final output is a matrix (as shown in fig. 6, where I represents the size of the input image of the discriminator), which has a local receptive field on the input image and is more favorable for the requirement of the semantic segmentation task. On the input of the discriminator, the probability maps of each class are multiplied by corresponding RGB (Red-Green-Blue) or IRRG (Infrared-Red-Green) images to obtain a new feature map as input instead of directly adopting truth labels or predicted values. The profile contains 3 × C channels (where C denotes the number of classes) and it is more advantageous for the discriminator to use the information in the original image to distinguish between predicted results and true values.
2. Training classification models
The invention adopts a mode of countertraining to train a generator and a discriminator. The generator generates a segmentation result, the discriminator distinguishes a sample to be selected from a real sample, the two samples compete with each other in a zero-sum game framework according to data distribution, and the specific form can be represented as follows:
Figure BDA0003142365540000111
wherein X = { X (1) ,X (2) ,…,X (N) Denotes a set of input images, Y = { Y = (1) ,Y (2) ,…,Y (N) The notation indicates a truth label set corresponding to Y, V represents an objective function of the minmax game, and E represents an expected value of a distribution function; d (-) represents a discriminator, θ D Represents the parameters in the discriminator D; g (-) denotes a generator (this chapter refers to the classification network proposed in chapter III), θ G Representing the parameters in the generator G.
(1) Training discriminator
In the classification model based on generation of the countermeasure network, the penalty function of the discriminator may be defined in the form:
Figure BDA0003142365540000112
θ D the parameters in the discriminator D are represented,
Figure BDA0003142365540000113
representing binary cross-entropy loss (i.e. countering loss) [40,42] D (-) indicates that the discriminator determines that X of the input is the predicted value G (X) from the generator (n) ) Or true value label Y (n) . y represents a certain type of one-hot codes in the truth labels, and x represents a certain type of prediction results generated by the generator.
(2) Training generator
Generator through training mixing loss function
Figure BDA0003142365540000114
The generator is implemented to generate samples that are difficult for the arbiter to distinguish between "true and false".
Figure BDA0003142365540000115
The device comprises two parts:
Figure BDA0003142365540000116
and
Figure BDA0003142365540000117
Figure BDA0003142365540000118
is used to reduce the performance of the discriminator,
Figure BDA0003142365540000119
i.e., a multi-path fused focus loss function, for generating a correct classification prediction of each pixel of the input image. It is represented as follows:
Figure BDA00031423655400001110
θ G a parameter representative of the generator G is,
Figure BDA00031423655400001111
is that
Figure BDA00031423655400001112
And
Figure BDA00031423655400001113
a linear combination of (a) as
Figure BDA00031423655400001114
The penalty factor of (2).
The focus loss function for multipath fusion is defined as follows:
Figure BDA00031423655400001115
Figure BDA00031423655400001116
representing the nth shade in each batchLike the one-hot encoded tag of class c,
Figure BDA00031423655400001117
refers to the Softmax layer output, C represents the total number of classes, and N refers to the total number of images participating in training in each batch. Gamma is called "focusing parameter" and its role is to make the network focus on samples that are difficult to classify; in the invention, gamma =2 is taken.
In order to further improve the network performance, different layers of extracted feature maps are respectively extracted from a decoder, and a new Loss function form is formed by combining the feature maps with the Focal local. As shown in fig. 7, the outputs of Level-5 to Level-2 and Softmax layers in the decoder are respectively extracted, because the channel dimensions of the feature maps extracted from different layers are different (the dimensions of the feature channels extracted from Level-5 to Level-2 are 256, 256, 64, respectively, and the channel dimensions output by the Softmax layer are the total number of categories); respectively passing the characteristic graphs extracted from Level-5 to Level-2 through 1 multiplied by 1 convolutional layers, keeping the characteristic graphs consistent with the output of a Softmax layer in channel dimension, respectively calculating the Focal local and then summing, wherein the specific form is as follows:
FL fusion =∑FL(p,q) i (18)
FL(p,q) i representing the Focal local calculated after extracting the feature maps from different layers; m denotes the total number of extraction layers, where M =5.
Countermeasure training typically employs a step-by-step update of the discriminator parameter θ D Parameter θ of sum generator G The method (1). The AREANs proposed by the present invention are therefore divided into two steps: first, the parameter θ of the generator is fixed G Updating the parameter θ of the discriminator D Enabling the discriminator to distinguish the prediction results; second, fix the parameter θ of the discriminator D Updating the parameter θ of the generator G So that the generator generates a prediction that is difficult for the arbiter to distinguish between "true and false". Through the training mode of the confrontation, the statistical relationship between the prediction result and the corresponding truth label in the high-level semantics can be established, and simultaneously, in the confrontation with the discriminator, each layer in the generator can play the role.
Experimental verification
To further validate the classification performance of the classification method of the present invention, this experiment was trained on an infrared-red-blue (IR-R-G) three channel image of the variangen and Potsdam datasets.
In the first training stage, the discriminator does not participate, and the proposed Focal local training generator with multi-path fusion is used; wherein the encoder weights are initialized using the public pre-training model (VGG-19, resNet-101, and ResNeSt-101 pre-training models on ImageNet are used in this experiment), and the decoder is initialized using the Kaiming method; initial learning rate is set to 10 -4 Total iteration 10 5 Next, the process is repeated.
In the second training stage, a discriminator is introduced for confrontation training, the generator adopts the training model weight obtained in the first stage, and the discriminator is initialized by using a Kaiming method; in the aspect of setting the learning rate, the invention adopts TTUR (Two Time-Scale Update Rule) to respectively set the learning rates of the Two to different values, the method can lead the training of the GANs to be more stable on the premise of not increasing the Time cost, and the initial learning rate of the generator is set to be 10 -4 And the initial learning rate of the discriminator is set to 5X 10 -4
Figure BDA0003142365540000131
Is set to 0.05, and 3 × 10 is iterated altogether 5 Next, the process is repeated.
The middle of the two training stages does not need to be interrupted, and an end-to-end training mode is formed. Adam is adopted to train the AREANs framework, beta, provided by the invention 1 =0.5,β 2 =0.999, weight decay is 10 -4 Learning rate of 10 per iteration 5 Half the secondary decay, batch _ size =16.
The experiment assesses the degree of performance improvement by comparing the classification accuracy of the baseline method to that of the method after different improvement strategies were applied to the two data sets. Wherein "base 1", "base 2" and "base 3" respectively denote reference methods with VGG-19, resNet-101 and resnext-101 as encoders, before AR structure addition and antagonistic learning; "Baseline1+ AR", "Baseline2+ AR" and "Baseline3+ AR" are respectively improved methods after adding an AR structure; "Baseline1+ GAN", "Baseline2+ GAN" and "Baseline3+ GAN" respectively represent improved methods of introducing antagonistic learning; "Baseline1+ AR + GAN", "Baseline2+ AR + GAN" and "Baseline3+ AR + GAN" represent the improved methods after adding AR structures and resisting learning, respectively, i.e. the improved method of the ARENAs architecture proposed by the present invention. Table 1 shows the performance comparison before and after introducing AR structure, counterlearning and using the proposed method in the Vaihingen dataset using the three backbone network encoders. Wherein, the evaluation index of the ground feature type adopts an F1 score, and the overall performance index of the method respectively adopts an average cross-over ratio (mIoU) and an Overall Accuracy (OA).
TABLE 1
Figure BDA0003142365540000132
Figure BDA0003142365540000141
In summary, the AREANs architecture proposed by the present invention obtains the best test result on the gaihingen data set, and the improved methods of three different backbone networks have a greater performance improvement than the reference method: in the aspect of mIoU, the improvement is respectively 4.57%, 7.22% and 10.13% compared with the reference method; in the aspect of OA, the OA is respectively improved by 1.09%,2.53% and 3.40%.
As can be seen from Table 1, except for the ARENAs proposed in this chapter, the reference method after the counterstudy strategy is adopted has a certain performance improvement over the three reference methods; the method mainly comprises the step that the GAN can learn high-order consistency between a true value label and a prediction label, so that a prediction result of a model is in the same manifold as training data as much as possible, and the accuracy of the prediction result is improved.
Among three improved methods based on AREANs framework, baseline3+ AR + GAN obtains the optimal experimental result on all indexes. The ResNeSt-101 network is a variation of the ResNet-101 network, and improves the basic performance of the network by adding a Split-Attention module and a multi-path fusion improvement measure while keeping the depth of the network of the latter. The performance of 'Base 1ine3+ AR + GAN' with ResNeSt-101 as an encoder is greatly improved, mIoU is improved by 10.13%, and OA is improved by 3.40%.
The AREANs methods of the three backbone networks are respectively improved by 4.23%,9.27% and 9.38% in the mloU aspect and are respectively improved by 1.58%,2.66% and 2.89% in the OA aspect. The method verifies that the classification precision can be effectively improved under the condition of not increasing the iteration times.
From the classification effect on small targets (vehicles), the three AREANs framework methods are greatly improved, and F1 scores are improved by 10.10%,21.71% and 23.18% respectively. The decoder improvement strategy of combining AR structure and counterlearning is explained, so that the model has stronger sensitivity to a tiny target, particularly on a high-resolution remote sensing image with complex texture and structure.
The AREANs (AREANs-VGG, AREANs-ResNet and AREANs-ResNeSt) proposed by the invention are compared with other classical networks on the Vaihingen data set, and FCN, UNet, segNet, PSPNet and DeepLabv3+ are the most typical 5 networks in deep learning; treUNet adopts a self-adaptive Tree-CNN module mode, a confusion matrix is constructed by utilizing Tree-CNN b1ock, and misclassification in prediction is reduced by combining a Tree pruning algorithm; the HSN replaces different convolutional layers with the initiation module, so that the multi-scale receptive field of the network for rich texture information is improved, and the prediction capability of the network is enhanced; the REMSNet improves the receptivity of the network to global texture information by constructing a parallel multi-core deconvolution module and increasing an attention mechanism; the SPNet provides a design idea of a lightweight network by introducing a stripe pooling module and a mixed pooling module and combining an attention machine mechanism and a multi-path fusion idea. The above networks all adopt the architecture of "encoder-decoder", and the specific experimental results are shown in table 3.
TABLE 3
Figure BDA0003142365540000151
In general, the classification between different models is closer in appearance, mainly because they all use an architecture similar to the "coder-decoder" of the FCN. On the basis, the performance of the network is improved to different degrees through different improvement means, such as establishing high-low order feature relation (UNet, segNet), constructing multi-scale feature fusion (PSPNet, HSN), increasing attention mechanism (REMSNet) and the like. Thanks to the Attention-Residual block and the confrontation training, the AREANs provided by the invention obtain excellent results in terms of overall accuracy and F1 fraction, wherein the AREANs-ResNeSt has the best performance, and the OA value and the average F1 value are respectively improved by 1.73 percent and 3.78 percent compared with SegNet and are respectively improved by 3.11 percent and 4.91 percent compared with UNet, thereby showing the effectiveness of the method provided by the invention.
The AREANs framework adopted by the invention focuses on extracting useful characteristics including global context information and local position information, and the antagonism training method implicitly learns high-order structure information in a training stage and purifies a prediction result in a testing stage without extra time cost (as shown in Table 4).
TABLE 4
Figure BDA0003142365540000161
Partial prediction results of different model methods on the Vaihingen data set are shown in fig. 8, in which the first behavior of fig. 8 is a truth label corresponding to six original images, the second behavior is a truth label corresponding to each original image, and the rest behaviors are different algorithms to obtain classification results. It can be seen that the ground features with different colors, sizes and textures cause the problems of large intra-class difference and small inter-class difference in the images, which brings great difficulty to the task of image classification. Specifically, cars are different in color, causing large step differences, dark cars are very similar to their shadows on the road, and objects with similar texture are also easily misclassified, such as buildings and impervious layers in the third column of fig. 8, short pots and trees in the fifth and sixth columns of fig. 8. In addition, at the edge of the whole image, due to lack of correlation with the surrounding environment, it may not be possible to extract the object, as in the second column of the figure. Where FCN, UNet, segNet are low performing, the main reason for misclassification is that the network cannot effectively reuse features, which may result in a lack of useful context information. Although deplab v3+ and PSPNet employ ASPP modules and pyramid pooling modules, respectively, they are not effective in performing pixel-level classification tasks for high-resolution aerial images. Although SPNet introduced stripe pooling and hybrid pooling with attention mechanism, it did not solve the problem of misclassification due to class imbalance well. In contrast, the AREANs architecture proposed in this section outperforms other comparison methods through the AR block and the countermeasure training strategy. In addition, the generator of the AREANs architecture is optimized by a multi-path fusion focus loss function, and the influence of the imbalance-like problem is reduced to a certain extent.
In addition, the classification method of the invention is compared with the published partial convolution neural network model on the ISPRS benchmark (the specific content can be inquired: https:// www2.ISPRS. Org/communities/comm 2/wg4/results /). These models mainly include:
SVL _ X: the method is provided by an ISPRS 2D Semantic Labeling control organization party and is used as a reference for comparing the participants. And (3) comprehensively using NVDI, nDSM and SVL characteristics, and adopting an Ada-boost classifier and a CRF model to obtain a final prediction result. In the following experiments, SVL _1 indicates that CRF post-treatment was performed, and SVL _3 indicates that CRF post-treatment was not performed on the prediction results.
RIT _ L7: the model combines a random tree algorithm to extract structural information in the image and the label, and the FCN utilizes the extracted structural information to classify the pixel level. And selecting IR-R-G three-channel images and DSM data as training data.
DLR _8: the model utilizes an ensemble learning method, and introduces edge detection by integrating characteristic information of FCN, segNet and VGG on different scales, so that the accuracy of segmentation results is improved; and selecting IR-R-G three-channel images and DSM data as training data.
UZ _1: the model adopts a decoder-encoder structure, learns the spatial information of input data through a decoder, completes the restoration of characteristic information by utilizing a deconvolution structure in the decoder, and also adds nDSM data as training data.
DST _ X: adopting a mixed FCN structure, taking an image and DSM as training data, and adopting CRF to carry out post-processing on the data; in the experiment, the surrogate trained with the Vaihingen dataset was called DST _2, and the surrogate trained with the Potsdam dataset was called DST _5.
BKHN _ X: adopting a mixed structure of FCN and ResNet-101, and taking DSM and nDSM as training data besides inputting images; in the experiment, the surrogate trained with the Vaihingen dataset was named BKHN-5 and the surrogate trained with the Potsdam dataset was named BKHN-3.
GSN: an FCN structure with a threshold control mechanism is adopted, and ResNet-101 is used as an encoder.
CASIA2: the model adopts a network structure of a single self-cascade structure, and the encoder selects a ResNet-101 variant; the data only uses three-channel IRRG images, and does not use elevation data (DSM and nDSM) of ISPRS 2D Semantic Labeling control and any post-processing method, which are the same as those of AREANs mentioned herein.
ADL _3: the method combines CNN and artificial features to realize pixel-level classification of dense image blocks. Training a random forest classifier by using artificial features, and combining CNN to realize preliminary prediction of images; and finally, refining the prediction result by using the CRF.
ONE _7: the method fuses prediction results of SegNets of two scales, and uses IRRG images and NVDI (Normalized DifferenceVenetion Index), DSM and nDSM data as training data.
BUCTY5: and (3) training the network by comprehensively using IRRGB and DSM data by using a tree-shaped CNN structure and combining a pruning algorithm.
The results of the quantitative comparison of the present invention (AREANs for short) with the above method in the ISPRS Vaihingen test set are shown in Table 5, and the results of the quantitative comparison of the present invention with the above method in the ISPRS Potsdam test set are shown in Table 6.
TABLE 5
Figure BDA0003142365540000181
TABLE 6
Figure BDA0003142365540000182
Tables 5 and 6 show the results of quantitative comparisons of AREANs proposed by the present invention with other methods disclosed in ISPRS 2D Semantic Labeling Contest. Overall, AREANs achieved excellent performance on both datasets, AREANs-resenestt achieved OA values of 91.3% and 91.9% on the Vaihingen and Potsdam datasets, respectively; moreover, the correct recognition rate of small targets (the "vehicle" class) is greatly improved compared with other methods on benchmark, and the F1 scores on the Vaihingen and Potsdam data sets reach 90.5% and 97.0%, respectively.
Because a large amount of complex texture and structure information is contained in the high-resolution remote sensing image, a deeper and stronger model is selected as a feature extraction network, and the method becomes one of solutions for improving the overall performance of the model. GSN, CASIA2 and BKKHK adopt a pretrained network ResNet-101 which has excellent performance on a natural image data set as an encoder according to the thought, and are used for feature extraction of high-resolution remote sensing images. Such a design is mainly based on the following reasons: the fine tuning method based on the pre-training model can improve the generalization capability of the network; the randomly initialized network may pay more attention to the spectral information of the image target, neglect semantic information of the target, and cause the generalization capability of the network to be reduced. Therefore, the classification accuracy of the three methods is better than that of other methods, and particularly, the classification accuracy of CASIA2 and BKHN is slightly lower than that of the invention.
In the AREANs architecture provided by the invention, all indexes of AREANs-ResNeSt on two data sets are optimal, the performance of AREANs-ResNet is slightly inferior to that of the AREANs-ResNet, the performance of AREANs-VGG is relatively weakest, but the AREANs-VGG still has certain advantages compared with other methods on benchmark. Thanks to the training strategy of the counterlearning, compared with DST _ X, ONE _7 and DLR _8, the framework provided by the invention can effectively improve the model classification accuracy under the condition of only using a generator for testing without increasing the time consumption and the parameter quantity. In addition, a pre-trained CNN network is adopted in a decoder, so that an overfitting phenomenon caused by strong correlation among remote sensing image blocks is avoided. Meanwhile, in order to improve the stability of the generation of the confrontation network training, the TTUR strategy is used for training in the method, and the training difficulty is reduced. The method provided by the invention does not use any additional data (such as DSM, nDSM, NDVI, artificial design features and the like) for assistance, does not adopt additional classifiers and post-processing steps (such as CRF in RIT _ L7, DST _ X and ADL _ 3), and further does not adopt a method of model integration to improve the classification accuracy of the network.
FIGS. 9-a to 9-i and FIGS. 10-a to 10-1 show the classification results obtained by randomly selecting one image from each of the two data sets. It can be seen that the invention embeds the Attention-Residual block, effectively improves the sensing ability of the model to the position and semantic information, and obtains a better classification result by combining the pre-training model.

Claims (10)

1. A method for classifying remote sensing images in the same domain based on counterstudy is characterized by comprising the following steps:
acquiring remote sensing image data to be classified, and inputting the remote sensing image data to a trained classification model for classification; the classification model comprises a generator and a discriminator, wherein the generator adopts a structure of an encoder and a decoder and is used for obtaining a pixel classification prediction result of the remote sensing image; the discriminator adopts a convolutional neural network and is used for distinguishing a real label from a prediction result generated by the generator by acquiring high-order consistency between the real label and the prediction result; the generator and the arbiter train in a mode of countertraining.
2. The method for classifying remote sensing images in the same domain based on antagonistic learning according to claim 1, wherein the encoder adopts a feature extraction network for mapping the input remote sensing image data to a high-dimensional feature space; the decoder adopts a multilayer convolutional neural network and comprises a plurality of convolutional neural networks with different depths, and a residual error part, an up-sampling part and an attention enhancing part are arranged in each convolutional neural network with different depths; the output of the decoder is used as the input of the encoder, and meanwhile, the decoder is connected with the corresponding layers in the encoder so as to fuse the low-layer feature position information and the high-layer feature semantic information.
3. The method for classifying remote sensing images in the same domain based on antagonistic learning according to claim 2, wherein the residual part comprises two convolution modules, the input end and the output end of the residual part are connected in a cross-layer manner, and the input end of the residual part is used for receiving the characteristics of the convolutional neural network and the encoder on the previous layer after splicing; the input end of the up-sampling part is used for receiving the output signal of the residual error part and restoring the feature map processed by the residual error part to the size corresponding to the high-order feature map.
4. The method for classifying remote sensing images in the same domain based on antagonistic learning as claimed in claim 2, wherein the attention enhancing part comprises a semantic information enhancing module and a position information enhancing module which are processed in parallel, the input of the semantic information enhancing module and the position information enhancing module are both feature maps output by the up-sampling part, the semantic information enhancing module is used for operating the channel dimension of the input feature maps, and the relationship modeling of specific semantic information between the high-order feature maps and the low-order feature maps is completed by utilizing the correlation between the high-order channel and the low-order channel; the position information enhancement module is used for establishing the position information correlation between the local features of the input feature map and other neighborhoods.
5. The method for classifying remote sensing images in the same domain based on counterstudy as claimed in claim 4, wherein the semantic information enhancement module comprises the following processing procedures:
obtaining statistical information of the input characteristic diagram on the channel dimension by using global average pooling;
determining the weight of each channel dimension by using a linear transformation and an activation function according to the obtained statistical information of the channel dimension, wherein the calculation formula of the weight is as follows:
Figure FDA0003142365530000021
g c representing feature vectors obtained through global average pooling,
Figure FDA0003142365530000022
and
Figure FDA0003142365530000023
respectively representing the weight of a 1 multiplied by 1 convolution layer, reLU representing a ReLU function, sigma representing a Sigmoid operation, and C representing the total number of categories;
based on the weights in each channel dimension, an enhanced feature map is determined.
6. The method for classifying remote sensing images in the same domain based on counterstudy according to claim 4, wherein the enhanced feature map obtained by the position information enhancement module is as follows:
v i,j =h i,j *x i,j
Figure FDA0003142365530000024
q=W s *x
wherein v = [ v = 1,1 ,v 1,2 ,…,v W,H ]V denotes an enhanced feature map, x = [ x ] 1,1 ,x 1,2 ,…,x W,H ],
Figure FDA0003142365530000025
A slice representing the input feature map along the channel dimension, (i, j) corresponds to the feature map's spatial position coordinates i e {1,2, \8230;, W }, j e {1,2, \8230;, H };
Figure FDA0003142365530000026
representing the mapping matrix after a 1 x 1 convolution operation,
Figure FDA0003142365530000027
denotes that x passes through W s Mapping the weight graph; h is i,j Is q i,j Scaled to [0,1 ] by Sigmoid]The result in (ii) represents the importance of the position information at the (i, j) position in the feature map.
7. The method for classifying remote sensing images in the same domain based on antagonistic learning as claimed in claim 2, wherein the generator adopts a loss function in the training process as follows:
Figure FDA0003142365530000028
Figure FDA0003142365530000029
Figure FDA00031423655300000210
is used to reduce the performance of the discriminator,
Figure FDA00031423655300000211
a focus loss function for multi-path fusion to generate correct classification prediction for each pixel of the input image, theta G The parameters of the generator G are represented by,
Figure FDA00031423655300000212
is that
Figure FDA00031423655300000213
And
Figure FDA00031423655300000214
a linear combination of (a) as
Figure FDA00031423655300000215
D (-) indicates that the discriminator judges that X as an input is the predicted value G (X) from the generator (n) ) Or true value label Y (n)
8. The method for classifying remote sensing images in the same domain based on antagonistic learning according to claim 1 or 2, wherein the discriminator is formed by connecting 8 convolutional layers in series, the sizes of convolutional cores of the convolutional layers are 4 x 4, the step length of the last convolutional layer is not 1, and the step lengths of the first convolutional layer to the seventh convolutional layer are all 2; the first convolutional layer employs a ReLU activation function, and the remaining convolutional layers employ a LeakyReLU activation function.
9. The method for classifying remote sensing images in the same domain based on counterstudy as claimed in claim 7, wherein the loss function of the discriminator can be defined as follows:
Figure FDA0003142365530000031
Figure FDA0003142365530000032
wherein theta is D Represents the parameters in the discriminator D and,
Figure FDA0003142365530000033
representing a binary cross entropy loss, D (-) representing the arbiter determines that x of the input is from the generatorPredicted value G (X) of (n) ) Or true value label Y (n) Y represents a certain type of one-hot codes in the truth label, and x represents a certain type of prediction results generated by the generator.
10. The method for classifying remote sensing images in the same domain based on counterstudy as claimed in claim 9, wherein the parameter θ of the discriminator D Parameter θ of sum generator G The step-by-step updating is adopted, and the parameter theta of the generator is fixed firstly G Updating the parameter θ of the discriminator D Enabling the discriminator to distinguish the prediction results; re-fixing the discriminator parameter theta D Updating the parameter theta of the generator G So that the generator generates a prediction result that the discriminator cannot distinguish between true and false.
CN202110738534.4A 2021-06-30 2021-06-30 Same-domain remote sensing image classification method based on counterstudy Pending CN115564982A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110738534.4A CN115564982A (en) 2021-06-30 2021-06-30 Same-domain remote sensing image classification method based on counterstudy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110738534.4A CN115564982A (en) 2021-06-30 2021-06-30 Same-domain remote sensing image classification method based on counterstudy

Publications (1)

Publication Number Publication Date
CN115564982A true CN115564982A (en) 2023-01-03

Family

ID=84736665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110738534.4A Pending CN115564982A (en) 2021-06-30 2021-06-30 Same-domain remote sensing image classification method based on counterstudy

Country Status (1)

Country Link
CN (1) CN115564982A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116977466A (en) * 2023-07-21 2023-10-31 北京大学第三医院(北京大学第三临床医学院) Training method for enhancing CT image generation model and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116977466A (en) * 2023-07-21 2023-10-31 北京大学第三医院(北京大学第三临床医学院) Training method for enhancing CT image generation model and storage medium

Similar Documents

Publication Publication Date Title
CN110458844B (en) Semantic segmentation method for low-illumination scene
CN111914907B (en) Hyperspectral image classification method based on deep learning space-spectrum combined network
CN112347859B (en) Method for detecting significance target of optical remote sensing image
CN111259906B (en) Method for generating remote sensing image target segmentation countermeasures under condition containing multilevel channel attention
CN111612807B (en) Small target image segmentation method based on scale and edge information
CN113469094A (en) Multi-mode remote sensing data depth fusion-based earth surface coverage classification method
CN112215847B (en) Method for automatically segmenting overlapped chromosomes based on counterstudy multi-scale features
CN113421269A (en) Real-time semantic segmentation method based on double-branch deep convolutional neural network
CN113283435A (en) Remote sensing image semantic segmentation method based on multi-scale attention fusion
CN110837836A (en) Semi-supervised semantic segmentation method based on maximized confidence
CN112991350B (en) RGB-T image semantic segmentation method based on modal difference reduction
CN111127472B (en) Multi-scale image segmentation method based on weight learning
CN110738663A (en) Double-domain adaptive module pyramid network and unsupervised domain adaptive image segmentation method
CN115619743A (en) Construction method and application of OLED novel display device surface defect detection model
CN114359292A (en) Medical image segmentation method based on multi-scale and attention
CN113807356B (en) End-to-end low-visibility image semantic segmentation method
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN114255403A (en) Optical remote sensing image data processing method and system based on deep learning
CN112560624B (en) High-resolution remote sensing image semantic segmentation method based on model depth integration
CN112819039A (en) Texture recognition model establishing method based on multi-scale integrated feature coding and application
CN111325766A (en) Three-dimensional edge detection method and device, storage medium and computer equipment
CN116912708A (en) Remote sensing image building extraction method based on deep learning
CN114708455A (en) Hyperspectral image and LiDAR data collaborative classification method
CN113160032A (en) Unsupervised multi-mode image conversion method based on generation countermeasure network
CN106203373A (en) A kind of human face in-vivo detection method based on deep vision word bag model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination