CN113343953B - FGR-AM method and system for remote sensing scene recognition - Google Patents

FGR-AM method and system for remote sensing scene recognition Download PDF

Info

Publication number
CN113343953B
CN113343953B CN202110894846.4A CN202110894846A CN113343953B CN 113343953 B CN113343953 B CN 113343953B CN 202110894846 A CN202110894846 A CN 202110894846A CN 113343953 B CN113343953 B CN 113343953B
Authority
CN
China
Prior art keywords
module
remote sensing
convolution
image
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110894846.4A
Other languages
Chinese (zh)
Other versions
CN113343953A (en
Inventor
夏景明
丁悦
谈玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Zhiqiang Information Technology Co.,Ltd.
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202110894846.4A priority Critical patent/CN113343953B/en
Publication of CN113343953A publication Critical patent/CN113343953A/en
Application granted granted Critical
Publication of CN113343953B publication Critical patent/CN113343953B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a FGR-AM method for remote sensing scene recognition, which comprises the following steps: performing effective information enhancement processing and ineffective information suppression processing on the image features extracted by the 3 rd and 5 th bottle neck convolution modules; respectively extracting contour information contained in the remote sensing image and more interesting features in vision from the image features extracted by the 3 rd bottleneck convolution module, and extracting detail features contained in the remote sensing image from the image features extracted by the 5 th bottleneck convolution module; fusing the channel attention and spatial attention enhancement features; and mapping the multi-dimensional features to orthogonal k-dimensional features, and identifying and classifying the remote sensing images. According to the method, the main characteristics and the detail characteristics of the image are considered, the interested information and the detail information are extracted and fused, so that the identification precision of the network is improved, and the network can accurately identify scenes in complex scenes and similar scenes.

Description

FGR-AM method and system for remote sensing scene recognition
Technical Field
The invention relates to the technical field of computer vision, in particular to a FGR-AM method and a FGR-AM system for remote sensing scene recognition.
Background
The remote sensing scene classification means that the image is divided into blocks, and each block is attached with a proper category (such as residential areas, farmlands, rivers, forests and the like) according to the composition of the blocks. This is of great significance for image management, retrieval, analysis, detection and recognition of typical targets. As resolution increases, images become more diverse, allowing fine-grained classification and recognition. Meanwhile, the high-resolution remote sensing image has richer details, the characteristics in the image are more various, and objects on the ground are usually staggered. The similarity between images of the same type decreases, and the difference between images of the same type significantly increases. In addition, the rotational and positional relationships between objects in the image need also be considered. These problems present challenges to high precision scene classification.
The rapid development of the high-resolution remote sensing image brings new opportunities to the remote sensing scene. But there is also a greater challenge that rich image detail information contains more invalid information. For example, the park grassland and the golf course have high similarity, and after deep extraction of the image, too much detail information affects judgment of the network. A greedy layered unsupervised pre-training learning algorithm is provided, and the algorithm has better performance in both aviation scene classification and high-resolution land utilization classification. The current high-precision scene classification method mostly adopts deep CNN (such as VGG16, google lenet, ResNet 50). However, because the remote sensing image has few categories and relatively small labeled data amount, the direct application of the deep convolution features to the remote sensing image is difficult, a multi-subset feature fusion method is proposed, the deep features extracted from a plurality of convolution neural networks are fused, and the global and local information of the deep features is integrated, so that the lower-dimensional features are obtained, and the stronger resolution is realized.
In recent years, inspired by human visual mechanisms, attention mechanisms have promoted the performance of many CNN-based visual tasks. The rolling block attention module (CBAM) carries out space attention and channel attention in turn, and SKNet fuses attention-enhanced multi-scale features to realize approximate self-adaptive selection of an accepting domain. For example, the invention with the publication number of CN112861978A provides a multi-branch feature fusion remote sensing scene image classification method based on an attention mechanism, and aims to solve the problem of low accuracy in the existing method for classifying remote sensing image scenes. The process is as follows: the method comprises the following steps of firstly, acquiring a remote sensing image, and preprocessing the remote sensing image to obtain a preprocessed remote sensing image; step two, establishing a multi-branch feature fusion convolutional neural network AMB-CNN based on an attention mechanism; training a multi-branch feature fusion convolutional neural network AMB-CNN based on an attention mechanism by adopting the preprocessed remote sensing image to obtain a pre-trained multi-branch feature fusion convolutional neural network AMB-CNN based on the attention mechanism; and fourthly, classifying the remote sensing images to be recognized by adopting the trained AMB-CNN. The invention with the publication number of CN113052188A discloses a remote sensing image target detection method, which adopts a ResNet residual error network to extract a multi-scale feature map, and adopts a cross-channel information fusion mode according to the target characteristics to fuse the multi-scale features, so as to enhance the semantic information of the features and the richness of the features to obtain the fused multi-scale feature map; introducing an attention mechanism on the fused feature map to generate a probability significant feature map, weakening redundant background information in the remote sensing image and enhancing the significance of the target; and introducing the position information of each key point of the detection frame after the first regression, reconstructing a feature map with the position information, and performing final multi-class classification and positioning prediction. For the two examples, the features of the third convolution module are extracted from the former, processed by the attention module and then fused with the features of the original convolution module, so that the remote sensing scene images are classified more accurately under lower complexity; the target features of the remote sensing image are combined and then processed by the attention module, and the situations that the target size is small, background information is complex and positioning is not accurate enough in the remote sensing image can be processed.
However, both methods are not suitable for processing the situation that scenes with high similarity exist in the high-resolution remote sensing image or scenes with high similarity and scenes with large differences exist at the same time. In fact, in most of the existing methods for extracting the remote sensing scene specific graph by using the neural network, when the main features of the image are concerned, the detail information is omitted, or after the details are extracted excessively, the identification precision of the network in a high-similarity scene is reduced, so that the problems are difficult to solve.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an FGR-AM method and system for remote sensing scene recognition, which take main features and detail features of images into consideration, so that the extracted features not only contain rich detail features (rich detail features extracted by a 5 th channel enable scene categories to be accurately recognized in the remote sensing scene with difference), but also do not ignore more interesting information in vision (the features extracted by a 3 rd channel enhance effective information and effectively filter ineffective information in an attention module); the interesting information and the detail information are extracted and fused, so that the identification precision of the network is improved, and the network can accurately identify scenes in complex scenes and similar scenes.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides an FGR-AM method for remote sensing scene recognition, where the FGR-AM method includes the following steps:
s1, performing feature extraction on the input original remote sensing image by adopting 5 bottleneck convolution modules which are connected in sequence;
s2, performing effective information enhancement processing and ineffective information suppression processing on the image features extracted by the 3 rd and 5 th bottle neck convolution modules;
s3, extracting contour information and more interesting features in vision contained in the remote sensing image from the image features extracted by the 3 rd bottleneck convolution module, and extracting detail features contained in the remote sensing image from the image features extracted by the 5 th bottleneck convolution module;
s4, gathering the channel attention and the space attention enhancement features by adopting a bilinear fine-grained fusion feature module, and fusing the extracted contour information contained in the remote sensing image, the features more interesting in vision and the detail features contained in the remote sensing image to form a bilinear vector with global space and channel consistency representation;
and S5, adopting a principal component analysis module to map the multidimensional characteristics generated in the step S4 to orthogonal k-dimensional characteristics, and identifying and classifying the remote sensing images.
Optionally, in step S1, the feature extraction process of the bottleneck convolution module includes the following steps:
s11, inputting the image into a standard convolution layer with 1 x 1 convolution kernel and swish activation function to extract the characteristics, wherein the channel is expanded to n times of the number of basic channels;
s12, inputting the features extracted in the step S11 into a Depthwise convolutional layer with a convolutional kernel of 3 multiplied by 3 and a step length of 2 for feature extraction, wherein the number of channels is unchanged;
and S13, inputting the image features extracted in the step S12 into a linear convolution with a convolution kernel of 1 multiplied by 1, and reducing the dimension of the feature map to the original number of channels.
Optionally, the base number of channels for the 5 bottleneck convolution modules is 64, 128, 256, 512 and 512, respectively, in process order;
wherein, the value of the expansion value n corresponding to the 1 st and the 2 nd bottleneck convolution modules is 6; the value of the expansion value n corresponding to the 3 rd and 4 th bottleneck convolution modules is 4; the value of the expansion value n corresponding to the 5 th bottleneck convolution module is 2.
Optionally, in step S2, the process of performing the valid information enhancement processing and the invalid information suppression processing on the image features extracted by the 3 rd and 5 th bottle neck convolution modules includes the following steps:
s21, aiming at the 3 rd or 5 th bottleneck convolution module, extracting the features of the corresponding bottleneck convolution module
Figure 172522DEST_PATH_IMAGE001
Performing maximum pooling and average pooling respectively, wherein the feature dimension after pooling is 1 × 1 × c
Figure 187752DEST_PATH_IMAGE002
C represents the number of channels, h represents the height of the input feature map, and w represents the width of the input feature map;
s22, inputting the two characteristic dimensions of 1 × 1 × c after the maximum pooling and the average pooling into a shared MLP, wherein the first layer and the second layer of the MLP are c/16 and c respectively;
s23, performing weight addition on the two obtained feature vectors, and calculating a weight matrix of channel attention by using a sigmoid function to obtain
Figure 453648DEST_PATH_IMAGE003
Optionally, in step S3, the process of simultaneously extracting contour information and features of more interest in vision contained in the remote sensing image from the image features extracted by the 3 rd bottleneck convolution module and extracting detail features contained in the remote sensing image from the image features extracted by the 5 th bottleneck convolution module respectively includes the following steps:
s31, extracting the weight matrix
Figure 62484DEST_PATH_IMAGE004
To carry out
Figure 970397DEST_PATH_IMAGE001
After operation, a new weight matrix is obtained
Figure 590865DEST_PATH_IMAGE005
S32, the weight matrix is convolved by 7 × 7, and the sigmoid function is used to calculate the spatial attention weight matrix to obtain the spatial attention weight matrix
Figure 293242DEST_PATH_IMAGE006
(ii) a Wherein the content of the first and second substances,
Figure 389374DEST_PATH_IMAGE005
operated as
Figure 569820DEST_PATH_IMAGE007
F,
Figure 700587DEST_PATH_IMAGE008
Representing element-level multiplication;
s33, mixing
Figure 573865DEST_PATH_IMAGE009
And
Figure 16347DEST_PATH_IMAGE005
to carry out
Figure 484DEST_PATH_IMAGE010
After operation, a new weight matrix is obtained
Figure 985758DEST_PATH_IMAGE010
(ii) a Wherein the content of the first and second substances,
Figure 29937DEST_PATH_IMAGE010
operated as
Figure 835082DEST_PATH_IMAGE011
Optionally, in step S5, the process of mapping the multidimensional features to orthogonal k-dimensional features by using the principal component analysis module, and identifying and classifying the remote sensing image includes the following steps:
s51, in the network training stage, training by adopting a full connection layer;
and S52, when searching and identifying the image, adopting the principal component analysis module to replace the full connection layer and mapping the multidimensional characteristics to the orthogonal k-dimensional characteristics.
In a second aspect, an embodiment of the present invention provides an FGR-AM system for remote sensing scene recognition, where the FGR-AM system includes:
the FGR-AM remote sensing scene network comprises 5 bottleneck convolution modules, a first channel attention module, a first spatial attention module, a second channel attention module, a second spatial attention module, a bilinear feature fusion module and a principal component analysis module;
and the FGR-AM remote sensing scene network training module is used for replacing the main component analysis module with a full connection layer to train the FGR-AM remote sensing scene network.
The 5 bottleneck convolution modules are sequentially connected and used for carrying out feature extraction on the input original remote sensing image;
the input end of the first channel attention module is connected with the output end of the 3 rd bottleneck convolution module, and the output end of the first channel attention module is connected to the bilinear feature fusion module through the first spatial attention module; the input end of the second channel attention module is connected with the output end of the 5 th bottleneck convolution module, and the output end of the second channel attention module is connected to the bilinear feature fusion module through the second spatial attention module;
the first channel attention module and the second channel attention module are respectively used for performing effective information enhancement processing and ineffective information suppression processing on the image features extracted by the 3 rd and 5 th bottleneck convolution modules; the first spatial attention module is used for simultaneously extracting contour information contained in the remote sensing image and more interesting features in vision from the image features extracted by the 3 rd bottleneck convolution module; the second spatial attention module is used for extracting detail features contained in the remote sensing image from the image features extracted by the 5 th bottleneck convolution module;
the bilinear feature fusion module is used for gathering the channel attention and the space attention enhancement features, fusing the extracted contour information contained in the remote sensing image, the features which are more interesting in vision and the detail features contained in the remote sensing image, and forming a bilinear vector with global space and channel consistency representation;
and the principal component analysis module is used for mapping the multidimensional characteristics generated by the bilinear characteristic fusion module to orthogonal k-dimensional characteristics and identifying and classifying the remote sensing image.
Optionally, the bottleneck convolution module includes a standard convolution layer, a Depthwise convolution layer and a linear convolution layer, which are connected in sequence;
the convolution kernel of the standard convolution layer is 1 multiplied by 1, the activation function is swish, and the channel expansion is n times of the number of basic channels; the convolution kernel of the Depthwise convolution layer is 3 multiplied by 3, the step length is 2, and the number of channels is maintained to be n times of the number of basic channels; the convolution kernel of the linear convolution layer is 1 multiplied by 1, and the number of channels is reduced to the number of basic channels;
the number of basic channels of the 5 bottleneck convolution modules is 64, 128, 256, 512 and 512, respectively, in image processing order; wherein, the value of the expansion value n corresponding to the 1 st and the 2 nd bottleneck convolution modules is 6; the value of the expansion value n corresponding to the 3 rd and 4 th bottleneck convolution modules is 4; the value of the expansion value n corresponding to the 5 th bottleneck convolution module is 2.
Optionally, the process of performing valid information enhancement processing and invalid information suppression processing on the image features extracted by the 3 rd and 5 th bottleneck convolution modules by the first channel attention module and the second channel attention module includes the following steps:
aiming at the 3 rd or 5 th bottleneck convolution module, the features extracted by the corresponding bottleneck convolution module are
Figure 996811DEST_PATH_IMAGE001
Performing maximum pooling and average pooling respectively, wherein the feature dimension after pooling is 1 × 1 × c
Figure 836591DEST_PATH_IMAGE002
C represents the number of channels, h represents the height of the input feature map, and w represents the width of the input feature map;
inputting two characteristic dimensions of 1 multiplied by c after the maximum pooling and the average pooling into a shared MLP, wherein the first layer and the second layer of the MLP are respectively c/16 and c;
and performing weight addition on the two obtained feature vectors, and calculating a weight matrix of the channel attention by using a sigmoid function to obtain the channel attention weight matrix.
Optionally, the process of extracting features by the first spatial attention module or the second spatial attention module includes the following steps:
s31, extracting the weight matrix from the first channel attention module or the second channel attention module
Figure 51671DEST_PATH_IMAGE004
To carry out
Figure 78533DEST_PATH_IMAGE001
After operation, a new weight matrix is obtained
Figure 935631DEST_PATH_IMAGE005
S32, weighting the matrix
Figure 364338DEST_PATH_IMAGE005
Performing convolution processing of 7 × 7, and calculating a weight matrix of spatial attention by using sigmoid function to obtain
Figure 874954DEST_PATH_IMAGE012
(ii) a Wherein the content of the first and second substances,
Figure 920270DEST_PATH_IMAGE013
operated as
Figure 49900DEST_PATH_IMAGE014
F,
Figure 333114DEST_PATH_IMAGE008
Representing element-level multiplication;
s33, mixing
Figure 421156DEST_PATH_IMAGE015
And
Figure 422610DEST_PATH_IMAGE013
to carry out
Figure 355931DEST_PATH_IMAGE016
After operation, a new weight matrix is obtained
Figure 634597DEST_PATH_IMAGE016
(ii) a Wherein the content of the first and second substances,
Figure 627960DEST_PATH_IMAGE016
operated as
Figure 116710DEST_PATH_IMAGE017
The invention has the beneficial effects that:
the invention gives consideration to the main characteristics and the detail characteristics of the image, so that the extracted characteristics not only contain rich detail characteristics (the rich detail characteristics extracted by the 5 th channel can accurately identify scene categories in the remote sensing scene with difference), but also do not ignore more interesting information in vision (the characteristics extracted by the 3 rd channel not only enhance effective information but also effectively filter ineffective information in an attention module); the interesting information and the detail information are extracted and fused, so that the identification precision of the network is improved, and the network can accurately identify scenes in complex scenes and similar scenes.
The invention adopts 5 bottleneck convolution modules, extracts image characteristics, solves the problem of information loss caused by using an activation function in a network, and reduces parameters in the network. On the basis, a channel attention module is adopted for the No. 3 and No. 5 bottleneck convolution modules, so that effective information in a channel is enhanced, and ineffective information is restrained. Also, mid-level features represent generally better effects on scaling, rotation, and illumination changes in the image than low-level features.
The method adopts the space attention module to extract the characteristics of a more interesting part in the image vision, so as to improve the identification precision of the network on similar scenes. The invention adopts a CBP linear fusion module, outputs the outer product at the same space position, calculates bilinear characteristics, captures the pairwise correlation between characteristic channels, and provides stronger representation than a linear model through linear combination. The invention adopts PCA to map n-dimensional features from CBP to orthogonal k-dimensional features, so that the network has better generalization capability.
Drawings
FIG. 1 is a flow chart of an FGR-AM method for remote sensing scene recognition according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a channel attention module and a space attention module according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an FGR-AM remote sensing scene network according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a comparison result between the FGR-AM method of the embodiment of the present invention and several currently used remote sensing scene recognition methods.
Fig. 5 is a schematic diagram of the recognition accuracy of the FGR-AM method of the embodiment of the present invention and several common urban remote sensing scene recognition methods on the NWPU-rescic 45 data set.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.
Example one
FIG. 1 is a flow chart of an FGR-AM method for remote sensing scene recognition according to an embodiment of the present invention. The embodiment is applicable to the case of performing identification detection on the remote sensing scene image through a device such as a server, and the method can be executed by an FGR-AM system for remote sensing scene identification, and the system can be implemented in a software and/or hardware manner, and can be integrated in an electronic device, for example, an integrated server device.
Referring to fig. 1, the FGR-AM method includes the steps of:
and S1, performing feature extraction on the input original remote sensing image by adopting 5 bottleneck convolution modules which are sequentially connected.
And S2, performing effective information enhancement processing and ineffective information suppression processing on the image features extracted by the 3 rd and 5 th bottle neck convolution modules.
And S3, extracting contour information and more interesting visual features contained in the remote sensing image from the image features extracted by the 3 rd bottle neck convolution module and extracting detail features contained in the remote sensing image from the image features extracted by the 5 th bottle neck convolution module respectively.
And S4, gathering the channel attention and the space attention enhancement characteristics by adopting a bilinear fine-grained fusion characteristic module, and fusing the extracted contour information contained in the remote sensing image, the characteristics more interested in vision and the detail characteristics contained in the remote sensing image to form a bilinear vector with global space and channel consistency representation.
And S5, adopting a principal component analysis module to map the multidimensional characteristics generated in the step S4 to orthogonal k-dimensional characteristics, and identifying and classifying the remote sensing images.
In this embodiment, an FGR-AM remote sensing scene network is first constructed, and fig. 3 is a schematic structural diagram of the remote sensing scene network according to the embodiment of the present invention. The FGR-AM remote sensing scene network comprises 5 bottleneck convolution modules, a channel attention module, a space attention module, a bilinear fine-grained fusion module and a principal component analysis module.
The 5 bottleneck convolution modules are used for extracting the characteristics of the remote sensing image, the problem of information loss caused by the use of an activation function in the network is solved while the image characteristics are extracted, and the parameters in the network are reduced. The channel attention module is used for performing information enhancement operation on the image features extracted by the 3 rd and 5 th bottleneck convolution modules, enhancing effective information in a channel and inhibiting ineffective information; also, mid-level features represent generally better effects on scaling, rotation, and illumination changes in the image than low-level features. The spatial attention module is used for extracting interesting features in the remote sensing image and extracting more interesting parts in image vision, so that the identification precision of the network to similar scenes is improved. The bilinear fine-grained fusion module is used for outputting an outer product at the same spatial position by adopting the CBP linear fusion module, calculating bilinear features, and capturing pairwise correlation among feature channels by the outer product. The linear combination thus provides a stronger representation than the linear model. And the principal component analysis module is used for replacing an FC layer by PCA when retrieving and identifying images, mapping the multidimensional features to orthogonal k-dimensional features, retrieving and identifying the images and improving the generalization capability of the network.
The specific steps of the method mentioned in this embodiment are described in detail by one of the examples.
Inputting an original remote sensing image to a FGR-AM remote sensing scene network, and extracting the characteristics of the image by adopting 5 bottleneck convolution modules, wherein the method comprises the following substeps:
step 1-1, inputting a remote sensing image of an NWPU-RESISC45 data set into a network, setting image pixels as 224 multiplied by 224, performing standard convolution on the image, wherein a convolution kernel is 3 multiplied by 3, the number of channels is 64, and the size of an output feature map is 224 multiplied by 64.
Step 1-2, in the first layer of the bottleneck convolution module 1, the convolution kernel is 3 × 3, the activation function is swish, the number of channels is expanded to 6 times of the number of basic channels, and the size of the output feature map is 224 × 224 × (3 × 64).
Step 1-3, the second layer of the bottleneck convolution module 1 adopts a Depthwise convolution mode, a convolution kernel is 3 multiplied by 3, an activation function is swish, a step size is 2, and the size of an output feature map is 112 multiplied by (3 multiplied by 64).
And 1-4, reducing the dimension of the feature map by adopting a 1 × 1 standard convolution in the third layer of the bottleneck convolution module 1, wherein the size of the output feature map is 112 × 112 × 64.
1-5, the other 4 bottleneck convolution modules are similar to the module 1, the channel of the 2 nd bottleneck convolution module is 6 times of the base channel when expanded, the channels of the 3 rd and 4 th bottleneck convolution modules are 4 times of the number of the base channels when expanded, the channel of the 5 th bottleneck convolution module is 2 times of the number of the base channels when expanded, and the number of the base channels of the 5 bottleneck convolution modules is 64, 128, 256, 512 and 512 respectively.
The method comprises the steps of firstly, carrying out multi-layer extraction on image features, carrying out multi-layer feature extraction on images, and solving the problem of information loss caused by the use of an activation function in a network by not adopting the activation function in the third layer of a bottleneck convolution module, and reducing parameters in the network.
Step two, inputting the image features extracted by the 3 rd and 5 th bottleneck convolution modules into a spatial attention module, and enhancing the features extracted by the bottleneck convolution modules of different layers, wherein the spatial attention module comprises the following substeps as shown in fig. 2:
step 2-1, the feature size extracted by the 3 rd bottle neck convolution module is 56 × 56 × 256, the feature size extracted by the 5 th convolution module is 14 × 14 × 512, and the two extracted feature sizes are input into the channel attention module.
Step 2-2, taking the feature size extracted by the 3 rd bottleneck convolution module as an example, performing maximum pooling and average pooling on the features 56 × 56 × 256 extracted by the 3 rd bottleneck convolution module respectively to obtain two feature vectors of 1 × 1 × 256.
Step 2-3, two feature sizes of 1 × 1 × 256 are input into the shared MLP, the first and second layers of the MLP being 16 and 256, respectively.
And 2-4, performing weight addition on the two obtained characteristic sizes of 1 multiplied by 256, and calculating a weight matrix of channel attention by using a sigmoid function to obtain a weight matrix of 1 multiplied by 256.
And 2-5, repeating the steps on the characteristics extracted by the 5 th bottleneck convolution module.
In the second step, a channel attention module is adopted for the 3 rd and 5 th bottleneck convolution modules, so that effective information in a channel is enhanced, and ineffective information is restrained. Also, mid-level features represent generally better effects on scaling, rotation, and illumination changes in the image than low-level features. Specifically, compared with the features extracted by the 3 rd bottleneck convolution module, after the 1 st and the 2 nd bottleneck convolution modules, the feature information extracted by the two modules contains excessive invalid information. After the attention module is input, invalid information can be enhanced at the same time, and the improvement of the identification precision is not facilitated. At block 3, the network extracted features retain both contour information in the image and more interesting information in the image. And the 5 th convolution module carries out deep feature extraction, and finally, the detail features are fully extracted. The features extracted by the two convolution modules are respectively input, and after the features are output from the attention module, the obtained feature information of the 3 rd channel is strengthened in more interesting parts, such as: in the two scenes of the park grassland and the golf course in the remote sensing scene, the images have high similarity, and the two scenes are correctly identified and distinguished through the contour extracted by the 3 rd channel and the more interesting characteristics in vision, namely through the difference of buildings and other parts in the two scenes. After the detail features extracted by the 5 th channel are subjected to feature enhancement through the attention module, the recognition of scenes with large differences is high in recognition, and the scene category can be judged through detail information. The features extracted by the two modules are fused, so that the network can have better identification precision in the process of identifying the remote sensing scene, no matter the scene is higher in similarity or is higher in difference. In addition, the attention module can suppress invalid features through a channel convolution module in the attention mechanism besides enhancing the features, highlight key feature positions and obtain a global space/channel consistent representation.
Step three, extracting visually interesting features in the remote sensing image by adopting a space attention module, as shown in fig. 2, comprising the following substeps:
step 3-1, taking the extracted characteristics of the 3 rd bottleneck convolution module as an example, after passing through the channel convolution module, performing the obtained 1 × 1 × 256 weight matrix
Figure 853722DEST_PATH_IMAGE013
After operation, a new weight matrix is obtained
Figure 377108DEST_PATH_IMAGE013
Figure 275793DEST_PATH_IMAGE013
Operated as
Figure 376473DEST_PATH_IMAGE014
F, wherein F is 1X 256.
Step 3-2, weighting the matrix
Figure 651597DEST_PATH_IMAGE013
Performing convolution processing of 7 × 7, and calculating a weight matrix of spatial attention by using sigmoid function to obtain
Figure 29489DEST_PATH_IMAGE012
Step 3-3, mixing
Figure 364655DEST_PATH_IMAGE018
And
Figure 562418DEST_PATH_IMAGE013
to carry out
Figure 906812DEST_PATH_IMAGE019
After operation, a new weight matrix is obtained
Figure 513112DEST_PATH_IMAGE019
(ii) a Wherein the content of the first and second substances,
Figure 753600DEST_PATH_IMAGE020
operated as
Figure 704238DEST_PATH_IMAGE011
And thirdly, extracting the characteristics of the more interesting part in the image vision by adopting a space attention module so as to improve the identification precision of the network on the similar scene.
Step four, fusing the features extracted from different layers by adopting a bilinear fine-grained fusion feature module, inputting a final weight matrix obtained by two channels passing through an attention module into a CBP (cell based map), and integrating the channel attention and space attention enhancement features by the CBP to form a bilinear vector with global space and channel consistency representation;
and in the fourth step, a CBP linear fusion module is adopted, the outer product is output at the same spatial position, the bilinear characteristic is calculated, and the outer product captures the pairwise correlation among characteristic channels. The linear combination thus provides a stronger representation than the linear model.
Step five, in the stage of searching and identifying images, a principal component analysis module is adopted to map the multidimensional characteristics to orthogonal k-dimensional characteristics, so that the generalization capability of the network is improved, and the method comprises the following substeps:
and step 5-1, adopting an FC layer when the FGR-AM network is trained, wherein the dimension of the last FC is equal to the class number of the image data set and is 45.
And 5-2, replacing the FC layer with PCA when searching and identifying the image.
And in the fifth step, when the images are retrieved and identified, the FC layer is replaced by the PCA, and the FC layer is seriously influenced by the original training data set and cannot be popularized to a new target data set. The PCA maps the multidimensional characteristics to orthogonal k-dimensional characteristics, and the generalization capability of the network is improved. That is, in this embodiment, the FC layer is used for training in the training stage, and FC obtains high-dimensional information; in order to ensure the uniformity of output dimension reduction, principal component analysis is adopted in the retrieval and identification stage, and the generalization capability of the network is effectively improved. In this embodiment, the training and recognition modules form a principal component analysis module, rather than using principal component analysis to reduce dimensions as part of remote sensing networks in the prior art, for example, a parcel crop recognition method combining remote sensing image time sequence and texture features disclosed in the invention with publication number CN112395914A is a typical application method for reducing dimensions using principal component analysis alone.
Fig. 4 and fig. 5 are comparison results of the FGR-AM method of the present embodiment with several remote sensing scene recognition methods commonly used at present. It can be found that the FGR-AM scene network of the present embodiment has higher recognition accuracy.
Example two
The embodiment of the invention provides an FGR-AM system for remote sensing scene recognition, which comprises an FGR-AM remote sensing scene network and an FGR-AM remote sensing scene network training module.
The FGR-AM remote sensing scene network comprises 5 bottleneck convolution modules, a first channel attention module, a first spatial attention module, a second channel attention module, a second spatial attention module, a bilinear feature fusion module and a principal component analysis module. And the FGR-AM remote sensing scene network training module is used for replacing the main component analysis module with a full connection layer to train the FGR-AM remote sensing scene network.
The 5 bottleneck convolution modules are sequentially connected and used for carrying out feature extraction on the input original remote sensing image; the input end of the first channel attention module is connected with the output end of the 3 rd bottleneck convolution module, and the output end of the first channel attention module is connected to the bilinear feature fusion module through the first spatial attention module; the input end of the second channel attention module is connected with the output end of the 5 th bottleneck convolution module, and the output end of the second channel attention module is connected to the bilinear feature fusion module through the second spatial attention module; the first channel attention module and the second channel attention module are respectively used for performing effective information enhancement processing and ineffective information suppression processing on the image features extracted by the 3 rd and 5 th bottleneck convolution modules; the first spatial attention module is used for simultaneously extracting contour information contained in the remote sensing image and more interesting features in vision from the image features extracted by the 3 rd bottleneck convolution module; the second spatial attention module is used for extracting detail features contained in the remote sensing image from the image features extracted by the 5 th bottleneck convolution module; the bilinear feature fusion module is used for gathering the channel attention and the space attention enhancement features, fusing the extracted contour information contained in the remote sensing image, the features which are more interesting in vision and the detail features contained in the remote sensing image, and forming a bilinear vector with global space and channel consistency representation; and the principal component analysis module is used for mapping the multidimensional characteristics generated by the bilinear characteristic fusion module to orthogonal k-dimensional characteristics and identifying and classifying the remote sensing image.
Optionally, the bottleneck convolution module includes a standard convolution layer, a Depthwise convolution layer and a linear convolution layer, which are connected in sequence; the convolution kernel of the standard convolution layer is 1 multiplied by 1, the activation function is swish, and the channel expansion is n times of the number of the basic channels; the convolution kernel of the Depthwise convolution layer is 3 multiplied by 3, the step length is 2, and the number of channels is maintained to be n times of the number of basic channels; the convolution kernel of the linear convolution layer is 1 multiplied by 1, and the number of channels is reduced to the number of basic channels; the number of basic channels of the 5 bottleneck convolution modules is 64, 128, 256, 512 and 512, respectively, in image processing order; wherein, the value of the expansion value n corresponding to the 1 st and the 2 nd bottleneck convolution modules is 6; the value of the expansion value n corresponding to the 3 rd and 4 th bottleneck convolution modules is 4; the value of the expansion value n corresponding to the 5 th bottleneck convolution module is 2.
Optionally, the process of performing valid information enhancement processing and invalid information suppression processing on the image features extracted by the 3 rd and 5 th bottleneck convolution modules by the first channel attention module and the second channel attention module includes the following steps:
aiming at the 3 rd or 5 th bottleneck convolution module, the features extracted by the corresponding bottleneck convolution module are
Figure 321165DEST_PATH_IMAGE021
Performing maximum pooling and average pooling respectively, wherein the feature dimension after pooling is 1 × 1 × c
Figure 673649DEST_PATH_IMAGE022
C denotes the number of channels, h denotes the height of the input feature map, and w denotes the width of the input feature map. Inputting two characteristic dimensions of 1 multiplied by c after maximum pooling and average pooling into a shared MLP, wherein the first layer and the second layer of the MLP are respectively c/16 and c. The obtained two feature vectors are subjected to weight addition, and the sigmoid function is used for calculating the channel attentionA weight matrix of forces, to obtain
Figure 209672DEST_PATH_IMAGE023
Optionally, the process of extracting features by the first spatial attention module or the second spatial attention module comprises the following steps:
s31, extracting the weight matrix from the first channel attention module or the second channel attention module
Figure 382027DEST_PATH_IMAGE024
To carry out
Figure 802644DEST_PATH_IMAGE021
After operation, a new weight matrix is obtained
Figure 744056DEST_PATH_IMAGE013
(ii) a S32, weighting the matrix
Figure 857505DEST_PATH_IMAGE013
Performing convolution processing of 7 × 7, and calculating a weight matrix of spatial attention by using sigmoid function to obtain
Figure 392523DEST_PATH_IMAGE012
(ii) a Wherein the content of the first and second substances,
Figure 351252DEST_PATH_IMAGE013
operated as
Figure 412749DEST_PATH_IMAGE014
F,
Figure 697099DEST_PATH_IMAGE008
Representing element-level multiplication; s33, mixing
Figure 578468DEST_PATH_IMAGE015
And
Figure 465521DEST_PATH_IMAGE013
to carry out
Figure 381525DEST_PATH_IMAGE016
After operation, a new weight matrix is obtained
Figure 571197DEST_PATH_IMAGE016
(ii) a Wherein the content of the first and second substances,
Figure 471020DEST_PATH_IMAGE016
operated as
Figure 771552DEST_PATH_IMAGE017
Through the FGR-AM system of the second embodiment of the invention, the transmission object is determined by establishing the data containing relation of the whole application, and the aim of identifying and detecting the remote sensing scene image is achieved. The FGR-AM system provided by the embodiment of the invention can execute the FGR-AM method for remote sensing scene identification provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (9)

1. An FGR-AM method for remote sensing scene recognition, characterized in that the FGR-AM method comprises the following steps:
s1, performing feature extraction on the input original remote sensing image by adopting 5 bottleneck convolution modules which are connected in sequence;
s2, connecting a channel attention module and a space attention module respectively after the 3 rd bottle neck convolution module and the 5 th bottle neck convolution module to form two channels;
the channel attention module is used for enhancing effective information in a corresponding channel and inhibiting ineffective information;
the spatial attention module is used for simultaneously extracting contour information and more interesting features in vision contained in the remote sensing image from the image features extracted by the 3 rd bottleneck convolution module and extracting detail features contained in the remote sensing image from the image features extracted by the 5 th bottleneck convolution module;
s3, inputting the final weight matrix obtained by the two channels through the spatial attention module into a bilinear fine-grained fusion feature module, namely fusing the extracted contour information contained in the remote sensing image, the features more interesting in vision and the detailed features contained in the remote sensing image to form a bilinear vector with global space and channel consistency representation;
and S4, adopting a principal component analysis module to map the multidimensional characteristics generated in the step S3 to orthogonal k-dimensional characteristics, and identifying and classifying the remote sensing images.
2. The FGR-AM method for remote sensing scene recognition of claim 1, wherein in step S1, the feature extraction process of the bottleneck convolution module comprises the following steps:
s11, inputting the image into a standard convolution layer with 1 x 1 convolution kernel and swish activation function to extract the characteristics, wherein the channel is expanded to n times of the number of basic channels;
s12, inputting the features extracted in the step S11 into a Depthwise convolutional layer with a convolutional kernel of 3 multiplied by 3 and a step length of 2 for feature extraction, wherein the number of channels is unchanged;
and S13, inputting the image features extracted in the step S12 into a linear convolution with a convolution kernel of 1 multiplied by 1, and reducing the dimension of the feature map to the original number of channels.
3. The FGR-AM method for remote sensing scene recognition of claim 2, wherein, in order of processing, the number of fundamental channels of the 5 bottleneck convolution modules is 64, 128, 256, 512 and 512;
wherein, the value of the expansion value n corresponding to the 1 st and the 2 nd bottleneck convolution modules is 6; the value of the expansion value n corresponding to the 3 rd and 4 th bottleneck convolution modules is 4; the value of the expansion value n corresponding to the 5 th bottleneck convolution module is 2.
4. The FGR-AM method for remote sensing scene recognition according to claim 1, wherein in step S2, the process of the channel attention module enhancing the valid information and suppressing the invalid information in the corresponding channel comprises the following steps:
s21, aiming at the 3 rd or 5 th bottleneck convolution module, respectively carrying out maximum pooling and average pooling on the features F extracted by the corresponding bottleneck convolution module, wherein the feature dimension after pooling is 1 × 1 × c, and F belongs to Rc×h×wC represents the number of channels, h represents the height of the input feature map, and w represents the width of the input feature map;
s22, inputting the two characteristic dimensions of 1 × 1 × c after the maximum pooling and the average pooling into a shared MLP, wherein the first layer and the second layer of the MLP are c/16 and c respectively;
s23, performing weight addition on the two obtained feature vectors, and calculating a weight matrix of channel attention by using a sigmoid function to obtain MC∈Rc×1×1
5. The FGR-AM method for remote sensing scene recognition according to claim 4, wherein in step S2, the process of extracting contour information and features of more interest in vision simultaneously from the image features extracted by the 3 rd bottleneck convolution module and extracting detail features contained in the remote sensing image from the image features extracted by the 5 th bottleneck convolution module by the spatial attention module comprises the following steps:
s31, extracting the weight matrix MCAfter F1 operation, obtaining a new weight matrix F';
s32, the weight matrix F' is convoluted by 7 multiplied by 7, the sigmoid function is used to calculate the weight matrix of the space attention, and M is obtainedS∈R1×h×w(ii) a Wherein F1 is calculated as
Figure FDA0003344536290000021
Figure FDA0003344536290000022
Representing element-level multiplication;
s33, mixing MSF2 operation is carried out on the weight matrix F 'to obtain a new weight matrix F'; wherein F2 is calculated as
Figure FDA0003344536290000023
6. An FGR-AM system for remote sensing scene recognition based on the method of claim 1, wherein the FGR-AM system comprises:
the FGR-AM remote sensing scene network comprises 5 bottleneck convolution modules, a first channel attention module, a first spatial attention module, a second channel attention module, a second spatial attention module, a bilinear feature fusion module and a principal component analysis module;
the FGR-AM remote sensing scene network training module is used for replacing the main component analysis module with a full connection layer to train the FGR-AM remote sensing scene network;
the 5 bottleneck convolution modules are sequentially connected and used for carrying out feature extraction on the input original remote sensing image;
the input end of the first channel attention module is connected with the output end of the 3 rd bottleneck convolution module, and the output end of the first channel attention module is connected to the bilinear feature fusion module through the first spatial attention module; the input end of the second channel attention module is connected with the output end of the 5 th bottleneck convolution module, and the output end of the second channel attention module is connected to the bilinear feature fusion module through the second spatial attention module;
the first channel attention module and the second channel attention module are respectively used for performing effective information enhancement processing and ineffective information suppression processing on the image features extracted by the 3 rd and 5 th bottleneck convolution modules; the first spatial attention module is used for simultaneously extracting contour information contained in the remote sensing image and more interesting features in vision from the image features extracted by the 3 rd bottleneck convolution module; the second spatial attention module is used for extracting detail features contained in the remote sensing image from the image features extracted by the 5 th bottleneck convolution module;
the bilinear feature fusion module is used for gathering the channel attention and the space attention enhancement features, and fusing the extracted contour information contained in the remote sensing image, the features more interesting in vision and the detail features contained in the remote sensing image to form a bilinear vector with global space and channel consistency representation;
and the principal component analysis module is used for mapping the multidimensional characteristics generated by the bilinear characteristic fusion module to orthogonal k-dimensional characteristics and identifying and classifying the remote sensing image.
7. The FGR-AM system for remote sensing scene recognition of claim 6, wherein the bottleneck convolution module comprises a standard convolution layer, a Depthwise convolution layer and a linear convolution layer connected in sequence;
the convolution kernel of the standard convolution layer is 1 multiplied by 1, the activation function is swish, and the channel expansion is n times of the number of basic channels; the convolution kernel of the Depthwise convolution layer is 3 multiplied by 3, the step length is 2, and the number of channels is maintained to be n times of the number of basic channels; the convolution kernel of the linear convolution layer is 1 multiplied by 1, and the number of channels is reduced to the number of basic channels;
the number of basic channels of the 5 bottleneck convolution modules is 64, 128, 256, 512 and 512, respectively, in image processing order; wherein, the value of the expansion value n corresponding to the 1 st and the 2 nd bottleneck convolution modules is 6; the value of the expansion value n corresponding to the 3 rd and 4 th bottleneck convolution modules is 4; the value of the expansion value n corresponding to the 5 th bottleneck convolution module is 2.
8. The FGR-AM system for remote sensing scene recognition of claim 6, wherein the first and second channel attention modules performing effective information enhancement and ineffective information suppression processing on the image features extracted by the 3 rd and 5 th bottleneck convolution modules comprises the following steps:
for the 3 rd or 5 th bottle neckA convolution module for performing maximum pooling and average pooling treatment on the features F extracted by the corresponding bottleneck convolution module respectively, wherein the feature dimension after pooling is 1 × 1 × c, wherein F belongs to Rc×h×wC represents the number of channels, h represents the height of the input feature map, and w represents the width of the input feature map;
inputting two characteristic dimensions of 1 multiplied by c after the maximum pooling and the average pooling into a shared MLP, wherein the first layer and the second layer of the MLP are respectively c/16 and c;
carrying out weight addition on the two obtained eigenvectors, calculating a weight matrix of channel attention by using a sigmoid function, and obtaining MC∈Rc×1×1
9. The FGR-AM system for remote sensing scene recognition of claim 8, wherein the first or second spatial attention module feature extraction process comprises the steps of:
s31, extracting the weight matrix M from the first channel attention module or the second channel attention moduleCAfter F1 operation, obtaining a new weight matrix F';
s32, the weight matrix F' is convoluted by 7 multiplied by 7, the sigmoid function is used to calculate the weight matrix of the space attention, and M is obtainedS∈R1×h×w(ii) a Wherein F1 is calculated as
Figure FDA0003344536290000031
Figure FDA0003344536290000032
Representing element-level multiplication;
s33, mixing MSF2 operation is carried out on the weight matrix F 'to obtain a new weight matrix F'; wherein F2 is calculated as
Figure FDA0003344536290000033
CN202110894846.4A 2021-08-05 2021-08-05 FGR-AM method and system for remote sensing scene recognition Active CN113343953B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110894846.4A CN113343953B (en) 2021-08-05 2021-08-05 FGR-AM method and system for remote sensing scene recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110894846.4A CN113343953B (en) 2021-08-05 2021-08-05 FGR-AM method and system for remote sensing scene recognition

Publications (2)

Publication Number Publication Date
CN113343953A CN113343953A (en) 2021-09-03
CN113343953B true CN113343953B (en) 2021-12-21

Family

ID=77480806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110894846.4A Active CN113343953B (en) 2021-08-05 2021-08-05 FGR-AM method and system for remote sensing scene recognition

Country Status (1)

Country Link
CN (1) CN113343953B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114995782B (en) * 2022-08-03 2022-10-25 上海登临科技有限公司 Data processing method, device, equipment and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688784A (en) * 2017-08-23 2018-02-13 福建六壬网安股份有限公司 A kind of character identifying method and storage medium based on further feature and shallow-layer Fusion Features
CN111462126A (en) * 2020-04-08 2020-07-28 武汉大学 Semantic image segmentation method and system based on edge enhancement

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104240256B (en) * 2014-09-25 2017-03-15 西安电子科技大学 A kind of image significance detection method based on the sparse modeling of stratification
WO2019222759A1 (en) * 2018-05-18 2019-11-21 Synaptics Incorporated Recurrent multimodal attention system based on expert gated networks
CN109871798B (en) * 2019-02-01 2021-06-29 浙江大学 Remote sensing image building extraction method based on convolutional neural network
CN110473267A (en) * 2019-07-12 2019-11-19 北京邮电大学 Social networks image based on attention feature extraction network describes generation method
CN111639594B (en) * 2020-05-29 2023-09-22 苏州遐迩信息技术有限公司 Training method and device for image description model
CN111680667B (en) * 2020-07-13 2022-06-24 北京理工大学重庆创新中心 Remote sensing image ground object classification method based on deep neural network
CN112203098B (en) * 2020-09-22 2021-06-01 广东启迪图卫科技股份有限公司 Mobile terminal image compression method based on edge feature fusion and super-resolution
CN112070070B (en) * 2020-11-10 2021-02-09 南京信息工程大学 LW-CNN method and system for urban remote sensing scene recognition
CN112464787B (en) * 2020-11-25 2022-07-08 北京航空航天大学 Remote sensing image ship target fine-grained classification method based on spatial fusion attention
CN112784779A (en) * 2021-01-28 2021-05-11 武汉大学 Remote sensing image scene classification method based on feature pyramid multilevel feature fusion
CN112861978B (en) * 2021-02-20 2022-09-02 齐齐哈尔大学 Multi-branch feature fusion remote sensing scene image classification method based on attention mechanism
CN113033630A (en) * 2021-03-09 2021-06-25 太原科技大学 Infrared and visible light image deep learning fusion method based on double non-local attention models
CN113052188A (en) * 2021-03-26 2021-06-29 大连理工大学人工智能大连研究院 Method, system, equipment and storage medium for detecting remote sensing image target
CN113192040B (en) * 2021-05-10 2023-09-22 浙江理工大学 Fabric flaw detection method based on YOLO v4 improved algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688784A (en) * 2017-08-23 2018-02-13 福建六壬网安股份有限公司 A kind of character identifying method and storage medium based on further feature and shallow-layer Fusion Features
CN111462126A (en) * 2020-04-08 2020-07-28 武汉大学 Semantic image segmentation method and system based on edge enhancement

Also Published As

Publication number Publication date
CN113343953A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN110728263B (en) Pedestrian re-recognition method based on strong discrimination feature learning of distance selection
Peng et al. Detecting heads using feature refine net and cascaded multi-scale architecture
CN112101150B (en) Multi-feature fusion pedestrian re-identification method based on orientation constraint
Gao et al. Change detection from synthetic aperture radar images based on channel weighting-based deep cascade network
Tian et al. A dual neural network for object detection in UAV images
CN103336957B (en) A kind of network homology video detecting method based on space-time characteristic
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
CN102385592B (en) Image concept detection method and device
CN111612008A (en) Image segmentation method based on convolution network
CN111027576B (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
CN111738344A (en) Rapid target detection method based on multi-scale fusion
CN112580480B (en) Hyperspectral remote sensing image classification method and device
CN109710804B (en) Teaching video image knowledge point dimension reduction analysis method
CN116863539A (en) Fall figure target detection method based on optimized YOLOv8s network structure
CN111967464A (en) Weak supervision target positioning method based on deep learning
CN113743484A (en) Image classification method and system based on space and channel attention mechanism
CN111639697B (en) Hyperspectral image classification method based on non-repeated sampling and prototype network
CN114663707A (en) Improved few-sample target detection method based on fast RCNN
CN117037004A (en) Unmanned aerial vehicle image detection method based on multi-scale feature fusion and context enhancement
CN113343953B (en) FGR-AM method and system for remote sensing scene recognition
Chen et al. Part alignment network for vehicle re-identification
Lou et al. Multi-scale context attention network for image retrieval
Gao et al. Adaptive random down-sampling data augmentation and area attention pooling for low resolution face recognition
CN111582057A (en) Face verification method based on local receptive field
Ling et al. A facial expression recognition system for smart learning based on YOLO and vision transformer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230221

Address after: Room 401, Floor 4, Building 1, No. 69, Olympic Street, Jianye District, Nanjing, Jiangsu, 210000

Patentee after: Nanjing Zhiqiang Information Technology Co.,Ltd.

Address before: 210000 No. 219 Ningliu Road, Pukou District, Nanjing City, Jiangsu Province

Patentee before: Nanjing University of Information Science and Technology

TR01 Transfer of patent right