CN114863283A

CN114863283A - SAR image target identification method combining transfer learning and attention mechanism

Info

Publication number: CN114863283A
Application number: CN202210579119.3A
Authority: CN
Inventors: 肖永生; 毛聪; 黄丽贞; 贺丰收; 胡义海; 饶烜
Original assignee: Nanchang Hangkong University
Current assignee: Nanchang Hangkong University
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-08-05

Abstract

The invention discloses an SAR image target identification method combining transfer learning and attention mechanism, which comprises the following steps of S1, extracting a convolution characteristic graph from an input SAR image through a convolution kernel of 7 × 7; s2, inputting the obtained convolution characteristic diagram into an attention mechanism for focusing to obtain a primary focusing characteristic diagram; s3, inputting the initial focusing feature map into the fine-tuned residual error network to obtain a weight feature map, wherein the residual error network is formed by combining a deep migration learning method, and a weight model trained by the residual error network on a known data set is migrated into the SAR image recognition work; s4, obtaining a weight characteristic diagram containing the weight, inputting the weight characteristic diagram into an attention mechanism again for focusing; s5, inputting the focused target recognition feature map containing the weight into a subsequent convolutional layer for classification to obtain a target recognition result.

Description

SAR image target identification method combining transfer learning and attention mechanism

Technical Field

The invention relates to a method for recognizing an SAR image target by combining transfer learning and an attention mechanism.

Background

The SAR is used as a high-resolution imaging radar, is not influenced by conditions such as weather, illumination and the like, has certain ground surface penetration capacity, can realize penetration detection of a hidden target, and can realize continuous earth observation all day long and all weather. These excellent properties have led to an increasing range of SAR applications in both civilian and military applications.

In recent years, the technology of SAR has been developed faster and faster, the imaging quality has been better and better, and the image resolution has been higher and higher, but the Automatic Target Recognition (ATR) based on SAR images has been developed relatively slowly. The difficulties of SAR ATR are mainly reflected in two aspects: (1) the influence of noise interference, especially the model performance is reduced under the condition of serious noise, and even the target problem cannot be correctly identified; (2) the information quantity is large, the information is messy and not focused enough, and the related information cannot be identified quickly and accurately.

Disclosure of Invention

The invention aims to solve the technical problems in the prior art and provides an SAR image target identification method combining transfer learning and attention mechanism.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a SAR image target identification method combining transfer learning and attention mechanism comprises the following steps:

s1, extracting a convolution feature map from the input SAR image through 7 × 7 convolution kernels;

s2, inputting the obtained convolution characteristic diagram into an attention mechanism for focusing to obtain a primary focusing characteristic diagram, wherein the attention mechanism is used for filtering interest points from a large amount of digital image information, selectively filtering important information and focusing;

s3, inputting the initial focusing feature map into the fine-tuned residual error network to obtain a weight feature map, wherein the residual error network is formed by utilizing a depth convolution network to complete an end-to-end target identification task and then combining a depth migration learning method, a weight model trained by the residual error network on a known data set is migrated into the SAR image identification work, and the SAR image identification model training time is accelerated;

s4, obtaining a weight characteristic diagram containing the weight, inputting the weight characteristic diagram into an attention mechanism again for focusing;

and S5, inputting the focused target recognition feature map containing the weight into a subsequent convolutional layer for classification to obtain a target recognition result.

Preferably, the deep migration learning method in S3 is fine tuning of a deep network, and a weight model trained by a residual error network on an ImageNet data set is migrated to an SAR image recognition operation.

Preferably, the attention mechanism in S2 is a CBAM hybrid attention module, and the operation method of the hybrid attention module is a combination of a channel attention mechanism and a spatial attention mechanism.

Preferably, the channel attention mechanism is as follows: the channel sub-module utilizes the maximum convergent output and the average convergent output of the shared network, and the channel attention mechanism calculation formula is as follows:

wherein A is _c (F) Representing the channel attention mechanism function, F representing the input of the channel attention mechanism, and calculating in a matrix form, sigma representing the sigmoid function, W ₀ Representing the multi-layer perceptron built-in hidden layer weight computation,W ₁ representing the multi-layer perceptron built-in output layer weight computation,

indicating that input F is averaged pooled within the channel attention mechanism;

indicating the maximum pooling of input F within the channel attention mechanism.

Preferably, said W ₀ And W ₁ For two input sharing, the ReLU activation function is followed by W ₀ ；W ₀ ∈R ^C/r×C ，W ₁ ∈R ^C×C/r ，W ₀ ∈R ^C/r×C Represents W ₀ The method comprises the steps that weight values between C neurons of an input layer and C/R neurons of a hidden layer are input, C represents the channel number of SAR image data, R represents a hyper-parameter of a multilayer perceptron, and R represents picture data; w ₁ ∈R ^C×C/r Represents W ₁ Is the weight value between the C/r neurons of the hidden layer to the C neurons of the output layer.

Preferably, the spatial attention mechanism is as follows: the spatial submodule takes two similar outputs collected along the channel axis and forwards them to the convolutional layer, the formula for the spatial attention mechanism being:

wherein A is _s (F) Representing a function of a spatial attention mechanism, and F representing an input of the spatial attention mechanism; σ denotes sigmoid function, f ^7×7 Represents a convolution operation with a convolution kernel size of 7 x 7,

representing the average pooling of image data along the channel dimension in a spatial attention mechanism;

indicating doing image data in a spatial attention mechanismMaximal pooling is done along the channel dimension.

Preferably, the first and second liquid crystal materials are,

representing the feature map dimensions obtained by stitching together the average pooled data and the maximum pooled tree along the channel dimensions

Representing that the feature map data dimension becomes 1 × H × W after the input feature map data is subjected to the average pooling process of the spatial attention mechanism;

indicating that the feature map data dimension becomes 1 × H × W after maximum pooling processing of the input feature map data in the spatial attention mechanism.

Preferably, the target recognition result obtained in S5 is used as the target recognition vector M; the target identification vector M is obtained by multiplying the weight A obtained by attention learning by the input H:

M＝AH (3)

in the above formula, H is the attention layer input, and A is the attention coding output;

a is calculated as follows:

wherein,

is a matrix multiplication element; a. the _c Attention is drawn to one-dimensional channels, A _s Attention is drawn to two-dimensional space.

The invention has the beneficial effects that:

compared with the traditional SAR image target identification method, the SAR ATR method based on the deep migration learning has stronger expression capability, can quickly extract image characteristics by utilizing a fine-tuned residual error network, realizes end-to-end learning, combines an attention mechanism, focuses information containing target characteristics, improves the capability of characteristic screening, further improves the target identification capability of the SAR image, and effectively improves the identification performance.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a schematic view of a depth migration model according to the present invention;

FIG. 2 is a structural diagram of a CBAM according to the present invention;

FIG. 3 is a schematic diagram of a channel attention model according to the present invention;

FIG. 4 is a schematic diagram of a spatial attention model according to the present invention;

FIG. 5 is a schematic diagram of the SAR image recognition algorithm structure combining the attention mechanism and the migration model according to the present invention;

fig. 6 is a graph of the recognition rate versus the line of the SAR image under the influence of different degrees of noise.

Detailed Description

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

Referring to fig. 1-6, a preferred embodiment of the present invention, a method for target recognition of SAR images by combining transfer learning and attention mechanism, the method comprises the following steps:

s2, inputting the obtained convolution characteristic diagram into an attention mechanism for focusing to obtain a primary focusing characteristic diagram, wherein the attention mechanism is used for filtering out interest points from a large amount of digital image information, selectively screening out important information and focusing to further improve the performance;

s3, inputting the initial focusing feature map into the fine-tuned residual error network to obtain a weight feature map, wherein the residual error network is formed by utilizing a depth convolution network to complete an end-to-end target identification task and then combining a depth migration learning method, a weight model trained on a known data set by the residual error network is migrated into the SAR image identification work, the SAR image identification model training time is accelerated, and the identification efficiency is improved;

and S5, inputting the focused target identification feature map containing the weight into a subsequent convolutional layer for classification to obtain a target identification result.

Specifically, the adopted migration model is a model obtained by training an ImageNet data set, the attention mechanism method is adopted for channel attention and space attention, and training is carried out by combining the attention mechanism under the condition that an original deep convolution network structure is not changed; the functional block diagram of the ATR algorithm of SAR with a combination of deep migration learning and attention mechanism is shown in fig. 5.

As a preferred embodiment of the present invention, it may also have the following additional technical features:

in this embodiment, the deep migration learning method in S3 is fine tuning of a deep network, and a weight model trained on an ImageNet data set by a residual error network is migrated to an SAR image recognition work.

In this embodiment, the attention mechanism in S2 is a hybrid attention module, and the operation method of the hybrid attention module is a combination of a channel attention mechanism and a spatial attention mechanism.

In this embodiment, the channel attention mechanism is: the channel submodule utilizes the maximum convergent output and the average convergent output of the shared network, and the channel attention mechanism calculation formula is as follows:

wherein, A _c (F) Representing the channel attention mechanism function, F representing the input to the channel attention mechanism,calculated in matrix form, sigma denotes sigmoid function, W ₀ Representing the computation of weights of hidden layers within a multi-layer perceptron, W ₁ Representing the multi-layer perceptron built-in output layer weight calculation,

In this embodiment, W is as defined above ₀ And W ₁ For two input sharing, the ReLU activation function is followed by W ₀ ；W ₀ ∈R ^C/r×C ，W ₁ ∈R ^C ^×C/r ，W ₀ ∈R ^C/r×C Represents W ₀ The method comprises the steps that weight values between C neurons of an input layer and C/R neurons of a hidden layer are input, C represents the channel number of SAR image data, R represents a hyper-parameter of a multilayer perceptron, and R represents picture data; w is a group of ₁ ∈R ^C×C/r Represents W ₁ Is the weight value between the C/r neurons of the hidden layer to the C neurons of the output layer.

In this embodiment, the spatial attention mechanism is: the spatial submodule takes two similar outputs collected along the channel axis and forwards them to the convolutional layer, the formula for the spatial attention mechanism being:

indicating that the image data is maximally pooled along the channel dimension in the spatial attention mechanism.

In the present embodiment, the first and second electrodes are,

indicating that the feature map data dimensions become 1 × H × W after maximum pooling of the input feature map data in the spatial attention mechanism.

In this embodiment, the target recognition result obtained in S5 is used as the target recognition vector M; the target identification vector M is obtained by multiplying the weight A obtained by attention learning by the input H:

M＝AH (3)

a is calculated as follows:

wherein,

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

Fig. 1 shows a deep migration model, and fine-tuning (fine-tuning) of a deep network is the most common deep network migration method, and network fine-tuning adjusts a network according to a task of the network by using a network that is trained by others. The method has the advantages that the network does not need to be trained from the beginning aiming at a new task, so that the time cost is saved; the pre-trained model is usually performed on a large data set, so that training data is expanded, the model has higher robustness, and the generalization capability of the model is effectively improved.

The specific process is embodied in that a model structure and partial parameters of an ATR deep learning model of the trained A target SAR are directly transferred to an ATR deep learning model of the B target SAR, and a small amount of training data of a B target SAR data set is used for fine tuning; the method is equivalent to performing output layer conversion on a pre-trained model, namely converting an original output layer into a new output layer with randomly initialized parameters, and then training the output layer by using a smaller data set on the basis of the model.

FIG. 2 shows a CBAM structure. After an SAR image is input, the CBAM considers two problems of "what" the target is and "where" the target is, and the specific operation method is as follows: the channel sub-module utilizes the maximum convergent output and the average convergent output of the shared network; the spatial submodule takes two similar outputs, which converge along the channel axis, and forwards them to the convolutional layer.

In order to solve the problem that the nonlinear classification cannot be well solved by using a Softmax single-layer full-link mode in the conventional CNN, we adopt a Attention mechanism as a hybrid Attention Module (CBAM), and compared with the original Attention method, the CBAM includes a channel Attention mechanism and a spatial Attention mechanism, and the Attention mechanism can flexibly capture the link between global information and local information. The purpose of the attention mechanism is to enable the model to obtain a target region needing important attention, put more weight into the target region, highlight remarkable useful features and suppress and ignore irrelevant features.

FIG. 3 shows a channel attention mechanism, which is formulated as follows:

wherein A is _c (F) Representing the channel attention mechanism function, F representing the input of the channel attention mechanism, and calculating in a matrix form, sigma representing the sigmoid function, W ₀ Representing the computation of weights of hidden layers within a multi-layer perceptron, W ₁ Representing the multi-layer perceptron built-in output layer weight computation,

For a given SAR image profile input F ∈ R ^H×W×C Where F denotes the input of SAR image data, R ^H×W×C Three dimensional data representing an input feature map are respectively H image height, W image width and C image channel number; simultaneously, through Global Average Pooling (GAP) and Global Maximum Pooling (GMP), operators with different space semantic descriptions are respectively obtained, the operators and the operators are fused through a Multilayer Perceptron (MPL), then, two feature vectors are fused in an addition mode, and finally, through activating a sigmoid function, a channel attention vector A is obtained _c ∈R ^C×1×1 。A _c ∈R ^C×1×1 The three dimensional data after the image is processed by the channel attention mechanism are respectively H image height 1, W image width 1 and C image channel number C.

Shown in fig. 4 is a spatial attention mechanism, where the spatial submodule takes two similar outputs collected along the channel axis and forwards them to the convolutional layer. The formula is as follows:

wherein A is _s (F) Representing a function of a spatial attention mechanism, and F representing an input of the spatial attention mechanism; sigma denotes sigmoid function, f ^7×7 Represents a convolution operation with a convolution kernel size of 7 x 7,

For a given input: f is belonged to R ^H×W×C Simultaneously carrying out Global Average Pooling (GAP) and Global Maximum Pooling (GMP) operations along channel dimensions to respectively obtain two different channel feature description operators, splicing the two channel feature description operators to obtain a two-dimensional feature map, carrying out 7-by-7 convolution operation on the two channel feature description operators and a sigmoid activation function to finally obtain a spatial attention vector A _s ∈R ^1×H×W 。A _s ∈R ^1×H×W The three dimensional data after the image is processed by the channel attention mechanism are respectively H image height, W image width and C image channel number of 1.

Combining the fig. 3 and fig. 4, inputting a feature map extracted from the SAR image through a convolutional layer into a channel attention mechanism shown in fig. 3, correcting an original feature map through the channel attention mechanism to obtain an intermediate feature map, inputting the intermediate feature map into a space attention mechanism module for correction, and finally obtaining a feature map subjected to attention focusing as shown in fig. 2, wherein the CBAM mixed attention mechanism is a serial mixing mechanism which is obtained by firstly passing the feature map through the channel attention mechanism and then passing the space attention mechanism.

Fig. 5 shows a network framework of SAR images combining deep migration learning and attention mechanism, where a target recognition vector M represents a final target recognition result, and M is obtained by multiplying a weight a obtained by attention learning by an input H:

M＝AH (3)

in the above formula, H is the attention layer input, and a is the encoded output of attention;

a is calculated as follows:

wherein,

And finally, an attention mechanism and a transfer learning method are effectively combined, so that the integral algorithm of the SAR ATR is realized.

Fig. 6 shows the SAR image recognition accuracy under 0% to 15% random noise, and the SAR image recognition rate of the invention for the MSTART data set containing 15% random noise is much higher than that of the MSTART data set containing 15% random noise by other methods.

The above additional technical features can be freely combined and used in superposition by those skilled in the art without conflict.

The above description is only a preferred embodiment of the present invention, and the technical solutions that achieve the objects of the present invention by basically the same means are all within the protection scope of the present invention.

Claims

1. A SAR image target identification method combining transfer learning and attention mechanism is characterized in that: the method comprises the following steps:

2. The SAR image target recognition method combining transfer learning and attention mechanism as claimed in claim 1, wherein: the deep transfer learning method in the step S3 is fine tuning of a deep network, and a weight model trained on an ImageNet data set by a residual error network is transferred to SAR image recognition work.

3. The SAR image target recognition method combining transfer learning and attention mechanism as claimed in claim 1, wherein: the attention mechanism in S2 is a CBAM hybrid attention module, and the method of operation of the hybrid attention module is a combination of a channel attention mechanism and a spatial attention mechanism.

4. The SAR image target recognition method combining transfer learning and attention mechanism as claimed in claim 3, wherein: the channel attention mechanism is as follows: the channel submodule utilizes the maximum convergent output and the average convergent output of the shared network, and the channel attention mechanism calculation formula is as follows:

5. The SAR image target recognition method combining transfer learning and attention mechanism as claimed in claim 4, wherein: the W is ₀ And W ₁ For two input sharing, the ReLU activation function is followed by W ₀ ；W ₀ ∈R ^C/r×C ，W ₁ ∈R ^C×C/r ，W ₀ ∈R ^C/r×C Represents W ₀ The method comprises the steps that weight values between C neurons of an input layer and C/R neurons of a hidden layer are input, C represents the channel number of SAR image data, R represents a hyper-parameter of a multilayer perceptron, and R represents picture data; w ₁ ∈R ^C×C/r Represents W ₁ Is the weight value between the C/r neurons of the hidden layer to the C neurons of the output layer.

6. The SAR image target recognition method combining transfer learning and attention mechanism as claimed in claim 3, wherein: the spatial attention mechanism is as follows: the spatial submodule takes two similar outputs collected along the channel axis and forwards them to the convolutional layer, the formula for the spatial attention mechanism being:

7. The SAR image target recognition method combining transfer learning and attention mechanism as claimed in claim 6, wherein:

8. The SAR image target recognition method combining transfer learning and attention mechanism as claimed in claim 1, wherein: using the target recognition vector M to obtain a target recognition result in the S5; the target identification vector M is obtained by multiplying the weight A obtained by attention learning by the input H:

M＝AH (3)

a is calculated as follows:

wherein,