CN118097439A

CN118097439A - AMFFNet and IACGAN dual-network-based cross-task migration SAR target recognition method

Info

Publication number: CN118097439A
Application number: CN202410516581.8A
Authority: CN
Inventors: 杜晓林; 万训杨; 刘珅砚; 唐梦皎; 李洪高
Original assignee: Yantai University
Current assignee: Yantai University
Priority date: 2024-04-28
Filing date: 2024-04-28
Publication date: 2024-05-28

Abstract

The invention discloses a cross-task migration SAR target recognition method based on AMFFNet and IACGAN double networks, and relates to the technical field of radar automatic target recognition; the invention provides YOCT data enhancement method, which is improved according to the application scene of SAR ATR. The proposed YOCT method is first used to enhance training samples; then training IACAGN by means of supervised learning, after which part of the parameters of the discriminator of IACGAN are transferred to AMFFNet; finally, the training samples are used for fine adjustment of AMFFNet, so that AMFFNet has stronger feature extraction and classification capabilities and better recognition effects are achieved, and the network provided by the invention can still obtain better recognition performance under the condition that only a small number of training samples are used, and the attention mechanism is used, so that the network has good robustness to SAR images with noise.

Description

AMFFNet and IACGAN dual-network-based cross-task migration SAR target recognition method

Technical Field

The invention relates to the technical field of radar automatic target recognition, in particular to a cross-task migration SAR target recognition method for generating an countermeasure network (IACGAN) based on a designed attention multi-scale feature fusion network (AMFFNet) and improved auxiliary classification.

Background

Synthetic aperture radar (SYNTHETIC APERTURE RADAR, SAR) automatic target identification (Automatic Target Tecognition, ATR) technology has important value in military and homeland security applications, such as friend or foe identification, battlefield surveillance, disaster relief, and the like. Compared with remote sensing technologies such as visible light, infrared rays and the like, SAR can provide all-weather imaging in various geographic terrains, and is one of main methods for extracting target information.

The application of convolutional neural networks (Convolutional Neural Networks, CNN) in SAR target recognition has led to extensive research into CNN-based SAR target recognition methods and achieved good recognition results. However, the recognition performance of CNNs depends largely on training samples, and if training samples are scarce, the model is prone to overfitting. In order to alleviate the problem of overfitting when the sample is insufficient, an identification model is proposed in the prior art, CNN is utilized for feature extraction, SVM is utilized for classification, and the model has good effects on moving and stationary target acquisition and identification data sets. There are documents that use morphological component analysis (Morphological Component Analysis, MCA) as a pretreatment step for the CNN input layer to improve recognition accuracy.

Data enhancement has proven to be an effective approach to solve the problem of limited labeling samples in SAR images. If the literature is available, methods such as translation, noise addition, gesture synthesis and the like are applied to enhance the data; some documents perform data enhancement by clipping, flipping and proportional batch processing, so that the model converges faster and better; document Han J, Fang P, et al. You only cut once: Boosting data augmentation with a single cut[C]//International Conference on Machine Learning. PMLR, 2022: 8196-8212 proposes a data enhancement method that you need only cut once (You Only Cut Once, YOCO). YOCO image enhancement is achieved by splitting the image into two parts and applying a separate data enhancement operation to each part. And then combining the two parts of the image together to realize the enhancement of the whole image. YOCO has strong expandability, and other data enhancement methods can be combined with it. Thus YOCO can be used to enhance the diversity of each sample and encourage the neural network to identify objects from the partial information.

Transfer learning is also an effective method to solve the problem of lack of labeling samples, and has been widely studied. The large scene image is used to train a convolutional automatic encoder (Auto-Encoder, AE) and then used for SAR target identification. Or a VGG model trained on ImagNet data sets is used and applied to SAR ATR, and then the model is finely tuned by SAR data so as to achieve the aim of optimization. Or the image features are extracted by adopting an antagonism coding network, and the coded features are input into a classifier for classification, so that good recognition accuracy is obtained on the MSTAR data set. However, while the unsupervised generation network can extract data distribution features of the synthetic aperture radar image, the learned class-dependent feature representation is relatively limited.

However, no effective method can still obtain better recognition performance under the condition of a small quantity of training samples at present, so that a data enhancement method with strong expansibility needs to be designed, and an SAR image processing method for effectively eliminating data noise is needed to improve the total recognition rate of an image under a small sample.

Disclosure of Invention

In order to solve the problems, the invention aims to provide a cross-task migration SAR target recognition method based on AMFFNet and IACGAN double networks.

The invention aims to achieve the aim, and the aim is achieved by the following technical scheme:

A cross-task migration SAR target recognition method based on AMFFNet and IACGAN dual networks comprises the following steps:

① Constructing a data set by using a YOCT method, and carrying out data enhancement on a training sample so as to expand the sample and increase sample diversity;

the YOCT method constructs a data set, firstly, a single image is divided into four sub-images by two cutters, then the four sub-images are enhanced to different degrees, and finally, the four sub-images are combined into a more diversified enhanced image;

② Using the training samples to supervise training the improved auxiliary classification generation countermeasure network IACGAN and obtaining the weight parameters of the IACGAN discriminator;

③ Initializing partial parameters of the designed attention multiscale feature fusion network AMFFNet using weight parameters of IACGAN discriminators and fine tuning AMFFNet using training samples;

④ SAR target recognition is performed using the trimmed AMFFNet.

Preferably, step ① uses YOCT method to construct the dataset as YOCT method based on random inversion and color dithering, comprising two steps of vertical and horizontal cuts, in particular:

Firstly, vertically cutting an SAR image into left and right image parts with equal size, respectively carrying out random vertical overturn on the left and right image parts in order to keep a target at a central position, applying different degrees of color dithering to each sub-image, and combining the two sub-images into a complete image after processing;

Then, the previously combined image is cut again in the horizontal direction into two sub-images of equal size up and down, and similarly, in order to keep the target at the center position, the two sub-images are randomly flipped horizontally, respectively, and different degrees of color dithering are applied to each sub-image;

and finally, merging the two sub-images into a complete image again to complete the data enhancement process.

Preferably, step ① uses the YOCT method to construct the dataset as a mixup-based YOCT method, specific:

Firstly, uniformly segmenting an SAR image into four sub-images with the same size, then, randomly selecting one SAR image from a training batch, and respectively mixing the four segmented sub-images with the randomly selected SAR image through mixup, wherein the calculation formula of a mixed sample obtained by the proposed method is as follows:

，

Wherein, ，/>Is a sub-image of four cuts,/>Is a randomly selected SAR image that is selected,Is an interpolation parameter for controlling/>, in the interpolation processAnd/>Is a random number; /(I)Refers to the four obtained/>Average after averaging; the new samples generated and their labels are expressed as:

，

Wherein, Is an operation for image stitching in the height and width dimensions,/>Are respectively/>Mixing the four sub-images with the SAR image selected randomly through mixup to obtainThe final loss value is given by the following formula:

，

Wherein, Is a cross entropy loss function,/>Is the final output of the network.

Preferably, step ② includes the steps of:

inputting the combination of the potential vector and the tag into a generator, outputting a pseudo SAR image with the input class tag, while training the generator, the discriminator keeps its network parameters fixed,

The objective function consists of two parts: loss of true and false discriminationAnd classification loss/>；

Expressed as: /(I)，

Wherein,Representing a given input image/>Probability distribution of time discriminator,/>Refers to a given real image,/>Refers to the image generated by the generator. /(I)The real image data is the expected value, the false image data is the generated image data;

expressed as: /(I) ，

Wherein,Representing a given image/>The probability distribution of the discriminator on the class label of the image, c is the class of the input image, and the loss of the generator is expressed as: /(I)，

The training goal of the generator is to maximizeEven if the generated data is more realistic, and at the same time maximizes the probability that it is correctly classified;

(2b) The input of the discriminator is the output of the generator or training sample, the output of the discriminator is the authenticity and class of the input image, and the generator keeps its network parameters fixed while training the discriminator;

The loss of the discriminator is expressed as: ，

the training goal of the discriminator is to maximize Namely, the classification and distinguishing capability of the discriminator on the real data and the fake data are enhanced as much as possible;

(2c) Optimizing the generator and the discriminator by alternate training, updating the generator and the discriminator in a four to one ratio in order to maintain counter balance, using a back propagation algorithm to optimize their respective loss functions;

(2d) The above process is repeated until Nash equilibrium is reached, and the weight parameters of IACGAN discriminators are saved and obtained.

Preferably, step ③ includes the steps of:

(3a) Migrating partial weight parameters of the pre-trained IACGAN discriminator into AMFFNet, and then training the entire network using the enhanced training samples;

(3b) AMFFNet, in each iteration, updating all parameters of the network through a back propagation algorithm by using a loss function, minimizing loss values and optimizing the network in the process, and comparing the prediction probability of each category with corresponding real labels by using a cross entropy loss function, wherein the calculation formula is as follows:

，

Wherein, Represents the/>True tag of individual samples,/>Represents predicted first/>Probability of individual samples belonging to different classes,/>Representing the number of samples,/>The representation model predicts the probability distribution for each possible class given the input.

Preferably, the network module AMFFNet in the fine tuning of AMFFNet using the training sample set at step ③ is a multi-scale depth separable convolution module, specifically:

First, in the deep convolution section, the proposed multi-scale depth separable convolution MDSC module by using convolution kernels of sizes 3x3, 5x5 and 7x 7; secondly, in the point convolution section, two 1x1 convolutions are used to combine the fused features, wherein the first 1x1 convolution doubles the number of channels and the second 1x1 convolution compresses the number of channels back to the input size to obtain the final feature; finally, the modules are connected in a residual mode;

The MDSC module is expressed as:

，

Wherein, Representing an input feature; /(I)Is a point convolution with a convolution kernel size,/>Is a feature concatenation operation,/>Is a depth convolution with a convolution kernel size of 3, and the remaining parameters are analogized.

Preferably, the module AMFFNet in the fine tuning of AMFFNet using the training sample set in step ③ is a multi-branch fusion module, specifically:

Before each multi-branch fusion MBF module, the input is firstly downsampled, the downsampling rate is controlled by the step length, after downsampling, the input is firstly input through a visual attention VA module and then input to an MDSC module after activating a function through batch normalization and gelu through two visual attention multi-scale enhancement VAME modules in VAME; the output is obtained by establishing residual connection with the original input after droppath is input, one branch is arranged at the output of the first VAME module, the residual connection is established with the output of the first VAME module after the processing of the visual attention module and the droppath, the output characteristics of the two branches are connected together to obtain richer fusion characteristics, and finally, the fusion characteristics are compressed through 1 multiplied by 1 convolution to obtain the final output of the MBF module;

The MBF module is expressed as:

，

Wherein Droppath refers to setting the whole path to 0 with probability p during training, mdsc represents a multi-scale depth separable convolution module, VA is a visual attention module, VAME is visual attention multi-scale enhancement, GELU is an activation function, BN is batch normalization, conv1x1 represents convolution with convolution kernel size 1.

Preferably, the specific steps of step ④ for SAR target identification using the trimmed AMFFNet are:

Firstly, adopting data in an MSTAR data set at a view angle of 15 degrees as a testing set, and testing the SAR target recognition performance of AMFFNet pieces of fine-tuned SAR, wherein the total number of images of 2425 pieces of SAR; secondly, inputting the testing set into a fine-tuned AMFFNet network, and outputting the obtained result as the identification result of each testing image; and finally, summarizing all the obtained test results to obtain the total recognition rate of the method under the test scene.

Compared with the prior art, the invention has the following advantages:

The present invention proposes an improved data enhancement method called you only need to cut twice (You Only Cut Twice, YOCT) and applies it to the SAR target identification field. The method has strong expansibility, and can effectively improve the enhanced diversity of each sample; the present invention proposes a Multi-scale depth separable convolution (Multi-SCALE DEPTHWISE Separable Convolution, MDSC) module and applies it to the designed AMFFNet. The module can extract multi-scale features and extract rich target features with fewer parameters and higher efficiency by using a depth separable convolution structure; based on Visual Attention (VA), dual Attention (DA) and MDSC modules, AMFFNet for SAR target recognition is designed and constructed, and the network has good feature expression and discrimination capability, so that the network has a good effect in image classification tasks; according to the invention, IACGAN is used as a pre-training task, the network can extract rich features under the supervision condition, and the extracted features can improve the recognition task, so that AMFFNet is helped to obtain better recognition performance; experiments on MSTAR datasets validated the invention for effectiveness in SAR target identification.

The invention relates to a cross-task migration SAR target recognition method based on AMFFNet and IACGAN double networks, which can still obtain better recognition performance under the condition of only a small quantity of training samples; the data enhancement method designed by the invention has strong expansibility, and can effectively enhance samples and improve the diversity of the samples; the network provided by the invention uses a attention mechanism, so that the network has good robustness to SAR images with noise.

Drawings

FIG. 1 is a general frame diagram of the present invention;

FIG. 2 is a block diagram of a method of YOCT of design; wherein (a) is a flow chart of YOCT based on random flipping and color dithering and (b) is a flow chart of YOCT based on mixup;

FIG. 3 is a block diagram of the modules of AMFFNet in accordance with the present invention; wherein (a) is a multi-scale depth separable convolution module MDSC, (b) is a visual attention module VA, (c) is a dual attention module DA, and (d) is a multi-branch fusion module MBF;

FIG. 4 is an identification accuracy curve for five data enhancement strategies;

FIG. 5 is an enlarged view of a portion of FIG. 4;

FIG. 6 is a graph of recognition accuracy for all methods in an SOC scenario;

FIG. 7 is a confusion matrix for the proposed method when the training sample ratio is 100%;

FIG. 8 is a confusion matrix for the proposed method when the training sample ratio is 70%;

FIG. 9 is a confusion matrix for the proposed method with a training sample ratio of 50%;

FIG. 10 is a confusion matrix for the proposed method with a training sample ratio of 30%;

FIG. 11 is a confusion matrix for the proposed method with a training sample ratio of 10%;

FIG. 12 is a SAR image of different SNR, where (a) is the SAR image when the SNR is 70 dB; wherein (b) is an SAR image at an SNR of 40 dB; wherein (c) is an SAR image at an SNR of 30 dB; wherein (d) is an SAR image at an SNR of 20 dB; wherein (e) is an SAR image at an SNR of 10 dB;

FIG. 13 is a graph comparing recognition accuracy curves of all methods under SOC scenarios under different SNR conditions;

Fig. 14 is an enlarged view of a portion of fig. 13.

Detailed Description

The invention aims to provide a target recognition method for cross-task migration SAR based on AMFFNet and IACGAN double networks, which is realized by the following technical scheme:

Step 1, carrying out data enhancement on training samples by using a YOCT method so as to expand the samples and increase sample diversity;

Step 2, performing supervised training on IACGAN by using a training sample, and obtaining weight parameters of a IACGAN discriminator;

(1) The training sample adopts data under the angle of view of 17 degrees in the MSTAR data set, and 2747 SAR images are taken in total. Prior to the experiment, the central 96×96 region of the SAR image of each target was cropped.

(2) After training IACGAN, the weight parameters of the IACGAN discriminator are saved, as shown in fig. 1, and the IACGAN discriminator includes three parts: AMFFBlock module modified by AMFFNet, classification head composed of full connection layer and softmax function, and true and false discrimination head composed of full connection layer and sigmoid function.

Step 3, initializing partial parameters of AMFFNet by using weight parameters of IACGAN discriminator, and fine-tuning AMFFNet by using training samples;

(1) In the obtained weight parameters of IACGAN discriminator, only the weight of AMFFBlock module is used, and the weights of the classification head and the true and false discrimination head are discarded;

(2) Initializing the weights of the corresponding portions in AMFFNet using the weight parameters of AMFFBlock module of IACGAN discriminator;

(3) Training the weight-migrated AMFFNet network using the training samples as described above; a smaller learning rate is used in the training process, and the cosine annealing strategy is utilized to gradually reduce the learning rate so as to achieve the effect of fine tuning AMFFNet; the weight migrated AMFFNet inherits some of the classification and generalization capabilities of the IACGAN discriminator, which is further enhanced after AMFFNet is trimmed, making the network more identification capable.

And 4, performing SAR target recognition by utilizing the finely tuned AMFFNet.

Testing the SAR target recognition performance of the fine-tuned AMFFNet SAR images by adopting data in an MSTAR dataset at a view angle of 15 degrees; secondly, inputting the testing set into a fine-tuned AMFFNet network, and outputting the obtained result as the identification result of each testing image; and finally, summarizing all the obtained test results to obtain the total recognition rate of the method under the test scene.

Step 1, carrying out data enhancement on training samples by using a YOCT method so as to expand the samples and increase sample diversity; as shown in fig. 2, two YOCT strategies are included: YOCT based on random flipping and color dithering, and YOCT based on mixup. In TOCT frames, a single image is cut twice to obtain four parts, and then data enhancement of different degrees is carried out on the cut parts so as to improve the diversity of single samples after enhancement;

specific:

1.1 YOCT based on random inversion and color dithering

As shown in fig. 2 (a), the method includes two steps of vertical cutting and horizontal cutting.

Then, the previously combined image is cut into two sub-images with the same size in the horizontal direction again, in order to keep the target at the center position, the two sub-images are respectively and randomly horizontally turned over, different degrees of color dithering are applied to each sub-image, and after the processing, the two sub-images are combined into a complete image, so that the data enhancement process is completed.

After enhancement by the method, the SAR image can form four areas with different enhancement degrees, and compared with the YOCO method, the method can more uniformly improve the diversity of each sample after enhancement and reduce the risk of model overfitting.

1.2 Mixup-based YOCT

As shown in fig. 2 (b), firstly, uniformly segmenting a SAR image into four sub-images with equal size, then randomly selecting one SAR image from a training batch, and respectively mixing the four segmented sub-images with the randomly selected SAR image through mixup, wherein the calculation formula of a mixed sample obtained by the method provided by the invention is as follows:

，

Wherein, Is an operation for image stitching in the height and width dimensions,/>Are respectively/>Mixing the four sub-images with the randomly selected SAR image through mixup to obtain/>。

The final loss value is given by the following formula:

，

Wherein, Is a cross entropy loss function,/>Is the final output of the network. The method can lead four sub-images of the SAR image to be respectively matched with/>To some extent, mixup are performed and finally they are spliced together, simulating the occlusion effect. Different areas of the image are shielded to different degrees, the spliced image forms richer features, the diversity of data is enhanced, and the over-fitting problem of the model is effectively relieved.

Step 2, performing supervised training on IACGAN by using a training sample, and obtaining weight parameters of the IACGAN discriminator, wherein the specific process is as follows:

Unlike conventional unsupervised GAN, tag information is added to the generator and discriminator at IACGAN. By mixing random noise with tag information, the tag information is introduced into a generator so that it can generate a realistic image with a particular class of tags. The introduction of label information in the discriminator enables it to distinguish not only between authentic and counterfeit input images, but also to identify their corresponding categories. During the training process of IACGAN, the generator and discriminator are optimized by alternating training. The discriminator keeps its network parameters fixed while training the generator, and vice versa (while training the discriminator, the generator keeps its network parameters fixed).

To maintain the counter balance, the generator and discriminator are shown at 4:1 is updated. The method comprises the steps of optimizing respective loss coefficients by using a back propagation algorithm, propagating gradients of a loss function from an output layer to an input layer by using the back propagation algorithm, calculating the gradients of each weight layer by layer, and modifying parameters of a network by using the gradients, wherein the process is repeated until a model reaches a good balance state, namely a generator can generate a relatively real generated image, and a discriminator can effectively discriminate the true and false of the image and classify the image, so that Nash equilibrium is achieved between the two images. Finally, all parameters of the discriminator are saved.

Original ACGAN included a generator and a discriminator, which included a true-false discriminator and an auxiliary classifier. The auxiliary classifier learns along with the true-false discriminator during training so that the discriminator can extract a feature representation associated with the class. These feature representations are used in post-migration classification tasks and can provide a richer, more meaningful expression of features that help to improve classification performance.

As shown in fig. 1, IACGAN modifies the feature extraction portion of the discriminator and improves the structure of AMFFNet. In IACGAN, the main module of the discriminator is derived from the AMFFBlock module of AMFFNet; wherein the true-false discriminator is a fully connected layer and sigmoid function added after the module, and the auxiliary classifier is a fully connected layer and softmax function added after the module. The generator employs a stacked structure of convolutions, batch normalization, gelu activation functions, and upsampling. The final 3x3 convolution is used to perform feature dimension reduction on the image, and the Tanh activation function is used to normalize the output generated image.

IACGAN techniques are embodied in the challenge process of the discriminator and generator, specifically:

Firstly, inputting a combination of a potential vector and a label into a generator, and outputting a pseudo SAR image with an input category label;

Secondly, the input of the discriminator is the output of the generator or training samples;

finally, the output of the discriminator is the authenticity and class of the input image. For the discriminator, it is desirable that its classification is correct and the authenticity of the image is correctly judged.

It is also desirable for the generator to sort correctly, but it is desirable that the authenticator cannot distinguish between counterfeit data correctly, and the challenge between them can enhance each competitor's ability until Nash equilibrium is reached. Thus, the objective function consists of two parts: loss of true and false discriminationAnd classification loss/>,

Expressed as: /(I)，

expressed as: /(I) ，

Wherein,Representing a given image/>The probability distribution of the identifier over the class labels of the image c is the class of the input image.

The losses of the generator and discriminator are expressed as:

，

Wherein, Is the loss of the generator,/>Is the loss of discriminator.

The training goal of the generator is to maximizeEven though the data generated is more realistic and at the same time maximizes the probability that it is correctly classified. The training goal of the discriminator is to maximize/>I.e. to enhance as much as possible the classification and discrimination capabilities of the discriminator for genuine and counterfeit data.

Step 3, initializing partial parameters of AMFFNet by using weight parameters of IACGAN discriminator, and fine-tuning AMFFNet by using training samples; the specific process is as follows:

During the training process AMFFNet, the AMFFBlock module partial weight parameters of the pre-trained IACGAN discriminator are migrated into the corresponding modules of AMFFNet, and then the entire network is trained using the enhanced training samples. Because the generation of the challenge task and the classification task differ in goals and methods, the weight parameters of the transferred portion after the cross-task transfer are not frozen during the network training. In each iteration, all parameters of the network are updated by a back-propagation algorithm using the loss function. This process aims at minimizing the loss value and optimizing the network. Since in the pre-training of IACGAN, the evaluator has learned the category information associated with the categorization. After migrating the parameters of the part IACGAN discriminator, AMFFNet already has some capability in the classification task. Thus, using a small learning rate can help the network model converge and obtain an optimal solution. In addition, the cosine annealing strategy is adopted to gradually reduce the learning rate.

3.1AMFFNet

Traditionally, CNN structures alternately stack the rolled and pooled layers in multiple layers and generate the final output through fully connected layers. Between each layer, an activation function is typically added to introduce nonlinearities. Through the back propagation algorithm, the CNN can automatically learn the feature representation and model parameters from the training data. However, unlike conventional CNN architectures, the proposed AMFFNet feature extraction module employs less parametric, more efficient split convolutions and is improved in the MDSC module. In addition, convolution replaces the pooling operation to preserve more information and reduce information loss. Two attention modules, VAM and DAM, were introduced to enhance the ability of the network to capture important features from images. The network integrally adopts a multi-level feature fusion structure, and features of different levels are combined, so that the network has richer characterization capability. And finally, classifying the output by adopting a cross entropy loss function.

3.2 Multi-scale depth separable convolution module

Depth separable convolution is a modified algorithm proposed based on standard convolution. It reduces the complexity and computational cost of the model by decomposing the convolution operation, while maintaining or improving the performance of the model, compared to standard convolution. Specifically, the depth separable convolution consists of two main parts: (1) The depth convolution performs a channel-level convolution operation on each channel in the input, and each input channel has a corresponding convolution kernel to perform the convolution operation, thereby preserving the original number of input channels. (2) The point convolution uses a convolution kernel of 1x1 to combine the output channels of the previous step, and can be viewed as modeling a linear combination between channels. The point convolution can be used to adjust the number of channels without changing the spatial structure of the channels.

The invention provides a Multi-scale depth separable convolution module (Multi-SCALE DEPTHWISE Separable Convolution, MDSC) module, which improves the original depth separable convolution. First, in the deep convolution section, the depth separable convolution uses only one size convolution kernel, limiting the spatial information extracted. By using convolution kernels of 3x3, 5x5 and 7x7 sizes, the proposed MDSC module can extract spatial information at multiple scales and fuse the extracted information to obtain a richer feature representation. Second, in the point convolution section, two 1x1 convolutions are used to combine the fused features, with the first 1x1 convolution doubling the number of channels and the second 1x1 convolution compressing the number of channels back to the input size to obtain the final feature. Finally, the modules are connected by means of a residual. Referring to fig. 3 (a), the MDSC module may be expressed as:

，

In this module, depth convolutions with convolution kernels of different sizes are used to extract multi-scale features, providing more spatial information to the model. These features are then processed by two point convolutions, linearly transforming the features of each channel, and interactively fusing between channels. Thereby further enhancing the expressive power of the model and facilitating extraction of a richer representation of the features. Finally, by introducing a residual structure, the expression capability and learning capability of the network are enhanced.

3.3 Visual attention Module

As shown in fig. 3 (b), in Visual Attention (VA) module, two 1x1 convolutions are used to encapsulate large core Attention (LARGE KERNEL Attention, LKA) respectively, and gelu activation functions are added only after the first 1x1 convolution. By combining the features of convolution and self-attention, LKA can extract specific global features. In addition, LKA achieves not only spatial adaptability but also channel adaptability. By introducing a visual attention module, the ability of the network to obtain global information is enhanced. The device captures attention not only across the space dimension, but also effectively across the channel dimension, improves the capturing capacity of important features, and simultaneously reduces the influence of noise.

3.4 Dual attention module

A Dual Attention (DA) module is shown in fig. 3 (c) and contains two types of Attention blocks: (1) The spatial attention module selectively aggregates features at each location by weighting and summing the features at each location. (2) The channel attention module emphasizes interdependent channel features by integrating the relevant features on all channels. The outputs of the two attention modules are finally combined to enhance the representational capacity of the feature.

In the invention, the DA is applied to the SAR target recognition task, with some necessary modifications to it to accommodate the classification task. First, relu activation functions are replaced with gelu. After the dual attention fusion, the output channels are compressed half by the final 1x1 convolution to obtain more classification features. The modified dual-attention module is used for processing the multi-level fusion features, so that the model can better understand the relationship between different levels of fusion features. The module selectively emphasizes or suppresses the contribution of different features, and enhances the feature discrimination capability of the model. This improvement further eliminates noise interference and improves the performance and stability of the model.

3.5 Multi-branch fusion Module

Before each Multi-Branch Fusion (MBF) module, the input is first downsampled, the downsampling rate being controlled by the step size. After downsampling, it passes through two visual attention multiscale enhancement (Visual Attention Multi-SCALE ENHANCED, VAME) modules. In VAME, the input is first passed through the VA module, then the batch normalization and gelu activation functions are then input to the MDSC module. The output is obtained by establishing a residual connection with the original input after the input passes droppath. The output of the first VAME module has a branch, which is processed by the visual attention module and droppath to establish a residual connection with the output of the first VAME module. The output features of the two branches are connected together, resulting in a richer fusion feature. Finally, the fusion features are compressed by 1×1 convolution to obtain the final output of the MBF module. As shown in fig. 3 (d), the MBF module can be expressed as:

，

One MBF module is formed by stacking two VAME layers, and the output features of the two VAME are fused. This structure of MBF can keep the network more useful information during forward propagation, while residual connection and droppath can effectively enhance the fitting and generalization ability of the model, alleviate the gradient vanishing problem, and reduce the risk of overfitting.

3.6 Loss function

In training and optimizing the network model, cross entropy loss functions are used to compare the predicted probabilities for each category to the corresponding real labels. The calculation formula is as follows:

，

The classification performance of the model can be effectively measured by the loss function, and the model parameters can be optimized by minimizing the loss value.

Step 4, SAR target recognition is carried out by utilizing the finely-adjusted AMFFNet; the specific process is as follows:

Firstly, acquiring a test sample, and using data in an MSTAR data set at a view angle of 15 degrees as a test set to test the SAR target recognition performance of AMFFNet fine-tuned SAR, wherein the total number of SAR images is 2425; secondly, inputting each sample in the test set into a fine-tuned AMFFNet network one by one to obtain the output of the identification result of each test image; and finally, summarizing all the obtained test results to obtain the total recognition rate of the method under the test scene.

The effect of the invention is further illustrated by the following simulation comparative tests:

1. simulation conditions

1.1 Data set Source

The simulation experiment adopts an MSTAR data set to carry out network training and identification test, all SAR images in the MSTAR data set are acquired by 10 GHz X-band SAR sensors in the Sandinia national laboratory, the MSTAR data set published in a publication with the resolution of 0.3m×0.3m[Keydel E R, Lee S W, Moore J T. MSTAR extended operating conditions: A tutorial[J]. Algorithms for Synthetic Aperture Radar Imagery III, 1996, 2757: 228-242]. comprises ten ground target categories, and the data mainly comprises SAR slice images of stationary vehicles captured from various view angles. Prior to the experiment, the central 96×96 region of the SAR image of each target was cropped. Table 1 lists the number of samples per target type under standard operating conditions (Standard Operating Conditions, SOCs) scenarios. The training set is data under a 17-degree visual angle, and the test set is data under a 15-degree visual angle.

1.2 Experimental setup

All experiments used Adam [Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.] optimizers with an initial learning rate of 0.0001 and a cosine annealing strategy. The cosine annealing termination learning rate was set to 0.00001, the batch size of training samples was 32, and the total training round number was 200. In the pre-training stage IACGAN, the learning rate is set to 0.0002 and the model is trained for 1000 rounds. The experiment was performed on a server of a Windows 10 system equipped with two RTX a4000 graphics cards and 64GB memory. The proposed method is implemented using an open source library PyTorch [Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[J]. Advances in neural information processing systems, 2019, 32.]. The model parameters of the proposed network are contained in table 2.

TABLE 1 data information of objects in SOC scenarios

Table 2 network parameters of the model

2 Simulation content

2.1 YOCT comparative experiment

In this experiment we will compare the recognition performance under different data enhancement methods and make recognition classification based on AMFFNet networks. We will compare the following five strategies: a combination of the two YOCT methods, YOCT based on random flipping and color dithering, YOCT based on mixup, random flipping and color dithering, and mixup. For each method we performed five independent experiments and determined the final recognition accuracy by the average of their best accuracy.

As can be seen from table 3, the YOCT method based on random inversion and color dithering combined with the YOCT method based on mixup can achieve the highest recognition accuracy up to 99.30%. At the same time, 99.16% and 99.06% recognition accuracy can be achieved using only YOCT method based on random inversion and color dithering and YOCT method based on mixup, respectively. The YOCT method based on random inversion and color dithering improves recognition accuracy by about 0.15% compared to the method using only random inversion and color dithering. Furthermore, the YOCT method based on mixup shows a more significant improvement compared to mixup method, increasing the recognition accuracy by about 0.26%. This is because adding a certain data enhancement method to the YOCT framework enables four different enhancement areas to be formed for a single image, as compared to conventional data enhancement methods. This enables the framework to further expand the effect of the data enhancement method, thereby further increasing the diversity of individual samples and effectively alleviating the problem of insufficient label samples. It is observed that incorporating the different data enhancement methods into the YOCT framework has a positive impact on improving recognition accuracy.

TABLE 3 identification accuracy on MSTAR datasets using different data enhancement strategies

For a more direct comparison of experimental results, we plotted AMFFNet recognition accuracy curves under five data enhancement strategies in fig. 4-5. It can be seen from fig. 4-5 that under each data enhancement strategy, the proposed AMFFNet can achieve recognition accuracy of over 90% before 25 epochs. This shows that the network has a faster fitting speed and can achieve better recognition in fewer training cycles. In addition, the accuracy curves under the five methods show a similar trend, are relatively stable after 50 cycles, except that the curves of YOCT methods based on random inversion and color dithering and methods using only random inversion and color dithering have some large fluctuations after 50 cycles, and the three YOCT enhancement strategies have higher recognition accuracy than other methods. This also shows that the data enhancement method based on YOCT framework can effectively improve the enhancement effect of the data enhancement method and improve the recognition accuracy.

2.2 Experimental results at different training sample ratios

In this experiment, the cross-task migration method based on AMFFNet and IACGAN dual networks was compared with advanced target recognition methods employed in other documents, including RepVGG method [Ding X, Zhang X, Ma N, et al. Repvgg: Making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13733-13742.]、ResNet18 method and ResNet method [He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.]、MobileNetV2 method [Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.]、DenseNet method [Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.]、MobileViT method [Mehta S, Rastegari M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[J]. arXiv preprint arXiv:2110.02178, 2021.]、SFAS method [Wang C, Liu X, Huang Y, et al. Semi-Supervised SAR ATR Framework with Transductive Auxiliary Segmentation[J]. Remote Sensing, 2022, 14(18): 4547.] and ASIR-Net method [Yu F, Zhou F, Zhou F. A Lightweight Fully Convolutional Neural Network for SAR Automatic Target Recognition. Remote Sens. 2021, 13, 3029[J]. 2021.]. using ten classes of vehicle classification datasets for SOC scenarios in this experiment. To make the comparison experiment more fair and objective, YOCT data enhancement was performed on the training set. For all methods at each training sample ratio, five independent experiments were performed, with a random selection of training data at a certain ratio for each experiment. The final recognition accuracy of each method at a specific sample ratio was determined by taking the average of the best accuracy obtained in five experiments.

As shown in table 4, the proposed method can achieve the highest recognition accuracy up to 99.81% under the condition of 100% sample proportion. The AMFFNet method can also achieve 99.30% recognition accuracy in recognition tasks. Clearly, training with all samples, the proposed method can improve recognition accuracy by about 0.5%. This is because the weights of the pre-trained IACGAN discriminators can provide better generalization ability and a priori knowledge to AMFFNet, thereby improving the performance of the model. Among other methods, denseNet method also achieves high recognition accuracy of 99.65%, whereas MobileNetV2 method can only achieve 99.11%. In addition, under the condition that the sample proportion is only 10%, the identification accuracy of 90.04% is still realized by the method, and 89.87% can be achieved by the AMFFNet method, which is superior to other methods. The recognition accuracies of DenseNet method, resNet method, and ResNet method were 88.61%, 87.32%, and 86.21%, respectively, whereas the recognition accuracies of MobileNetV method, repVGG method, and ASIR-Net method were only 71.84%, 71.24%, and 58.23%, respectively. This shows that the proposed method not only achieves excellent recognition performance in a complete sample, but also enables good performance in a small sample condition.

Table 4 identification accuracy of all methods in SOC scenarios

In fig. 6, the recognition accuracy curves for all methods at various sample ratios are shown. The proposed method achieves the best results at different sample ratios. As the sample ratio increases from 30% to 100%, the recognition accuracy of all methods gradually increases with increasing sample rate, with increasing sample rate from 10% to 30% being most pronounced. Compared with other identification methods, the identification accuracy of the method and AMFFNet provided by the invention is higher than that of other methods under the condition that the sample proportion is only 10%. In the case of insufficient labeled samples, the ASIR-Net accuracy curve is low. Furthermore, although MobileViT and RepVGG exhibited satisfactory recognition performance at other sample ratios, their accuracy was significantly reduced when only 10% of the samples were available. The advantage of the proposed method in handling small numbers of marked samples can also be verified from fig. 6.

In order to explain the recognition effect of the proposed method under different training sample ratios in more detail, confusion matrixes of the proposed method under different training sample ratios are given, wherein the abscissa is a prediction result of the model, and the ordinate is a true class label of 10 classes of targets in the MSTAR data set. Figures 7-11 show the confusion matrix at a sample scale of 100% -10%. Analysis of fig. 7 shows that in the case of 100% samples, the proposed method only has a small number of recognition errors on BTR60 and ZSU234, misrecognition of BTR60 as BRDM2 or D7, misrecognition of ZSU234 as 2S1, and recognition accuracy of the remaining targets reaches 100%. Furthermore, when the sample ratio is 10% as shown in fig. 11, the proposed method only makes a large number of recognition errors when recognizing the first five targets in the map. Among these, BMP2 is the most confusing target, often mistaken for BTR70 and T72, followed by 2S1, and most likely mistaken for BTR60. While at this sample scale the last five targets have only a small number of recognition errors.

2.3 Adding noise of different powers

The Signal-to-Noise Ratio (SNR) is defined as the Ratio of the power PS of the SAR image to the interference Noise power PN, snr=10 lg (PS/PN). Different intensities of gaussian white noise are added to the SAR image by controlling the SNR to adjust the intensity of the added noise. In this experiment, 70, 40, 30, 20, 10dB of noise was added to the SAR image, as shown in (a) - (e) of fig. 12, respectively. Five experiments were performed for each SNR condition and each method, and the average of the highest recognition accuracy was taken as the final result.

As shown in fig. 13-14, variations in recognition accuracy under different SNR conditions are demonstrated. Obviously, as the SNR decreases, the recognition accuracy of each method also decreases accordingly. The recognition accuracy improvement of each method is most pronounced when the SNR increases from 10dB to 20 dB. This is mainly because when the SNR is 10dB, noise interference is strong, and a portion of the target is submerged by noise, resulting in a significant reduction in recognition accuracy. After that, as noise is reduced, the recognition accuracy gradually increases. The proposed method shows a strong anti-interference capability, followed by MobileViT. At an SNR of 10dB, the AMFFNet method has identification accuracy only slightly lower than the first two methods. The proposed method and AMFFNet method of the present invention still achieve good recognition accuracy when dealing with low SNR data, due to the two attention modules introduced in the network, namely VA and DA. The two attention modules can play a role in noise suppression in network training, so that the robustness of the model to noise is improved. However, although DenseNet has a better recognition capability under other noise intensities, its recognition accuracy drops sharply at an SNR of 10 dB. The above experimental results also confirm the effectiveness of the proposed method in processing noise data.

2.4 Recognition manifestation in EOC scenarios

The presence of significant pitch angle differences between the SAR image training samples and the test samples may lead to recognition failures. The invention therefore evaluates the identification performance of the proposed method under extended operating conditions (Extended Operating Conditions, EOC) to verify its robustness against these effects.

Target data information of Table 5 EOC

The data set in the EOC scenario is shown in table 5. The experimental results at a larger pitch angle (EOC-D) were first evaluated. The training set contains samples of four targets in table 1 at a 17 ° pitch angle, while four target samples at a 30 ° pitch angle were used as the test set. The confusion matrix in this EOC-D scenario is shown in table 6. Analysis of table 6 shows that the proposed method achieves good results in the identification of each target. Only if some errors occur in identifying 2S1 and ZSU234, 2S1 is erroneously identified as BRDM2 and ZSU234 is erroneously identified as 2S1. The overall recognition accuracy of the method can reach 99.56%. Since this is an identification scenario at a large pitch angle, this also demonstrates the effectiveness of the proposed method in such scenario.

Confusion matrix of Table 6 EOC-D

In another EOC test scenario, the proposed method was evaluated for its suitability for target configuration (EOC-C) and version variants (EOC-V). The training set for this experiment included four targets listed in table 1. The test sets in Table 5 are EOC-C and EOC-V, containing two variants of BMP2 and ten variants of T72. The confusion matrices for EOC-C and EOC-V are shown in tables 7 and 8, respectively. In the EOC-C scenario, the proposed method achieves 99.52% accuracy of identification for variants of T72 and 100% accuracy for variants of a32 and a63, only erroneously identified as BMP2 and BRDM2 when identifying S7 and a64 variants. In the EOC-V scenario, the proposed method achieves a 98.12% recognition accuracy, where variants 9566 and C21 of BMP2 are easily erroneously recognized as T72 and some are erroneously recognized as BTR70, thereby reducing recognition accuracy. Among the variants of T72, variant 812 was easily misrecognized as BMP2 and BTR70, while variants a04 and a07 were easily misrecognized as BMP2. Overall, the generalization ability of the network is improved due to the migration learning performed by the network. This allows the proposed method to achieve good recognition results for variants of BMP2 and T72 in both EOC-C and EOC-V scenarios.

Confusion matrix of Table 7 EOC-C

Confusion matrix of Table 8 EOC-V

/>

Claims

1. A cross-task migration SAR target recognition method based on AMFFNet and IACGAN double networks is characterized by comprising the following steps of: the method comprises the following steps:

④ SAR target recognition is performed using the trimmed AMFFNet.

2. The method for identifying a cross-task migration SAR target based on AMFFNet and IACGAN dual networks according to claim 1, wherein: step ① using the YOCT method to construct the dataset is a YOCT method based on random inversion and color dithering, comprising two steps, vertical and horizontal, specific:

3. The method for identifying a cross-task migration SAR target based on AMFFNet and IACGAN dual networks according to claim 1, wherein: step ① constructs the dataset using YOCT method as mixup-based YOCT method, specific:

，

Wherein, ，/>Is a sub-image of four cuts,/>Is a randomly selected SAR image,/>Is an interpolation parameter for controlling/>, in the interpolation processAnd/>Is a random number; /(I)Refers to the four obtained/>Average after averaging; the new samples generated and their labels are expressed as:

，

Wherein, Is an operation for image stitching in the height and width dimensions,/>Respectively areMixing the four sub-images with the randomly selected SAR image through mixup to obtain/>The final loss value is given by the following formula:

，

Wherein, Is a cross entropy loss function,/>Is the final output of the network.

4. The method for identifying a cross-task migration SAR target based on AMFFNet and IACGAN dual networks according to claim 1, wherein: step ② includes the steps of:

inputting the combination of the potential vector and the label into a generator, outputting a pseudo SAR image with the input class label, and keeping the network parameters of the pseudo SAR image fixed by a discriminator when the generator is trained, wherein an objective function consists of two parts: loss of true and false discrimination And classification loss；

Expressed as:

，

Wherein, Representing a given input image/>Probability distribution of time discriminator,/>Refers to the image of a given real image,Refers to the image generated by the generator; /(I)Refers to the expected value;

Expressed as:

，

Wherein, Representing a given image/>The probability distribution of the discriminator on the class label of the image, c is the class of the input image, and the loss of the generator is expressed as: /(I)，

The loss of the discriminator is expressed as:

，

5. The method for identifying a cross-task migration SAR target based on AMFFNet and IACGAN dual networks according to claim 1, wherein: step ③ includes the steps of:

，

6. The method for identifying a cross-task migration SAR target based on AMFFNet and IACGAN dual networks according to claim 1, wherein: the module AMFFNet in the fine tuning of AMFFNet using the training sample set in step ③ is a multi-scale depth separable convolution module, specifically:

The MDSC module is expressed as:

，

Wherein, Representing an input feature; /(I)Is a point convolution with a convolution kernel size,/>Is a feature concatenation operation,/>Is a depth convolution with a convolution kernel size of 3.

7. The method for identifying a cross-task migration SAR target based on AMFFNet and IACGAN dual networks according to claim 1, wherein: the module AMFFNet in the fine tuning of AMFFNet using the training sample set in step ③ is a multi-branch fusion module, specifically:

The MBF module is expressed as:

，

8. The method for identifying a cross-task migration SAR target based on AMFFNet and IACGAN dual networks according to claim 1, wherein: the specific steps of step ④ for SAR target recognition using the trimmed AMFFNet are: