CN112926457B

CN112926457B - SAR image recognition method based on fusion frequency domain and space domain network model

Info

Publication number: CN112926457B
Application number: CN202110220080.1A
Authority: CN
Inventors: 李雪松; 李晓冬; 杜记川; 罗子娟; 吴蔚; 杨东
Original assignee: CETC 28 Research Institute
Current assignee: CETC 28 Research Institute
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2022-09-06
Anticipated expiration: 2041-02-26
Also published as: CN112926457A

Abstract

The invention belongs to the technical field of image recognition, and discloses an SAR image recognition method based on a fusion frequency domain and space domain network model, which comprises the following steps: converting the original spatial domain image into a frequency domain image; carrying out channel selection on the frequency domain image to obtain an effective frequency domain signal; inputting the effective frequency domain signal into a frequency domain backbone network to extract frequency domain characteristics; inputting an original spatial domain image into a spatial domain backbone network, and extracting spatial domain characteristics; fusing the spatial domain characteristics and the frequency domain characteristics through a network model; and inputting the fused features into a classifier to realize the identification and classification of the target in the SAR image. The invention designs an end-to-end network model fusing the frequency domain and the spatial domain, not only considers the pixel characteristics of the spatial domain of the SAR image, but also extracts the frequency domain characteristics of the SAR image aiming at the imaging characteristics that the SAR is different from visible light, and the effectiveness and the robustness of the SAR image recognition model can be further improved by fusing the spatial domain characteristics and the frequency domain characteristics.

Description

SAR image recognition method based on fusion frequency domain and space domain network model

Technical Field

The invention belongs to the technical field of computer image recognition, and particularly relates to an SAR image recognition method based on a fusion frequency domain and space domain network model.

Background

Synthetic Aperture Radar (SAR) image recognition distinguishes specific targets by using characteristic information of the targets, and interpretation and analysis of SAR images are realized.

SAR image target identification is widely applied to the military field, resource exploration, environmental monitoring and other aspects. Compared with visible light images and infrared images, the SAR images have the characteristics of all weather, strong penetrability, richer image information and the like. On the one hand, since the SAR image is imaged by microwave reflection of the target; SAR images, on the other hand, typically contain a large amount of noise interference and geometric deformation. Therefore, target recognition of SAR images is very challenging.

SAR image recognition attracts a large number of scholars at home and abroad to research. The traditional framework of SAR image recognition comprises the following steps: (1) an image preprocessing module: speckle noise usually exists in the SAR image, the noise interference influences the performance of image identification, and the function of the image preprocessing module is to suppress the noise; (2) a feature extraction module: the extraction and selection of the characteristics play a critical role in the performance of target identification, and the SAR target characteristics mainly comprise geometric characteristics, scattering characteristics and transformation characteristics; (3) a classification identification module: and mapping the extracted features to a feature space through a classifier to realize target classification and identification. With the deep study and the wide application of the deep learning method in the field of computer vision, a plurality of deep neural network models are migrated and applied to SAR image recognition, and the effect better than that of the traditional method is achieved. This is mainly because the conventional methods rely on a manually designed feature extractor, require a professional knowledge background and a complex parameter adjustment process, and each method is only specific to a specific application and a fixed scene, and the generalization performance and robustness of the model are poor. Deep learning is to perform feature extraction in a data-driven mode by constructing a deep neural network structure, deep feature representation closely related to tasks can be obtained according to learning of a large number of samples, the data set expression is more efficient and accurate, the extracted abstract features have better generalization capability, and the model robustness is stronger in an end-to-end mode.

Although some deep learning methods for visible light image recognition also achieve good performance in the field of SAR image recognition, some differences exist between SAR images and visible light images: in one aspect, the complex data for each pixel of the SAR image may be transformed in the frequency domain to extract corresponding amplitude and phase information. The amplitude information has great correlation with the gray scale information of the visible light image, and is the backscattering intensity of the ground target to radar waves; the phase information is the round-trip propagation distance of the sensor from the ground target. On the other hand, the visible light image recognition model generally only considers the pixel points of the spatial domain and the high-order feature modeling of the relationship between the pixel points, but does not consider the characteristics of the SAR image and the target to be recognized, such as the nonuniformity of the strong background scattering clutter. Therefore, only the feature extraction and model construction of the spatial domain are considered, and the method is not suitable for SAR image recognition.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: due to the imaging characteristic of the SAR image, not only can an amplitude image of a space domain be obtained, but also the back scattering characteristic of a frequency domain is included. Most of the existing deep learning methods only build a network model to model the airspace characteristics of an SAR target, do not dig out the frequency domain characteristics of an SAR image, and cause the loss of key information. Some deep learning methods also mine the frequency domain characteristics of the SAR image, but do not verify the validity of the frequency domain signal. On one hand, because the background information exists in the spatial domain image, the frequency domain signal converted by the background information and the frequency domain signal converted by the foreground information are distributed in different frequency domain channels, namely, a part of signals of the frequency domain channels exist as noise information, which is not beneficial to the identification of the SAR image; on the other hand, speckle noise exists in the SAR image, and is also distributed in different frequency domain channels, which may interfere with the recognition performance of the SAR image. Therefore, how to separate out the effective frequency domain signal is crucial.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art and provides an SAR image recognition method based on a fusion frequency domain and space domain network model, wherein the space domain pixel information and the frequency domain characteristics of the SAR image are extracted in an end-to-end mode, and the characteristics are fused to obtain the essential characteristics of a target, so that the accuracy and the robustness of SAR image recognition are further improved, and the interpretability of the model is enhanced.

In order to solve the technical problem, the invention discloses an SAR image recognition method based on a fusion frequency domain and space domain network model, which comprises the following steps:

step 1, acquiring an SAR image to be identified, and performing image data enhancement on the acquired SAR image, wherein the enhanced image is used as a spatial domain image;

and 2, converting the size of the spatial domain image in the step 1 into N x N, and dividing the converted spatial domain image into N x N size blocks to obtain N/N x N/N size blocks. Transforming the image signal of each size block space domain into a structural form expressed by frequency components by a frequency domain conversion method, wherein each size block has n x n different frequency components; the frequency components of corresponding positions in blocks with different sizes are used as one channel of the constructed frequency domain image, and all n channels form a new frequency domain image F through dimension transformation ₁ ；

Step 3, inputting the spatial domain image in the step 1 into a spatial domain backbone network to extract a spatial domain feature vector F _space 。

And 4, different frequency channels have different influences on the model performance, some frequency channels play a key role in model identification, and some frequency channels do not help, but increase the time for model training and inference. Thus, by comparing the frequency domain image F obtained in step 2 ₁ Selecting channels to obtain effective frequency domain signal F ₂ 。

Step 5, the effective frequency domain signal F obtained in the step 4 is processed ₂ Inputting the data into a frequency domain backbone network to extract a frequency domain feature vector F _frequency 。

Step 6, the space domain feature vector F obtained in the step 3 is processed _space And the frequency domain feature vector F obtained in step 5 _frequency Carrying out feature fusion to obtain a fused target feature vector F _fusion 。

Step 7, fusing the characteristic vector F obtained in the step 6 _fusion Input the methodAnd in a subsequent network, performing feature dimension reduction and class probability prediction to realize the identification and classification of the target.

In one implementation, the image data enhancement of the SAR image in step 1 includes: in the training stage, the data enhancement mode comprises the modes of image standardization, image scale transformation, geometric transformation (translation, turnover and the like), random cutting and the like; in the testing stage, the enhanced image is used as a spatial domain image only by adopting the modes of image scale transformation and image standardization;

in one implementation, the step 2 includes:

step 2.1, converting the size of the spatial domain image into N × N, namely, the length and the width of the converted spatial domain image are both N, and dividing the spatial domain image into N/N × N/N size blocks; the size of the spatial domain image is converted into N x N so as to ensure the consistency of frequency domain signals obtained from all images in the data set, and the size of the frequency domain signals is divided into N x N size blocks, wherein the obtained N/N x N/N size blocks are used for separating the frequency domain signals with different frequencies;

step 2.2, the image signal of each size block space domain is converted into a structural form expressed by frequency components by adopting a two-dimensional discrete cosine transform mode, and the calculation formula of the two-dimensional discrete cosine transform is as follows:

Y＝C ^N ·X·(C ^N ) ^T

where X is an image signal of each size block spatial domain, Y is an output frequency domain signal of each size block, and C is a transform coefficient matrix, and the expression formula is as follows:

where j, k ∈ {0,1,2, …, N-1}, j and k respectively represent the positions of the horizontal and vertical axes of the pixel points in the spatial-domain image signal. When j is 0, α _j 1 is ═ 1; when j > 0, α _j ＝2；

The image signal of each size block space domain is converted into a structural form expressed by frequency components by adopting a two-dimensional discrete cosine transform mode, so that the frequency domain energy focusing degree is better, and unimportant frequency domain regions can be filtered;

step 2.3, extracting frequency components at corresponding positions for the frequency domain signals Y obtained from the blocks with different sizes to connect, and using the frequency components as a channel of the constructed frequency domain image; since the frequency domain signal Y has a total of n ² Each position, so that the constructed frequency domain image is a two-dimensional feature vector with the size of n ² *(N/n) ² The corresponding position refers to the ith position in the frequency domain signal Y obtained by each size block, i is equal to {1,2, …, n ² }; the two-dimensional frequency domain image is expanded through dimensionality to form a new three-dimensional frequency domain image F ₁ Image size n ² (N/N) N, wherein N ² The number of channels is the frequency domain image. The method is beneficial to restoring the spatial information of the frequency domain signal, and is used for extracting the subsequent frequency domain characteristics and fusing the frequency domain characteristics and the spatial domain characteristics.

In one implementation manner, in the step 3, the spatial domain image is input to a spatial domain backbone network based on ResNet50 to extract a spatial domain feature vector F _space . In the step, the spatial domain characteristics of the SAR image are modeled by excavating the pixel values and the relation between the pixels.

In one implementation, the step 4 inputs the frequency domain image, and obtains the effective frequency domain signal by using a channel selection method based on an attention mechanism. The method comprises the following steps:

step 4.1, input frequency domain image F ₁ The attention network obtains an attention feature vector Mask, and the expression formula is as follows:

Mask＝Sigmoid(BN(Conv(ReLU(Conv(F ₁ )))))

wherein Conv represents 1 × 1 convolution operation, BN represents batch normalization, and Sigmoid and ReLU represent activation functions;

step 4.2, fusing attention characteristic vector Mask and frequency domain image F ₁ And using convolution network model to select effective frequency to obtain effective frequency signal F ₂ The expression formula is as follows:

wherein the content of the first and second substances,

representing element-by-element multiplication.

Different frequency channels have different influences on the performance of the model, some frequency channels play a key role in the identification of the model, some frequency channels do not help, but the training and deducing time of the model is increased, and step 4.1 utilizes an attention mechanism to model the importance among the characteristics of the different frequency channels and pays attention to the dependency relationship of the model channel level; and 4.2, filtering out frequency channels with low importance by utilizing the importance of the further modeling channel characteristics of the convolutional neural network to obtain effective frequency domain characteristics.

In one implementation, in the step 5, the frequency domain backbone network is a modified frequency domain backbone network of ResNet50, and the modification is to remove the first convolutional layer and the pooling layer of the ResNet50 residual network, so as to ensure that the input of the network model is adapted to the input of the frequency domain signal. In the step, deep learning of frequency domain signals is realized by fine adjustment of the ResNet50 network, frequency distribution and scattering characteristics are automatically mined, and frequency domain characteristics of the SAR image are extracted. The space domain backbone network and the frequency domain backbone network are kept to adopt the same model structure, and subsequent feature fusion of the space domain features and the frequency domain features in the same dimensional space is facilitated.

In one implementation, in step 6, Concat is connected in a channel dimension to a space domain feature vector F _space Sum frequency domain feature vector F _frequency Fusing to obtain a feature vector F _fusion The expression formula is as follows:

F _fusion ＝Concat(F _space ，F _frequency ,dim＝1)

the complementation of the frequency domain characteristic and the spatial domain characteristic is realized through the characteristic fusion of the channel dimensions, and the discriminability and the robustness of the characteristics are enhanced.

In one implementation, in step 7,the subsequent network is a classifier and consists of a full-connection layer network and a Softmax activation function, and the fused feature vector F _fusion Inputting the feature vectors into a full-connection layer network, and performing feature dimension reduction to obtain feature vectors of an identification target, wherein the dimensions of the feature vectors of the identification target are the number of categories of all targets; inputting the characteristic vector for identifying the target into a Softmax activation function to predict the probability corresponding to each category, and taking the category with the maximum probability value as the predicted category to realize the prediction of the target category; and in the subsequent network model training stage, inputting the category of target prediction and the labeled category information, and performing supervised training on the subsequent network model by adopting a cross entropy loss function.

Has the advantages that: the invention discloses an SAR image recognition method based on a fusion frequency domain and space domain network model, which is characterized in that an end-to-end network model is designed to fuse the pixel information of an airspace and the scattering characteristics of a frequency domain, the complementation of different domain characteristics is carried out, and deeper key characteristic information is excavated. Meanwhile, the effective frequency domain signal is selected by using the attention-based frequency domain channel selection method, so that the interference of noise frequency domain signals is reduced, and the effect and performance of SAR image identification are further improved.

Drawings

The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1 is a flow chart of the implementation steps of the present invention;

FIG. 2 is a network diagram of an embodiment of the present invention;

FIG. 3 is a schematic diagram of a channel selection method based on attention mechanism according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a spatial domain backbone network and a frequency domain backbone network according to an embodiment of the present invention;

Detailed Description

The invention is further elucidated with reference to the drawings and the embodiments.

FIG. 1 is a flow chart of the implementation steps of the present invention, which includes the following steps:

and 2, converting the size of the spatial domain image in the step 1 into 448 x 448, and dividing the image into 8 x 8 size blocks to obtain 56 x 56 size blocks. Transforming the image signal of the spatial domain of each size block into a structural form expressed by frequency components by a frequency domain conversion method, wherein each size block has 64 different frequency components; the frequency component of the corresponding position in the blocks with different sizes is used as one channel of the constructed frequency domain image, and all 64 channels form a new frequency domain image F through dimension transformation ₁ ；

Step 4, obtaining the frequency domain image F in the step 2 ₁ Selecting channels to obtain effective frequency domain signal F ₂ 。

Step 7, fusing the characteristic vector F obtained in the step 6 _fusion Inputting the data into a subsequent network, performing feature dimension reduction and class probability prediction, and realizing the identification and classification of the target.

In this embodiment, the enhancing the image data of the SAR image in step 1 includes: in the training stage, a data enhancement mode of image standardization, image scale transformation and random cutting is adopted; in the testing stage, the enhanced image is used as a spatial domain image only by adopting the modes of image scale transformation and image standardization;

fig. 2 is a schematic network diagram according to an embodiment of the present invention, where step 2 includes:

step 2.1, converting the size of the spatial domain image into 448 × 448, namely, the length and the width of the converted spatial domain image are 448 pixels, and dividing the spatial domain image into blocks with the size of 8 × 8 to obtain 56 blocks with the size of 56;

Y＝C ^N ·X·(C ^N ) ^T

where X is the image signal for each size block spatial domain and N is the transformed spatial domain image length and width, value 448. Y is the frequency domain signal of each size block output, C is the transform coefficient matrix, and the expression formula is as follows:

where j, k ∈ {0,1,2, …, 447}, and j and k represent the positions of the horizontal and vertical axes, respectively, of the pixel points in the spatial-domain image signal. When j is 0, α _j 1 is ═ 1; when j > 0, α _j ＝2；

Step 2.3, extracting frequency components at corresponding positions for the frequency domain signals Y obtained from the blocks with different sizes to connect, and using the frequency components as a channel of the constructed frequency domain image; since there are 64 positions in the frequency domain signal Y in total, the constructed frequency domain image is a two-dimensional feature vector with a size of 64 × 3136, and the corresponding position refers to the ith position in the frequency domain signal Y obtained by each size block, i belongs to {1,2, …, 64 }; the two-dimensional frequency domain image is expanded through dimensionality to form a new three-dimensional frequency domain image F ₁ The image size is 64 × 56, where 64 is the number of channels in the frequency domain image.

Fig. 4 is a schematic structural diagram of a spatial domain backbone network and a frequency domain backbone network according to an embodiment of the present invention, in step 3, a spatial domain image is input to a ResNet 50-based spatial domain backbone network to extract spatial domain featuresVector F _space The dimension of the feature vector is 2048.

Fig. 3 is a schematic diagram of a channel selection method based on the attention mechanism according to an embodiment of the present invention, where in step 4, a frequency domain image is input, and an effective frequency domain signal is obtained by using the channel selection method based on the attention mechanism. The method comprises the following steps:

Mask＝Sigmoid(BN(Conv(ReLU(Conv(F ₁ )))))

step 4.2, fusing attention feature vector Mask and frequency domain image F ₁ And using the network model to select effective frequency to obtain effective frequency signal F ₂ The expression formula is as follows:

wherein the content of the first and second substances,

representing element-by-element multiplication.

Fig. 4 is a schematic structural diagram of a spatial domain backbone network and a frequency domain backbone network according to an embodiment of the present invention, in step 5, the frequency domain backbone network is a modified frequency domain backbone network of ResNet50, and the modification is to remove a first convolutional layer and a pooling layer of a ResNet50 residual network, so as to ensure that the input of the network model is adapted to the input of the frequency domain signal. Inputting the frequency domain image into the frequency domain backbone network of the improved ResNet50 to extract the frequency domain feature vector F _frequency The feature vector dimension is 2048.

In this embodiment, in the step 6, the Concat is connected in the channel dimension to the space domain feature vector F _space Sum frequency domain feature vector F _frequency Fusing to obtain a feature vector F _fusion To harmonizeThe dimension of the resultant eigenvector is 4096, and the expression formula is as follows:

F _fusion ＝Concat(F _space ,F _frequency ,dim＝1)

in this embodiment, in step 7, the subsequent network is a classifier and is composed of a full-connection layer network and a Softmax activation function, and the feature vector F after fusion is used _fusion Inputting the data into a full-connection layer network, performing feature dimension reduction to obtain a feature vector of a recognition target, wherein the dimension of the feature vector input by the full-connection layer network is 4096, and the dimension of the output feature vector is the number of categories of all targets. And inputting the characteristic vector for identifying the target into a Softmax activation function to predict the probability corresponding to each category, and taking the category with the maximum probability value as the predicted category so as to realize the prediction of the target category. And in the subsequent network training stage, inputting the category of target prediction and the labeled category information, and performing supervised training on the subsequent network model by adopting a cross entropy loss function.

The present invention provides a method for recognizing an SAR image based on a fused frequency domain and spatial domain network model, and a plurality of methods and approaches for implementing the technical solution are provided, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components not specified in this embodiment can be implemented by the prior art.

Claims

1. A SAR image recognition method based on a fusion frequency domain and space domain network model is characterized by comprising the following steps:

step 2, converting the size of the spatial domain image in the step 1 into N x N, and dividing the converted spatial domain image into N x N size blocks to obtain N/N x N/N size blocks; by frequency domain conversion methodsTransforming the image signal in the spatial domain of each size block into a structural form expressed in frequency components, wherein each size block has n x n different frequency components; the frequency components of corresponding positions in the blocks with different sizes are used as one channel of the constructed frequency domain image, and all n channels form a new frequency domain image F through dimension transformation ₁ ；

Step 3, inputting the spatial domain image in the step 1 into a spatial domain backbone network to extract a spatial domain feature vector F _space ；

Step 4, the frequency domain image F obtained in the step 2 is processed ₁ Selecting channels to obtain effective frequency domain signal F ₂ ；

Step 5, the effective frequency domain signal F obtained in the step 4 is processed ₂ Inputting the data into a frequency domain backbone network to extract a frequency domain feature vector F _frequency ；

Step 6, the space domain feature vector F obtained in the step 3 is processed _space And the frequency domain feature vector F obtained in step 5 _frequency Carrying out feature fusion to obtain a fused target feature vector F _fusion ；

Step 7, fusing the feature vector F obtained in the step 6 _fusion Inputting the data into a subsequent network, performing feature dimension reduction and class probability prediction, and realizing the identification and classification of the target.

2. The method for identifying the SAR image based on the fusion frequency domain and space domain network model as claimed in claim 1, wherein the image data enhancement of the SAR image in the step 1 comprises: in the training stage, the data enhancement mode comprises an image standardization mode, an image scale transformation mode, a geometric transformation mode and a random cutting mode; in the testing stage, the enhanced image is used as a spatial domain image by adopting the modes of image scale transformation and image standardization.

3. The method for SAR image recognition based on the fusion frequency domain and space domain network model according to claim 1, wherein the step 2 comprises:

step 2.1, converting the size of the spatial domain image into N × N, namely, the length and the width of the converted spatial domain image are both N, and dividing the spatial domain image into N/N × N/N size blocks;

Y＝C ^N ·X·(C ^N ) ^T

wherein, X is an image signal of each size block space domain, Y is an output frequency domain signal of each size block, C is a transform coefficient matrix, and the expression formula is as follows:

wherein j, k belongs to {0,1,2, …, N-1}, and j and k respectively represent the positions of the horizontal axis and the vertical axis of a pixel point in the spatial domain image signal; when j is 0, α _j 1 is ═ 1; when j > 0, α _j ＝2；

Step 2.3, extracting frequency components at corresponding positions for the frequency domain signals Y obtained from the blocks with different sizes to connect, and using the frequency components as a channel of the constructed frequency domain image; since there is a total of n in the frequency domain signal Y ² Each position, so that the constructed frequency domain image is a two-dimensional feature vector with the size of n ² *(N/n) ² The corresponding position refers to the ith position in the frequency domain signal Y obtained by each size block, i is equal to {1,2, …, n ² }; the frequency domain image is expanded through dimensionality to form a new three-dimensional frequency domain image F ₁ Image size n ² (N/N) (N/N), wherein N ² The number of channels is the frequency domain image.

4. The SAR image recognition method based on the fusion frequency domain and space domain network model as claimed in claim 1, wherein in step 3, the space domain image is input to a ResNet 50-based space domain backbone network to obtain a space domain feature vector F _space 。

5. The method for SAR image recognition based on the fusion frequency domain and space domain network model according to claim 1, wherein the step 4 comprises:

Mask＝Sigmoid(BN(Conv(ReLU(Conv(F ₁ )))))

wherein the content of the first and second substances,

representing element-by-element multiplication.

6. The method as claimed in claim 1, wherein in step 5, the frequency domain backbone network is a modified ResNet50 frequency domain backbone network, and the modification is to remove a first convolutional layer and a pooling layer of a ResNet50 residual network to ensure that the input of the network model is adapted to the input of the frequency domain signal.

7. The SAR image recognition method based on the fusion frequency domain and space domain network model as claimed in claim 1, wherein in the step 6, Concat is connected to the space domain feature vector F in the channel dimension _space Sum frequency domain feature vector F _frequency Fusing to obtain a feature vector F _fusion The expression formula is as follows:

F _fusion ＝Concat(F _space ,F _frequency ,dim＝1)

where dim denotes a channel dimension, and dim-1 denotes one channel dimension.

8. The SAR image recognition method based on the fusion frequency domain and space domain network model as claimed in claim 1, wherein in step 7, the subsequent network is a classifier composed of a full link network and a Softmax activation function, and the fused feature vector F is obtained _fusion Inputting the feature vector into a full-connection layer network, performing feature dimension reduction to obtain a feature vector of an identification target, inputting the feature vector of the identification target into a Softmax activation function to predict the probability corresponding to each category, and taking the category with the maximum probability value as the predicted category; and in the subsequent network model training stage, inputting the category of target prediction and the labeled category information, and performing supervised training on the subsequent network model by adopting a cross entropy loss function.