CN116681623A

CN116681623A - SAR image target detection method based on multistage Laplacian pyramid denoising

Info

Publication number: CN116681623A
Application number: CN202310774557.XA
Authority: CN
Inventors: 傅雄军; 赵聪霞; 董健; 谢民; 冯程; 常昊; 曹申; 吴文浩
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2023-06-28
Filing date: 2023-06-28
Publication date: 2023-09-01

Abstract

The invention relates to a SAR image target detection method based on multistage Laplacian pyramid denoising. Comprising the following steps: decomposing an original image into different frequency domain sub-bands through Laplacian pyramid transformation, inhibiting noise on each high-frequency sub-band by adopting a trainable threshold module, and obtaining a denoised image through Laplacian pyramid reconstruction; connecting the denoised image and the first sub-band of the Laplace transformation with the original image to fuse the spatial domain information and the frequency domain information of the image to form a three-channel image, and inputting the three-channel image into a subsequent target detection network; the attention mechanism is introduced into a feature fusion module of the basic network, different weights are given to each pixel of the feature map, and effective features are highlighted. The method has self-adaptability, and does not increase the complexity and training difficulty of the network due to the change of the input image scale; generating a truth diagram containing scale information, and not relating to training of a neural network, so that the complexity is low and the operation is easy; the generated truth-up graph indicates the location and scale information of each target.

Description

SAR image target detection method based on multistage Laplacian pyramid denoising

Technical Field

The invention belongs to the technical field of deep learning, image processing and radar target detection, and particularly relates to a SAR image target detection method based on multistage Laplacian pyramid denoising.

Background

SAR target detection is an important subject of radar high-resolution image interpretation, and the image target detection has important application value in the fields of disaster prevention and relief, emergency rescue, urban management, ocean monitoring and the like. In recent years, with the vigorous development of artificial intelligence, image processing technology based on deep learning is becoming a mainstream method for intelligent recognition and detection of image targets. Researchers put forward a series of SAR image target detection algorithms based on convolutional neural networks, and the effectiveness of the algorithms is verified.

Speckle noise is an important factor affecting the intelligent recognition and detection accuracy of SAR image targets, and can reduce the contrast of images, cause false detection, omission detection and the like of targets, and affect the image interpretation effect. The traditional image denoising method mainly comprises two modes of airspace denoising and frequency domain denoising, the intelligent image denoising method is mainly based on deep learning, and effective features in an image are extracted through a feature extraction function of a neural network, so that noise in a graph is obtained. Image denoising based on the above two modes has achieved excellent effects, but there are also some problems:

the existing denoising method is only used as a preprocessing process of image processing, end-to-end image processing is not realized, and the denoising and the detection process of the follow-up image processing process are not coupled immediately, so that the scheme integrity is influenced, and the overall processing effect is further influenced. The traditional frequency domain denoising method is often provided with a noise suppression threshold according to experience, is easily influenced by human experience, and cannot achieve self-adaption to each image, each network and image processing tasks. Therefore, an end-to-end target detection network comprising an autonomous denoising structure is constructed, and the method has important significance in solving the problem of target detection performance of the synthetic aperture radar under the pollution of speckle noise.

Disclosure of Invention

The invention aims to solve the problem that denoising and detection processes are not coupled when SAR image target detection is carried out under the influence of speckle noise, and provides a SAR image target detection method based on multistage Laplacian pyramid denoising, which is based on a multistage Laplacian pyramid denoising end-to-end SAR image target detection model: lpdnat.

In order to achieve the above purpose, the present invention adopts the following technical scheme.

The SAR image target detection method relies on a multistage Laplacian pyramid denoising end-to-end SAR image target detection model: LPDNet comprising denoising network, feature fusion module and target detection network;

the denoising network is a multi-stage Laplacian pyramid denoising network based on threshold self-adaption; the threshold self-adaptive multi-level Laplacian pyramid denoising network comprises a Laplacian pyramid decomposition module, a threshold self-adaptive determination module and a Laplacian pyramid reconstruction module;

based on the prior characteristic that most SAR image speckle noise is concentrated in a high-frequency sub-band, the high-frequency sub-band image of an image is decomposed and extracted through the Laplacian pyramid, put into a branch network consisting of a rolling layer and a full-connection layer for supervised learning, the threshold is learned through self-adaptive training, the high-frequency sub-band image is subjected to hard threshold processing according to the threshold, and the high-frequency sub-band image is combined with a low-frequency sub-band image and then subjected to Laplacian pyramid reconstruction to obtain a denoised image; therefore, the threshold value of Laplace denoising is supervised and learned through the training set label, and the recognition performance degradation caused by setting the threshold value through experience is effectively avoided.

The feature fusion module fuses the original image, the first layer of high-frequency subband image decomposed by the Laplacian pyramid and the denoised image through stacking to obtain a fused image;

the target detection network is a target detection network CBAM Yolox-tiny based on an attention mechanism; wherein CBAM is Convolutional Block Attention Module;

the Yolox-tiny network comprises a backbone network, a feature fusion part and a detection head;

the backbone network comprises a plurality of convolution layers and a pooling layer;

the SAR image target detection method based on the multistage Laplacian pyramid denoising relies on a multistage Laplacian pyramid denoising end-to-end SAR image target detection model comprising a denoising network, a feature fusion module and a target detection network: lpdnat; the target detection method comprises the following steps:

s1: selecting a target detection data set, and dividing the target detection data set into a training set, a verification set and a test set;

s1, a target detection data set comprises a sample image and a corresponding label;

s2: the method for constructing the multistage Laplacian pyramid denoising end-to-end SAR image target detection model comprises the following steps:

s21, constructing a threshold self-adaptive multi-level Laplacian pyramid denoising network, wherein the method comprises the steps of constructing a Laplacian pyramid decomposition module, a threshold self-adaptive determination module and a Laplacian pyramid reconstruction module;

the input of the Laplacian pyramid decomposition module is a sample image, and the output is a high-frequency subband image and a low-frequency subband image which are obtained through Laplacian pyramid decomposition; the input of the threshold self-adaptive determining module is a high-frequency subband image obtained by the Laplacian pyramid decomposing module, and the output is a noise-reduced high-frequency subband image; the input of the Laplacian pyramid reconstruction module is a high-frequency subband image and a low-frequency subband image after noise reduction, and the output is a denoising image;

the low-frequency subband image obtained by decomposing the Laplacian pyramid can be used as input of a Laplacian pyramid decomposition module at the next stage, k-level high-frequency subband images and low-frequency subband images are obtained through k-level cascading, and then the Laplacian pyramid decomposition, threshold self-adaptive determination and Laplacian pyramid reconstruction process are repeated to form a threshold self-adaptive multi-level Laplacian pyramid denoising network;

the k is equal to or more than 0 and equal to or less than 4, and k is equal to 0, and corresponds to one-time decomposition, and the k-level high-frequency subband image is obtained through k+1 times of decomposition;

s22, constructing a feature fusion module;

s23, constructing a target detection network based on an attention mechanism;

s3: training a multi-stage Laplacian pyramid denoising end-to-end SAR image target detection model to obtain a trained LPDNet, comprising:

s31: initializing network parameters, initializing a convolution kernel of a convolution layer and a weight value of each layer according to a set initialization value, and setting a reasonable learning rate;

s32: training the LPDNet by using the sample images and the corresponding labels in the training set divided in the S1, performing supervised learning on network parameters through a loss function, and outputting the trained LPDNet;

s4: the target detection stage specifically comprises the following steps:

and (3) detecting by adopting the test set divided in the step (S1), inputting a sample image in the test set into the trained LPDNet, and outputting a target detection result.

S1, downloading a corresponding existing data set according to a target detection task to obtain the target detection data set;

the target detection data set is divided into a training set, a verification set and a test set, and specifically comprises the following steps: dividing a target detection data set according to a certain proportion;

the certain proportion is as follows: training, validation and test sets account for x%, y%, z% and x% + y% + z% = 1, respectively.

S22, the feature fusion module fuses the original image, the first layer of high-frequency subband image 1 decomposed by the Laplacian pyramid and the denoised image through stacking to obtain a fusion feature image;

the method can effectively make up the loss of image detail information caused by the denoising network by stacking and fusing.

S23, introducing a CBAM module into a characteristic fusion part of a Yolox-tiny network; the CBAM module comprises a plurality of global pooling layers, a maximum pooling layer and a convolution layer; and inputting the fusion feature map obtained by the feature fusion module into an attention mechanism-based target detection network, namely connecting the Laplacian pyramid denoising network and the attention mechanism-based target detection network through a feature fusion part, and constructing the LPDNet.

Advantageous effects

The invention relates to a target detection method based on Laplacian pyramid denoising threshold adaptation, which has the following beneficial effects:

1. according to the method, the threshold self-adaptive multi-level Laplace pyramid denoising is combined with the target detection network, so that self-adaptive learning of the Laplace denoising threshold is realized, the method is self-adaptive to each picture, and the defect that the denoising threshold is set according to experience in the traditional frequency domain denoising so as to influence the denoising effect and the target detection effect is overcome;

2. according to the method, the denoising image and the first layer of high-frequency subband image of the Laplacian pyramid are stacked with the original image, so that the detail information lost in the denoising process is compensated, the high-frequency information of the image is enhanced, and the target detection effect is improved;

3. and an attention module is added in the target detection network, so that the characterization capability is increased, important features are focused, unnecessary features are restrained, and the target detection effect is further enhanced.

Drawings

FIG. 1 is a diagram of the overall network structure of a multi-level Laplacian pyramid denoising end-to-end SAR image target detection method, namely a method flow chart;

fig. 2 is a diagram of a multistage laplacian pyramid denoising network based on threshold adaptation in an embodiment 1 of a multistage laplacian pyramid denoising end-to-end SAR image target detection method of the present invention; multistage Laplacian pyramid denoising network with threshold self-adaption-based multistage Laplacian pyramid denoising end-to-end SAR image target detection model

FIG. 3 is a block diagram of an attention mechanism module of the present invention;

fig. 4 is a block diagram of an attention mechanism based object detection network CBAM Yolox-tiny.

Detailed Description

The SAR image detection method based on Laplacian pyramid denoising threshold adaptation is described in detail below with reference to the accompanying drawings and specific embodiments.

Example 1

This example illustrates the implementation of the method of the present invention for target detection on an SSDD dataset.

s22, constructing a feature fusion module;

s23, constructing a target detection network based on an attention mechanism;

s4: the target detection stage specifically comprises the following steps:

Experimental data and configurations were as follows:

(1) Using an SSDD standard data set, wherein the data set comprises 1160 SAR ship images with different resolutions and polarization modes, comprises 2358 ships with different scales and different materials, and is a data set widely applied to the SAR image target detection field;

(2) The size of the image in the dataset is about 500×500, and for convenience of experiments, the size of the image is unified to 416×416;

(3) Dividing the data set into a training set, a testing set and a verification set according to the proportion of 7:2:1;

(4) The computing platform is a Linux server assembled with a K80 model GPU, and the model is realized by using a PyTorch framework.

Fig. 1 shows an overall network structure diagram of a multi-level laplacian pyramid denoising end-to-end SAR image target detection method, namely a flow diagram of the method. Wherein, the CBAM Yolox-tiny target detection network is a target detection network which fuses a CBAM module and Yolox-tiny, namely, the target detection network is shown in figure 4.

FIG. 2 illustrates a block diagram of a multi-level Laplacian pyramid denoising network based on threshold adaptation for this example;

FIG. 3 illustrates a block diagram of an attention mechanism module;

fig. 4 shows a block diagram of the attention-based object detection network CBAM Yolox-tini in this example.

The following is a specific implementation procedure. Lpdnat has two phases: network construction and training phase and target detection phase:

the network construction and training stage mainly comprises the following steps:

step A.1: the training sample set is constructed specifically as follows:

unifying the sizes of the images in the SSDD data set to 416×416, and randomly dividing the images into a training set, a test set and a verification set according to the ratio of 7:2:1;

step A.2: the SAR image target detection network based on Laplacian pyramid denoising threshold adaptation is constructed, and the SAR image target detection network specifically comprises the following steps:

the construction is carried out according to the threshold self-adaptive multi-level Laplacian pyramid denoising network proposed in the step S2, and the specific steps are as follows in combination with FIG. 1:

the input of the Laplacian pyramid decomposition module is an original image, and the output is a high-frequency subband image and a low-frequency subband image which are obtained through Laplacian pyramid decomposition; the input of the threshold self-adaptive determining module is the high-frequency subband image obtained by the Laplacian pyramid decomposing module, and the output is the denoised high-frequency subband image; the input of the Laplacian pyramid reconstruction module is a denoised high-frequency subband image and a denoised low-frequency subband image, and the input is a denoised image;

the low-frequency subband image obtained by the decomposition of the Laplacian pyramid can be used as the input of a Laplacian pyramid decomposition module at the next stage, the first, second and third-stage high-frequency subband images and the low-frequency subband image can be obtained by embedding three times of Laplacian pyramid decomposition, the threshold self-adaptive determination process is repeated for each stage of high-frequency subband image, and the denoised high-frequency subband image and the low-frequency subband image/the upper-stage Laplacian pyramid reconstruction output image are input into the Laplacian pyramid reconstruction module together to obtain the denoised image.

( Or this version: inputting the images in the training set into a Laplacian pyramid decomposition module, and outputting the images into a high-frequency subband image 1 and a low-frequency subband image 1 which are obtained through Laplacian pyramid decomposition; the input of the threshold self-adaptive determining module 1 is a high-frequency subband image 1 obtained by the Laplacian pyramid decomposing module, and the output is a noise-reduced high-frequency subband image 1; carrying out Laplacian pyramid decomposition on the low-frequency subband image 1 to obtain a high-frequency subband image 2 and a low-frequency subband image 2; inputting the high-frequency subband image 2 into a threshold self-adaptive determining module 2 to obtain a high-frequency subband image 2; carrying out Laplacian pyramid decomposition on the low-frequency subband image 2 to obtain a high-frequency subband image 3 and a low-frequency subband image 3; inputting the high-frequency subband image 3 into a threshold self-adaptive determining module 3 to obtain a high-frequency subband image 3; inputting the high-frequency subband image 3 and the low-frequency subband image 3 into a Laplacian pyramid reconstruction module to obtain a reconstructed image 3 of a third layer; inputting the reconstructed image 3 and the high-frequency subband image 2 into a Laplacian pyramid reconstruction module to obtain a reconstructed image 2 of a second layer; inputting the reconstructed image 2 and the high-frequency subband image 1 into a Laplacian pyramid reconstruction module to obtain a reconstructed image 1 of the first layer, namely a denoised image; )

The threshold self-adaptive determining module consists of a convolution layer and a full connection layer, and is particularly shown in fig. 2;

the step A.2 is specifically implemented: the laplacian pyramid transform is based on a gaussian pyramid decomposition. And (3) performing multi-resolution feature extraction on the image through Gaussian filtering and downsampling to obtain thumbnail images of original images with various resolutions, thereby forming a Gaussian pyramid. The high resolution portion of the image is at the bottom of the pyramid and the low resolution portion is at the top of the pyramid. The essence of the Laplacian pyramid is the detail component of the Gaussian pyramid image with the same level, and is obtained by subtracting the Gaussian pyramid images with two adjacent levels, namely, the image with the higher level is up-sampled to be the same as the image with the lower level in size, and then the image with the lower level is subjected to differential operation by filtering, so that the obtained image is the Laplacian pyramid image with the corresponding level, and the Laplacian pyramid image can be expressed as:

wherein LP _l Representation of decomposition resultsLayer 1 image of G _l Is a layer i gaussian pyramid image,the result obtained by upsampling and filtering the layer 1 Gaussian pyramid image is shown, and N is the highest level of the Laplacian decomposition.

The Laplacian pyramid reconstruction process is opposite to the decomposition process, and is started from the top layer, and is subjected to downsampling and addition, iterative calculation layer by layer until the bottom image, and the process can be expressed as:

in the step A.2, when the operation of the threshold self-adaption determining module is implemented specifically: performing convolution operation on the input image to obtain an output image of the layer;

the threshold self-adaptive determining module comprises a convolution layer, a full connection layer, an activation function, a hard threshold processing part and parameter setting;

in this network, the convolution kernel size of the convolution layer is the same as the input high frequency subband image size, with a global receptive field of the high frequency subband image, which can be expressed as:

O _i ＝MaxPool(f _i *I _H +b _i )

wherein O is _i The ith characteristic channel representing the output of the convolution layer, maxPool () is the maximum pooling function, f _i For the convolution kernel of the input feature map corresponding to the ith feature channel, I _H High frequency subband images input for convolutional layer, b _i Representing the bias of the convolutional layer. The fully-connected layer takes the output of the convolution layer as input, i.e. O _i Outputs a tensor D, D E R ^1×1×1 :

D＝ReLU(WO+b)

Wherein ReLU () is a ReLU activation function. The denoising threshold corresponding to one high-frequency subband can be obtained through the above operation. Noise in the high frequency subgraph is suppressed by a hard threshold function, which is shown as follows:

wherein I is _H (x, y) represents the high frequency subband coefficients after thresholding, I (x, y) represents the high frequency subband coefficients after laplacian pyramid decomposition, and threshold is the threshold determined by the convolutional neural network.

Step A.3: the feature fusion is specifically as follows:

the image obtained by denoising the multi-level Laplacian pyramid, the original image and the first-layer high-frequency subband image transformed by the Laplacian pyramid are fused, the contour information of the image is enhanced while the characteristics of the image information lost by denoising are made up, and the fused characteristic image is sent to a subsequent detection network;

step A.4: constructing an attention mechanism-based target detection network:

the CBAM module, as shown in fig. 3, includes a channel attention mechanism and a spatial attention mechanism, derives attention map (attention map) sequentially along two dimensions of the channel and the space, multiplies the attention map by the image, and adaptively refines the features, thereby increasing the characterization capability, i.e. focusing on important features and suppressing unnecessary features.

The feature fusion part of introducing the CBAM module into the Yolox-tiny network structure is shown in fig. 4, and a box in the figure is an attention mechanism module. The CBAM is arranged on four channels, features are given different weights, effective information is provided for the next layer of feature extraction process after the channels are connected, invalid information is restrained, and therefore detection accuracy is further improved.

The feature fusion module is connected with a threshold self-adaptive multi-level Laplacian pyramid denoising network and an attention mechanism-based target detection network to form an end-to-end detection network, namely an LPDNet;

the final result of target detection is output through the detection head in fig. 4, and includes the position information and the category information of the detection target;

step A.5: training LPDNet, namely training thresholds and target positions of target detection in a multi-level laplacian pyramid network by using the training set constructed in step a.1, specifically:

step a.5.1: initializing network parameters, namely initializing a convolution kernel of a convolution layer and a weight value of each layer to be 0 as a mean value and variance as a varianceIs a gaussian distribution of (c);

step a.5.2: using StepLR as a learning rate adjustment mechanism, i.e. the learning rate decreases by a factor of 0.05 per 1 epoch pass; adam (Adaptive momentum) was used as the optimizer algorithm;

the model is firstly frozen for 50 rounds of backbone network training, batch (batch size during network training) is set to 8, and initial learning rate is set to 0.001; thawing training is carried out for 100 rounds, batch is set to 4, and initial learning rate is set to 0.0001;

wherein the loss function is set as:

wherein, the liquid crystal display device comprises a liquid crystal display device,and->The binary cross entropy loss used for classification loss and regression loss are respectively IoU loss functions adopted for regression loss;

thus far, from step a.1 to step a.5.2, the network construction and training phase of this embodiment is completed;

in the detection stage, a corresponding test sample is detected by using a model trained by a training set, a detection result graph is obtained, and the detection effect is evaluated by using accuracy (Precision), recall (Recall), average Accuracy (AP) and F1 score (F1-score) as evaluation indexes.

These evaluation criteria were calculated from four components, true Positive (TP), true Negative (TN), false Positive (FP), false Negative (FN). In this context, TP and TN denote the number of correctly detected vessels and the correct number of backgrounds, respectively. FP represents the number of false positives and FN represents the number of undetected ships. To determine if the detected target is correct, an overlap ratio (Intersection over Union, ioU) is introduced. IoU the ratio of the detection box to the true box overlap is calculated:

wherein S is _∩ Representing the overlapping area of the predicted frame and the real frame, S _∪ Representing the overall area of both. If IoU is greater than the set threshold (set to 0.5 herein), then the detection box is determined to be correct.

Accuracy, recall, average accuracy, F1 score are defined as follows:

the accuracy represents the correct proportion of all the predicted frames, and the recall rate represents the proportion of the target which is correctly positioned and identified to the total target number. The AP and F1 scores are used to measure the balance between accuracy and recall. The accuracy is calculated as the average value of the accuracy of the recall in the interval of 0 to 1, the F1 fraction is the harmonic average of the recall and the F1 fraction, and the higher the two values are, the better the detection performance is.

Table 1 experimental results

The specific implementation of the SAR image target detection method based on the multistage Laplacian pyramid denoising network in SAR ship detection is completed.

The foregoing is a preferred embodiment of the present invention, and the present invention should not be limited to the embodiment and the disclosure of the drawings. All equivalents and modifications that come within the spirit of the disclosure are desired to be protected.

Claims

1. The SAR image target detection method based on the multistage Laplacian pyramid denoising is characterized by relying on a multistage Laplacian pyramid denoising end-to-end SAR image target detection model comprising a denoising network, a feature fusion module and a target detection network: lpdnat; the target detection method comprises the following steps:

s22, constructing a feature fusion module;

s23, constructing a target detection network based on an attention mechanism;

s4: the target detection stage specifically comprises the following steps:

2. The SAR image target detection method based on multistage Laplacian pyramid denoising according to claim 1, wherein S1 the target detection data set is obtained by downloading a corresponding existing data set according to a target detection task;

3. The method for detecting the SAR image target based on the multi-level Laplacian pyramid denoising according to claim 2, wherein the feature fusion module in S22 fuses the original image, the Laplacian pyramid decomposed first-layer high-frequency subband image 1 and the denoised image through stacking to obtain a fusion feature map;

4. The method for detecting the target of the SAR image based on the multi-level Laplacian pyramid denoising according to claim 3, wherein S23 introduces a CBAM module into a feature fusion part of a Yolox-tiny network; the CBAM module comprises a plurality of global pooling layers, a maximum pooling layer and a convolution layer; and inputting the fusion feature map obtained by the feature fusion module into an attention mechanism-based target detection network, namely connecting the Laplacian pyramid denoising network and the attention mechanism-based target detection network through a feature fusion part, and constructing the LPDNet.