CN115909086A

CN115909086A - SAR target detection and identification method based on multistage enhanced network

Info

Publication number: CN115909086A
Application number: CN202211449058.5A
Authority: CN
Inventors: 白雪茹; 鲜要胜; 杨敏佳; 孟昭晗; 周峰
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-04-04

Abstract

The invention discloses an SAR target detection and identification method based on a multistage enhancement network, which mainly solves the problems of poor robustness, high false alarm rate and omission factor and low detection and identification precision in the prior art in a complex environment. The implementation scheme is as follows: marking and dividing SAR measured data to obtain a training set and a test set; constructing a multi-level enhancement network formed by cascading a data level enhancement module, a feature level enhancement module, a region suggestion module and a decision level enhancement module; training a multistage enhancement network by using a training set based on a random gradient descent algorithm; and inputting the test set image into the trained multistage enhancement network to obtain the detection and identification result of the SAR target. The SAR target detection and identification method based on the multi-point sensor significantly improves the detection and identification performance of the SAR target in the complex environment, and can be used for battlefield reconnaissance and situation perception.

Description

SAR target detection and identification method based on multistage enhanced network

Technical Field

The invention belongs to the technical field of radar remote sensing, and further relates to an SAR target detection and identification method which can be used for battlefield reconnaissance and situation perception.

Background

Synthetic aperture radar SAR is an active microwave imaging sensor that obtains two-dimensional high resolution images by transmitting large time-width-bandwidth product signals and using aperture synthesis. Compared with optical and infrared sensors, the SAR has the unique advantages of all-weather, long acting distance, strong penetrating power and the like, becomes an important means for earth observation, and is widely applied to the fields of military affairs and civil use. With the continuous improvement of SAR systems and the continuous improvement of SAR imaging levels, SAR image interpretation techniques are gradually receiving attention from researchers and researchers in the related field. As the difficulty and the key step, the method has important significance and research value for the accurate detection and identification of the important target.

The traditional SAR target detection and identification method mainly adopts three-stage processing flows including target detection, target identification and target identification. The target detection is mainly based on a constant false alarm rate CFAR algorithm, and on the premise that a background clutter meets a certain probability distribution model, the algorithm realizes target detection by performing a sliding window on an SAR image and comparing a selected value with a self-adaptive threshold, however, because effective modeling is difficult to be performed on a non-uniform strong clutter background, the algorithm has poor adaptability in a complex scene and low detection precision. The target identification and the target recognition are mainly carried out by manual feature design and classifier construction according to the statistical information and physical characteristics of the images, however, strong professional knowledge and expert experience are needed, the accuracy and flexibility of the algorithm are poor, and the ideal effect is difficult to achieve in practical application. In addition, the inefficient connection among links in the traditional three-level processing flow also greatly reduces the arithmetic efficiency of the algorithm, and a new architecture system needs to be developed urgently.

In recent years, with the development of deep learning technology, a target detection and identification method mainly based on a deep neural network makes a major breakthrough in the field of computer vision. Due to the special structure of the deep network, the algorithms can predict the position and the class information of the target at the same time without performing multi-stage processing, and the performance and the efficiency of detection and identification are obviously improved. At present, the mainstream detection and identification algorithm can be divided into a single-stage mode and a double-stage mode, wherein the detection and identification algorithm directly decodes the features provided by the network to realize the detection and identification of the target, has a faster reasoning speed, and represents algorithms such as YOLO, SSD, retinaNet and the like; and a candidate region extraction stage is added in the latter, namely, a candidate region possibly containing a key target is extracted from the image through a depth network, then the position of the candidate region is further corrected, and an identification result is obtained, wherein the representative algorithms comprise R-CNN, faster R-CNN, cascade R-CNN and the like. Compared with a single-phase algorithm, the double-phase algorithm has higher detection and identification precision.

Although the method based on the deep learning provides a feasible approach for SAR target detection and identification, compared with an optical image, the SAR image scene is more complex, the similarity of different types of targets is higher, and the edges of the targets are unclear under the influence of speckle noise, so that the problems that the complex environment is not stable and the similar types are difficult to distinguish still exist.

Patent document with application number 201710461303.7 discloses an SAR image target detection and identification integrated method, which comprises the steps of extracting SAR image features through a convolutional neural network, generating a candidate region possibly containing a target based on the features, and finally predicting the category and position information of the region of interest by using a full-connection network to realize the detection and identification of the SAR target. Because the method does not perform relevant optimization aiming at the characteristics of the SAR image, the method is not stable in a complex environment, the predicted boundary frame is not accurate, the false alarm rate and the omission factor are high, and the fine-grained features of the target are difficult to effectively mine, so that the detection and identification accuracy is low.

Disclosure of Invention

The invention aims to provide an SAR target detection and identification method based on a multistage enhancement network aiming at the defects of the prior art, so as to improve the robustness of detection and identification in a complex environment, reduce the false alarm rate and the omission factor, enhance the separability of characteristics and obviously improve the SAR target detection and identification precision.

The technical idea of the invention is that the SAR target detection and identification performance under the complex environment is improved by designing a multi-stage enhanced network, and the implementation steps comprise:

(1) Acquiring SAR images with multiple types of targets, marking the target position and the target type in each SAR image, and randomly dividing the marked SAR images to obtain a training set and a test set;

(2) Constructing a multi-stage enhancement network:

(2a) Establishing a data level enhancement module which sequentially performs multi-scale transformation, random overturning, random rotation, power transformation and random noise operation;

(2b) Establishing a feature level enhancement module consisting of a backbone network A, a feature optimization pyramid network F, a recursive backbone network Q and a recursive feature optimization pyramid network E in a cascading manner;

(2c) Selecting the existing regional suggestion network to form a regional suggestion module G, and selecting cross entropy loss and CIOU loss as classification and regression loss;

(2d) Three decision makers d ₁ ,d ₂ ,d ₃ A decision level enhancement module D formed by cascading is adopted, and cross entropy loss and CIOU loss are used as classification and regression loss;

(2e) Sequentially cascading a data level enhancement module, a feature level enhancement module, a region suggestion module and a decision level enhancement module to form a multi-level enhancement network;

(3) Training the multi-stage enhancement network:

(3a) Randomly sampling a group of SAR images from a training set, inputting the SAR images into a multistage enhancement network, calculating loss, and updating network parameters through a random gradient descent algorithm based on the loss;

(3b) Repeating the process (3 a) until the network converges to obtain a trained multistage enhancement network;

(4) And inputting the SAR images in the test set into a trained multistage enhancement network to obtain a detection and identification result.

Compared with the prior art, the invention has the following advantages:

firstly, the invention simulates the scale and orientation change of the target and clutter noise interference by designing a data level enhancement module, thereby improving the robustness of the algorithm in a complex environment and reducing the false alarm rate and the omission factor.

Secondly, by designing a feature level enhancement module, the fine-grained features of the targets in the SAR image are fully excavated, and the separability of similar class targets is enhanced;

thirdly, the invention designs a decision-level enhancement module to finely adjust the prediction result for multiple times so as to gradually reduce the deviation between the target prediction position and the true value and effectively inhibit the influence of SAR target edge blurring on the detection precision.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a diagram of a multi-level enhanced network model constructed in the present invention;

FIG. 3 is a diagram of simulation results of the present invention.

Detailed Description

The examples and effects of the present invention will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, the method for detecting and identifying an SAR target based on a multi-stage enhanced network in this example sequentially includes data labeling and dividing, constructing the multi-stage enhanced network, training the multi-stage enhanced network, and obtaining an SAR target detection and identification result, and is specifically implemented as follows:

step one, marking and dividing data.

Acquiring SAR images with various targets, labeling the target position and the target type in each SAR image, and randomly dividing the labeled SAR images according to the ratio of 7.

In the embodiment of the invention, the SAR images come from satellite-borne radars on a high-resolution No. 3 satellite, the SAR image scale comprises three types of 600 × 600,1024 × 1024 and 2048 × 2048, seven types of airplane targets are totally included, the number of images in the training set is 1400, and the number of images in the testing set is 600.

And step two, constructing a multi-level enhanced network.

Referring to fig. 2, the multi-level enhancement network constructed in this step includes a data-level enhancement module, a feature-level enhancement module, a region suggestion module, and a decision-level enhancement module, which are sequentially cascaded, and the construction steps are as follows:

2.1 Establishing a data level enhancement module which sequentially performs multi-scale transformation, random overturning, random rotation, power transformation and random noise operation; in the embodiment of the invention, all the operations are performed during training, and only multi-scale transformation and random flipping operation are performed during testing, wherein the scale of the multi-scale transformation comprises 1024 × 1024, 1088 × 1088, 1152 × 1152, the direction of random flipping comprises horizontal, vertical and diagonal, the angle of random rotation comprises 90 °, 180 °, 270 °, and the coefficient of the power transformation is a random value between [0.8,1.2 ].

2.2 Establishing a feature level enhancement module consisting of a backbone network A, a feature-preferred pyramid network F, a recursive backbone network Q and a recursive feature-preferred pyramid network E in cascade:

2.2.1 Built up to include 5 concatenated convolution modules a ₁ ,a ₂ ,a ₃ ,a ₄ ,a ₅ The backbone network a of (a), wherein:

first convolution module a ₁ The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;

second convolution module a ₂ Is formed by cascading 3 residual blocks;

third convolution module a ₃ Is formed by cascading 4 residual blocks;

fourth convolution module a ₄ The method is characterized by comprising 6 residual blocks in cascade connection;

fifth convolution module a ₅ Is formed by cascading 3 residual blocks;

in an embodiment of the invention, the backbone network a is used to extract a multi-scale feature map for input SAR images with width and height W and H, respectively

The multi-scale characteristic map output by the trunk network is ^ greater than or equal to>

Wherein Y is _i For the ith convolution module a _i The output characteristic map of (1).

2.2.2 Built up to include 4 parallel branches f ₁ ,f ₂ ,f ₃ ,f ₄ Is preferably a pyramid network F, each branch F _i Preference module f by characteristics _i ^s And a feature fusion module f _i ^u Cascade composition, each feature preferably being a module f _i ^s From two parallel sub-branches

And a 1X 1 standard convolutional layer cascade, a first sub-branch +>

The structure of (2) comprises a global average pooling layer, a 1-dimensional convolution layer, a Sigmoid activation layer and a second sub-branch +>

Is an equal branch; each feature fusion module f _i ^u Consists of a 3 x 3 standard convolutional layer.

In an embodiment of the present invention, the feature-preferred pyramid network F is used for a multi-scale feature map Y output to the backbone network _i (i =2,3,4,5) feature preference and feature fusion, Y _i The characteristics are preferably expressed as:

wherein S is _i Preference is given to module f for the ith feature _i ^s The output characteristic diagram of (1) (-) represents the multiplication operation channel by channel, GAP (-) represents the global average pooling, conv1d (-) represents the 1-dimensional convolution, sigma (-) represents the Sigmoid function, and Conv1 × 1 (-) represents the 1 × 1 standard convolution; s. the _i After feature fusion, the expression is:

wherein, U _i For the ith feature fusion module f _i ^u Conv3 × 3 (-) represents a 3 × 3 standard convolution and Up (-) represents a bilinear interpolation upsampling function.

2.2.3 Built up to include 5 concatenated convolution modules q) ₁ ,q ₂ ,q ₃ ,q ₄ ,q ₅ Of (a), wherein:

1 st convolution module q ₁ The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;

2 nd convolution module q ₂ The device consists of 3 cascaded residual blocks and a 1 multiplied by 1 standard convolution layer which are connected in parallel;

the 3 rd convolution module q ₃ The device consists of 4 cascaded residual blocks and a 1 multiplied by 1 standard convolution layer which are connected in parallel;

the 4 th convolution block q ₄ The device is formed by connecting 6 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;

convolution module 5 q ₅ Is composed of 3 stagesThe combined residual block and a 1 multiplied by 1 standard convolution layer are connected in parallel to form the combined residual block;

in an embodiment of the invention, the recursive backbone network Q is used to extract a multi-scale feature map for input SAR images with width and height W and H, respectively

The multi-scale characteristic graph output by the recursion backbone network is as follows:

wherein, Z _i For the ith convolution module q _i Output characteristic diagram of (3), U _i-1 Pyramid network ith-1 feature fusion module optimized for features

Conv1 × 1 (·) represents a 1 × 1 standard convolution.

2.2.4 To create a recursive feature-preferred pyramid network E with the same structure, parameters as the feature-preferred pyramid network F.

In an embodiment of the present invention, the recursive feature preferred pyramid E is used to output a multi-scale feature map Z for the recursive backbone network Q _i (i =2,3,4,5) feature preference and feature fusion, Z _i The characteristics are preferably expressed as follows:

W _i ＝f _i ^s (Z _i+1 )

＝Conv1×1(σ(Conv1d(GAP(Z _i+1 )))⊙Z _i+1 ),i＝1,2,3,4

wherein, W _i Preference is given to module f for the ith feature _i ^s The output characteristic diagram of (1) (-) represents the multiplication operation channel by channel, GAP (-) represents the global average pooling, conv1d (-) represents the 1-dimensional convolution, sigma (-) represents the Sigmoid function, and Conv1 × 1 (-) represents the 1 × 1 standard convolution; w is a group of _i After feature fusion, the expression is:

wherein, P _i For the ith feature fusion module f _i ^u Conv3 × 3 (-) represents a 3 × 3 standard convolution and Up (-) represents a bilinear interpolation upsampling function.

2.3 A regional suggestion network is established, the regional suggestion network sequentially comprises a candidate region generation module, a classification regression module, a post-processing module and a positive and negative sample distribution module, wherein:

a candidate region generation module for generating a candidate region in the input feature map P _i (i =1,2,3,4) the generated area per dot is 2 ²ⁱ⁺² Three rectangular candidate regions with width and height ratios of 1, 1;

a classification regression module comprising two parallel 3 × 3 standard convolution layers g ₁ And g ₂ The first winding layer g ₁ A second convolution layer g for adjusting the center position and width and height of the candidate region ₂ A target confidence for predicting the candidate region;

a post-processing module for filtering the redundant candidate regions and outputting the first N candidate regions with the highest target confidence coefficient { (p) _j ,b _j ) I j =1, 2., N }, where p is _j Is the target confidence of the jth candidate region, b _j ＝(x _j ,y _j ,w _j ,h _j ) Is the bounding box of the jth candidate region, (x) _j ,y _j ) Is the coordinate of the center point of the bounding box, (w) _j ,h _j ) Is the width and height of the bounding box;

the positive and negative sample distribution module is used for distributing the candidate area as a positive and negative sample, namely distributing the candidate area with the intersection ratio of more than 0.7 with the labeling boundary frame as a positive sample and distributing the candidate area with the intersection ratio of less than 0.3 with the labeling boundary frame as a negative sample;

in an embodiment of the present invention, the bounding box R output by the region suggestion module G _G Expressed as:

R _G ＝{b _j ＝(x _j ,y _j ,w _j ,h _j )|j＝1,2,...N}。

2.4 Built up from three cascaded sub-deciders d ₁ ,d ₂ ,d ₃ Each sub-decision maker has the same structure and sequentially comprises an interested region extraction module, a classification regression module and a positive and negative sample distribution module, wherein:

the region-of-interest extraction module is composed of a self-adaptive average pooling layer and a flattening layer in a connected mode, the self-adaptive average pooling layer is used for pooling the candidate region features into 7 x 7 interesting features, and the flattening layer is used for flattening the interesting features;

the classification regression module is formed by connecting two linear layers in parallel, the first linear layer adjusts the position of the boundary frame, and the second linear layer predicts the category score of the boundary frame;

the positive and negative sample distribution module is used for distributing the boundary frame into positive and negative samples, namely distributing the boundary frame with the intersection ratio of the boundary frame to be marked greater than a threshold value as a positive sample, distributing the boundary frame with the intersection ratio of the boundary frame to be marked smaller than the threshold value as a negative sample, and respectively setting the threshold values of the three sub-decision makers to be 0.5,0.6 and 0.7;

bounding box R of decision-level enhancement module output _D Expressed as:

wherein N is the number of the bounding boxes,

the position of the jth bounding box output by the third sub-decision maker,

respectively represent->

Is based on the center point coordinates and width and height of the blood vessel>

Jth bounding box class for each sub-decider outputThe mean of the scores, expressed as:

wherein N is _c Is the total number of the categories,

respectively representing sub-deciders d ₁ ,d ₂ ,d ₃ The category score of the jth bounding box of the output.

And step three, training a multi-stage enhancement network.

3.1 A group of SAR images are randomly sampled from a training set and input into a multistage enhancement network to calculate the loss of the SAR images, and based on the loss, the network parameters are updated through a stochastic gradient descent algorithm:

3.1.1 Compute loss for a multi-level enhanced network

Wherein->

And &>

The loss of the region suggestion module and the loss of the decision-level enhancement module are respectively expressed as:

wherein N is _G Number of candidate regions, p, randomly sampled for region suggestion module _m And

respectively suggest regionsTarget confidence and corresponding true label for the m-th candidate region of the module sample, b _m And &>

Bounding boxes and corresponding real labels, L, for the mth candidate region sampled by the region suggestion module, respectively _cls And L _reg Respectively, cross entropy classification loss and CIOU regression loss, N _D Number of bounding boxes, c, sampled randomly for each sub-decision maker _l i and->

Class scores and corresponding true labels, respectively, for the ith bounding box sampled by the ith sub-decider, b _l ⁱ And &>

The ith bounding box sampled for the ith sub-decider and the corresponding real label, λ, respectively _i Satisfy @, for the lost weight of the ith sub-decider>

For activating a function, the expression is:

3.1.2 Solving for the Multi-stage enhancement network loss in 3.1.1)

The gradient of the multi-level enhancement network parameter θ is expressed as:

wherein

And &>

Loss for the region suggestion module and the decision-level enhancement module, respectively.

3.1.3 Gradient according to the solution in 3.1.2)

Updating the multi-level enhanced network parameters, expressed as:

wherein theta' is the updated network parameter, and theta is the network parameter before updating; lr is a learning rate, which is set according to the input image batch size, and in the embodiment of the present invention, lr =0.005 is set.

3.2 Step 3.1) is repeated until the network converges, and a trained multistage enhancement network is obtained.

And step four, acquiring an SAR target detection recognition result.

And inputting the SAR image in the test set into a trained multistage enhancement network to obtain a detection recognition result.

The effect of the invention can be further illustrated by the following simulation experiment:

1. simulation experiment conditions are as follows:

the software platform of the simulation experiment of the invention is as follows: ubuntu 18.04 operating system and Pytorch 1.8.0, hardware configuration: core i9-10980XE CPU and NVIDIA GeForce RTX 3090 GPU.

The simulation experiment of the invention uses high-resolution No. 3 SAR actual measurement data, the scene type is an airport, the image resolution is 1m multiplied by 1m, the number of SAR images is 2000, the image size is 600 multiplied by 600,1024 multiplied by 1024 and 2048 multiplied by 2048, the number of target categories is 7, the total number of targets is 6556, the number of training set images is 1400, and the number of testing set images is 600.

2. Simulation content and result analysis:

under the simulation conditions, the invention and the existing 'SAR image target detection and identification integrated method' are respectively used for completing training on a training set, then a test set image is randomly selected and input into a trained network, and the detection and identification result is visualized on the test set image, and the result is shown in figure 3. Fig. 3 (a) shows a detection result of the prior art, and fig. 3 (b) shows a detection result of the present invention. The green rectangle in the figure indicates that the detection identifies the correct target, and the red rectangle indicates that the detection or identification is incorrect.

As can be seen from comparing fig. 3 (a) and 3 (b), the detection and identification result obtained in the prior art has more false alarms and missed detections, and the detection and identification result obtained in the present invention has fewer false alarms and missed detections.

Comparing the detection identification indexes of the invention and the prior art on all test set images, including the average accuracy, the average recall rate, the average F1 score and the class average accuracy of the seven classes of targets, the results are shown in table 1:

TABLE 1

Evaluation index	Prior Art	The invention
			Average rate of accuracy	81.4％	96.5％
Average recall rate	78.8％	97.1％
			Average F1 score	0.80	0.97
Class mean accuracy	83.1％	97.3％

As can be seen from Table 1, the average accuracy, the average recall ratio, the average F1 score and the class average accuracy of the invention are all higher than those of the prior art, which shows that the detection and identification performance of the invention is obviously better than that of the prior art.

Claims

1. A SAR target detection and identification method based on a multistage enhancement network is characterized by comprising the following steps:

(1) Acquiring SAR images with multiple types of targets, labeling the target position and the target type in each SAR image, and randomly dividing the labeled SAR images to obtain a training set and a test set;

(2) Constructing a multi-stage enhancement network:

(2b) Establishing a feature level enhancement module formed by cascading a backbone network A, a feature optimization pyramid network F, a recursion backbone network Q and a recursion feature optimization pyramid network E;

(3) Training the multi-stage enhancement network:

2. The method of claim 1, wherein the backbone network A in step (2 b) comprises 5 cascaded convolution modules a ₁ ,a ₂ ,a ₃ ,a ₄ ,a ₅ ；

The first convolution module a ₁ The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;

the second convolution module a ₂ Is formed by cascading 3 residual blocks;

the third convolution module a ₃ Is formed by cascading 4 residual blocks;

the fourth convolution module a ₄ The method is characterized by comprising 6 residual blocks in cascade connection;

the fifth convolution module a ₅ Is formed by cascading 3 residual blocks;

the output characteristic diagram of the whole backbone network is shown as

Wherein Y is _i For the ith convolution module a _i In the output characteristic diagram of (a) is shown,

is an input SAR image with width and height W and H, respectively.

3. The method of claim 1, wherein the features in step (2 b) are preferably a pyramid network F, a packetComprising 4 parallel branches f ₁ ,f ₂ ,f ₃ ,f ₄ Each branch f _i Preference module f by characteristics _i ^s And a feature fusion module f _i ^u Cascading;

each characteristic optimization module f _i ^s From two parallel sub-branches

And a 1X 1 standard convolutional layer cascade, a first sub-branch +>

The structure of the multilayer structure sequentially comprises a global average pooling layer, a 1-dimensional convolution layer and a Sigmoid activation layer; second sub-branch +>

Are constant branches;

each feature fusion module f _i ^u Consists of a 3 × 3 standard convolutional layer;

the output feature map of the whole feature optimization pyramid network F is represented as:

wherein U is _i For the ith feature fusion module f _i ^u Conv3 × 3 (-) represents a 3 × 3 standard convolution, up (-) represents a bilinear interpolation upsampling function, S _i Preference is given to module f for the ith feature _i ^s Is represented as:

wherein Y is _i+1 Is the i +1 th convolution module a of the backbone network _i+1 The output characteristic diagram of (5) indicates a channel-by-channel multiplication operation, GAP (-) indicates fullLocal mean pooling, conv1d (. Cndot.) represents a 1-dimensional convolution, σ (. Sigma.). Cndot.) represents a Sigmoid function, and Conv 1X 1 (. Cndot.) represents a 1X 1 standard convolution.

4. The method of claim 1, wherein the recursive backbone network Q in step (2 b) comprises 5 cascaded convolution modules Q ₁ ,q ₂ ,q ₃ ,q ₄ ,q ₅ ；

The 1 st convolution module q ₁ The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;

the 2 nd convolution module q ₂ The device is formed by connecting 3 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;

the 3 rd convolution module q ₃ The device is formed by connecting 4 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;

the 4 th convolution module q ₄ The device is formed by connecting 6 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;

said 5 th convolution module q ₅ The device is formed by connecting 3 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;

the output characteristic diagram of the whole recursion backbone network A is shown as follows:

wherein Z _i For the ith convolution module q _i Is used for the output characteristic diagram of the system,

for input SAR images with width and height W and H, respectively, U _i-1 Feature-optimized pyramid network i-1 th feature fusion module>

Conv1 × 1 (·) represents a 1 × 1 standard convolution.

5. The method of claim 1, wherein the recursive feature-preferred pyramid network E in step (2 b) has the same structure and parameters as the feature-preferred pyramid network F, and the output features thereof are represented as follows:

wherein P is _i For the ith feature fusion module f _i ^u Conv3 × 3 (-) represents a 3 × 3 standard convolution, up (-) represents a bilinear interpolation upsampling function, W _i Preference is given to module f for the ith feature _i ^s Is expressed as:

W _i ＝f _i ^s (Z _i+1 )

＝Conv1×1(σ(Conv1d(GAP(Z _i+1 )))⊙Z _i+1 ) I =1,2,3,4 wherein Z _i+1 For the (i + 1) th convolution module q of the recursive backbone network _i+1 The output characteristic diagram of (1) (-) indicates a channel-by-channel multiplication operation, GAP (-) indicates a global average pooling, conv1d (-) indicates a 1-dimensional convolution, σ (-) indicates a Sigmoid function, and Conv1 × 1 (-) indicates a 1 × 1 standard convolution.

6. The method of claim 1, wherein the regional suggestion network in step (2 c) comprises a candidate region generation module, a classification regression module, a post-processing module, and a positive and negative sample distribution module in sequence;

the candidate region generation module is used for inputting the feature map P _i The area generated at each point is 2 ²ⁱ⁺² Three rectangular candidate regions with width and height ratios of 1, 1;

the classification regression module comprises two parallel 3 x 3 standard convolution layers g ₁ And g ₂ The first winding layer g ₁ A second convolution layer g for adjusting the center position and width and height of the candidate region ₂ A target confidence for predicting the candidate region;

the above-mentionedA post-processing module for filtering the redundant candidate regions and outputting the first N candidate regions with the highest target confidence coefficient { (p) _j ,b _j ) I j =1, 2., N }, where p is _j Is the target confidence of the jth candidate region, b _j ＝(x _j ,y _j ,w _j ,h _j ) As a bounding box for the jth candidate region, (x) _j ,y _j ) Is the center point coordinate of the bounding box, (w) _j ,h _j ) Is the width and height of the bounding box;

the bounding box of the entire regional proposal network output is represented as: r is _G ＝{b _j ＝(x _j ,y _j ,w _j ,h _j )|j＝1,2,...N}。

7. The method of claim 1, wherein three cascaded sub-deciders d in step (2 d) are used ₁ ,d ₂ ,d ₃ Each decision maker sequentially comprises an interested region extraction module, a classification regression module and a positive and negative sample distribution module;

the interesting region extraction module consists of a self-adaptive average pooling layer and a flattening layer in a joint mode, wherein the self-adaptive average pooling layer is used for pooling the candidate region features into interesting features of 7 x 7, and the flattening layer is used for flattening the interesting features;

the bounding box output by the decision-level boosting module is represented as:

wherein N is the number of the bounding boxes,

for the position of the jth bounding box output of the third sub-decision maker>

Respectively represent->

In the coordinate system of (4), width and height>

The mean of the jth bounding box class score output for each sub-decider is expressed as:

wherein N is _c As a result of the total number of categories,

8. The method of claim 1, wherein the loss in step (3 a) comprises a loss of a region recommendation module

And loss of decision stage boost module>

Respectively, as follows:

target confidence and corresponding true label for the mth candidate region sampled by the region suggestion module, respectively, b _m And &>

Bounding boxes and corresponding real labels, L, for the mth candidate region sampled by the region suggestion module, respectively _cls And L _reg Respectively, cross entropy classification loss and CIOU regression loss, N _D For the number of randomly sampled bounding boxes for each sub-decision maker>

And &>

Class scores and corresponding true tags, <' > according to the ith bounding box sampled by the ith sub-decider>

And &>

Respectively for the ith sub-decision makerThe ith bounding box of a sample and the corresponding true label, λ _i Satisfy @, for the lost weight of the ith sub-decider>

For activating a function, the expression is:

the loss of the entire multi-stage enhancement network is expressed as:

9. the method of claim 1, wherein the network parameters are updated in step (3 b) by a random gradient descent method, which is implemented as follows:

(3b1) Solving for the gradient of the multi-level enhancement network parameters, expressed as:

wherein

For a multi-stage boost network loss, greater or lesser>

And &>

Loss of the region suggestion module and the decision-level enhancement module is respectively, and theta is a learnable parameter of the multi-level enhancement network;

(3b2) According to the gradient of solution

Updating parameters of the multi-level enhanced network, expressed as:

wherein theta' is a network parameter after updating, and theta is a network parameter before updating; lr is a learning rate, which is set according to the input image batch size.