CN115909086A - SAR target detection and identification method based on multistage enhanced network - Google Patents

SAR target detection and identification method based on multistage enhanced network Download PDF

Info

Publication number
CN115909086A
CN115909086A CN202211449058.5A CN202211449058A CN115909086A CN 115909086 A CN115909086 A CN 115909086A CN 202211449058 A CN202211449058 A CN 202211449058A CN 115909086 A CN115909086 A CN 115909086A
Authority
CN
China
Prior art keywords
module
network
convolution
layer
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211449058.5A
Other languages
Chinese (zh)
Inventor
白雪茹
鲜要胜
杨敏佳
孟昭晗
周峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202211449058.5A priority Critical patent/CN115909086A/en
Publication of CN115909086A publication Critical patent/CN115909086A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an SAR target detection and identification method based on a multistage enhancement network, which mainly solves the problems of poor robustness, high false alarm rate and omission factor and low detection and identification precision in the prior art in a complex environment. The implementation scheme is as follows: marking and dividing SAR measured data to obtain a training set and a test set; constructing a multi-level enhancement network formed by cascading a data level enhancement module, a feature level enhancement module, a region suggestion module and a decision level enhancement module; training a multistage enhancement network by using a training set based on a random gradient descent algorithm; and inputting the test set image into the trained multistage enhancement network to obtain the detection and identification result of the SAR target. The SAR target detection and identification method based on the multi-point sensor significantly improves the detection and identification performance of the SAR target in the complex environment, and can be used for battlefield reconnaissance and situation perception.

Description

SAR target detection and identification method based on multistage enhanced network
Technical Field
The invention belongs to the technical field of radar remote sensing, and further relates to an SAR target detection and identification method which can be used for battlefield reconnaissance and situation perception.
Background
Synthetic aperture radar SAR is an active microwave imaging sensor that obtains two-dimensional high resolution images by transmitting large time-width-bandwidth product signals and using aperture synthesis. Compared with optical and infrared sensors, the SAR has the unique advantages of all-weather, long acting distance, strong penetrating power and the like, becomes an important means for earth observation, and is widely applied to the fields of military affairs and civil use. With the continuous improvement of SAR systems and the continuous improvement of SAR imaging levels, SAR image interpretation techniques are gradually receiving attention from researchers and researchers in the related field. As the difficulty and the key step, the method has important significance and research value for the accurate detection and identification of the important target.
The traditional SAR target detection and identification method mainly adopts three-stage processing flows including target detection, target identification and target identification. The target detection is mainly based on a constant false alarm rate CFAR algorithm, and on the premise that a background clutter meets a certain probability distribution model, the algorithm realizes target detection by performing a sliding window on an SAR image and comparing a selected value with a self-adaptive threshold, however, because effective modeling is difficult to be performed on a non-uniform strong clutter background, the algorithm has poor adaptability in a complex scene and low detection precision. The target identification and the target recognition are mainly carried out by manual feature design and classifier construction according to the statistical information and physical characteristics of the images, however, strong professional knowledge and expert experience are needed, the accuracy and flexibility of the algorithm are poor, and the ideal effect is difficult to achieve in practical application. In addition, the inefficient connection among links in the traditional three-level processing flow also greatly reduces the arithmetic efficiency of the algorithm, and a new architecture system needs to be developed urgently.
In recent years, with the development of deep learning technology, a target detection and identification method mainly based on a deep neural network makes a major breakthrough in the field of computer vision. Due to the special structure of the deep network, the algorithms can predict the position and the class information of the target at the same time without performing multi-stage processing, and the performance and the efficiency of detection and identification are obviously improved. At present, the mainstream detection and identification algorithm can be divided into a single-stage mode and a double-stage mode, wherein the detection and identification algorithm directly decodes the features provided by the network to realize the detection and identification of the target, has a faster reasoning speed, and represents algorithms such as YOLO, SSD, retinaNet and the like; and a candidate region extraction stage is added in the latter, namely, a candidate region possibly containing a key target is extracted from the image through a depth network, then the position of the candidate region is further corrected, and an identification result is obtained, wherein the representative algorithms comprise R-CNN, faster R-CNN, cascade R-CNN and the like. Compared with a single-phase algorithm, the double-phase algorithm has higher detection and identification precision.
Although the method based on the deep learning provides a feasible approach for SAR target detection and identification, compared with an optical image, the SAR image scene is more complex, the similarity of different types of targets is higher, and the edges of the targets are unclear under the influence of speckle noise, so that the problems that the complex environment is not stable and the similar types are difficult to distinguish still exist.
Patent document with application number 201710461303.7 discloses an SAR image target detection and identification integrated method, which comprises the steps of extracting SAR image features through a convolutional neural network, generating a candidate region possibly containing a target based on the features, and finally predicting the category and position information of the region of interest by using a full-connection network to realize the detection and identification of the SAR target. Because the method does not perform relevant optimization aiming at the characteristics of the SAR image, the method is not stable in a complex environment, the predicted boundary frame is not accurate, the false alarm rate and the omission factor are high, and the fine-grained features of the target are difficult to effectively mine, so that the detection and identification accuracy is low.
Disclosure of Invention
The invention aims to provide an SAR target detection and identification method based on a multistage enhancement network aiming at the defects of the prior art, so as to improve the robustness of detection and identification in a complex environment, reduce the false alarm rate and the omission factor, enhance the separability of characteristics and obviously improve the SAR target detection and identification precision.
The technical idea of the invention is that the SAR target detection and identification performance under the complex environment is improved by designing a multi-stage enhanced network, and the implementation steps comprise:
(1) Acquiring SAR images with multiple types of targets, marking the target position and the target type in each SAR image, and randomly dividing the marked SAR images to obtain a training set and a test set;
(2) Constructing a multi-stage enhancement network:
(2a) Establishing a data level enhancement module which sequentially performs multi-scale transformation, random overturning, random rotation, power transformation and random noise operation;
(2b) Establishing a feature level enhancement module consisting of a backbone network A, a feature optimization pyramid network F, a recursive backbone network Q and a recursive feature optimization pyramid network E in a cascading manner;
(2c) Selecting the existing regional suggestion network to form a regional suggestion module G, and selecting cross entropy loss and CIOU loss as classification and regression loss;
(2d) Three decision makers d 1 ,d 2 ,d 3 A decision level enhancement module D formed by cascading is adopted, and cross entropy loss and CIOU loss are used as classification and regression loss;
(2e) Sequentially cascading a data level enhancement module, a feature level enhancement module, a region suggestion module and a decision level enhancement module to form a multi-level enhancement network;
(3) Training the multi-stage enhancement network:
(3a) Randomly sampling a group of SAR images from a training set, inputting the SAR images into a multistage enhancement network, calculating loss, and updating network parameters through a random gradient descent algorithm based on the loss;
(3b) Repeating the process (3 a) until the network converges to obtain a trained multistage enhancement network;
(4) And inputting the SAR images in the test set into a trained multistage enhancement network to obtain a detection and identification result.
Compared with the prior art, the invention has the following advantages:
firstly, the invention simulates the scale and orientation change of the target and clutter noise interference by designing a data level enhancement module, thereby improving the robustness of the algorithm in a complex environment and reducing the false alarm rate and the omission factor.
Secondly, by designing a feature level enhancement module, the fine-grained features of the targets in the SAR image are fully excavated, and the separability of similar class targets is enhanced;
thirdly, the invention designs a decision-level enhancement module to finely adjust the prediction result for multiple times so as to gradually reduce the deviation between the target prediction position and the true value and effectively inhibit the influence of SAR target edge blurring on the detection precision.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a diagram of a multi-level enhanced network model constructed in the present invention;
FIG. 3 is a diagram of simulation results of the present invention.
Detailed Description
The examples and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the method for detecting and identifying an SAR target based on a multi-stage enhanced network in this example sequentially includes data labeling and dividing, constructing the multi-stage enhanced network, training the multi-stage enhanced network, and obtaining an SAR target detection and identification result, and is specifically implemented as follows:
step one, marking and dividing data.
Acquiring SAR images with various targets, labeling the target position and the target type in each SAR image, and randomly dividing the labeled SAR images according to the ratio of 7.
In the embodiment of the invention, the SAR images come from satellite-borne radars on a high-resolution No. 3 satellite, the SAR image scale comprises three types of 600 × 600,1024 × 1024 and 2048 × 2048, seven types of airplane targets are totally included, the number of images in the training set is 1400, and the number of images in the testing set is 600.
And step two, constructing a multi-level enhanced network.
Referring to fig. 2, the multi-level enhancement network constructed in this step includes a data-level enhancement module, a feature-level enhancement module, a region suggestion module, and a decision-level enhancement module, which are sequentially cascaded, and the construction steps are as follows:
2.1 Establishing a data level enhancement module which sequentially performs multi-scale transformation, random overturning, random rotation, power transformation and random noise operation; in the embodiment of the invention, all the operations are performed during training, and only multi-scale transformation and random flipping operation are performed during testing, wherein the scale of the multi-scale transformation comprises 1024 × 1024, 1088 × 1088, 1152 × 1152, the direction of random flipping comprises horizontal, vertical and diagonal, the angle of random rotation comprises 90 °, 180 °, 270 °, and the coefficient of the power transformation is a random value between [0.8,1.2 ].
2.2 Establishing a feature level enhancement module consisting of a backbone network A, a feature-preferred pyramid network F, a recursive backbone network Q and a recursive feature-preferred pyramid network E in cascade:
2.2.1 Built up to include 5 concatenated convolution modules a 1 ,a 2 ,a 3 ,a 4 ,a 5 The backbone network a of (a), wherein:
first convolution module a 1 The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;
second convolution module a 2 Is formed by cascading 3 residual blocks;
third convolution module a 3 Is formed by cascading 4 residual blocks;
fourth convolution module a 4 The method is characterized by comprising 6 residual blocks in cascade connection;
fifth convolution module a 5 Is formed by cascading 3 residual blocks;
in an embodiment of the invention, the backbone network a is used to extract a multi-scale feature map for input SAR images with width and height W and H, respectively
Figure BDA0003950748850000041
The multi-scale characteristic map output by the trunk network is ^ greater than or equal to>
Figure BDA0003950748850000042
Wherein Y is i For the ith convolution module a i The output characteristic map of (1).
2.2.2 Built up to include 4 parallel branches f 1 ,f 2 ,f 3 ,f 4 Is preferably a pyramid network F, each branch F i Preference module f by characteristics i s And a feature fusion module f i u Cascade composition, each feature preferably being a module f i s From two parallel sub-branches
Figure BDA0003950748850000043
And a 1X 1 standard convolutional layer cascade, a first sub-branch +>
Figure BDA0003950748850000044
The structure of (2) comprises a global average pooling layer, a 1-dimensional convolution layer, a Sigmoid activation layer and a second sub-branch +>
Figure BDA0003950748850000051
Is an equal branch; each feature fusion module f i u Consists of a 3 x 3 standard convolutional layer.
In an embodiment of the present invention, the feature-preferred pyramid network F is used for a multi-scale feature map Y output to the backbone network i (i =2,3,4,5) feature preference and feature fusion, Y i The characteristics are preferably expressed as:
Figure BDA0003950748850000052
wherein S is i Preference is given to module f for the ith feature i s The output characteristic diagram of (1) (-) represents the multiplication operation channel by channel, GAP (-) represents the global average pooling, conv1d (-) represents the 1-dimensional convolution, sigma (-) represents the Sigmoid function, and Conv1 × 1 (-) represents the 1 × 1 standard convolution; s. the i After feature fusion, the expression is:
Figure BDA0003950748850000053
wherein, U i For the ith feature fusion module f i u Conv3 × 3 (-) represents a 3 × 3 standard convolution and Up (-) represents a bilinear interpolation upsampling function.
2.2.3 Built up to include 5 concatenated convolution modules q) 1 ,q 2 ,q 3 ,q 4 ,q 5 Of (a), wherein:
1 st convolution module q 1 The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;
2 nd convolution module q 2 The device consists of 3 cascaded residual blocks and a 1 multiplied by 1 standard convolution layer which are connected in parallel;
the 3 rd convolution module q 3 The device consists of 4 cascaded residual blocks and a 1 multiplied by 1 standard convolution layer which are connected in parallel;
the 4 th convolution block q 4 The device is formed by connecting 6 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;
convolution module 5 q 5 Is composed of 3 stagesThe combined residual block and a 1 multiplied by 1 standard convolution layer are connected in parallel to form the combined residual block;
in an embodiment of the invention, the recursive backbone network Q is used to extract a multi-scale feature map for input SAR images with width and height W and H, respectively
Figure BDA0003950748850000054
The multi-scale characteristic graph output by the recursion backbone network is as follows:
Figure BDA0003950748850000061
wherein, Z i For the ith convolution module q i Output characteristic diagram of (3), U i-1 Pyramid network ith-1 feature fusion module optimized for features
Figure BDA0003950748850000063
Conv1 × 1 (·) represents a 1 × 1 standard convolution.
2.2.4 To create a recursive feature-preferred pyramid network E with the same structure, parameters as the feature-preferred pyramid network F.
In an embodiment of the present invention, the recursive feature preferred pyramid E is used to output a multi-scale feature map Z for the recursive backbone network Q i (i =2,3,4,5) feature preference and feature fusion, Z i The characteristics are preferably expressed as follows:
W i =f i s (Z i+1 )
=Conv1×1(σ(Conv1d(GAP(Z i+1 )))⊙Z i+1 ),i=1,2,3,4
wherein, W i Preference is given to module f for the ith feature i s The output characteristic diagram of (1) (-) represents the multiplication operation channel by channel, GAP (-) represents the global average pooling, conv1d (-) represents the 1-dimensional convolution, sigma (-) represents the Sigmoid function, and Conv1 × 1 (-) represents the 1 × 1 standard convolution; w is a group of i After feature fusion, the expression is:
Figure BDA0003950748850000062
wherein, P i For the ith feature fusion module f i u Conv3 × 3 (-) represents a 3 × 3 standard convolution and Up (-) represents a bilinear interpolation upsampling function.
2.3 A regional suggestion network is established, the regional suggestion network sequentially comprises a candidate region generation module, a classification regression module, a post-processing module and a positive and negative sample distribution module, wherein:
a candidate region generation module for generating a candidate region in the input feature map P i (i =1,2,3,4) the generated area per dot is 2 2i+2 Three rectangular candidate regions with width and height ratios of 1, 1;
a classification regression module comprising two parallel 3 × 3 standard convolution layers g 1 And g 2 The first winding layer g 1 A second convolution layer g for adjusting the center position and width and height of the candidate region 2 A target confidence for predicting the candidate region;
a post-processing module for filtering the redundant candidate regions and outputting the first N candidate regions with the highest target confidence coefficient { (p) j ,b j ) I j =1, 2., N }, where p is j Is the target confidence of the jth candidate region, b j =(x j ,y j ,w j ,h j ) Is the bounding box of the jth candidate region, (x) j ,y j ) Is the coordinate of the center point of the bounding box, (w) j ,h j ) Is the width and height of the bounding box;
the positive and negative sample distribution module is used for distributing the candidate area as a positive and negative sample, namely distributing the candidate area with the intersection ratio of more than 0.7 with the labeling boundary frame as a positive sample and distributing the candidate area with the intersection ratio of less than 0.3 with the labeling boundary frame as a negative sample;
in an embodiment of the present invention, the bounding box R output by the region suggestion module G G Expressed as:
R G ={b j =(x j ,y j ,w j ,h j )|j=1,2,...N}。
2.4 Built up from three cascaded sub-deciders d 1 ,d 2 ,d 3 Each sub-decision maker has the same structure and sequentially comprises an interested region extraction module, a classification regression module and a positive and negative sample distribution module, wherein:
the region-of-interest extraction module is composed of a self-adaptive average pooling layer and a flattening layer in a connected mode, the self-adaptive average pooling layer is used for pooling the candidate region features into 7 x 7 interesting features, and the flattening layer is used for flattening the interesting features;
the classification regression module is formed by connecting two linear layers in parallel, the first linear layer adjusts the position of the boundary frame, and the second linear layer predicts the category score of the boundary frame;
the positive and negative sample distribution module is used for distributing the boundary frame into positive and negative samples, namely distributing the boundary frame with the intersection ratio of the boundary frame to be marked greater than a threshold value as a positive sample, distributing the boundary frame with the intersection ratio of the boundary frame to be marked smaller than the threshold value as a negative sample, and respectively setting the threshold values of the three sub-decision makers to be 0.5,0.6 and 0.7;
bounding box R of decision-level enhancement module output D Expressed as:
Figure BDA0003950748850000071
wherein N is the number of the bounding boxes,
Figure BDA0003950748850000072
the position of the jth bounding box output by the third sub-decision maker,
Figure BDA0003950748850000073
respectively represent->
Figure BDA0003950748850000074
Is based on the center point coordinates and width and height of the blood vessel>
Figure BDA0003950748850000075
Jth bounding box class for each sub-decider outputThe mean of the scores, expressed as:
Figure BDA0003950748850000076
wherein N is c Is the total number of the categories,
Figure BDA0003950748850000077
respectively representing sub-deciders d 1 ,d 2 ,d 3 The category score of the jth bounding box of the output.
And step three, training a multi-stage enhancement network.
3.1 A group of SAR images are randomly sampled from a training set and input into a multistage enhancement network to calculate the loss of the SAR images, and based on the loss, the network parameters are updated through a stochastic gradient descent algorithm:
3.1.1 Compute loss for a multi-level enhanced network
Figure BDA0003950748850000078
Wherein->
Figure BDA0003950748850000079
And &>
Figure BDA00039507488500000710
The loss of the region suggestion module and the loss of the decision-level enhancement module are respectively expressed as:
Figure BDA0003950748850000081
Figure BDA0003950748850000082
wherein N is G Number of candidate regions, p, randomly sampled for region suggestion module m And
Figure BDA0003950748850000083
respectively suggest regionsTarget confidence and corresponding true label for the m-th candidate region of the module sample, b m And &>
Figure BDA0003950748850000084
Bounding boxes and corresponding real labels, L, for the mth candidate region sampled by the region suggestion module, respectively cls And L reg Respectively, cross entropy classification loss and CIOU regression loss, N D Number of bounding boxes, c, sampled randomly for each sub-decision maker l i and->
Figure BDA0003950748850000085
Class scores and corresponding true labels, respectively, for the ith bounding box sampled by the ith sub-decider, b l i And &>
Figure BDA0003950748850000086
The ith bounding box sampled for the ith sub-decider and the corresponding real label, λ, respectively i Satisfy @, for the lost weight of the ith sub-decider>
Figure BDA0003950748850000087
For activating a function, the expression is:
Figure BDA0003950748850000088
3.1.2 Solving for the Multi-stage enhancement network loss in 3.1.1)
Figure BDA0003950748850000089
The gradient of the multi-level enhancement network parameter θ is expressed as:
Figure BDA00039507488500000810
wherein
Figure BDA00039507488500000811
And &>
Figure BDA00039507488500000812
Loss for the region suggestion module and the decision-level enhancement module, respectively.
3.1.3 Gradient according to the solution in 3.1.2)
Figure BDA00039507488500000813
Updating the multi-level enhanced network parameters, expressed as:
Figure BDA00039507488500000814
wherein theta' is the updated network parameter, and theta is the network parameter before updating; lr is a learning rate, which is set according to the input image batch size, and in the embodiment of the present invention, lr =0.005 is set.
3.2 Step 3.1) is repeated until the network converges, and a trained multistage enhancement network is obtained.
And step four, acquiring an SAR target detection recognition result.
And inputting the SAR image in the test set into a trained multistage enhancement network to obtain a detection recognition result.
The effect of the invention can be further illustrated by the following simulation experiment:
1. simulation experiment conditions are as follows:
the software platform of the simulation experiment of the invention is as follows: ubuntu 18.04 operating system and Pytorch 1.8.0, hardware configuration: core i9-10980XE CPU and NVIDIA GeForce RTX 3090 GPU.
The simulation experiment of the invention uses high-resolution No. 3 SAR actual measurement data, the scene type is an airport, the image resolution is 1m multiplied by 1m, the number of SAR images is 2000, the image size is 600 multiplied by 600,1024 multiplied by 1024 and 2048 multiplied by 2048, the number of target categories is 7, the total number of targets is 6556, the number of training set images is 1400, and the number of testing set images is 600.
2. Simulation content and result analysis:
under the simulation conditions, the invention and the existing 'SAR image target detection and identification integrated method' are respectively used for completing training on a training set, then a test set image is randomly selected and input into a trained network, and the detection and identification result is visualized on the test set image, and the result is shown in figure 3. Fig. 3 (a) shows a detection result of the prior art, and fig. 3 (b) shows a detection result of the present invention. The green rectangle in the figure indicates that the detection identifies the correct target, and the red rectangle indicates that the detection or identification is incorrect.
As can be seen from comparing fig. 3 (a) and 3 (b), the detection and identification result obtained in the prior art has more false alarms and missed detections, and the detection and identification result obtained in the present invention has fewer false alarms and missed detections.
Comparing the detection identification indexes of the invention and the prior art on all test set images, including the average accuracy, the average recall rate, the average F1 score and the class average accuracy of the seven classes of targets, the results are shown in table 1:
TABLE 1
Evaluation index Prior Art The invention
Average rate of accuracy 81.4% 96.5%
Average recall rate 78.8% 97.1%
Average F1 score 0.80 0.97
Class mean accuracy 83.1% 97.3%
As can be seen from Table 1, the average accuracy, the average recall ratio, the average F1 score and the class average accuracy of the invention are all higher than those of the prior art, which shows that the detection and identification performance of the invention is obviously better than that of the prior art.

Claims (9)

1. A SAR target detection and identification method based on a multistage enhancement network is characterized by comprising the following steps:
(1) Acquiring SAR images with multiple types of targets, labeling the target position and the target type in each SAR image, and randomly dividing the labeled SAR images to obtain a training set and a test set;
(2) Constructing a multi-stage enhancement network:
(2a) Establishing a data level enhancement module which sequentially performs multi-scale transformation, random overturning, random rotation, power transformation and random noise operation;
(2b) Establishing a feature level enhancement module formed by cascading a backbone network A, a feature optimization pyramid network F, a recursion backbone network Q and a recursion feature optimization pyramid network E;
(2c) Selecting the existing regional suggestion network to form a regional suggestion module G, and selecting cross entropy loss and CIOU loss as classification and regression loss;
(2d) Three decision makers d 1 ,d 2 ,d 3 A decision level enhancement module D formed by cascading is adopted, and cross entropy loss and CIOU loss are used as classification and regression loss;
(2e) Sequentially cascading a data level enhancement module, a feature level enhancement module, a region suggestion module and a decision level enhancement module to form a multi-level enhancement network;
(3) Training the multi-stage enhancement network:
(3a) Randomly sampling a group of SAR images from a training set, inputting the SAR images into a multistage enhancement network, calculating loss, and updating network parameters through a random gradient descent algorithm based on the loss;
(3b) Repeating the process (3 a) until the network converges to obtain a trained multistage enhancement network;
(4) And inputting the SAR images in the test set into a trained multistage enhancement network to obtain a detection and identification result.
2. The method of claim 1, wherein the backbone network A in step (2 b) comprises 5 cascaded convolution modules a 1 ,a 2 ,a 3 ,a 4 ,a 5
The first convolution module a 1 The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;
the second convolution module a 2 Is formed by cascading 3 residual blocks;
the third convolution module a 3 Is formed by cascading 4 residual blocks;
the fourth convolution module a 4 The method is characterized by comprising 6 residual blocks in cascade connection;
the fifth convolution module a 5 Is formed by cascading 3 residual blocks;
the output characteristic diagram of the whole backbone network is shown as
Figure FDA0003950748840000021
Wherein Y is i For the ith convolution module a i In the output characteristic diagram of (a) is shown,
Figure FDA0003950748840000022
is an input SAR image with width and height W and H, respectively.
3. The method of claim 1, wherein the features in step (2 b) are preferably a pyramid network F, a packetComprising 4 parallel branches f 1 ,f 2 ,f 3 ,f 4 Each branch f i Preference module f by characteristics i s And a feature fusion module f i u Cascading;
each characteristic optimization module f i s From two parallel sub-branches
Figure FDA0003950748840000023
And a 1X 1 standard convolutional layer cascade, a first sub-branch +>
Figure FDA0003950748840000024
The structure of the multilayer structure sequentially comprises a global average pooling layer, a 1-dimensional convolution layer and a Sigmoid activation layer; second sub-branch +>
Figure FDA0003950748840000025
Are constant branches;
each feature fusion module f i u Consists of a 3 × 3 standard convolutional layer;
the output feature map of the whole feature optimization pyramid network F is represented as:
Figure FDA0003950748840000026
wherein U is i For the ith feature fusion module f i u Conv3 × 3 (-) represents a 3 × 3 standard convolution, up (-) represents a bilinear interpolation upsampling function, S i Preference is given to module f for the ith feature i s Is represented as:
Figure FDA0003950748840000027
wherein Y is i+1 Is the i +1 th convolution module a of the backbone network i+1 The output characteristic diagram of (5) indicates a channel-by-channel multiplication operation, GAP (-) indicates fullLocal mean pooling, conv1d (. Cndot.) represents a 1-dimensional convolution, σ (. Sigma.). Cndot.) represents a Sigmoid function, and Conv 1X 1 (. Cndot.) represents a 1X 1 standard convolution.
4. The method of claim 1, wherein the recursive backbone network Q in step (2 b) comprises 5 cascaded convolution modules Q 1 ,q 2 ,q 3 ,q 4 ,q 5
The 1 st convolution module q 1 The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;
the 2 nd convolution module q 2 The device is formed by connecting 3 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;
the 3 rd convolution module q 3 The device is formed by connecting 4 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;
the 4 th convolution module q 4 The device is formed by connecting 6 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;
said 5 th convolution module q 5 The device is formed by connecting 3 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;
the output characteristic diagram of the whole recursion backbone network A is shown as follows:
Figure FDA0003950748840000031
wherein Z i For the ith convolution module q i Is used for the output characteristic diagram of the system,
Figure FDA0003950748840000032
for input SAR images with width and height W and H, respectively, U i-1 Feature-optimized pyramid network i-1 th feature fusion module>
Figure FDA0003950748840000033
Conv1 × 1 (·) represents a 1 × 1 standard convolution.
5. The method of claim 1, wherein the recursive feature-preferred pyramid network E in step (2 b) has the same structure and parameters as the feature-preferred pyramid network F, and the output features thereof are represented as follows:
Figure FDA0003950748840000034
wherein P is i For the ith feature fusion module f i u Conv3 × 3 (-) represents a 3 × 3 standard convolution, up (-) represents a bilinear interpolation upsampling function, W i Preference is given to module f for the ith feature i s Is expressed as:
W i =f i s (Z i+1 )
=Conv1×1(σ(Conv1d(GAP(Z i+1 )))⊙Z i+1 ) I =1,2,3,4 wherein Z i+1 For the (i + 1) th convolution module q of the recursive backbone network i+1 The output characteristic diagram of (1) (-) indicates a channel-by-channel multiplication operation, GAP (-) indicates a global average pooling, conv1d (-) indicates a 1-dimensional convolution, σ (-) indicates a Sigmoid function, and Conv1 × 1 (-) indicates a 1 × 1 standard convolution.
6. The method of claim 1, wherein the regional suggestion network in step (2 c) comprises a candidate region generation module, a classification regression module, a post-processing module, and a positive and negative sample distribution module in sequence;
the candidate region generation module is used for inputting the feature map P i The area generated at each point is 2 2i+2 Three rectangular candidate regions with width and height ratios of 1, 1;
the classification regression module comprises two parallel 3 x 3 standard convolution layers g 1 And g 2 The first winding layer g 1 A second convolution layer g for adjusting the center position and width and height of the candidate region 2 A target confidence for predicting the candidate region;
the above-mentionedA post-processing module for filtering the redundant candidate regions and outputting the first N candidate regions with the highest target confidence coefficient { (p) j ,b j ) I j =1, 2., N }, where p is j Is the target confidence of the jth candidate region, b j =(x j ,y j ,w j ,h j ) As a bounding box for the jth candidate region, (x) j ,y j ) Is the center point coordinate of the bounding box, (w) j ,h j ) Is the width and height of the bounding box;
the positive and negative sample distribution module is used for distributing the candidate area as a positive and negative sample, namely distributing the candidate area with the intersection ratio of more than 0.7 with the labeling boundary frame as a positive sample and distributing the candidate area with the intersection ratio of less than 0.3 with the labeling boundary frame as a negative sample;
the bounding box of the entire regional proposal network output is represented as: r is G ={b j =(x j ,y j ,w j ,h j )|j=1,2,...N}。
7. The method of claim 1, wherein three cascaded sub-deciders d in step (2 d) are used 1 ,d 2 ,d 3 Each decision maker sequentially comprises an interested region extraction module, a classification regression module and a positive and negative sample distribution module;
the interesting region extraction module consists of a self-adaptive average pooling layer and a flattening layer in a joint mode, wherein the self-adaptive average pooling layer is used for pooling the candidate region features into interesting features of 7 x 7, and the flattening layer is used for flattening the interesting features;
the classification regression module is formed by connecting two linear layers in parallel, the first linear layer adjusts the position of the boundary frame, and the second linear layer predicts the category score of the boundary frame;
the positive and negative sample distribution module is used for distributing the boundary frame into positive and negative samples, namely distributing the boundary frame with the intersection ratio of the boundary frame to be marked greater than a threshold value as a positive sample, distributing the boundary frame with the intersection ratio of the boundary frame to be marked smaller than the threshold value as a negative sample, and respectively setting the threshold values of the three sub-decision makers to be 0.5,0.6 and 0.7;
the bounding box output by the decision-level boosting module is represented as:
Figure FDA0003950748840000051
wherein N is the number of the bounding boxes,
Figure FDA0003950748840000052
for the position of the jth bounding box output of the third sub-decision maker>
Figure FDA0003950748840000053
Respectively represent->
Figure FDA0003950748840000054
In the coordinate system of (4), width and height>
Figure FDA0003950748840000055
The mean of the jth bounding box class score output for each sub-decider is expressed as:
Figure FDA0003950748840000056
wherein N is c As a result of the total number of categories,
Figure FDA0003950748840000057
respectively representing sub-deciders d 1 ,d 2 ,d 3 The category score of the jth bounding box of the output.
8. The method of claim 1, wherein the loss in step (3 a) comprises a loss of a region recommendation module
Figure FDA0003950748840000058
And loss of decision stage boost module>
Figure FDA0003950748840000059
Respectively, as follows:
Figure FDA00039507488400000510
Figure FDA00039507488400000511
wherein N is G Number of candidate regions, p, randomly sampled for region suggestion module m And
Figure FDA00039507488400000512
target confidence and corresponding true label for the mth candidate region sampled by the region suggestion module, respectively, b m And &>
Figure FDA00039507488400000513
Bounding boxes and corresponding real labels, L, for the mth candidate region sampled by the region suggestion module, respectively cls And L reg Respectively, cross entropy classification loss and CIOU regression loss, N D For the number of randomly sampled bounding boxes for each sub-decision maker>
Figure FDA00039507488400000514
And &>
Figure FDA00039507488400000515
Class scores and corresponding true tags, <' > according to the ith bounding box sampled by the ith sub-decider>
Figure FDA00039507488400000516
And &>
Figure FDA00039507488400000517
Respectively for the ith sub-decision makerThe ith bounding box of a sample and the corresponding true label, λ i Satisfy @, for the lost weight of the ith sub-decider>
Figure FDA00039507488400000518
Figure FDA00039507488400000519
For activating a function, the expression is:
Figure FDA0003950748840000061
the loss of the entire multi-stage enhancement network is expressed as:
Figure FDA0003950748840000062
9. the method of claim 1, wherein the network parameters are updated in step (3 b) by a random gradient descent method, which is implemented as follows:
(3b1) Solving for the gradient of the multi-level enhancement network parameters, expressed as:
Figure FDA0003950748840000063
wherein
Figure FDA0003950748840000064
For a multi-stage boost network loss, greater or lesser>
Figure FDA0003950748840000065
And &>
Figure FDA0003950748840000066
Loss of the region suggestion module and the decision-level enhancement module is respectively, and theta is a learnable parameter of the multi-level enhancement network;
(3b2) According to the gradient of solution
Figure FDA0003950748840000067
Updating parameters of the multi-level enhanced network, expressed as:
Figure FDA0003950748840000068
wherein theta' is a network parameter after updating, and theta is a network parameter before updating; lr is a learning rate, which is set according to the input image batch size.
CN202211449058.5A 2022-11-18 2022-11-18 SAR target detection and identification method based on multistage enhanced network Pending CN115909086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211449058.5A CN115909086A (en) 2022-11-18 2022-11-18 SAR target detection and identification method based on multistage enhanced network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211449058.5A CN115909086A (en) 2022-11-18 2022-11-18 SAR target detection and identification method based on multistage enhanced network

Publications (1)

Publication Number Publication Date
CN115909086A true CN115909086A (en) 2023-04-04

Family

ID=86473982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211449058.5A Pending CN115909086A (en) 2022-11-18 2022-11-18 SAR target detection and identification method based on multistage enhanced network

Country Status (1)

Country Link
CN (1) CN115909086A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574161A (en) * 2024-01-17 2024-02-20 航天宏图信息技术股份有限公司 Surface parameter estimation method, device and equipment based on generation of countermeasure network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574161A (en) * 2024-01-17 2024-02-20 航天宏图信息技术股份有限公司 Surface parameter estimation method, device and equipment based on generation of countermeasure network
CN117574161B (en) * 2024-01-17 2024-04-16 航天宏图信息技术股份有限公司 Surface parameter estimation method, device and equipment based on generation of countermeasure network

Similar Documents

Publication Publication Date Title
CN110135267B (en) Large-scene SAR image fine target detection method
US11783569B2 (en) Method for classifying hyperspectral images on basis of adaptive multi-scale feature extraction model
CN111738124B (en) Remote sensing image cloud detection method based on Gabor transformation and attention
CN110033002B (en) License plate detection method based on multitask cascade convolution neural network
CN106355151B (en) A kind of three-dimensional S AR images steganalysis method based on depth confidence network
CN107563433B (en) Infrared small target detection method based on convolutional neural network
CN112446388A (en) Multi-category vegetable seedling identification method and system based on lightweight two-stage detection model
CN109934166A (en) Unmanned plane image change detection method based on semantic segmentation and twin neural network
CN108898065B (en) Deep network ship target detection method with candidate area rapid screening and scale self-adaption
CN108447057B (en) SAR image change detection method based on significance and depth convolution network
CN111079739B (en) Multi-scale attention feature detection method
CN112132042A (en) SAR image target detection method based on anti-domain adaptation
CN110189304A (en) Remote sensing image target on-line quick detection method based on artificial intelligence
CN106096506A (en) Based on the SAR target identification method differentiating doubledictionary between subclass class
CN110659601B (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
Cepni et al. Vehicle detection using different deep learning algorithms from image sequence
CN110969121A (en) High-resolution radar target recognition algorithm based on deep learning
CN113536963A (en) SAR image airplane target detection method based on lightweight YOLO network
CN109034213B (en) Hyperspectral image classification method and system based on correlation entropy principle
CN111985325A (en) Aerial small target rapid identification method in extra-high voltage environment evaluation
CN113298032A (en) Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning
CN115761534A (en) Method for detecting and tracking small target of infrared unmanned aerial vehicle under air background
CN116469020A (en) Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance
CN109558803B (en) SAR target identification method based on convolutional neural network and NP criterion
CN115909086A (en) SAR target detection and identification method based on multistage enhanced network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination