CN113657181B

CN113657181B - SAR image rotation target detection method based on smooth tag coding and feature enhancement

Info

Publication number: CN113657181B
Application number: CN202110841106.4A
Authority: CN
Inventors: 蒋雯; 赵子豪; 耿杰
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2024-01-23
Anticipated expiration: 2041-07-23
Also published as: CN113657181A

Abstract

The invention discloses a SAR image rotation target detection method based on smooth tag coding and feature enhancement, which comprises the following steps: inputting a ship SAR image dataset, and converting the dataset into a PASCALVOC format; converting quadrilateral labels in any rotation direction into rectangular labels and rotation angles, and converting the rotation angles into binary smooth label codes; constructing a convolutional neural network ResNet-50, and training the network by using the manufactured ship SAR image data set; fusing the feature images output by the ResNet-50 by using a global averaging pooling module to obtain fused feature images; carrying out position attention enhancement on the fused feature map to obtain an enhanced feature map; the position information enhancement is carried out on the attention feature map by utilizing the original feature map jump connection mode; and carrying out anchor-frame-free regression prediction on the enhanced feature map by using a full convolution network to obtain a final detection result. The invention provides a method for combining global feature fusion with a position attention mechanism, wherein an angle classification branch is added on an anchor-free frame detection frame, and a novel centrality calculation method is also provided, so that the calculation method is more reasonable. The method not only effectively solves the detection problem that the SAR image ship target is difficult to accurately position, but also can enhance the prediction precision of the SAR image ship target rotation angle.

Description

SAR image rotation target detection method based on smooth tag coding and feature enhancement

Technical Field

The invention belongs to the field of intelligent interpretation of remote sensing images, and particularly relates to a SAR image rotation target detection method based on smooth tag coding and feature enhancement.

Background

The synthetic aperture radar (Synthetic Aperture Radar, SAR) is an active microwave remote sensor, can image various targets with high resolution, can work all the time and all the weather, has the advantages of strong penetrating power, strong anti-interference performance, long acting distance and the like, and has been widely applied to various fields of military reconnaissance, marine application, agriculture and forestry monitoring and the like. With the development of synthetic aperture radar imaging technology, SAR image target detection has been widely applied to the military and civil fields, and SAR image ship target detection is one of the hot spot problems.

With the development of artificial intelligence technology, deep learning models are gradually used for target detection tasks. The target detection algorithm based on the convolutional neural network can be divided into two major types, namely a double-stage target detection algorithm, such as a fast R-CNN and a Mask R-CNN, and the like, wherein the detection is divided into two stages of candidate region extraction, candidate target category and position prediction by the algorithm, and the other type is a single-stage target detection algorithm, such as You Only Look Once (YOLOv 2) and Retinonet, and the like, and the category and position information of the target are directly predicted by the algorithm in a regression mode.

However, existing convolutional neural network-based object detection algorithms are mainly applied to optical image data. Compared with the traditional target detection task, the SAR image ship target has the characteristics of random multiple presentation directions, and the labeling form of the random bounding box with the random presentation directions is beneficial to reducing the interference of excessive backgrounds in the rectangular labeling frame to realize more accurate target detection. The prior disclosed SAR image ship target SSDD+ also adopts any quadrilateral labeling form. The rotation angle representation includes OpenCV definition, long-side definition, and ordered quadrilateral representation, but a large loss value is generated due to the periodicity of angles (PoA) and the interchangeability of sides (EoE) and the chaotic regression of points. The SAR image ship target detection has other problems due to the characteristics of the image, and is mainly expressed in the following steps: (1) The ship target size of the low-resolution SAR image is smaller, the ship target position in the deep feature map based on the convolutional neural network is rough, and the ship target is difficult to locate and even miss; (2) The SAR image of the ship target has the problems of complex background and large noise, increases the difficulty of distinguishing the target from the background, and influences the ship target detection effect.

Disclosure of Invention

Aiming at the technical problems, the invention provides the SAR image rotation target detection method based on smooth tag coding and feature enhancement, which solves the problems of rotation angle regression caused by angle periodicity, edge interchangeability and the like by adopting a smooth tag coding mode, solves the problem of unbalance between feature layers by adopting a novel combined feature enhancement mode, enhances the robustness of features, improves the detection capability of small-size ship targets and improves the precision of ship target detection. The detection framework is built on the FCOS without anchor frame detection structure, and the invention improves the calculation mode of the centrality, so that the calculation method is more reasonable.

The technical method adopted by the invention is a SAR image rotation target detection method based on smooth tag coding and feature enhancement, and is characterized by comprising the following steps:

step one, inputting a ship SAR image data set, and converting the data set into a PASCAL VOC format:

step 101, adopting a published SSDD+ship SAR image dataset to count 1160 images, and labeling each instance by using any quadrilateral form;

step 102, dividing the disclosed SSDD+ data set into a training set, a testing set and a verification set according to the proportion of 7:2:1, and inputting the training set, the testing set and the verification set into a SAR image ship target detection model;

step two, converting quadrilateral labels in any rotation direction into rectangular labels and rotation angles, and converting the rotation angles into binary smooth label codes:

step 201, reading a boundary box contained in an image, solving a minimum circumscribed rectangle for the boundary box of any quadrangle, and returning to the form of a point set (x, y, w, h) and a rotation angle theta represented by the rectangle;

step 202, converting the rotation angle θ into binary smooth label code, wherein the angle range is AR, ω is the angle interval, and then

bcl＝Bin(θ/w,log ₂ (AR/ω))

Wherein the Bin () function converts θ/ω into a code length log ₂ (AR/ω) binary encoding;

step 203, converting the binary code label into a binary smooth label, wherein the binary codes of adjacent angles have larger differences in form, so that the distance between the adjacent angles is lost to a certain extent, and the codes between the adjacent angles can change greatly; smooth tag form adding discounts on binary code:

y _bcl ＝bcl(1-α)+α/log ₂ (AR/ω)

where α is a small super-parameter, typically 0.1, y _bcl Representing the processed binary smooth tag code;

thirdly, constructing a convolutional neural network ResNet-50, and training the network by using a ship SAR image dataset:

step 301, constructing a ResNet-50 network as a basic network, wherein the ResNet-50 network comprises 5 convolution modules, namely Conv1, conv2_x, conv3_x, conv4_x and Conv5_x;

step 302, extracting features of the SAR image by adopting a ResNet-50 network, and outputting feature graphs of the last 3 convolution modules, which are C respectively ₃ 、C ₄ 、C ₅ ；

Fusing the feature graphs output by the ResNet-50 by using a global averaging pooling module to obtain fused feature graphs:

step 401, for feature map C ₃ 、C ₄ 、C ₅ The convolution operation is carried out by adopting 256 convolution kernels with the size of 1 multiplied by 1 to obtain a characteristic diagram C' ₃ 、C′ ₄ 、C′ ₅ ；

Step 402, setting feature map C 'for dimension reduction' ₅ Is S' ₅ S 'is pooled using global averaging' ₅ The dimension is reduced to 1x1x256, and then the dimension passes through the full-connection layer twice, and the number of output channels of the first full-connection layer is S' ₅ Number of channelsOutput channel number and S 'of second full connection layer' ₅ The number of channels is the same, and the dimension of the feature map obtained after the two full-connection layers is 1x1x256 and C' ₄ Matrix multiplication is performed to obtain S' ₄ Similarly, C' ₄ After global average pooling and two full connection layers, the method is combined with C' ₃ Matrix multiplication is carried out to obtain S' ₃ ；

Step 403, checking S 'by using convolution of 3×3' ₃ 、S′ ₄ 、S′ ₅ Performing convolution operation to eliminate aliasing effect caused by global average pooling and two full connection layers to obtain a fused characteristic diagram S ₃ ，S ₄ ，S ₅ ；

Step five, carrying out position attention enhancement on the fused feature map to obtain an enhanced feature map:

step 501, merging feature map S ₃ By taking the number of channels as the originalConvolution kernel convolution of size 1x1 yields an interrogation feature matrix S ₃ Query, S ₃ Adopts the channel number as the original +.>Convolution kernel convolution with a size of 1x1 yields a key feature matrix S ₃ Key, will S ₃ Ruler with invariable number of channelsConvolution kernel convolution with size 1x1 yields a value feature matrix S ₃ Value, the same as S ₄ The characteristic matrix S is obtained by the same convolution mode of the two times ₄ _query、S ₄ Key and S ₄ Value, the same as S ₅ The characteristic matrix S is obtained by the same convolution mode of the two times ₅ _query、S ₅ Key and S ₅ _value：

Wherein S is _i The i-th feature map is shown as such,representing a weight matrix corresponding to the convolution kernel.

Step 502, calculating an inquiry feature matrix S _i Query and key feature matrix S _i Similarity P 'between keys' _i The computation of this similarity is achieved by matrix multiplication and softmax functions:

wherein P' _i The similarity matrix obtained from the i (i=3, 4, 5) th feature map is represented.

Step 503, similarity matrix P' _i And value characteristic matrix S _i Matrix multiplication is carried out on_value to obtain an attention characteristic enhancement matrix:

P″ _i ＝P′ _i *S _i _value

step six, carrying out position information enhancement on the attention feature map by utilizing the original feature map jump connection mode:

step 601, attention characteristic map P' ₃ 、P" ₄ 、P" ₅ Respectively with the original characteristic diagram C' ₃ 、C′ ₄ 、C′ ₅ After addition, the convolution operation is carried out by a convolution kernel with the channel number of the original feature map and the size of 1 multiplied by 1 to obtain an enhanced feature map P ₃ 、P ₄ 、P ₅ ；

Step 602, for enhanced feature map P ₅ Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number ₆ The method comprises the steps of carrying out a first treatment on the surface of the For the obtained characteristic map P ₆ Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number ₇ 。

Step seven, carrying out anchor-frame-free regression prediction on the enhanced feature map by utilizing a full convolution network to obtain a final detection result:

step 701, anchor-free regression prediction is performed pixel by pixel, each point on the feature map needs to be predicted, and the position point (x ', y') and the bounding box true value (x ₀ ,y ₀ ,x ₁ ,y ₁ ) The offset to be predicted can be obtained:

l ^* ＝x-x ₀ ,t ^* ＝y-y ₀

r ^* ＝x ₁ -x,b ^* ＝y ₁ -y

step 702, calculating the centrality between the position point and the true value of the bounding box, and the formula is as follows:

the method shows that the centrality is 1 when the position point coincides with the central point of the boundary frame, and is 0 when and only when the position point is positioned at four vertexes, and the centers of other positions are between [0,1 ]. Multiplying the centrality by the corresponding classification score, and calculating a final score for ordering the detected bounding boxes;

step 703, enhancingFeature map P ₃ 、P ₄ 、P ₅ 、P ₆ 、P ₇ Respectively inputting the data into a regression sub-network, a centrality regression sub-network, a classification sub-network and an angle classification sub-network;

step 704, a Loss function Loss formula of the whole network training is as follows:

wherein N is _pos Represents the number of positive samples, μ, λ and η represent the balance factors of the classification loss and the regression loss, respectively, L _cls 、L _reg 、L _{center_reg} And L _bcl Respectively representing category loss, bounding box regression loss, centrality regression loss and binary smooth tag coding loss;

step 705, enhancing the feature map P ₃ 、P ₄ 、P ₅ 、P ₆ 、P ₇ And obtaining a final prediction frame through a regression sub-network, a classification sub-network, a centrality regression sub-network and an angle classification sub-network respectively and then utilizing a non-maximum suppression algorithm, and outputting a ship SAR image target detection result.

Compared with the prior art, the invention has the following advantages:

firstly, the invention provides a SAR image rotation target detection method based on binary smooth label coding, which carries out classification prediction by converting a rotation angle into a binary coding form, and meanwhile, has larger difference on adjacent codes existing in the binary coding, so that the distance between the adjacent angles is lost to a certain extent, and the codes between the adjacent angles can change greatly. Therefore, the discount smooth label form is added to the binary code, so that the larger difference of adjacent codes is further reduced, meanwhile, overfitting can be avoided, and the generalization capability of angle prediction is improved.

Secondly, the invention enables the low-level features to be fused with the high-level semantic information through the global tie pooling module by the feature layer, and then the low-level information is fused with the enhanced position information through the position attention mechanism, so that the semantic information and the position information among different feature layers are enhanced, the influence of SAR image ship target noise on detection can be reduced, and the detection capability of targets is enhanced.

Third, the detection framework is built on the anchor-free frame detection structure FCOS, the traditional centrality calculation method calculates the predicted centrality of the boundary points, but the centrality of the position points falling on different boundaries is different from the centrality of the boundary frame central point.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of a global average pooling feature fusion model structure according to the present invention.

FIG. 3 is a schematic diagram of a position attention model according to the present invention.

Detailed Description

The method of the present invention will be described in further detail with reference to the accompanying drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of being practiced otherwise than as specifically illustrated and described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Spatially relative terms, such as "above … …," "above … …," "upper surface at … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial location relative to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "above" or "over" other devices or structures would then be oriented "below" or "beneath" the other devices or structures. Thus, the exemplary term "above … …" may include both orientations of "above … …" and "below … …". The device may also be positioned in other different ways (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

As shown in fig. 1, taking the disclosed ssdd+ship SAR image dataset as an example, the rationality and effectiveness of the present invention are illustrated, comprising the following specific steps:

step 202, converting the rotation angle such as θ=30° into a binary-coded smooth label, where the angle range is ar=180, ω=180/128 is the angle interval, then

bcl＝Bin(θ/w,log ₂ (AR/ω))＝[0,0,1,0,10,1]

I.e. the Bin () function willConversion to code length log ₂ Binary code [0,1, 0,1] of (AR/ω) =7]。

Step 203, converting the binary coded label into a binary smooth label code, wherein the binary codes of adjacent angles have large differences in form, so that the distance between the adjacent angles is lost to a certain extent, and the codes between the adjacent angles can change greatly. Smooth tag form adding discounts on binary code:

y _bcl ＝bcl(1-α)+α/log ₂ (AR/ω)

when α is 0.1, the example binary code in step 202 is converted into a binary smooth tag code, which is y _bcl ＝[0.014,0.014,0.9,0.014,0.9,0.014,0.9]。

Thirdly, constructing a convolutional neural network ResNet-50, and training the network by using the manufactured ship SAR image data set:

step 301, constructing a ResNet-50 network as a basic network, wherein the ResNet-50 network comprises 5 convolution modules, namely Conv1, conv2_x, conv3_x, conv4_x and Conv5_x, respectively, wherein the ship SAR image size is scaled to 800 multiplied by 1024 and input into Conv1 after normalization;

step 302, extracting features of the SAR image by adopting a ResNet-50 network, and outputting feature graphs of the last 3 convolution modules, which are C respectively ₃ 、C ₄ 、C ₅ The method comprises the steps of carrying out a first treatment on the surface of the Wherein C is ₃ 512 Zhang Tezheng diagram representing dimensions 100×128, C ₄ 1024 Zhang Tezheng diagram showing dimensions 50X 64, C ₅ 2048 Zhang Tezheng diagram representing a size of 25×32;

fusing the features of the feature map output by the ResNet-50 by using a global averaging pooling module to obtain a fused feature map:

step 401, for feature map C ₃ 、C ₄ 、C ₅ The convolution operation is carried out by adopting 256 convolution kernels with the size of 1 multiplied by 1 to obtain a characteristic diagram C' ₃ 、C′ ₄ 、C′ ₅ (II), (III), (V), (; wherein C' ₃ 256 Zhang Tezheng diagram representing dimensions 100×128, C' ₄ 256 Zhang Tezheng diagram representing dimensions 50×64, C' ₅ 256 Zhang Tezheng drawing showing dimensions 25×32;

step 402, setting feature map C 'for dimension reduction' ₅ Is S' ₅ S 'is pooled using global averaging' ₅ The dimension is reduced to 256 Zhang Tezheng with the dimension of 1 multiplied by 1, and then the two full connection layers are passed, and the number of output channels of the first full connection layer is S' ₅ Number of channelsOutput channel number and S 'of second full connection layer' ₅ The number of channels is the same, the dimension of the characteristic diagram obtained after two times of full connection layers is 256 Zhang Tezheng diagram with the dimension of 1 multiplied by 1, and C' ₄ Matrix multiplication is performed to obtain S' ₄ Similarly, C' ₄ After global average pooling and two full connection layers, the method is combined with C' ₃ Matrix multiplication is carried out to obtain S' ₃ Wherein S' ₃ 256 Zhang Tezheng diagram representing dimensions 100×128, S' ₄ 256 Zhang Tezheng diagram representing dimensions 50×64, S' ₅ 256 Zhang Tezheng drawing showing dimensions 25×32;

step 403, checking S 'by using convolution of 3×3' ₃ 、S′ ₄ 、S′ ₅ Proceeding withConvolution operation is carried out, so that aliasing effects caused by global average pooling and twice full connection layers are eliminated, and a fused characteristic diagram S is obtained ₃ ，S ₄ ，S ₅ ；

step 501, merging feature map S ₃ By taking the number of channels as the originalConvolution kernel convolution of size 1x1 yields an interrogation feature matrix S ₃ Query, S ₃ Adopts the channel number as the original +.>Convolution kernel convolution with a size of 1x1 yields a key feature matrix S ₃ Key, will S ₃ The convolution kernel convolution with the size of 1x1 is adopted to obtain a value characteristic matrix S with the unchanged channel number ₃ Value, the same as S ₄ The characteristic matrix S is obtained by the same convolution mode of the two times ₄ _query、S ₄ Key and S ₄ Value, the same as S ₅ The characteristic matrix S is obtained by the same convolution mode of the two times ₅ _query、S ₅ Key and S ₅ _value：

Wherein S is _i The i-th feature map is shown as such,representing the corresponding convolution kernelA weight matrix.

wherein P' _i Representing a similarity matrix derived from the i (i=3, 4, 5) th feature map;

step 503, similarity matrix P' i and value feature matrix S _i Matrix multiplication is carried out on_value to obtain an attention characteristic enhancement matrix:

P" _i ＝P′ _i *S _i _value

step 601, attention characteristic map P' ₃ 、P" ₄ 、P" ₅ Respectively with the original characteristic diagram C' ₃ 、C′ ₄ 、C′ ₅ After addition, the convolution operation is carried out by a convolution kernel with the channel number of the original feature map and the size of 1x1 to obtain an enhanced feature map P ₃ 、P ₄ 、P ₅ ；

Step 602, for enhanced feature map P ₅ Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number ₆ The method comprises the steps of carrying out a first treatment on the surface of the For the obtained characteristic map P ₆ Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number ₇ Wherein P is ₃ 256 Zhang Tezheng diagram representing dimensions 100×128, P ₄ 256 Zhang Tezheng diagram representing dimensions 50×64, P ₅ 256 Zhang Tezheng diagram representing dimensions 25×32, P ₆ 256 Zhang Tezheng diagram representing dimensions 13×16, P ₇ 256 Zhang Tezheng drawing showing dimensions 7×8;

l ^* ＝x-x ₀ ,t ^* ＝y-y ₀

r ^* ＝x ₁ -x,b ^* ＝y ₁ -y

the method shows that the centrality is 1 when the position point coincides with the central point of the boundary frame, and is 0 when and only when the position point is positioned at four vertexes, and the centers of other positions are between [0,1 ]. The centrality is multiplied by the corresponding classification score and the final score is calculated for ranking the detected bounding boxes.

Step 703, enhancing the feature map P ₃ 、P ₄ 、P ₅ 、P ₆ 、P ₇ Respectively inputting the data into a regression sub-network, a centrality regression sub-network, a classification sub-network and an angle classification sub-network;

The foregoing is merely an embodiment of the present invention, and the present invention is not limited thereto, and any simple modification, variation and equivalent structural changes made to the foregoing embodiment according to the technical matter of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. The SAR image rotation target detection method based on smooth tag coding and feature enhancement is characterized by comprising the following steps of:

bcl＝Bin(θ/w,log ₂ (AR/ω))

y _bcl ＝bcl(1-α)+α/log ₂ (AR/ω)

Step 402, setting feature map C 'for dimension reduction' ₅ Is S' ₅ S 'is pooled using global averaging' ₅ The dimension is reduced to 1x1x256, and then the dimension passes through the full-connection layer twice, and the number of output channels of the first full-connection layer is S' ₅ Number of channelsOutput channel number and S 'of second full connection layer' ₅ The number of channels is the same, and the dimension of the feature map obtained after the two full-connection layers is 1x1x256 and C' ₄ Matrix multiplication is performed to obtain S' ₄ Similarly, C' ₄ Via globalAverage pooling and twice full joining of layers followed by C' ₃ Matrix multiplication is carried out to obtain S' ₃ ；

Wherein S is _i The i-th feature map is shown as such,representing a weight matrix corresponding to the convolution kernel;

P" _i ＝P′ _i *S _i _value

Step 602, for enhanced feature map P ₅ Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number ₆ The method comprises the steps of carrying out a first treatment on the surface of the For the obtained characteristic map P ₆ Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number ₇ ；

step 701, no anchorThe frame regression prediction is performed in a pixel-by-pixel manner, and each point on the feature map needs to be predicted, and the known position point (x ', y') and the true value (x) ₀ ,y ₀ ,x ₁ ,y ₁ ) The offset to be predicted can be obtained:

l ^* ＝x-x ₀ ,t ^* ＝y-y ₀

r ^* ＝x ₁ -x,b ^* ＝y ₁ -y

the method shows that the centrality is 1 when the position point coincides with the central point of the boundary frame, and is 0 when and only when the position point is positioned at four vertexes, and the centers of other positions are all between [0,1 ]; multiplying the centrality by the corresponding classification score, and calculating a final score for ordering the detected bounding boxes;