CN113657181B - SAR image rotation target detection method based on smooth tag coding and feature enhancement - Google Patents

SAR image rotation target detection method based on smooth tag coding and feature enhancement Download PDF

Info

Publication number
CN113657181B
CN113657181B CN202110841106.4A CN202110841106A CN113657181B CN 113657181 B CN113657181 B CN 113657181B CN 202110841106 A CN202110841106 A CN 202110841106A CN 113657181 B CN113657181 B CN 113657181B
Authority
CN
China
Prior art keywords
feature map
convolution
network
feature
sar image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110841106.4A
Other languages
Chinese (zh)
Other versions
CN113657181A (en
Inventor
蒋雯
赵子豪
耿杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110841106.4A priority Critical patent/CN113657181B/en
Publication of CN113657181A publication Critical patent/CN113657181A/en
Application granted granted Critical
Publication of CN113657181B publication Critical patent/CN113657181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a SAR image rotation target detection method based on smooth tag coding and feature enhancement, which comprises the following steps: inputting a ship SAR image dataset, and converting the dataset into a PASCALVOC format; converting quadrilateral labels in any rotation direction into rectangular labels and rotation angles, and converting the rotation angles into binary smooth label codes; constructing a convolutional neural network ResNet-50, and training the network by using the manufactured ship SAR image data set; fusing the feature images output by the ResNet-50 by using a global averaging pooling module to obtain fused feature images; carrying out position attention enhancement on the fused feature map to obtain an enhanced feature map; the position information enhancement is carried out on the attention feature map by utilizing the original feature map jump connection mode; and carrying out anchor-frame-free regression prediction on the enhanced feature map by using a full convolution network to obtain a final detection result. The invention provides a method for combining global feature fusion with a position attention mechanism, wherein an angle classification branch is added on an anchor-free frame detection frame, and a novel centrality calculation method is also provided, so that the calculation method is more reasonable. The method not only effectively solves the detection problem that the SAR image ship target is difficult to accurately position, but also can enhance the prediction precision of the SAR image ship target rotation angle.

Description

SAR image rotation target detection method based on smooth tag coding and feature enhancement
Technical Field
The invention belongs to the field of intelligent interpretation of remote sensing images, and particularly relates to a SAR image rotation target detection method based on smooth tag coding and feature enhancement.
Background
The synthetic aperture radar (Synthetic Aperture Radar, SAR) is an active microwave remote sensor, can image various targets with high resolution, can work all the time and all the weather, has the advantages of strong penetrating power, strong anti-interference performance, long acting distance and the like, and has been widely applied to various fields of military reconnaissance, marine application, agriculture and forestry monitoring and the like. With the development of synthetic aperture radar imaging technology, SAR image target detection has been widely applied to the military and civil fields, and SAR image ship target detection is one of the hot spot problems.
With the development of artificial intelligence technology, deep learning models are gradually used for target detection tasks. The target detection algorithm based on the convolutional neural network can be divided into two major types, namely a double-stage target detection algorithm, such as a fast R-CNN and a Mask R-CNN, and the like, wherein the detection is divided into two stages of candidate region extraction, candidate target category and position prediction by the algorithm, and the other type is a single-stage target detection algorithm, such as You Only Look Once (YOLOv 2) and Retinonet, and the like, and the category and position information of the target are directly predicted by the algorithm in a regression mode.
However, existing convolutional neural network-based object detection algorithms are mainly applied to optical image data. Compared with the traditional target detection task, the SAR image ship target has the characteristics of random multiple presentation directions, and the labeling form of the random bounding box with the random presentation directions is beneficial to reducing the interference of excessive backgrounds in the rectangular labeling frame to realize more accurate target detection. The prior disclosed SAR image ship target SSDD+ also adopts any quadrilateral labeling form. The rotation angle representation includes OpenCV definition, long-side definition, and ordered quadrilateral representation, but a large loss value is generated due to the periodicity of angles (PoA) and the interchangeability of sides (EoE) and the chaotic regression of points. The SAR image ship target detection has other problems due to the characteristics of the image, and is mainly expressed in the following steps: (1) The ship target size of the low-resolution SAR image is smaller, the ship target position in the deep feature map based on the convolutional neural network is rough, and the ship target is difficult to locate and even miss; (2) The SAR image of the ship target has the problems of complex background and large noise, increases the difficulty of distinguishing the target from the background, and influences the ship target detection effect.
Disclosure of Invention
Aiming at the technical problems, the invention provides the SAR image rotation target detection method based on smooth tag coding and feature enhancement, which solves the problems of rotation angle regression caused by angle periodicity, edge interchangeability and the like by adopting a smooth tag coding mode, solves the problem of unbalance between feature layers by adopting a novel combined feature enhancement mode, enhances the robustness of features, improves the detection capability of small-size ship targets and improves the precision of ship target detection. The detection framework is built on the FCOS without anchor frame detection structure, and the invention improves the calculation mode of the centrality, so that the calculation method is more reasonable.
The technical method adopted by the invention is a SAR image rotation target detection method based on smooth tag coding and feature enhancement, and is characterized by comprising the following steps:
step one, inputting a ship SAR image data set, and converting the data set into a PASCAL VOC format:
step 101, adopting a published SSDD+ship SAR image dataset to count 1160 images, and labeling each instance by using any quadrilateral form;
step 102, dividing the disclosed SSDD+ data set into a training set, a testing set and a verification set according to the proportion of 7:2:1, and inputting the training set, the testing set and the verification set into a SAR image ship target detection model;
step two, converting quadrilateral labels in any rotation direction into rectangular labels and rotation angles, and converting the rotation angles into binary smooth label codes:
step 201, reading a boundary box contained in an image, solving a minimum circumscribed rectangle for the boundary box of any quadrangle, and returning to the form of a point set (x, y, w, h) and a rotation angle theta represented by the rectangle;
step 202, converting the rotation angle θ into binary smooth label code, wherein the angle range is AR, ω is the angle interval, and then
bcl=Bin(θ/w,log 2 (AR/ω))
Wherein the Bin () function converts θ/ω into a code length log 2 (AR/ω) binary encoding;
step 203, converting the binary code label into a binary smooth label, wherein the binary codes of adjacent angles have larger differences in form, so that the distance between the adjacent angles is lost to a certain extent, and the codes between the adjacent angles can change greatly; smooth tag form adding discounts on binary code:
y bcl =bcl(1-α)+α/log 2 (AR/ω)
where α is a small super-parameter, typically 0.1, y bcl Representing the processed binary smooth tag code;
thirdly, constructing a convolutional neural network ResNet-50, and training the network by using a ship SAR image dataset:
step 301, constructing a ResNet-50 network as a basic network, wherein the ResNet-50 network comprises 5 convolution modules, namely Conv1, conv2_x, conv3_x, conv4_x and Conv5_x;
step 302, extracting features of the SAR image by adopting a ResNet-50 network, and outputting feature graphs of the last 3 convolution modules, which are C respectively 3 、C 4 、C 5
Fusing the feature graphs output by the ResNet-50 by using a global averaging pooling module to obtain fused feature graphs:
step 401, for feature map C 3 、C 4 、C 5 The convolution operation is carried out by adopting 256 convolution kernels with the size of 1 multiplied by 1 to obtain a characteristic diagram C' 3 、C′ 4 、C′ 5
Step 402, setting feature map C 'for dimension reduction' 5 Is S' 5 S 'is pooled using global averaging' 5 The dimension is reduced to 1x1x256, and then the dimension passes through the full-connection layer twice, and the number of output channels of the first full-connection layer is S' 5 Number of channelsOutput channel number and S 'of second full connection layer' 5 The number of channels is the same, and the dimension of the feature map obtained after the two full-connection layers is 1x1x256 and C' 4 Matrix multiplication is performed to obtain S' 4 Similarly, C' 4 After global average pooling and two full connection layers, the method is combined with C' 3 Matrix multiplication is carried out to obtain S' 3
Step 403, checking S 'by using convolution of 3×3' 3 、S′ 4 、S′ 5 Performing convolution operation to eliminate aliasing effect caused by global average pooling and two full connection layers to obtain a fused characteristic diagram S 3 ,S 4 ,S 5
Step five, carrying out position attention enhancement on the fused feature map to obtain an enhanced feature map:
step 501, merging feature map S 3 By taking the number of channels as the originalConvolution kernel convolution of size 1x1 yields an interrogation feature matrix S 3 Query, S 3 Adopts the channel number as the original +.>Convolution kernel convolution with a size of 1x1 yields a key feature matrix S 3 Key, will S 3 Ruler with invariable number of channelsConvolution kernel convolution with size 1x1 yields a value feature matrix S 3 Value, the same as S 4 The characteristic matrix S is obtained by the same convolution mode of the two times 4 _query、S 4 Key and S 4 Value, the same as S 5 The characteristic matrix S is obtained by the same convolution mode of the two times 5 _query、S 5 Key and S 5 _value:
Wherein S is i The i-th feature map is shown as such,representing a weight matrix corresponding to the convolution kernel.
Step 502, calculating an inquiry feature matrix S i Query and key feature matrix S i Similarity P 'between keys' i The computation of this similarity is achieved by matrix multiplication and softmax functions:
wherein P' i The similarity matrix obtained from the i (i=3, 4, 5) th feature map is represented.
Step 503, similarity matrix P' i And value characteristic matrix S i Matrix multiplication is carried out on_value to obtain an attention characteristic enhancement matrix:
P″ i =P′ i *S i _value
step six, carrying out position information enhancement on the attention feature map by utilizing the original feature map jump connection mode:
step 601, attention characteristic map P' 3 、P" 4 、P" 5 Respectively with the original characteristic diagram C' 3 、C′ 4 、C′ 5 After addition, the convolution operation is carried out by a convolution kernel with the channel number of the original feature map and the size of 1 multiplied by 1 to obtain an enhanced feature map P 3 、P 4 、P 5
Step 602, for enhanced feature map P 5 Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number 6 The method comprises the steps of carrying out a first treatment on the surface of the For the obtained characteristic map P 6 Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number 7
Step seven, carrying out anchor-frame-free regression prediction on the enhanced feature map by utilizing a full convolution network to obtain a final detection result:
step 701, anchor-free regression prediction is performed pixel by pixel, each point on the feature map needs to be predicted, and the position point (x ', y') and the bounding box true value (x 0 ,y 0 ,x 1 ,y 1 ) The offset to be predicted can be obtained:
l * =x-x 0 ,t * =y-y 0
r * =x 1 -x,b * =y 1 -y
step 702, calculating the centrality between the position point and the true value of the bounding box, and the formula is as follows:
the method shows that the centrality is 1 when the position point coincides with the central point of the boundary frame, and is 0 when and only when the position point is positioned at four vertexes, and the centers of other positions are between [0,1 ]. Multiplying the centrality by the corresponding classification score, and calculating a final score for ordering the detected bounding boxes;
step 703, enhancingFeature map P 3 、P 4 、P 5 、P 6 、P 7 Respectively inputting the data into a regression sub-network, a centrality regression sub-network, a classification sub-network and an angle classification sub-network;
step 704, a Loss function Loss formula of the whole network training is as follows:
wherein N is pos Represents the number of positive samples, μ, λ and η represent the balance factors of the classification loss and the regression loss, respectively, L cls 、L reg 、L center_reg And L bcl Respectively representing category loss, bounding box regression loss, centrality regression loss and binary smooth tag coding loss;
step 705, enhancing the feature map P 3 、P 4 、P 5 、P 6 、P 7 And obtaining a final prediction frame through a regression sub-network, a classification sub-network, a centrality regression sub-network and an angle classification sub-network respectively and then utilizing a non-maximum suppression algorithm, and outputting a ship SAR image target detection result.
Compared with the prior art, the invention has the following advantages:
firstly, the invention provides a SAR image rotation target detection method based on binary smooth label coding, which carries out classification prediction by converting a rotation angle into a binary coding form, and meanwhile, has larger difference on adjacent codes existing in the binary coding, so that the distance between the adjacent angles is lost to a certain extent, and the codes between the adjacent angles can change greatly. Therefore, the discount smooth label form is added to the binary code, so that the larger difference of adjacent codes is further reduced, meanwhile, overfitting can be avoided, and the generalization capability of angle prediction is improved.
Secondly, the invention enables the low-level features to be fused with the high-level semantic information through the global tie pooling module by the feature layer, and then the low-level information is fused with the enhanced position information through the position attention mechanism, so that the semantic information and the position information among different feature layers are enhanced, the influence of SAR image ship target noise on detection can be reduced, and the detection capability of targets is enhanced.
Third, the detection framework is built on the anchor-free frame detection structure FCOS, the traditional centrality calculation method calculates the predicted centrality of the boundary points, but the centrality of the position points falling on different boundaries is different from the centrality of the boundary frame central point.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of a global average pooling feature fusion model structure according to the present invention.
FIG. 3 is a schematic diagram of a position attention model according to the present invention.
Detailed Description
The method of the present invention will be described in further detail with reference to the accompanying drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of being practiced otherwise than as specifically illustrated and described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Spatially relative terms, such as "above … …," "above … …," "upper surface at … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial location relative to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "above" or "over" other devices or structures would then be oriented "below" or "beneath" the other devices or structures. Thus, the exemplary term "above … …" may include both orientations of "above … …" and "below … …". The device may also be positioned in other different ways (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.
As shown in fig. 1, taking the disclosed ssdd+ship SAR image dataset as an example, the rationality and effectiveness of the present invention are illustrated, comprising the following specific steps:
step one, inputting a ship SAR image data set, and converting the data set into a PASCAL VOC format:
step 101, adopting a published SSDD+ship SAR image dataset to count 1160 images, and labeling each instance by using any quadrilateral form;
step 102, dividing the disclosed SSDD+ data set into a training set, a testing set and a verification set according to the proportion of 7:2:1, and inputting the training set, the testing set and the verification set into a SAR image ship target detection model;
step two, converting quadrilateral labels in any rotation direction into rectangular labels and rotation angles, and converting the rotation angles into binary smooth label codes:
step 201, reading a boundary box contained in an image, solving a minimum circumscribed rectangle for the boundary box of any quadrangle, and returning to the form of a point set (x, y, w, h) and a rotation angle theta represented by the rectangle;
step 202, converting the rotation angle such as θ=30° into a binary-coded smooth label, where the angle range is ar=180, ω=180/128 is the angle interval, then
bcl=Bin(θ/w,log 2 (AR/ω))=[0,0,1,0,10,1]
I.e. the Bin () function willConversion to code length log 2 Binary code [0,1, 0,1] of (AR/ω) =7]。
Step 203, converting the binary coded label into a binary smooth label code, wherein the binary codes of adjacent angles have large differences in form, so that the distance between the adjacent angles is lost to a certain extent, and the codes between the adjacent angles can change greatly. Smooth tag form adding discounts on binary code:
y bcl =bcl(1-α)+α/log 2 (AR/ω)
when α is 0.1, the example binary code in step 202 is converted into a binary smooth tag code, which is y bcl =[0.014,0.014,0.9,0.014,0.9,0.014,0.9]。
Thirdly, constructing a convolutional neural network ResNet-50, and training the network by using the manufactured ship SAR image data set:
step 301, constructing a ResNet-50 network as a basic network, wherein the ResNet-50 network comprises 5 convolution modules, namely Conv1, conv2_x, conv3_x, conv4_x and Conv5_x, respectively, wherein the ship SAR image size is scaled to 800 multiplied by 1024 and input into Conv1 after normalization;
step 302, extracting features of the SAR image by adopting a ResNet-50 network, and outputting feature graphs of the last 3 convolution modules, which are C respectively 3 、C 4 、C 5 The method comprises the steps of carrying out a first treatment on the surface of the Wherein C is 3 512 Zhang Tezheng diagram representing dimensions 100×128, C 4 1024 Zhang Tezheng diagram showing dimensions 50X 64, C 5 2048 Zhang Tezheng diagram representing a size of 25×32;
fusing the features of the feature map output by the ResNet-50 by using a global averaging pooling module to obtain a fused feature map:
step 401, for feature map C 3 、C 4 、C 5 The convolution operation is carried out by adopting 256 convolution kernels with the size of 1 multiplied by 1 to obtain a characteristic diagram C' 3 、C′ 4 、C′ 5 (II), (III), (V), (; wherein C' 3 256 Zhang Tezheng diagram representing dimensions 100×128, C' 4 256 Zhang Tezheng diagram representing dimensions 50×64, C' 5 256 Zhang Tezheng drawing showing dimensions 25×32;
step 402, setting feature map C 'for dimension reduction' 5 Is S' 5 S 'is pooled using global averaging' 5 The dimension is reduced to 256 Zhang Tezheng with the dimension of 1 multiplied by 1, and then the two full connection layers are passed, and the number of output channels of the first full connection layer is S' 5 Number of channelsOutput channel number and S 'of second full connection layer' 5 The number of channels is the same, the dimension of the characteristic diagram obtained after two times of full connection layers is 256 Zhang Tezheng diagram with the dimension of 1 multiplied by 1, and C' 4 Matrix multiplication is performed to obtain S' 4 Similarly, C' 4 After global average pooling and two full connection layers, the method is combined with C' 3 Matrix multiplication is carried out to obtain S' 3 Wherein S' 3 256 Zhang Tezheng diagram representing dimensions 100×128, S' 4 256 Zhang Tezheng diagram representing dimensions 50×64, S' 5 256 Zhang Tezheng drawing showing dimensions 25×32;
step 403, checking S 'by using convolution of 3×3' 3 、S′ 4 、S′ 5 Proceeding withConvolution operation is carried out, so that aliasing effects caused by global average pooling and twice full connection layers are eliminated, and a fused characteristic diagram S is obtained 3 ,S 4 ,S 5
Step five, carrying out position attention enhancement on the fused feature map to obtain an enhanced feature map:
step 501, merging feature map S 3 By taking the number of channels as the originalConvolution kernel convolution of size 1x1 yields an interrogation feature matrix S 3 Query, S 3 Adopts the channel number as the original +.>Convolution kernel convolution with a size of 1x1 yields a key feature matrix S 3 Key, will S 3 The convolution kernel convolution with the size of 1x1 is adopted to obtain a value characteristic matrix S with the unchanged channel number 3 Value, the same as S 4 The characteristic matrix S is obtained by the same convolution mode of the two times 4 _query、S 4 Key and S 4 Value, the same as S 5 The characteristic matrix S is obtained by the same convolution mode of the two times 5 _query、S 5 Key and S 5 _value:
Wherein S is i The i-th feature map is shown as such,representing the corresponding convolution kernelA weight matrix.
Step 502, calculating an inquiry feature matrix S i Query and key feature matrix S i Similarity P 'between keys' i The computation of this similarity is achieved by matrix multiplication and softmax functions:
wherein P' i Representing a similarity matrix derived from the i (i=3, 4, 5) th feature map;
step 503, similarity matrix P' i and value feature matrix S i Matrix multiplication is carried out on_value to obtain an attention characteristic enhancement matrix:
P" i =P′ i *S i _value
step six, carrying out position information enhancement on the attention feature map by utilizing the original feature map jump connection mode:
step 601, attention characteristic map P' 3 、P" 4 、P" 5 Respectively with the original characteristic diagram C' 3 、C′ 4 、C′ 5 After addition, the convolution operation is carried out by a convolution kernel with the channel number of the original feature map and the size of 1x1 to obtain an enhanced feature map P 3 、P 4 、P 5
Step 602, for enhanced feature map P 5 Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number 6 The method comprises the steps of carrying out a first treatment on the surface of the For the obtained characteristic map P 6 Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number 7 Wherein P is 3 256 Zhang Tezheng diagram representing dimensions 100×128, P 4 256 Zhang Tezheng diagram representing dimensions 50×64, P 5 256 Zhang Tezheng diagram representing dimensions 25×32, P 6 256 Zhang Tezheng diagram representing dimensions 13×16, P 7 256 Zhang Tezheng drawing showing dimensions 7×8;
step seven, carrying out anchor-frame-free regression prediction on the enhanced feature map by utilizing a full convolution network to obtain a final detection result:
step 701, anchor-free regression prediction is performed pixel by pixel, each point on the feature map needs to be predicted, and the position point (x ', y') and the bounding box true value (x 0 ,y 0 ,x 1 ,y 1 ) The offset to be predicted can be obtained:
l * =x-x 0 ,t * =y-y 0
r * =x 1 -x,b * =y 1 -y
step 702, calculating the centrality between the position point and the true value of the bounding box, and the formula is as follows:
the method shows that the centrality is 1 when the position point coincides with the central point of the boundary frame, and is 0 when and only when the position point is positioned at four vertexes, and the centers of other positions are between [0,1 ]. The centrality is multiplied by the corresponding classification score and the final score is calculated for ranking the detected bounding boxes.
Step 703, enhancing the feature map P 3 、P 4 、P 5 、P 6 、P 7 Respectively inputting the data into a regression sub-network, a centrality regression sub-network, a classification sub-network and an angle classification sub-network;
step 704, a Loss function Loss formula of the whole network training is as follows:
wherein N is pos Represents the number of positive samples, μ, λ and η represent the balance factors of the classification loss and the regression loss, respectively, L cls 、L reg 、L center_reg And L bcl Respectively representing category loss, bounding box regression loss, centrality regression loss and binary smooth tag coding loss;
step 705, enhancing the feature map P 3 、P 4 、P 5 、P 6 、P 7 And obtaining a final prediction frame through a regression sub-network, a classification sub-network, a centrality regression sub-network and an angle classification sub-network respectively and then utilizing a non-maximum suppression algorithm, and outputting a ship SAR image target detection result.
The foregoing is merely an embodiment of the present invention, and the present invention is not limited thereto, and any simple modification, variation and equivalent structural changes made to the foregoing embodiment according to the technical matter of the present invention still fall within the scope of the technical solution of the present invention.

Claims (1)

1. The SAR image rotation target detection method based on smooth tag coding and feature enhancement is characterized by comprising the following steps of:
step one, inputting a ship SAR image data set, and converting the data set into a PASCAL VOC format:
step 101, adopting a published SSDD+ship SAR image dataset to count 1160 images, and labeling each instance by using any quadrilateral form;
step 102, dividing the disclosed SSDD+ data set into a training set, a testing set and a verification set according to the proportion of 7:2:1, and inputting the training set, the testing set and the verification set into a SAR image ship target detection model;
step two, converting quadrilateral labels in any rotation direction into rectangular labels and rotation angles, and converting the rotation angles into binary smooth label codes:
step 201, reading a boundary box contained in an image, solving a minimum circumscribed rectangle for the boundary box of any quadrangle, and returning to the form of a point set (x, y, w, h) and a rotation angle theta represented by the rectangle;
step 202, converting the rotation angle θ into binary smooth label code, wherein the angle range is AR, ω is the angle interval, and then
bcl=Bin(θ/w,log 2 (AR/ω))
Wherein the Bin () function converts θ/ω into a code length log 2 (AR/ω) binary encoding;
step 203, converting the binary code label into a binary smooth label, wherein the binary codes of adjacent angles have larger differences in form, so that the distance between the adjacent angles is lost to a certain extent, and the codes between the adjacent angles can change greatly; smooth tag form adding discounts on binary code:
y bcl =bcl(1-α)+α/log 2 (AR/ω)
where α is a small super-parameter, typically 0.1, y bcl Representing the processed binary smooth tag code;
thirdly, constructing a convolutional neural network ResNet-50, and training the network by using a ship SAR image dataset:
step 301, constructing a ResNet-50 network as a basic network, wherein the ResNet-50 network comprises 5 convolution modules, namely Conv1, conv2_x, conv3_x, conv4_x and Conv5_x;
step 302, extracting features of the SAR image by adopting a ResNet-50 network, and outputting feature graphs of the last 3 convolution modules, which are C respectively 3 、C 4 、c 5
Fusing the feature graphs output by the ResNet-50 by using a global averaging pooling module to obtain fused feature graphs:
step 401, for feature map C 3 、C 4 、C 5 The convolution operation is carried out by adopting 256 convolution kernels with the size of 1 multiplied by 1 to obtain a characteristic diagram C' 3 、C′ 4 、C′ 5
Step 402, setting feature map C 'for dimension reduction' 5 Is S' 5 S 'is pooled using global averaging' 5 The dimension is reduced to 1x1x256, and then the dimension passes through the full-connection layer twice, and the number of output channels of the first full-connection layer is S' 5 Number of channelsOutput channel number and S 'of second full connection layer' 5 The number of channels is the same, and the dimension of the feature map obtained after the two full-connection layers is 1x1x256 and C' 4 Matrix multiplication is performed to obtain S' 4 Similarly, C' 4 Via globalAverage pooling and twice full joining of layers followed by C' 3 Matrix multiplication is carried out to obtain S' 3
Step 403, checking S 'by using convolution of 3×3' 3 、S′ 4 、S′ 5 Performing convolution operation to eliminate aliasing effect caused by global average pooling and two full connection layers to obtain a fused characteristic diagram S 3 ,S 4 ,S 5
Step five, carrying out position attention enhancement on the fused feature map to obtain an enhanced feature map:
step 501, merging feature map S 3 By taking the number of channels as the originalConvolution kernel convolution of size 1x1 yields an interrogation feature matrix S 3 Query, S 3 Adopts the channel number as the original +.>Convolution kernel convolution with a size of 1x1 yields a key feature matrix S 3 Key, will S 3 The convolution kernel convolution with the size of 1x1 is adopted to obtain a value characteristic matrix S with the unchanged channel number 3 Value, the same as S 4 The characteristic matrix S is obtained by the same convolution mode of the two times 4 _query、S 4 Key and S 4 Value, the same as S 5 The characteristic matrix S is obtained by the same convolution mode of the two times 5 _query、S 5 Key and S 5 _value:
Wherein S is i The i-th feature map is shown as such,representing a weight matrix corresponding to the convolution kernel;
step 502, calculating an inquiry feature matrix S i Query and key feature matrix S i Similarity P 'between keys' i The computation of this similarity is achieved by matrix multiplication and softmax functions:
wherein P' i Representing a similarity matrix derived from the i (i=3, 4, 5) th feature map;
step 503, similarity matrix P' i And value characteristic matrix S i Matrix multiplication is carried out on_value to obtain an attention characteristic enhancement matrix:
P" i =P′ i *S i _value
step six, carrying out position information enhancement on the attention feature map by utilizing the original feature map jump connection mode:
step 601, attention characteristic map P' 3 、P" 4 、P" 5 Respectively with the original characteristic diagram C' 3 、C′ 4 、C′ 5 After addition, the convolution operation is carried out by a convolution kernel with the channel number of the original feature map and the size of 1 multiplied by 1 to obtain an enhanced feature map P 3 、P 4 、P 5
Step 602, for enhanced feature map P 5 Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number 6 The method comprises the steps of carrying out a first treatment on the surface of the For the obtained characteristic map P 6 Amplifying, and obtaining a characteristic map P through convolution with the convolution kernel size of 3x3 by the unchanged channel number 7
Step seven, carrying out anchor-frame-free regression prediction on the enhanced feature map by utilizing a full convolution network to obtain a final detection result:
step 701, no anchorThe frame regression prediction is performed in a pixel-by-pixel manner, and each point on the feature map needs to be predicted, and the known position point (x ', y') and the true value (x) 0 ,y 0 ,x 1 ,y 1 ) The offset to be predicted can be obtained:
l * =x-x 0 ,t * =y-y 0
r * =x 1 -x,b * =y 1 -y
step 702, calculating the centrality between the position point and the true value of the bounding box, and the formula is as follows:
the method shows that the centrality is 1 when the position point coincides with the central point of the boundary frame, and is 0 when and only when the position point is positioned at four vertexes, and the centers of other positions are all between [0,1 ]; multiplying the centrality by the corresponding classification score, and calculating a final score for ordering the detected bounding boxes;
step 703, enhancing the feature map P 3 、P 4 、P 5 、P 6 、P 7 Respectively inputting the data into a regression sub-network, a centrality regression sub-network, a classification sub-network and an angle classification sub-network;
step 704, a Loss function Loss formula of the whole network training is as follows:
wherein N is pos Represents the number of positive samples, μ, λ and η represent the balance factors of the classification loss and the regression loss, respectively, L cls 、L reg 、L center_reg And L bcl Respectively representing category loss, bounding box regression loss, centrality regression loss and binary smooth tag coding loss;
step 705, enhancing the feature map P 3 、P 4 、P 5 、P 6 、P 7 And obtaining a final prediction frame through a regression sub-network, a classification sub-network, a centrality regression sub-network and an angle classification sub-network respectively and then utilizing a non-maximum suppression algorithm, and outputting a ship SAR image target detection result.
CN202110841106.4A 2021-07-23 2021-07-23 SAR image rotation target detection method based on smooth tag coding and feature enhancement Active CN113657181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110841106.4A CN113657181B (en) 2021-07-23 2021-07-23 SAR image rotation target detection method based on smooth tag coding and feature enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110841106.4A CN113657181B (en) 2021-07-23 2021-07-23 SAR image rotation target detection method based on smooth tag coding and feature enhancement

Publications (2)

Publication Number Publication Date
CN113657181A CN113657181A (en) 2021-11-16
CN113657181B true CN113657181B (en) 2024-01-23

Family

ID=78490092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110841106.4A Active CN113657181B (en) 2021-07-23 2021-07-23 SAR image rotation target detection method based on smooth tag coding and feature enhancement

Country Status (1)

Country Link
CN (1) CN113657181B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119582B (en) * 2021-12-01 2024-04-26 安徽大学 Synthetic aperture radar image target detection method
CN114612743A (en) * 2022-03-10 2022-06-10 北京百度网讯科技有限公司 Deep learning model training method, target object identification method and device
CN115272856B (en) * 2022-07-28 2023-04-04 北京卫星信息工程研究所 Ship target fine-grained identification method and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145993A (en) * 2018-08-27 2019-01-04 大连理工大学 SAR image classification method based on multiple features Yu non-negative autocoder
CN110414531A (en) * 2019-03-19 2019-11-05 中船(浙江)海洋科技有限公司 SAR image Local Feature Extraction based on gradient ratio
CN111563414A (en) * 2020-04-08 2020-08-21 西北工业大学 SAR image ship target detection method based on non-local feature enhancement
CN112329542A (en) * 2020-10-10 2021-02-05 中国人民解放军战略支援部队航天工程大学 SAR image ship target detection method based on feature refined network model
CN112487900A (en) * 2020-11-20 2021-03-12 中国人民解放军战略支援部队航天工程大学 SAR image ship target detection method based on feature fusion
CN112597815A (en) * 2020-12-07 2021-04-02 西北工业大学 Synthetic aperture radar image ship detection method based on Group-G0 model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472483B (en) * 2019-07-02 2022-11-15 五邑大学 SAR image-oriented small sample semantic feature enhancement method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145993A (en) * 2018-08-27 2019-01-04 大连理工大学 SAR image classification method based on multiple features Yu non-negative autocoder
CN110414531A (en) * 2019-03-19 2019-11-05 中船(浙江)海洋科技有限公司 SAR image Local Feature Extraction based on gradient ratio
CN111563414A (en) * 2020-04-08 2020-08-21 西北工业大学 SAR image ship target detection method based on non-local feature enhancement
CN112329542A (en) * 2020-10-10 2021-02-05 中国人民解放军战略支援部队航天工程大学 SAR image ship target detection method based on feature refined network model
CN112487900A (en) * 2020-11-20 2021-03-12 中国人民解放军战略支援部队航天工程大学 SAR image ship target detection method based on feature fusion
CN112597815A (en) * 2020-12-07 2021-04-02 西北工业大学 Synthetic aperture radar image ship detection method based on Group-G0 model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
On the arbitrary-oriented object detection: Classification based approaches revisited;Xue Yang et al;arXiv;全文 *
一种基于深层次多尺度特征融合CNN的SAR图像舰船目标检测算法;杨龙等;光学学报;第40卷(第2期);全文 *

Also Published As

Publication number Publication date
CN113657181A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
CN113657181B (en) SAR image rotation target detection method based on smooth tag coding and feature enhancement
Milioto et al. Lidar panoptic segmentation for autonomous driving
Gupta et al. Monitoring and surveillance of urban road traffic using low altitude drone images: a deep learning approach
Rani LittleYOLO-SPP: A delicate real-time vehicle detection algorithm
CN111563414B (en) SAR image ship target detection method based on non-local feature enhancement
Chen et al. MSARN: A deep neural network based on an adaptive recalibration mechanism for multiscale and arbitrary-oriented SAR ship detection
CN112818903A (en) Small sample remote sensing image target detection method based on meta-learning and cooperative attention
Takeki et al. Combining deep features for object detection at various scales: finding small birds in landscape images
Raghavan et al. Optimized building extraction from high-resolution satellite imagery using deep learning
CN108960190B (en) SAR video target detection method based on FCN image sequence model
Suhr et al. End-to-end trainable one-stage parking slot detection integrating global and local information
Xu et al. Fast ship detection combining visual saliency and a cascade CNN in SAR images
CN112287859A (en) Object recognition method, device and system, computer readable storage medium
CN115546681A (en) Asynchronous feature tracking method and system based on events and frames
Al Said et al. Retracted: An unmanned aerial vehicles navigation system on the basis of pattern recognition applications—Review of implementation options and prospects for development
Yildirim et al. Ship detection in optical remote sensing images using YOLOv4 and Tiny YOLOv4
Liu et al. SRM-FPN: a small target detection method based on FPN optimized feature
Zhang et al. CR-YOLOv8: Multiscale object detection in traffic sign images
Dai et al. CODNet: A center and orientation detection network for power line following navigation
CN113269147A (en) Three-dimensional detection method and system based on space and shape, and storage and processing device
Cimarelli et al. Faster visual-based localization with mobile-posenet
CN111680680A (en) Object code positioning method and device, electronic equipment and storage medium
CN116486238A (en) Target fine granularity identification method combining point set representation and graph classification
CN115345932A (en) Laser SLAM loop detection method based on semantic information
Chen et al. Real-time maritime obstacle detection based on YOLOv5 for autonomous berthing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant