CN116953702A - Rotary target detection method and device based on deduction paradigm - Google Patents

Rotary target detection method and device based on deduction paradigm Download PDF

Info

Publication number
CN116953702A
CN116953702A CN202310864157.8A CN202310864157A CN116953702A CN 116953702 A CN116953702 A CN 116953702A CN 202310864157 A CN202310864157 A CN 202310864157A CN 116953702 A CN116953702 A CN 116953702A
Authority
CN
China
Prior art keywords
feature
detection
scale
network
small
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310864157.8A
Other languages
Chinese (zh)
Inventor
梁毅
王雅丽
李军辉
邢孟道
戴志玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310864157.8A priority Critical patent/CN116953702A/en
Publication of CN116953702A publication Critical patent/CN116953702A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/88Radar or analogous systems specially adapted for specific applications
    • G01S13/89Radar or analogous systems specially adapted for specific applications for mapping or imaging
    • G01S13/90Radar or analogous systems specially adapted for specific applications for mapping or imaging using synthetic aperture techniques, e.g. synthetic aperture radar [SAR] techniques
    • G01S13/9021SAR image post-processing techniques
    • G01S13/9027Pattern recognition for feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Electromagnetism (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a rotary target detection method based on deduction paradigm, which comprises the following steps: constructing a target detection network frame comprising a trunk feature extraction module, a feature fusion module with a small-scale enhanced branch and a detection and identification module; carrying out multi-scale feature extraction on the SAR image by utilizing a trunk feature extraction module to obtain a multi-scale feature map comprising small-scale features; carrying out feature fusion on the multi-scale feature map by using a feature fusion module with a small-scale enhanced branch to obtain a multi-level feature map; and (3) inputting the multi-level characteristic diagram into a detection and identification module, and carrying out regression classification detection by using a rotary detection box based on a deduction model to obtain a target detection result. The method avoids the problem of information redundancy caused by horizontal detection, realizes the direction estimation of the target, solves the boundary problem of angle prediction caused by rotation labeling, effectively improves the ship detection performance in complex scenes, and ensures that the detection of the ship target aiming at the SAR image is more stable and accurate.

Description

Rotary target detection method and device based on deduction paradigm
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to a rotating target detection method and device based on deduction paradigm.
Background
The synthetic aperture radar (Synthetic Aperture Radar, SAR) is used as an active microwave active detector, can provide high-resolution images through all-day, all-weather, long-distance and large-range observation, has strong local or global monitoring capability, and has important value in the fields of military reconnaissance and civilian use. The ship target detection is used as a basic function of the marine ship management system, and plays a role in guaranteeing further ship target identification and tracking, so that the method has important significance in researching ship detection in complex marine large scenes. The main problem of SAR image target detection research is to extract a target region of interest from an SAR image and remove environmental clutter and artificial clutter false alarms. The existing mainstream SAR ship target detection method can be divided into a traditional detection algorithm based on model driving and a deep learning detection algorithm based on data driving.
The conventional SAR target detection algorithm is mainly represented by a Constant False Alarm Rate (CFAR) target detection method based on background clutter statistical distribution and a salient target detection algorithm based on a visual attention model. The method based on the constant false alarm rate is difficult to select a proper clutter background model, cannot be well applied to sea and ship target detection under the complex conditions of multi-scale and background clutter, and can generate a large number of false alarms and false alarms. The detection algorithm based on the deep learning is classified into a two-stage detection algorithm represented by R-CNN series and a one-stage detection algorithm represented by YOLO series. In the ship target detection method based on deep learning, the method can be divided into horizontal detection and rotation detection according to different designs of detection frames.
Because the remote sensing SAR image has larger difference in aspects of image scale, data scale and the like, and the background has high complexity, large scale change and more small targets, the application of the target detection algorithm based on deep learning in the SAR image generates great challenges. However, the general deep learning target detection framework has certain universality in the remote sensing image, so that the detection framework thought can be used for reference, but the transplanting cannot be directly carried out. Therefore, how to apply the target detection algorithm based on deep learning to the ship target detection and balance the detection efficiency and the detection accuracy problems is a great challenge.
In the prior art, deep learning detection algorithms applied to the optical field, such as fast RCNN, efficientDet, FCOS, yolov and ReDet networks, are directly adopted to detect sea remote sensing SAR images.
However, as many ship targets in the sea remote sensing SAR image are small targets densely distributed, the ship target rotation angle direction is arbitrary and the detection based on the anchor frame has the problem of regression boundary, the problems of difficult feature extraction, missed detection for the small targets, low detection precision under the conditions of strong clutter background, large target scale difference and the like can occur when the existing target detection algorithm based on deep learning is adopted. In addition, most of existing ship target detection algorithms based on deep learning are used for horizontal detection, redundant information can be brought, direction estimation of targets cannot be achieved, some rotation detectors ignore the boundary problem of angle prediction brought by rotation labeling, and therefore high-aspect ratio targets are difficult to detect, and high-precision detection is difficult to achieve.
Disclosure of Invention
Aiming at the problems that small targets are densely distributed, the target scale difference is large, redundant information is caused by horizontal detection, and the direction estimation of the targets and the boundary problem that angle prediction is caused by rotation labeling cannot be realized, the invention provides a rotating target detection method and device based on a deduction model. The technical problems to be solved by the invention are realized by the following technical scheme:
in a first aspect, the present invention provides a method for detecting a rotating target based on an deductive form, comprising:
constructing a target detection network frame comprising a trunk feature extraction module, a feature fusion module with a small-scale enhanced branch and a detection and identification module;
carrying out multi-scale feature extraction on the SAR image by utilizing the trunk feature extraction module to obtain a multi-scale feature map comprising small-scale features;
performing feature fusion on the multi-scale feature map by using the feature fusion module with the small-scale enhancement branch to obtain a multi-level feature map;
and inputting the multi-level feature images into a detection and identification module, and carrying out regression classification detection on the multi-level feature images by using a rotary detection box based on a deduction model form to obtain a target detection result.
In a second aspect, the present invention provides a rotary target detection device based on a deductive paradigm, comprising:
the network construction module is used for constructing a target detection network frame comprising a trunk feature extraction module, a feature fusion module with a small-scale enhanced branch and a detection and identification module;
the feature extraction module is used for extracting multi-scale features of the SAR image by utilizing the trunk feature extraction module to obtain a multi-scale feature map comprising small-scale features;
the feature fusion module is used for carrying out feature fusion on the multi-scale feature map by utilizing the feature fusion module with the small-scale enhancement branch to obtain a multi-level feature map;
and the detection and identification module inputs the multi-level feature images into the detection and identification module, and regression classification detection is carried out on the multi-level feature images by using a rotary detection box based on a deduction model form, so that a target detection result is obtained.
The invention has the beneficial effects that:
1. according to the deduction-model-based rotating target detection method provided by the invention, on one hand, aiming at the characteristic that the SAR image data set has more small targets, the small-scale enhancement branch is added in the network frame, so that the small-scale ship features can keep better feature response at the bottom end of the pyramid, and the detection capability of the small-scale ship is enhanced; on the other hand, the rotary detection box based on deduction model design is used for solving the problem of regression boundary of angle prediction caused by the sensitivity of the ship target with a large length-width ratio to angle change and the rotary labeling, and the angle parameter can be dynamically adjusted according to the length-width ratio, so that the high-precision detection of the target can be realized. Compared with the existing detection method, the method can not only reduce the information redundancy problem caused by horizontal detection and realize the direction estimation of the target, but also solve the boundary problem of angle prediction caused by rotation annotation, effectively improve the SAR ship detection performance of complex scene, and enable the detection of the SAR image ship target to be more stable and accurate;
2. according to the invention, the pure convolutional neural network model Convnext with excellent performance in the field of computer vision is adopted as a backbone network, so that the feature extraction capability of the backbone network is effectively improved.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
FIG. 1 is a schematic diagram of a method for detecting a rotating object based on a deductive paradigm according to an embodiment of the present invention;
FIG. 2 is a diagram of an object detection network framework provided by an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a feature fusion module with a small-scale enhanced branch according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a rotational regression loss design provided by an embodiment of the present invention;
fig. 5 is a diagram showing the positional relationship among the ground real frame GT, the prediction frame pre_box, and the anchor frame anchor according to an embodiment of the present invention;
fig. 6 is a graph of the results of a simulation experiment using L1 Loss and KLD Loss;
fig. 7 is a block diagram of a rotary target detecting apparatus based on a deductive model according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.
Example 1
Referring to fig. 1, fig. 1 is a schematic diagram of a method for detecting a rotating object based on a deductive model according to an embodiment of the invention, the method includes:
step 1: constructing a target detection network frame comprising a trunk feature extraction module, a feature fusion module with a small-scale enhanced branch and a detection and identification module;
step 2: carrying out multi-scale feature extraction on the SAR image by utilizing a trunk feature extraction module to obtain a multi-scale feature map comprising small-scale features;
step 3: carrying out feature fusion on the multi-scale feature map by using a feature fusion module with a small-scale enhanced branch to obtain a multi-level feature map;
step 4: and inputting the multi-level feature images into a detection and identification module, and carrying out regression classification detection on the multi-level feature images by using a rotary detection box based on a deduction normal form to obtain a target detection result.
Aiming at the problems that small targets are densely distributed, the target scale difference is large, redundant information is brought about by horizontal detection, the direction estimation of the targets cannot be realized, and the rotation labeling brings about the boundary problem of angle prediction, the invention provides a high-precision rotation target detection method based on a deduction model. Aiming at the characteristic that small-scale ships have more targets in the SAR data set, the method adds a small-scale enhancement branch, ensures that the small-scale ship features can keep better feature response at the bottom end of the pyramid, and enhances the detection capability of the small-scale ships; meanwhile, aiming at the problem of angle prediction boundary caused by the sensitivity of the ship target with a large length-width ratio to angle change and the rotation labeling, a rotation detection box based on deduction model design is used, and the angle parameter is dynamically adjusted according to the length-width ratio, so that the high-precision detection of the target is realized.
The rotating target detection method based on the deductive model provided by the invention is described in detail below with reference to the constructed target detection network framework. Specifically, the embodiment adopts a Retinanet-based detection framework, and the network architecture is shown in fig. 2.
Firstly, aiming at the problem of insufficient feature extraction capability of a trunk feature extraction network, in the embodiment, a pure convolutional neural network model Convnext which is excellent in the field of computer vision is adopted as a trunk network to replace an original ResNet50 network, SAR images are input into the trunk feature extraction network Convnext for feature extraction, and a multi-scale feature map { C { comprising small-scale features can be obtained 2 ,C 3 ,C 4 ,C 5 }。
The Convnext network adopted in the embodiment is used as a pure convolution network, and the detection performance of the Convnext network can exceed that of the Swin converter without an additional special module under the conditions that the Convnext network has the same parameters, calculated amount and memory occupation as the Swin converter. The advantages of Convnext networks are mainly manifested in the following aspects:
1. training technical aspects
The optimizer was changed from Adam to AdamW, trained using the training strategy of vision Transformer.
2. Downsampling layer
For the convolutional neural network, the downsampling layer has the effects of reducing the calculated amount of the whole network, increasing the receptive field and simultaneously preventing the problem of overfitting, so that the following convolutional layer obtains more information. The Convnext feature extraction network adopts a single downsampling layer, and is realized by adding a convolution layer with the convolution kernel size of 2 and the step length of 2 after layer normalization, so that the size of a feature map is reduced.
3. Resnext of network structure
The depth convolution Depthwise Convolution in the ResNeXt design concept is adopted to balance accuracy and computational overhead by reducing the calculated amount, and meanwhile, the depth convolution is similar to a self-attention mechanism in a transducer model and operates on a per-channel basis, namely, the interaction of spatial information is performed on the dimension of each channel.
4. Inverted Bottleneck (inverted bottleneck) design
The residual structure used in standard ResNet networks is in the form of (large dimension-small dimension-large dimension) to reduce the computational effort. inverted bottleneck is of an inverse bottleneck structure, and adopts a (small dimension-large dimension-small dimension) form, so that information loss caused by compressed dimensions can be avoided when information is converted between different dimension feature spaces.
5. Activation function
The present network uses a gaussian error linear unit GELU activation function, unlike conventional activation functions, which allows for processing zero mean data and better overall network performance due to its micropower at zero.
According to the invention, the pure convolutional neural network model Convnext with excellent performance in the field of computer vision is adopted as a backbone network, so that the feature extraction capability of the backbone network is effectively improved.
Then, the feature map { C } obtained by the backbone network 2 ,C 3 ,C 4 ,C 5 Sending the multi-level feature { P } into a feature fusion network, and fusing the feature graphs with different multi-scale resolutions to obtain multi-level features { P } 2 ,P 3 ,P 4 }。
Because small-size ships in the SSDD data set occupy most, space dimension reduction is realized through multistage pooling operation at the top end of the pyramid, in the process, the space features of the small-size ships can be gradually diluted, so that the feature response is weakened, and the omission of a small target is easily caused. Therefore, the embodiment is based on a classical FPN network, a small-scale enhancement branch is added, an improved FPN feature fusion network with the small-scale enhancement branch is provided as a feature fusion module, the network brings the C2 level of the backbone network into the whole prediction framework besides carrying out additional processing on the features on the backbone network, and in this way, a level is added at the bottom of the original FPN, and the detection capability of a small-scale ship can be further enhanced by the bottom level.
In this embodiment, the improved FPN feature fusion network structure is shown in fig. 3, and includes an intermediate-level feature fusion network and an output-level feature enhancement network; the intermediate-level feature fusion network comprises a bottom small-scale feature enhancement feature layer and a plurality of intermediate feature layers.
Specifically, from the backbone feature extraction network, for { C 2 ,C 3 ,C 4 ,C 5 Feature fusion processing is carried out on the feature map, a middle-level feature fusion network from bottom to top is constructed, and high-level semantic information is fused into shallow features, such as { N in figure 3 2 ,N 3 ,N 4 Shown, represents the characteristics of each level after fusion, { P } 2 ,P 3 ,P 4 And the feature enhancement path is generated from the bottom to the top, and the feature layers are subjected to layer-by-layer downsampling at a spatial resolution of 2.
Optionally, as an implementation manner, the intermediate-level feature fusion network includes a first intermediate feature layer, a second intermediate feature layer, a third intermediate feature layer and a fourth intermediate feature layer for small-scale feature enhancement from bottom to top; wherein,,
the fourth intermediate feature layer uses a 1 x 1 convolution layer to maximum scale feature map C 5 Adjusting the number of the characteristic channels to obtain a characteristic diagram N 5 The computational expression is as follows:
N 5 ={Conv 1×1 (C 5 )}。
the first middle feature layer, the second middle feature layer and the third middle feature layer firstly utilize a 1 multiplied by 1 convolution layer to adjust the channel number of input features, then each element in the current feature level is added with the feature map element after the up-sampling of the previous level to obtain a fusion feature map, and the calculation expression is as follows:
N i =upsample(N i+1 )+Conv 1×1 (C i ),i=2,3,4。
further, the output stage characteristic enhancement network of the improved FPN network comprises three 3×3 convolution layers which are respectively used for carrying out convolution processing on the fusion characteristic graphs to obtain multiple layersFeature map { P 2 ,P 3 ,P 4 The computational form is expressed as:
P i ={Conv 3×3 (N i )},i=2,3,4。
to sum up, step 3 can be summarized as:
will multiscale feature map { C 2 ,C 3 ,C 4 ,C 5 Inputting into a mid-level feature fusion network to fuse high-level semantic information into shallow features to obtain a fusion feature map { N } 2 ,N 3 ,N 4 };
Will fuse the feature map { N ] 2 ,N 3 ,N 4 The multi-level characteristic map { P } is obtained by enhancement processing of the multi-level characteristic map { P } input into an output-level characteristic enhancement network 2 ,P 3 ,P 4 }。
In the embodiment, the small-scale enhancement branch is added in the network frame, so that the characteristics of the small-scale ship can keep better characteristic response at the bottom end of the pyramid, and the detection capability of the small-scale ship is enhanced
Further, after obtaining the multi-level feature map { P } 2 ,P 3 ,P 4 And after the detection, inputting the detection result into a detection and identification module for carrying out regression classification detection to obtain a final ship detection and identification result.
Specifically, when designing and constructing the detection and identification module, the method mainly comprises the following steps: designing a detection frame to generate a rotary detection box based on a deductive paradigm; constructing regression and classification recognition branches; the rotation regression loss and the classification loss based on the deductive paradigm are designed to obtain an overall loss function for network training. After the network training is completed, the trained network can be used for target detection.
The parts of the detection frame generation and regression, classification and identification branches, loss function design and the like are also called a Head module of the detection network. The principle of design of the rotary detection cartridge based on the deductive paradigm will be described with an emphasis on the loss function section.
1. The detection box is designed to generate a rotation detection box based on a deductive paradigm.
In general, the expression method of the rotary detection cartridge is classified into a generalized form and a deductive form. The design of the rotary detection box based on the generalized model adopts horizontal to general rotary detection, namely, the central point coordinates, width, height (x, y, w, h) of the detection frame are expanded to obtain the central point coordinates, width, height and angle information (x, y, w, h, θ). Rotation detection based on demonstration paradigm is from general rotation to special level detection, i.e. rotation regression loss is designed separately, which can be deduced in the case of level. Two regression loss design methods are shown in fig. 4, wherein (a) is a rotational regression loss design method based on a generalized paradigm and (b) is a rotational regression loss design method based on a deductive paradigm.
In this embodiment, the detection frame includes a ground real frame GT, a prediction frame pre_box, and an anchor frame anchor, and the positional relationship of the three is shown in fig. 5. The ground real frame GT, the prediction frame pre_box and the anchor frame anchor are all represented by five-dimensional vectors of center point coordinates, width, height and angle. Specifically, the ground truth box GT is represented by (x t ,y t ,w t ,h tt ) The Anchor frame Anchor is represented by (x a ,y a ,w a ,h aa ) The prediction frame is represented by (x p ,y p ,w p ,h pp ) And (3) representing.
The network coding stage calculates the offset according to the truth box GT and the Anchor box Anchor, and the decoding stage decodes the delta (i.e. model output) of the prediction box pre_box and the Anchor box Anchor into the pre_box (network testing stage). The corresponding relation between the parameters in the encoding and decoding stages is shown as follows:
encoding:decoding: />
2. And carrying out rotary regression loss design based on a deductive model based on the detection frame.
First, a description will be given of a rotational regression loss based on a generalized paradigm.
For a common generic detection model (horizontal frame detection), the model usually predicts the frame position and size by regression of the form of four offsets to match the actual offset. The overall regression loss of rotation detection from the generalized paradigm is then:
in the method, in the process of the invention,
wherein x, y, w and h respectively represent the abscissa, the ordinate, the width and the height of the central point of the frame, t, a and p respectively represent the ground real frame, the anchor frame and the prediction frame,and->Five parameters represent the position and angle information of the prediction bounding box, < >>And->And the position angle information of the ground truth box is represented.
The rotation regression loss based on the deductive paradigm is then derived on the basis of the rotation regression loss based on the deductive paradigm.
Since KLD (Kullback-Leibler divergence) and its derivatives can dynamically adjust the parameter gradient according to the characteristics of the object. It will adjust the importance of the angle parameter (gradient weight) according to the aspect ratio. This mechanism is critical for high precision detection because for high aspect ratio objects, slight angle errors can lead to severe precision degradation. Thus, the rotated bounding box is converted to a two-dimensional gaussian distribution, and then the KLD divergence between the gaussian distributions is calculated as the regression loss.
Specifically, the bounding box b (x, y, h, w, θ) is converted into two dimensions Gao Si (μ, Σ), and the mean μ and covariance matrix Σ are calculated as follows:
μ=(x,y) T
wherein R is a rotation matrix, and Λ is a eigenvalue diagonal matrix. (x, y, h, w, θ) represent the center point coordinates, width, height, and angle information of the detection frame, respectively.
Then, a prediction boundary box X is calculated p (x p ,y p ,h p ,w pp )~Ν ppp ) And a true bounding box X t (x t ,y t ,h t ,w tt )~Ν ttt ) KLD between two-dimensional gaussian functions is:
distance function D kl Normalized to final deductive-form-based rotational regression loss L reg Expressed as:
where f (·) represents a nonlinear function and τ is the second super-parameter used to modulate the overall loss.
In addition, a Focal Loss function is adopted as the classification Loss, all positive and negative samples participate in Loss calculation, and the specific expression is as follows through weight factor control:
where α is a balance factor for controlling the weight of the positive samples in the overall loss, γ is a modulation factor, (1-c x,y ) γ The modulation factor representing the refractory samples may increase the weight lost by the refractory samples.
The overall loss function can be expressed as:
wherein, c x,y The classification score is represented as a function of the classification score,real class label, t, representing object x,y =(x c ,y c W, h, t) represents the frame position of the regression branch prediction, +.>Representing the position of the real frame, N pos Representing the number of positive samples, λ represents a first hyper-parameter, (x, y) represents sample point coordinates, (x, y) e pos represents that the sample point is a positive sample.
The network is trained by using the constructed overall loss function, and the related training process can refer to the related art, which is not described in detail in this embodiment. After training, the multi-level feature images output by the feature fusion module are input into the detection and identification module, and regression classification detection is carried out on the multi-level feature images by using the rotation detection box based on the deduction model form, so that a target detection result is obtained.
According to the deduction-model-based rotating target detection method provided by the invention, on one hand, aiming at the characteristic that the SAR image data set has more small targets, the small-scale enhancement branch is added in the network frame, so that the small-scale ship features can keep better feature response at the bottom end of the pyramid, and the detection capability of the small-scale ship is enhanced; on the other hand, the rotary detection box based on deduction model design is used for solving the problem of regression boundary of angle prediction caused by the sensitivity of the ship target with a large length-width ratio to angle change and the rotary labeling, and the angle parameter can be dynamically adjusted according to the length-width ratio, so that the high-precision detection of the target can be realized. Compared with the existing detection method, the method can not only reduce the information redundancy problem caused by horizontal detection and realize the direction estimation of the target, but also solve the boundary problem of angle prediction caused by rotation annotation, effectively improve the SAR ship detection performance of complex scenes, and enable the detection of the SAR image ship target to be more stable and accurate.
Example two
On the basis of the first embodiment, the present embodiment provides a rotation target detection device based on a deductive model. Referring to fig. 7, fig. 7 is a block diagram of a rotary target detecting apparatus based on a deduction model according to an embodiment of the present invention, the apparatus includes:
the network construction module is used for constructing a target detection network frame comprising a trunk feature extraction module, a feature fusion module with a small-scale enhanced branch and a detection and identification module;
the feature extraction module is used for extracting multi-scale features of the SAR image by utilizing the trunk feature extraction module to obtain a multi-scale feature map comprising small-scale features;
the feature fusion module is used for carrying out feature fusion on the multi-scale feature map by utilizing the feature fusion module with the small-scale enhanced branch, so as to obtain a multi-level feature map;
and the detection and identification module inputs the multi-level feature images into the detection and identification module, and regression classification detection is carried out on the multi-level feature images by using the rotary detection box based on the deduction model form, so that a target detection result is obtained.
The device provided in this embodiment may implement the method provided in the foregoing embodiment, and the detailed process may refer to the first row number embodiment.
Therefore, the device can also reduce the information redundancy problem caused by horizontal detection, realize the direction estimation of the target, simultaneously solve the boundary problem of angle prediction caused by rotation labeling, effectively improve the SAR ship detection performance of complex scenes, and enable the detection of the SAR image ship target to be more stable and accurate.
Example III
The effectiveness of the invention is verified through simulation experiments, and meanwhile, several existing methods are selected to be compared with the method of the invention, so that the beneficial effects of the invention are further illustrated.
1. Experimental conditions
The experiment is realized by adopting an SSDD+ data set disclosed in China, and comprises ship SAR images under the conditions of different resolutions, sizes, sea conditions and sensor types. Experimental SSDD dataset was calculated as 8: the proportion of 2 is randomly divided into training sets and test sets. The algorithm is realized based on a pytorch target detection framework, iterative updating is carried out on network parameters by using an AdamW optimizer, the maximum iteration number Max ratio=800 is trained, the initial learning rate lr=0.0001, and the experiment is run on a computer with NVIDIA T40c GPU.
2. Measurement index
In order to quantitatively evaluate the detection performance of the network, the simulation uses average accuracy (Average Precision, AP), transmission frames per second (Frames Per Second, FPS) as evaluation criteria.
The AP was used to evaluate the overall performance of the detection method, and the computational expression is as follows:
where P represents accuracy and R represents recall. AP50 is the AP score when the IoU threshold is selected to be 0.5. The accuracy and recall are defined as follows:
wherein T is P Is the number of the ships actually detected, F P Is the number of vessels whose background is detected, F N Representing the number of real ships detected as background. A truly detected ship is defined as an object whose IoU value between the bounding box and the labeling true value is greater than 0.5.
The FPS is used to evaluate the speed of target detection, i.e. the number of pictures that can be processed per second or the time required to process a picture to evaluate the speed of detection, the shorter the time the faster the speed.
3. Experimental content and results analysis
3.1 verifying the effectiveness of the rotary target detection method based on the deduction paradigm, and selecting four classical target detection network frameworks of R3Det, reDet, S-RetinaNet and S-FCOS as comparison experiments for method verification, wherein the results are shown in Table 1.
Comparison of the Performance of the method presented in Table 1 with the prior art
Index (I) R3Det ReDet S-RetinaNet S-FCOS The method of the invention
AP50 80.75 85.42 83.01 88.83 90.52
It can be seen from table 1 that the proposed detection method has a higher AP50 value compared to the classical algorithm, and thus it can be seen that the proposed detection method has a significant detection performance advantage.
It can be seen from table 1 that compared with the classical algorithm, the detection method provided by the invention has a higher AP50 value, so that the detection method provided by the invention has obvious detection performance advantages.
3.2 ablation experiments
To verify the effectiveness of the present invention with a Convnext backbone network, an improved FPN feature fusion network, and a KLD regression loss function, the following comparative experiments were performed, where "v" indicates the use of this module. The experimental results are shown in table 2 below.
Table 2 comparison of module performance
Wherein, experiment 1 is a benchmark experiment, when KLD regression Loss is used to replace the original L1 Loss, AP50 can be increased by 1.27%, convnext is used as a backbone network, and AP50 can be increased by 1.16% on the original basis, and in addition, improving FPN can further increase detection accuracy, which benefits from enhancing small-scale enhancement branches and improving detection capability of small targets of the network.
In summary, the Convnext backbone network, the improved FPN network and the KLD regression loss function are used together, so that the network performance can be improved from 86.83% to 90.52%, and the AP50 is improved by 3.69%, thereby further illustrating the effectiveness of the method.
3.3 backbone network contrast experiments
The present invention uses Convnext network instead of ResNet50 as the backbone feature extraction network, and the test performance comparison results are shown in Table 3 below.
Table 3 backbone feature extraction network performance assessment results
As can be seen from table 3 above, under the condition that the feature fusion network adopts FPN and the loss function is consistent, the adoption of Convnext backbone network has better effect than the adoption of ResNet50, and the AP50 is improved by 1.16%, and has a faster detection speed.
3.4 effect of KLD regression loss on high accuracy detection
Three different images are selected, and network training is carried out by using L1 Loss and KLD regression Loss respectively, wherein the detection results are shown in figure 6, wherein (a) the image is the labeling position of the three different images, (b) the detection result after the network training is carried out by using the L1 Loss, and (c) the detection result after the network training is carried out by using the KLD regression Loss.
Comparing the detection results of fig. 6, the detection effect using the KLD loss function is significantly better than the L1 loss, which indicates that the KLD loss is very suitable for detecting a large aspect ratio and a small target, and further proves that the KLD loss can effectively improve the detection precision and realize the accurate detection effect of the target.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (10)

1. A deductive form-based method for detecting a rotating object, comprising:
constructing a target detection network frame comprising a trunk feature extraction module, a feature fusion module with a small-scale enhanced branch and a detection and identification module;
carrying out multi-scale feature extraction on the SAR image by utilizing the trunk feature extraction module to obtain a multi-scale feature map comprising small-scale features;
performing feature fusion on the multi-scale feature map by using the feature fusion module with the small-scale enhancement branch to obtain a multi-level feature map;
and inputting the multi-level feature images into a detection and identification module, and carrying out regression classification detection on the multi-level feature images by using a rotary detection box based on a deduction model form to obtain a target detection result.
2. The deduction paradigm-based rotation target detection method according to claim 1, wherein the trunk feature extraction module uses a pure convolutional neural network model Convnext as a trunk feature extraction network, and performing multi-scale feature extraction on the SAR image by using the trunk feature extraction module, to obtain a multi-scale feature map including small-scale features, includes:
and inputting the SAR image into a trunk feature extraction network Convnext for feature extraction to obtain a multi-scale feature map comprising small-scale features.
3. The deductive-pattern-based rotation target detection method according to claim 2, wherein the feature fusion module with small-scale enhancement branches comprises an improved FPN feature fusion network with small-scale enhancement branches, the improved FPN feature fusion network comprising an intermediate-level feature fusion network and an output-level feature enhancement network;
then, feature fusion is carried out on the multi-scale feature map by using a feature fusion network with a small-scale enhanced branch, so as to obtain a multi-level feature map, which comprises the following steps:
inputting the multi-scale feature map into the mid-level feature fusion network to fuse high-level semantic information into shallow features to obtain a fusion feature map;
and inputting the fusion feature map into the output-stage feature enhancement network for enhancement processing to obtain a multi-level feature map.
4. The deductive-pattern-based rotation target detection method according to claim 3, wherein the intermediate-level feature fusion network comprises an underlying small-scale feature enhancement feature layer and a plurality of intermediate feature layers; wherein, each characteristic layer is subjected to layer-by-layer downsampling at a spatial resolution of 2.
5. The deductive-pattern-based rotation target detection method according to claim 3, wherein the intermediate-level feature fusion network comprises a first intermediate feature layer, a second intermediate feature layer, a third intermediate feature layer and a fourth intermediate feature layer for small-scale feature enhancement from bottom to top; wherein,,
the fourth middle characteristic layer adjusts the characteristic channel number of the characteristic graph with the maximum scale by using a convolution layer;
the first middle feature layer, the second middle feature layer and the third middle feature layer firstly utilize a convolution layer to adjust the channel number of input features, and then each element in the current feature level is added with the feature map element after the up-sampling of the previous level to obtain a fusion feature map.
6. The deduction-paradigm-based rotation target detection method according to claim 3, wherein the output-stage feature enhancement network comprises three convolution layers, which are respectively used for carrying out convolution processing on the fusion feature map to obtain a multi-level feature map.
7. The deductive-pattern-based rotation target detection method according to claim 1, wherein constructing the detection recognition module comprises:
designing a detection frame to generate a rotary detection box based on a deductive paradigm;
constructing regression and classification recognition branches;
the rotation regression loss and the classification loss based on the deductive paradigm are designed to obtain an overall loss function for network training.
8. The method for detecting a rotating target based on a deductive paradigm according to claim 7, wherein the detection frames include a ground real frame GT, a prediction frame pre_box, and an anchor frame anchor; the ground real frame GT, the prediction frame pre_box and the anchor frame anchor are all represented by five-dimensional vectors of center point coordinates, width, height and angles; wherein,,
the ground real frame GT is formed by (x t ,y t ,w t ,h tt ) A representation; the prediction block pre_box consists of (x) p ,y p ,w p ,h pp ) A representation; the Anchor frame Anchor is formed by (x) a ,y a ,w a ,h aa ) And (3) representing.
9. The deductive-pattern-based rotation target detection method according to claim 7, wherein the overall loss function is expressed as:
wherein, c x,y The classification score is represented as a function of the classification score,real class label, t, representing object x,y =(x c ,y c W, h, t) represents the frame position of the regression branch prediction, +.>Representing the position of the real frame, N pos Representing the number of positive samples, λ representing a first hyper-parameter, (x, y) representing sample point coordinates, (x, y) e pos representing that the sample point is a positive sample;
L cls representing the classification loss, the expression is:
where α is a balance factor for controlling the weight of the positive samples in the overall loss, γ is a modulation factor, (1-c x,y ) γ The modulation factor representing the difficult sample can increase the weight of the loss of the difficult sample;
L reg representing a deductive-paradigm-based rotational regression loss, expressed as:
where f (·) represents a nonlinear function, τ represents a second super-parameter for modulating the overall loss, D kl Representing a prediction bounding box X p (x p ,y p ,h p ,w pp )~Ν ppp ) And a true bounding box X t (x t ,y t ,h t ,w tt )~Ν ttt ) KLD distance between two-dimensional gaussian functions, μ and Σ are mean and standard deviation representing bounding box, subscripts t, p represent ground true bounding box and predicted edgeAnd (5) a boundary frame.
10. A deductive-form-based rotating object detection device, comprising:
the network construction module is used for constructing a target detection network frame comprising a trunk feature extraction module, a feature fusion module with a small-scale enhanced branch and a detection and identification module;
the feature extraction module is used for extracting multi-scale features of the SAR image by utilizing the trunk feature extraction module to obtain a multi-scale feature map comprising small-scale features;
the feature fusion module is used for carrying out feature fusion on the multi-scale feature map by utilizing the feature fusion module with the small-scale enhancement branch to obtain a multi-level feature map;
and the detection and identification module inputs the multi-level feature images into the detection and identification module, and regression classification detection is carried out on the multi-level feature images by using a rotary detection box based on a deduction model form, so that a target detection result is obtained.
CN202310864157.8A 2023-07-13 2023-07-13 Rotary target detection method and device based on deduction paradigm Pending CN116953702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310864157.8A CN116953702A (en) 2023-07-13 2023-07-13 Rotary target detection method and device based on deduction paradigm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310864157.8A CN116953702A (en) 2023-07-13 2023-07-13 Rotary target detection method and device based on deduction paradigm

Publications (1)

Publication Number Publication Date
CN116953702A true CN116953702A (en) 2023-10-27

Family

ID=88445604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310864157.8A Pending CN116953702A (en) 2023-07-13 2023-07-13 Rotary target detection method and device based on deduction paradigm

Country Status (1)

Country Link
CN (1) CN116953702A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611966A (en) * 2023-10-31 2024-02-27 仲恺农业工程学院 Banana identification and pose estimation method based on Yolov7 rotating frame

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611966A (en) * 2023-10-31 2024-02-27 仲恺农业工程学院 Banana identification and pose estimation method based on Yolov7 rotating frame

Similar Documents

Publication Publication Date Title
CN110135267B (en) Large-scene SAR image fine target detection method
Mahaur et al. Small-object detection based on YOLOv5 in autonomous driving systems
Tian et al. A dual neural network for object detection in UAV images
CN111079739B (en) Multi-scale attention feature detection method
Wang et al. YOLOv3-MT: A YOLOv3 using multi-target tracking for vehicle visual detection
CN116188999B (en) Small target detection method based on visible light and infrared image data fusion
Wang et al. Ship detection based on fused features and rebuilt YOLOv3 networks in optical remote-sensing images
CN113762003B (en) Target object detection method, device, equipment and storage medium
Xu et al. Fast ship detection combining visual saliency and a cascade CNN in SAR images
Chen et al. Ship target detection algorithm based on improved YOLOv3 for maritime image
CN116953702A (en) Rotary target detection method and device based on deduction paradigm
Fan et al. A novel sonar target detection and classification algorithm
Kim et al. Rotational multipyramid network with bounding‐box transformation for object detection
CN112149526A (en) Lane line detection method and system based on long-distance information fusion
Yildirim et al. Ship detection in optical remote sensing images using YOLOv4 and Tiny YOLOv4
Zhao et al. Multitask learning for sar ship detection with gaussian-mask joint segmentation
Yang et al. Foreground enhancement network for object detection in sonar images
CN116310837B (en) SAR ship target rotation detection method and system
CN116758340A (en) Small target detection method based on super-resolution feature pyramid and attention mechanism
Liu et al. Find small objects in UAV images by feature mining and attention
CN116958780A (en) Cross-scale target detection method and system
Xu et al. Oil tank detection with improved EfficientDet model
Zhao et al. Deep learning-based laser and infrared composite imaging for armor target identification and segmentation in complex battlefield environments
Park et al. Automatic radial un-distortion using conditional generative adversarial network
CN112560907A (en) Limited pixel infrared unmanned aerial vehicle target detection method based on mixed domain attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination