CN114022751B - SAR target detection method, device and equipment based on feature refinement deformable network - Google Patents

SAR target detection method, device and equipment based on feature refinement deformable network Download PDF

Info

Publication number
CN114022751B
CN114022751B CN202111298372.3A CN202111298372A CN114022751B CN 114022751 B CN114022751 B CN 114022751B CN 202111298372 A CN202111298372 A CN 202111298372A CN 114022751 B CN114022751 B CN 114022751B
Authority
CN
China
Prior art keywords
feature
sar
level
target
deformable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111298372.3A
Other languages
Chinese (zh)
Other versions
CN114022751A (en
Inventor
赵琰
陈卓
赵凌君
张思乾
雷琳
唐涛
熊博莅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111298372.3A priority Critical patent/CN114022751B/en
Publication of CN114022751A publication Critical patent/CN114022751A/en
Application granted granted Critical
Publication of CN114022751B publication Critical patent/CN114022751B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a SAR target detection method, device and equipment based on a feature refinement deformable network. The method comprises the following steps: according to the imaging characteristics of the aircraft target in the SAR image, a Feature Fusion Attention Module (FFAM) and a deformable bypass connection module (DLCM) are designed. The FFAM can fully fuse texture features of the target in the low-dimensional feature map and abstract semantic information in the high-dimensional feature map, effectively screen key features of the target and inhibit background interference. When the refined feature pyramid is constructed, the DLCM further extracts the discretized scattering features of the target by stacking a plurality of deformable convolutions, so that the representation capability of the algorithm on the SAR image airplane target is improved.

Description

SAR target detection method, device and equipment based on feature refinement deformable network
Technical Field
The present invention relates to the technical field of SAR image detection, and in particular, to a method, an apparatus, and a device for detecting a SAR target based on a feature refinement deformable network.
Background
Synthetic aperture radar (Synthetic Aperture Radar, SAR) has all-day, all-weather earth-looking capabilities, is widely used in the military and civilian fields, and corresponding interpretation algorithms for specific tasks of SAR images have been proposed successively in the last decades. The continuous improvement of SAR image quality and quantity promotes the further development of high-resolution SAR image target interpretation algorithms. As one of tasks of high-resolution SAR image interpretation, the detection and identification of the SAR image airplane targets aims at accurately positioning and classifying the airplane targets from the SAR image of a large scene, and has important application value in the fields of civilian use (such as airport scheduling), military use (such as information investigation) and the like.
However, due to the high discretization of the aircraft target backscatter point distribution, the diversity of aircraft gestures, and the complex surrounding background interference, accurate detection of aircraft targets in SAR images remains a serious challenge.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a method, an apparatus, and a device for detecting a SAR target based on a feature refinement deformable network, which can efficiently detect a target in a SAR image.
A method for SAR target detection based on a feature refinement deformable network, the method comprising:
acquiring an SAR image training set, wherein the SAR image training set comprises a plurality of SAR training images, and each SAR training image comprises more than one target;
inputting each SAR training image into a feature refinement deformable network and training the SAR training image to obtain a trained feature refinement deformable network;
the feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points;
Acquiring an SAR target image to be detected;
inputting the SAR target image into the trained feature refinement deformable network, detecting targets in the SAR target image, and predicting the positions of the targets.
In one embodiment, the basic feature extraction structure adopts a VGG-16 neural network to extract basic features in the SAR training image;
wherein the output of three intermediate feature extraction layers of the VGG-16 neural network is selected as the input of the refined pyramid structure.
In one embodiment, the three intermediate feature extraction layers are a conv4_3 layer, a conv5_3 layer, and a Conv7 layer, which respectively correspond to low-level base features, middle-level base features, and high-level base features with different output sizes.
In one embodiment, the feature fusion attention unit comprises three groups;
the deformable bypass connection units comprise three groups connected with each group of feature fusion attention units respectively.
In one embodiment, inputting the low-level base features, the medium-level base features, and the high-level base features into the refined pyramid structure includes:
inputting the high-level basic features and the middle-level basic features into a third group of feature fusion attention units for feature fusion to obtain a fusion feature map related to the high-level basic feature map, and inputting the fusion features into a corresponding group of deformable bypass connection units to construct a high-level refined feature map;
Inputting the middle-layer basic feature map, the low-layer basic feature map and the high-layer refined feature map into a second group of feature fusion attention units for feature fusion to obtain a fusion feature map related to the middle-layer basic feature map, and inputting the fusion feature into a corresponding group of deformable bypass connection units to construct a middle-layer refined feature map;
and inputting the low-level basic feature image and the middle-level refined feature image into a first group of feature fusion attention units for feature fusion to obtain a fusion feature image related to the low-level basic feature image, and inputting the fusion feature into a corresponding group of deformable bypass connection units to construct the low-level refined feature image.
In one embodiment, the fused feature map is further input to a split attention unit for further information extraction before being input to the deformable bypass connection unit.
In one embodiment, the cascade detection head structure includes an anchor point refinement unit and a target detection unit;
inputting the low-level basic features, the middle-level basic features and the high-level basic features into the anchor point refinement unit to obtain a refined anchor point frame related to a target in the SAR training image;
Inputting the high-layer refined feature map, the middle-layer refined feature map and the low-layer refined feature map into the target detection unit to predict the refined anchor point frame so as to output a target detection result of the SAR training image.
A SAR target detection apparatus based on a feature refinement deformable network, the apparatus comprising:
the SAR image training system comprises a training data set acquisition module, a target acquisition module and a target acquisition module, wherein the training data set acquisition module is used for acquiring an SAR image training set, the SAR image training set comprises a plurality of SAR training images, and each SAR training image comprises more than one target;
the feature refinement deformable network training module is used for inputting each SAR training image into a feature refinement deformable network and training the SAR training image to obtain a trained feature refinement deformable network;
the feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points;
the SAR target image acquisition module is used for acquiring an SAR target image to be detected;
The target detection module is used for inputting the SAR target image into the trained feature refinement deformable network, detecting targets in the SAR target image and predicting the positions of the targets.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring an SAR image training set, wherein the SAR image training set comprises a plurality of SAR training images, and each SAR training image comprises more than one target;
inputting each SAR training image into a feature refinement deformable network and training the SAR training image to obtain a trained feature refinement deformable network;
the feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points;
acquiring an SAR target image to be detected;
Inputting the SAR target image into the trained feature refinement deformable network, detecting targets in the SAR target image, and predicting the positions of the targets.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring an SAR image training set, wherein the SAR image training set comprises a plurality of SAR training images, and each SAR training image comprises more than one target;
inputting each SAR training image into a feature refinement deformable network and training the SAR training image to obtain a trained feature refinement deformable network;
the feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points;
acquiring an SAR target image to be detected;
inputting the SAR target image into the trained feature refinement deformable network, detecting targets in the SAR target image, and predicting the positions of the targets.
According to the SAR target detection method, the SAR target detection device and the SAR target detection equipment based on the feature refinement deformable network, the Feature Fusion Attention Module (FFAM) and the deformable bypass connection module (DLCM) are designed according to the imaging characteristics of the airplane target in the SAR image. The FFAM can fully fuse texture features of the aircraft target in the low-dimensional feature map and abstract semantic information in the high-dimensional feature map, effectively screen key features of the target, effectively inhibit background interference, and further extract discrete scattering features of the aircraft target by stacking a plurality of deformable convolutions, so that the characterization capability of an algorithm on the SAR image aircraft target is improved.
Drawings
FIG. 1 is a flow diagram of a method for SAR target detection based on a feature refinement deformable network in one embodiment;
FIG. 2 is a schematic diagram of a refined pyramid structure in one embodiment;
FIG. 3 is a schematic diagram of a feature fusion attention unit in one embodiment;
FIG. 4 is a schematic diagram of the structure of a separation attention unit in one embodiment;
FIG. 5 is a schematic diagram of a deformable convolutional network in one embodiment;
fig. 6 is a schematic view of SAR image slices in a self-built dataset in an experiment;
Fig. 7 is a schematic diagram of different network target detection results and corresponding characteristic active regions based on CNN in the experiment;
fig. 8 is a schematic diagram of P-R curves of different network target detection results based on CNN in the experiment.
FIG. 9 is a block diagram of a SAR target detection device based on a feature refinement deformable network in one embodiment;
fig. 10 is an internal structural view of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
With the earth-looking capability of synthetic aperture radars (Synthetic Aperture Radar, SAR) over the whole day and around the clock, it is widely used in both military and civilian fields, and corresponding interpretation algorithms for specific tasks of SAR images have been proposed successively over the past decades. The convolutional neural network (Convolutional Neural Network, CNN) algorithm based on depth features has great breakthrough in computer vision tasks (such as object classification, detection, segmentation, etc.) compared with the traditional algorithm based on gradient, texture, etc. features (such as HoG, LBP, SIFT) by virtue of the strong feature extraction capability.
However, in the prior art, most of the proposed CNN-based algorithms focus on the high-level semantic features of the target in the SAR image, and the target detail information in the lower layer, i.e., the shallow layer, is still not available. In addition, the target features are extracted by convolution check with fixed size in the algorithms, and the method has larger mismatch with scattering information of airplane discretization and irregular distribution in some targets such as SAR images with airplanes as targets, so that the target positions can not be well identified when the SAR images are detected by using a deep neural network.
In view of the above problems, as shown in fig. 1, there is provided a method for detecting SAR target based on a feature refinement deformable network, comprising the steps of:
step S100, an SAR image training set is obtained, wherein the SAR image training set comprises a plurality of SAR training images, and each SAR training image comprises more than one target;
step S110, inputting each SAR training image into a feature refinement deformable network and training the SAR training image to obtain a trained feature refinement deformable network;
the feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points;
Step S120, acquiring an SAR target image to be detected;
step S130, inputting the SAR target image into a trained feature refinement deformable network, detecting targets in the SAR target image, and predicting the positions of the targets.
In the method, steps S100 to S110 are a process of training the feature refinement deformable network, and steps S120 to S130 are a process of performing actual detection using the trained feature refinement deformable network.
In this embodiment, the feature refinement deformable network may be applied to detect a variety of targets, such as aircraft.
In step S100, the SAR training image for training the network includes images of multiple aircraft targets for each image, so that the trained network can identify each aircraft in the SAR image and the location of each aircraft.
In step S110, the construction of a feature refinement deformable network is further described, including a basic feature extraction structure, a refinement pyramid structure, and a cascading detection head structure. In order to solve the problem that the neural network only extracts the high-level features of the image target in the prior art, a feature fusion attention unit (Feature Fusion Attention Module, FFAM) is designed in a refined pyramid structure, and the feature fusion attention unit can fully fuse texture details in the low-level features and high-dimensional semantic information in the high-level features of the target to highlight key features of the target. In addition, a deformable bypass connection module (Deformable Lateral Connection Module, DLCM) is designed in the refined pyramid structure aiming at the characteristic of high discretization of the backscattering point distribution in the SAR image, which aims at an airplane.
In this embodiment, the basic feature extraction structure extracts basic features in the SAR training image by using a VGG-16 neural network, wherein outputs of three intermediate feature extraction layers of the VGG-16 neural network are selected as inputs of the refined pyramid structure.
Specifically, the basic feature extraction structure is a bottom-up network forward propagation, as shown in fig. 2, and adopts a VGG-16 neural network with the last full connection layer removed as a feature extractor, so as to extract basic features of a target in the SAR image, and select three middle layer features of the VGG-16 neural network, namely a conv4_3 layer, a conv5_3 layer and a Conv7 layer, so as to characterize hierarchical semantic features of the target, wherein the spatial dimensions of the hierarchical semantic features are 80X80, 40X40 and 20X20, respectively.
The Conv4_3 layer, the Conv5_3 layer and the Conv7 layer respectively correspond to low-level basic features, middle-level basic features and high-level basic features with different output sizes, and the low-level basic features, the middle-level basic features and the high-level basic features are all used as inputs of a refined pyramid structure.
Because the outline of the aircraft in the optical image is more complete and clear, the features of the aircraft in the SAR image tend to be more discrete and localized, the texture details, namely the low-layer feature information, are more abundant, in order to fully utilize the texture information of the target in the shallow-layer feature image to improve the discrimination capability of an algorithm on the aircraft target, three groups of feature fusion attention units and deformable bypass connection units respectively connected with the feature fusion attention units are designed in a refined pyramid structure, the connection relationship is shown in figure 2, wherein three boxes with letters F represent the feature fusion attention units, and three boxes with children D represent the deformable bypass connection units.
Specifically, inputting the low-level base features, the medium-level base features, and the high-level base features into the refined pyramid structure includes:
inputting the high-level basic features, namely Conv7 layer output features and middle-level basic features, namely Conv 5-3 layer output features, into a third group of feature fusion attention units for feature fusion to obtain a fusion feature map related to the high-level basic feature map, and inputting the fusion features into a corresponding group of deformable bypass connection units to construct a high-level refined feature map, namely P3 in FIG. 2;
inputting the characteristics output by the middle basic characteristic diagram, the low basic characteristic diagram, namely Conv4_3 layer, and the high fine characteristic diagram into a second group of characteristic fusion attention units for characteristic fusion to obtain a fusion characteristic diagram related to the middle basic characteristic diagram, and inputting the fusion characteristic into a corresponding group of deformable bypass connection units to construct a middle fine characteristic diagram, namely P2 in FIG. 2;
and inputting the low-level basic feature image and the middle-level refined feature image into a first group of feature fusion attention units for feature fusion to obtain a fusion feature image related to the low-level basic feature image, and inputting the fusion feature into a corresponding group of deformable bypass connection units to construct the low-level refined feature image, namely P1 in FIG. 2.
As can be seen from fig. 2, the different hierarchical features are propagated from top to bottom in the feature fusion attention unit and the deformable bypass connection unit. Specifically, P3 is constructed from Conv7, conv5_3 via the first FFAM and DLCM. P2 is constructed from the outputs of conv5_3, conv4_3 and P3 via a second FFAM and DLCM. Finally, P1 is built up from P2, conv4_3 through a third FFAM and DLCM.
Specifically, in the feature fusion attention unit, in order to fully utilize the detailed information of the aircraft target in the low-dimensional feature map, the capability of the refined feature map to characterize the multi-dimensional feature of the aircraft target is improved, and the complex background interference is suppressed, as shown in fig. 3, in order to construct a feature map I i Feature layer F is checked using two convolutions of size 3×3 i-1 Downsampling is performed. And using one rollDeconvolution operation with core size of 4×4 is performed on feature layer F i+1 Upsampling to obtain a complex with F i Adjusted feature map F with the same spatial dimension i-1 And F is equal to i+1 . Further, common convolution with convolution kernel size 3×3 is used to pair F respectively i-1 And F is equal to i+1 Further transformation can promote the characterization capability of the feature map with specific dimension. These three layers of feature maps containing different semantic information are stitched along the channel dimension (i.e., in FIG. 2 Operation), and obtaining a fusion feature map with characterization capability on the multi-dimensional semantic information of the airplane target through a relu activation function.
Taking a second group of feature fusion attention units as an example, the units take a middle-layer basic feature map, a low-layer basic feature map and a high-layer refined feature map as inputs, and the middle-layer basic feature map is F i The low-level basic feature diagram is F i-1 While the high-level fine feature map is F i+1 . And (3) performing downsampling on the low-level basic feature map by using two convolution kernels with the size of 3 multiplied by 3, performing downsampling on the high-level refined feature map by using a deconvolution operation with the size of 4 multiplied by 4, enabling the two feature maps to have the same spatial size with the middle-level basic feature map, further transforming the high-level refined feature map and the low-level basic feature map respectively by adopting common convolution with the size of 3 multiplied by 3, and splicing three feature maps containing different semantic information along a channel dimension to obtain a fusion feature map.
Although the fused features can represent the richer semantic information of the airplane target, more background interference is inevitably introduced, and the discrimination capability of the algorithm on the target is reduced. In view of the variability of the aircraft features contained in the different feature maps, a separation attention unit (SAM) as shown in fig. 4 is also introduced in this embodiment, and the separation attention unit is activated by the relu function and spliced along the channel dimension for three-layer multi-scale feature maps in the second map, and then input into the module to further extract the key information of the aircraft targets in the feature maps.
In the separation attention unit, feature C of cascade i Firstly, through a convolution layer (Conv) with the channel number of c×r and grouping and dividing along the channel dimension, an r group characteristic diagram with the channel number of c, namely K, can be obtained 1 ,K 2 ,K 3 …K r . Where r is the number of feature map sets.
The r groups of feature graphs are added element by element and mapped by an average pooling operation (Avgpool) and a full Connection layer (FC) to obtain a channel weighted feature vector W i . By nonlinear mapping of Softmax activation function (delta), r sets of channel attention weights, namely a, can be obtained 1 ,a 2 ,a 3 …a r . Finally, the feature map K is input through the r groups of channel attention weights 1 ,K 2 ,K 3 …K r Respectively weighting and adding element by element to obtain a final weighted characteristic diagram I i I.e. the fusion profile that is ultimately entered into the deformable bypass connection unit.
In this embodiment, in order to obtain the final different layer refinement feature map, a deformable bypass connection unit is used to accurately extract the discretized features of the aircraft target in the SAR image, and the interior of the unit is formed by a deformable convolution network (Deformable Convolution Network, DCN).
Specifically, the input feature map X, that is, the fusion feature map, is subjected to a normal convolution operation with a size of 3×3 and an output channel number of 2, so as to obtain a two-dimensional offset S (i, j) of the convolution kernel sampling position. And then, the sampling position of the convolution kernel of the deformable convolution is further adjusted through S (i, j), and the key features of the target in the feature map X are extracted by adopting the new convolution, so that the output feature map O with more characterization capability for the aircraft target can be obtained. By superimposing multiple DCNs in the DLCM, the DCNs are structured as shown in fig. 5, which allows for a more flexible and accurate characterization of the discretized characteristics of the aircraft target.
Because of the small target size of the aircraft, the final position of the aircraft is predicted directly in the large scene SAR image with great difficulty and inaccuracy of the result is easily caused. Therefore, in this embodiment, a cascade detection strategy is also introduced, which is composed of an anchor point refinement unit (Anchor Refinement Module, ARM) and a target detection unit (Objet Detection Module, ODM), and can progressively perform refined predictive regression on the position of the aircraft target.
In this embodiment, the low-level basic feature, the middle-level basic feature and the high-level basic feature are input into the anchor point refinement unit to obtain a refined anchor point frame related to a target in the SAR training image, and then the high-level refined feature map, the middle-level refined feature map and the low-level refined feature map are input into the target detection unit to predict the refined anchor point frame so as to output a target detection result of the SAR training image.
Specifically, the feature graphs output by Conv4_3, conv5_3 and Conv7 layers are firstly input into an ARM unit, so that filtering of a simple negative sample anchor point frame and preliminary regression of the position of the simple negative sample anchor point frame can be realized, and a Refined anchor point frame (Refined Anchors) is obtained.
And then, respectively inputting the P1, P2 and P3 layer feature maps into an ODM module, further predicting the refined anchor point frame, and predicting the final position information of the aircraft target to obtain a final SAR image aircraft target detection result.
When training the feature refinement deformable network, the total loss function is as follows:
L total =L ARM +L ODM
in the formula, L ARM To refine the loss of units for anchor points, L ODM Is the loss of the target detection unit.
Next, experiments and analyses were also performed on the above method:
A. experimental data set
The experiment adopts a self-built data set to explore the detection performance of the method on the SAR image airplane target. The raw data contains 174 large scene SAR images taken by GF-3 and terraSAR-X satellites. Through the labeling of interpretation specialists and random cutting of the original large scene, 2317 original slices with 640 x 640 size are obtained through experiments, and the slices comprise 6781 planes, and the range of wings of the planes is about 25 meters to 75 meters. Experiments slice datasets were divided into training, validation and test sets at a ratio of 5:2:3. The original training set is expanded by adopting the technologies of contrast ratio conversion, brightness conversion, mirror image overturning, size expansion, random cutting technology and the like, so that the sample diversity of the airplane target in the training data set is improved. As shown in fig. 6, the first two and the last two pictures show satellite slice images from GF-3 and terrsar-X, respectively, with rectangular boxes marking the position of the aircraft.
B. Super parameter setting
To detect aircraft targets of different sizes and proportions, the default anchor frame sizes of the P1, P2 and P3 feature maps were set to 32, 64 and 128, respectively, and the aspect ratios were set to 1, 0.707 and 1.414. Thus, there are 3 default anchor boxes corresponding to each point of the refined feature map. During the algorithm training process, the size of the small lot (miniband) is set to 12, and all methods iterate 200 rounds. The initial learning rate of the network was 1e-3, which was attenuated at an attenuation rate of 0.1 at the 75 th and 150 th rounds. For stable training, a warm-up strategy (warm up) was used in the first 5 iteration cycles (epochs). In addition, the experiment used a random gradient descent method (SGD) training network parameters with a decay rate of 5e-4 and a momentum of 0.9.
C. Super parameter setting
In the experiment, accuracy (P), recall (R) and F score (F 1 ) These metrics are evaluated for different methods, which are defined by equations 2, 3 and 4, respectively. The real cases (TP), false counter cases (FN) and false positive cases (FP) refer to the number of real targets detected by algorithm and missed detection and the number of detected false alarm targets.
Precision(P)=TP/(TP+FP) (2)
Recall(R)=TP/(TP+FN) (3)
F 1 =2PR/(P+R) (4)
Furthermore, the temporal and spatial complexity of the model is evaluated in terms of frame rate (FPS) and parameter quantity (Parameters), which are defined by equations 5 and 6, respectively. Where t is the time at which the algorithm detects an image. In formula 6, C out 、C in Representing the number of output and input channels, k w And k h Representing the convolution kernel size.
FPS=1/t (5)
Parameters=C out ·(k w ·k h ·C in +1) (6)
D. Results and analysis
1) Functional exploration of internal modules
The impact of different modules in the FRDN on algorithm performance is shown in table 1. FRDN without any proposed module is used as a reference for comparison, and FFAM is split into two parts of Feature Fusion (FF) and Separation Attention Module (SAM), and the contributions of the FRDN to algorithm detection performance are discussed respectively. In addition, experiments have also compared the present method in view of the similarity of network structure and data processing flow of the refinished det and the present method.
Table 1 influence of FRDN internal modules on algorithm performance
In terms of detection accuracy, when no module is employed (FF, SAM, DLCM), the Baseline algorithm (Baseline) detects F of the aircraft target detection result 1 The AP value is respectively 0.8 percent and 0.5 percent higher than the RefineDet algorithm, and the false alarm target number is also reduced to a certain extent (from 420 to 382). After Feature Fusion (FF) is adopted to introduce shallow detail information of the target in the high-dimensional feature map, the detection performance of the Baseline (w/FF) algorithm on the aircraft target is reduced compared with that of Baseline. The method shows that only detailed information of the aircraft target in the low-dimensional feature map is merged into the high-dimensional feature map, more background interference is introduced to a certain extent, and accurate characterization of the target in the high-dimensional feature map by an algorithm is weakened. When the SAM module is introduced, the Baseline (w/FF+SAM) has a greatly improved capability of detecting aircraft targets. In particular, F of the algorithm 1 The AP value is respectively improved by 2.2 percent and 0.5 percent compared with the Baseline, and the false alarm target number is also greatly reduced (from 382 to 274). The SAM can accurately extract key features of the airplane targets, fully inhibit background interference and improve the identification capability of the algorithm on the airplane targets. After DLCM is introduced to strengthen the extraction of target discretization characteristics, the proposed algorithm (FRDN) is compared with the Baseline algorithm for airplane target detectionF 1 The fraction is improved by 3.5%, and the algorithm detection accuracy is improved by 8.4% compared with Baseline. This fully demonstrates the effectiveness of the FFAM consisting of FF and SAM and the DLCM designed by the algorithm for aircraft target detection. In terms of detection speed, the FRDN comprises more modules than the refine det and Baseline algorithm, and the more parameter quantity is introduced, so that the running speed of the algorithm is reduced to a certain extent, and the frame rate is reduced from 63 to 46FPS.
To fully explore the effect of different modules on the feature map, fig. 7 shows the detection results of the refine det and Baseline after different modules are introduced on the aircraft targets in the same area and the visual feature map. Wherein the activation areas of the aircraft edge and center on the feature map are represented by red arrows a and B, respectively. Compared with the detection result of the refine det on the airplane target and the corresponding feature map, the Baseline algorithm does not detect the tail of the airplane as a false alarm target, but has lower recognition accuracy on the airplane. After the FF module is introduced, the tail of the aircraft is still detected as another aircraft in the detection result of the algorithm due to the lack of extraction of the target key information in the fused feature map, and the main body area of the aircraft target on the feature map is weak to activate. After the SAM is introduced, the detection confidence of the algorithm on the aircraft target is higher, and the main body part of the target is more prominent on the feature map. This also verifies the accurate feature extraction and discrimination capabilities of the SA module for aircraft targets. When the DLCM is introduced, the detection accuracy of the proposed algorithm (FRDN) on the aircraft target is higher than that of the baseline algorithm (the confidence level is increased from 0.59 to 0.97), and the active area of the target on the feature map is more obvious than that of the background area, so that the effectiveness of the intrinsic module of the method on the extraction of the aircraft target features is further verified.
2) Compared with CNN-based methods
Table 2 lists the results of the detection of aircraft targets in SAR images by FRDN and other CNN-based General and Specific algorithms. The PADN algorithm and the DAPN algorithm are detection algorithms designed for specific targets in SAR images.
TABLE 2 comparison of Performance of deep learning based target detection algorithm on a self-built SAR aircraft dataset
In terms of detection accuracy, the F1 and AP scores of the method for aircraft target detection are 2.0% and 0.7% higher than those of the second name algorithm (namely Cascade R-CNN and refindet). The advantages over DAPN and PAPN are higher on all indexes. The method shows that the algorithm specially designed for the characteristics of the aircraft target of the SAR image has stronger characteristic extraction and discrimination capability on the aircraft target than other general and specific target detection algorithms. In terms of detection speed, the method is faster than other general two-stage detectors (such as FPN and Cascade R-CNN) and SAR image specific algorithms (such as DAPN and PADN and), but the detection speed is reduced compared with single-stage algorithms (such as SSD, refineDet) due to the fact that the model parameter amount is increased due to the introduction of various modules inside the method.
In addition, the accuracy-recall (P-R) curve of the proposed algorithm and other CNN-based algorithms on the SAR image airplane detection results is shown in fig. 8. It can be seen that as recall increases, the curve of the method herein is more stable than the curves of other algorithms.
In the SAR target detection method based on the feature refinement deformable network, a Feature Fusion Attention Module (FFAM) and a deformable bypass connection module (DLCM) are designed according to the imaging characteristics of an airplane target in an SAR image. The FFAM can fully fuse texture features of the airplane target in the shallow feature map and abstract semantic information in the high-dimensional feature map, effectively screen key features of the target and inhibit background interference. When the refined feature pyramid is constructed, the DLCM further extracts the discretized scattering features of the aircraft target by stacking a plurality of deformable convolutions, so that the representation capability of an algorithm on the SAR image aircraft target is improved. Experiments explored the role of the modules presented in the algorithms herein and compared and analyzed the algorithms with other CNN-based algorithms (e.g., FPN, cascades R-CNN, DAPN, PADN, refineDet, RPDet, etc.) on self-built datasets. The experimental result fully verifies the high efficiency of the algorithm for SAR image aircraft target detection.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.
In one embodiment, as shown in fig. 9, there is provided a SAR target detection apparatus based on a feature refinement deformable network, comprising: a training data set acquisition module 200, a feature refinement deformable network training module 210, a SAR target image acquisition module 220, and a target detection module 230, wherein:
the training data set acquisition module 200 is configured to acquire a SAR image training set, where the SAR image training set includes a plurality of SAR training images, and each SAR training image includes more than one target;
a feature refinement deformable network training module 210, configured to input each SAR training image into a feature refinement deformable network and train the same, to obtain a trained feature refinement deformable network;
the feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points;
the SAR target image acquisition module 220 is configured to acquire a SAR target image to be detected;
The target detection module 230 is configured to input the SAR target image into the trained feature refinement deformable network, detect targets in the SAR target image, and predict positions of the targets.
For specific limitations on the SAR target detection apparatus based on the feature-refined deformable network, reference may be made to the above limitation on the SAR target detection method based on the feature-refined deformable network, and will not be described here. The various modules in the above-described SAR target detection apparatus based on the feature refinement deformable network may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a SAR target detection method based on a feature refinement deformable network. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
acquiring an SAR image training set, wherein the SAR image training set comprises a plurality of SAR training images, and each SAR training image comprises more than one target;
inputting each SAR training image into a feature refinement deformable network and training the SAR training image to obtain a trained feature refinement deformable network;
the feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points;
Acquiring an SAR target image to be detected;
inputting the SAR target image into the trained feature refinement deformable network, detecting targets in the SAR target image, and predicting the positions of the targets.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring an SAR image training set, wherein the SAR image training set comprises a plurality of SAR training images, and each SAR training image comprises more than one target;
inputting each SAR training image into a feature refinement deformable network and training the SAR training image to obtain a trained feature refinement deformable network;
the feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points;
acquiring an SAR target image to be detected;
Inputting the SAR target image into the trained feature refinement deformable network, detecting targets in the SAR target image, and predicting the positions of the targets.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (5)

1. The SAR target detection method based on the feature refinement deformable network is characterized by comprising the following steps of:
acquiring an SAR image training set, wherein the SAR image training set comprises a plurality of SAR training images, and each SAR training image comprises more than one target;
inputting each SAR training image into a feature refinement deformable network and training the SAR training image to obtain a trained feature refinement deformable network;
The feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points, the basic feature extraction structure adopts a VGG-16 neural network to extract basic features in an SAR training image, and selects the output of three middle feature extraction layers of the VGG-16 neural network as the input of the refinement pyramid structure, and the three middle feature extraction layers are Conv4_3 layers, conv5_3 layers and Conv7 layers which respectively correspond to low-layer basic features, middle-layer basic features and high-layer basic features with different output sizes;
inputting the low-level, medium-level and high-level base features into the refined pyramid structure comprises: inputting the high-level basic features and the middle-level basic features into a third group of feature fusion attention units for feature fusion to obtain fusion features related to the high-level basic features, inputting the fusion features into a corresponding group of deformable bypass connection units to construct a high-level refined feature map, inputting the middle-level basic features, low-level basic features and the high-level refined feature map into a second group of feature fusion attention units for feature fusion to obtain fusion features related to the middle-level basic features, inputting the fusion features into a corresponding group of deformable bypass connection units to construct a middle-level refined feature map, inputting the low-level basic features and the middle-level refined feature map into a first group of feature fusion attention units for feature fusion to obtain fusion features related to the low-level basic features, and inputting the fusion features into a corresponding group of deformable bypass connection units to construct a low-level refined feature map;
The cascade detection head structure comprises an anchor point refining unit and a target detection unit, wherein the low-level basic characteristics, the middle-level basic characteristics and the high-level basic characteristics are input into the anchor point refining unit to obtain a refined anchor point frame related to a target in an SAR training image, and the high-level refined characteristic diagram, the middle-level refined characteristic diagram and the low-level refined characteristic diagram are input into the target detection unit to predict the refined anchor point frame so as to output a target detection result of the SAR training image;
acquiring an SAR target image to be detected;
inputting the SAR target image into the trained feature refinement deformable network, detecting targets in the SAR target image, and predicting the positions of the targets.
2. The SAR target detection method of claim 1, wherein,
the feature fusion attention unit comprises three groups;
the deformable bypass connection units comprise three groups connected with each group of feature fusion attention units respectively.
3. The SAR target detection method of claim 2, wherein each of the fused features is further input to a corresponding split attention unit for further information extraction, respectively, before being input to a corresponding deformable bypass connection unit.
4. A SAR target detection apparatus based on a feature refinement deformable network, the apparatus comprising:
the SAR image training system comprises a training data set acquisition module, a target acquisition module and a target acquisition module, wherein the training data set acquisition module is used for acquiring an SAR image training set, the SAR image training set comprises a plurality of SAR training images, and each SAR training image comprises more than one target;
the feature refinement deformable network training module is used for inputting each SAR training image into a feature refinement deformable network and training the SAR training image to obtain a trained feature refinement deformable network;
the feature refinement deformable network comprises a bottom-up basic feature extraction structure, a top-down refinement pyramid structure and a cascade detection head structure, wherein the refinement pyramid structure comprises a feature fusion attention unit for utilizing texture information of a target in a low-dimensional feature map and a deformable bypass connection unit for discretizing distribution of SAR image scattering points, the basic feature extraction structure adopts a VGG-16 neural network to extract basic features in an SAR training image, and selects the output of three middle feature extraction layers of the VGG-16 neural network as the input of the refinement pyramid structure, and the three middle feature extraction layers are Conv4_3 layers, conv5_3 layers and Conv7 layers which respectively correspond to low-layer basic features, middle-layer basic features and high-layer basic features with different output sizes;
Inputting the low-level, medium-level and high-level base features into the refined pyramid structure comprises: inputting the high-level basic features and the middle-level basic features into a third group of feature fusion attention units for feature fusion to obtain fusion features related to the high-level basic features, inputting the fusion features into a corresponding group of deformable bypass connection units to construct a high-level refined feature map, inputting the middle-level basic features, low-level basic features and the high-level refined feature map into a second group of feature fusion attention units for feature fusion to obtain fusion features related to the middle-level basic features, inputting the fusion features into a corresponding group of deformable bypass connection units to construct a middle-level refined feature map, inputting the low-level basic features and the middle-level refined feature map into a first group of feature fusion attention units for feature fusion to obtain fusion features related to the low-level basic features, and inputting the fusion features into a corresponding group of deformable bypass connection units to construct a low-level refined feature map;
the cascade detection head structure comprises an anchor point refining unit and a target detection unit, wherein the low-level basic characteristics, the middle-level basic characteristics and the high-level basic characteristics are input into the anchor point refining unit to obtain a refined anchor point frame related to a target in an SAR training image, and the high-level refined characteristic diagram, the middle-level refined characteristic diagram and the low-level refined characteristic diagram are input into the target detection unit to predict the refined anchor point frame so as to output a target detection result of the SAR training image;
The SAR target image acquisition module is used for acquiring an SAR target image to be detected;
the target detection module is used for inputting the SAR target image into the trained feature refinement deformable network, detecting targets in the SAR target image and predicting the positions of the targets.
5. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 3 when the computer program is executed.
CN202111298372.3A 2021-11-04 2021-11-04 SAR target detection method, device and equipment based on feature refinement deformable network Active CN114022751B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111298372.3A CN114022751B (en) 2021-11-04 2021-11-04 SAR target detection method, device and equipment based on feature refinement deformable network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111298372.3A CN114022751B (en) 2021-11-04 2021-11-04 SAR target detection method, device and equipment based on feature refinement deformable network

Publications (2)

Publication Number Publication Date
CN114022751A CN114022751A (en) 2022-02-08
CN114022751B true CN114022751B (en) 2024-03-05

Family

ID=80060609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111298372.3A Active CN114022751B (en) 2021-11-04 2021-11-04 SAR target detection method, device and equipment based on feature refinement deformable network

Country Status (1)

Country Link
CN (1) CN114022751B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860233A (en) * 2020-07-06 2020-10-30 中国科学院空天信息创新研究院 SAR image complex building extraction method and system based on attention network selection
CN112084901A (en) * 2020-08-26 2020-12-15 长沙理工大学 GCAM-based high-resolution SAR image airport runway area automatic detection method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860233A (en) * 2020-07-06 2020-10-30 中国科学院空天信息创新研究院 SAR image complex building extraction method and system based on attention network selection
CN112084901A (en) * 2020-08-26 2020-12-15 长沙理工大学 GCAM-based high-resolution SAR image airport runway area automatic detection method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Robust Pol-ISAR Target Recognition Based on ST-MC-DCNN;Xueru Bai, Xuening Zhou, Feng Zhang, Li Wang, Ruihang Xue, Feng Zhou;IEEE Transaction on Geoscience and Remote Sensing;第57卷(第12期);9912-9927 *
基于注意力机制特征融合网络的SAR图像飞机目标快速检测;赵琰;赵凌君;匡纲要;电子学报;第49卷(第9期);1665-1674 *

Also Published As

Publication number Publication date
CN114022751A (en) 2022-02-08

Similar Documents

Publication Publication Date Title
Sun et al. RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring
Gao et al. Multiscale residual network with mixed depthwise convolution for hyperspectral image classification
Yang et al. Scene classification of remote sensing image based on deep network and multi-scale features fusion
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN112215207A (en) Remote sensing image airplane target detection method combining multi-scale and attention mechanism
CN113468996B (en) Camouflage object detection method based on edge refinement
Yuan Face detection and recognition based on visual attention mechanism guidance model in unrestricted posture
Liu et al. Cross-part learning for fine-grained image classification
CN109034213B (en) Hyperspectral image classification method and system based on correlation entropy principle
CN116630798A (en) SAR image aircraft target detection method based on improved YOLOv5
Li et al. A survey on deep-learning-based real-time SAR ship detection
Li et al. Enhanced bird detection from low-resolution aerial image using deep neural networks
Yang et al. Air-to-ground multimodal object detection algorithm based on feature association learning
Yue et al. A novel few-shot learning method for synthetic aperture radar image recognition
CN114022752B (en) SAR target detection method based on attention feature refinement and alignment
Qu et al. Improved YOLOv5-based for small traffic sign detection under complex weather
CN114022751B (en) SAR target detection method, device and equipment based on feature refinement deformable network
Dey et al. A robust FLIR target detection employing an auto-convergent pulse coupled neural network
CN116310795A (en) SAR aircraft detection method, system, device and storage medium
CN116258960A (en) SAR target recognition method and device based on structured electromagnetic scattering characteristics
Zhu et al. Real-time traffic sign detection based on YOLOv2
Liu et al. Target detection of hyperspectral image based on faster R-CNN with data set adjustment and parameter turning
Lai et al. Multiscale high-level feature fusion for histopathological image classification
Li et al. Classification of optical remote sensing images based on Convolutional Neural Network
Han et al. Application of Multi-Feature Fusion Based on Deep Learning in Pedestrian Re-Recognition Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant