CN113688830A - Deep learning target detection method based on central point regression - Google Patents

Deep learning target detection method based on central point regression Download PDF

Info

Publication number
CN113688830A
CN113688830A CN202110930245.4A CN202110930245A CN113688830A CN 113688830 A CN113688830 A CN 113688830A CN 202110930245 A CN202110930245 A CN 202110930245A CN 113688830 A CN113688830 A CN 113688830A
Authority
CN
China
Prior art keywords
detection
feature
training
target detection
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110930245.4A
Other languages
Chinese (zh)
Other versions
CN113688830B (en
Inventor
李婕
周顺
王恩果
李毅
巩朋成
张正文
朱鑫潮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202110930245.4A priority Critical patent/CN113688830B/en
Publication of CN113688830A publication Critical patent/CN113688830A/en
Application granted granted Critical
Publication of CN113688830B publication Critical patent/CN113688830B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a deep learning target detection method based on central point regression, which comprises the following specific steps: a horizontal connection module is introduced to the original CenterNet network structure to correlate the characteristics among different layers, and the deep layer characteristics and the shallow layer characteristics are fused to improve the small target detection performance; a channel attention module is introduced into the horizontal connection module to perform self-adaptive calibration on the feature response among different channels, so that the feature extraction capability of the network is improved; and finally, performing a comparison experiment on the UCAS-AOD and RSOD public remote sensing data sets. The method has higher detection precision in the remote sensing image airplane target detection, maintains the speed advantage of a single-stage detection model, and has certain practical value.

Description

Deep learning target detection method based on central point regression
Technical Field
The invention relates to the technical field of target detection, in particular to a deep learning target detection method based on central point regression.
Background
The remote sensing image is shot by a satellite and comprises spatial resolution, time resolution, spectral resolution and the like. The target detection in the remote sensing image has important significance and application value in civil and military, and particularly, the aircraft target detection in the remote sensing image can provide more valuable information for more efficient management of civil aviation and military operation. Different from the traditional airplane image, the remote sensing image airplane target detection faces the difficulties of multi-scale, complex background, large occupied picture memory and the like.
With the rapid development of deep learning, a Convolutional Neural Network (CNN) based target detection method has become a trend of processing and recognizing remote sensing images. Currently, mainstream deep learning target detection algorithms can be divided into two categories: anchor-based methods and non-anchor-based methods. Most anchor-based methods can be divided into two categories, one-stage and two-stage. The single-stage methods such as SSD and YOLO can complete the classification and regression of objects in one stage, and have the advantage of very high detection speed. Two-stage methods, such as R-CNN, Fast R-CNN, Faster R-CNN, etc., can often achieve more accurate detection results by introducing a regional advice network. In recent years, much work has been done to improve the accuracy and efficiency of anchor-based methods, and has gradually grown to maturity. However, since the detection performance of the anchor-based method depends on the number of positive and negative samples and the hyper-parameters of the anchor point, such as the size, aspect ratio and number of the anchor point, there is no effective method for automatically adjusting the hyper-parameters at present, and the hyper-parameters must be manually calibrated one by one, thereby limiting the popularization of the anchor-based method. In addition, the anchor-based detection method introduces a Non-Maximum Suppression algorithm (NMS) to eliminate repeated target frames, which increases the complexity of the algorithm, the amount of computation, and the like, and causes the final detection speed to be relatively slow and the real-time to be relatively poor.
To increase the flexibility of the detector, anchorless methods have been developed and are receiving much attention. The anchor-free method does not rely on a preset anchor, directly performs feature extraction and detection on key points or dense regions of an input learning image, and can adapt to various objects through regression. Such as CornerNet, uses a single convolutional neural network to predict the top-left and bottom-right heat maps of all instances of the same object class, as well as the embedding vector for each detected corner, and finally matches a pair of corner points belonging to the same object with the suggested associative embedding method of Newell et al. The extreme detects four extreme points and a center point of an object by predicting four multi-peak value graphs of each object type, and the extreme points are grouped into the object based on a geometric method, so that a final result is obtained. The centret is improved on the basis of the CornerNet related idea for reference, object detection is carried out by directly predicting the target central point, other object attributes such as size, 3D position, direction and even posture are obtained by a regression method, and in addition, NMS, RPN and the like are not needed for training and testing, so that the centret is simpler, faster and more accurate compared with a detector based on a boundary box, and end-to-end is really realized. Liu et al first attempted to solve the problem of remote sensing image target detection using the anchorless approach of CenterNet and evaluated the performance of each backbone network of CenterNet on a NWPU VHR-10 remote sensing dataset. Zhang et al propose a feature-enhanced central point network, which introduces horizontal connections between different layers, and improves the detection accuracy of small targets in remote sensing images by combining deep-layer features and shallow-layer features, but the detection speed needs to be improved. The literature, "single-thread-based anchor-free target detection model", proposes a single-thread anchor-free target detection model, integrates multiple prediction branches into one branch by using an hourglass backbone extraction network, has a model-approximate semantic feature characterization capability combined with ResNet and FPN structures, saves memory space, maintains small target detection accuracy, but has the problem of low feature extraction efficiency. The method provides a certain idea for the direct application of the classic anchor-free method to the target detection of the remote sensing image, but the balance of detection speed and precision caused by the problems of complex background, small target, non-uniform shape to be detected and the like of the remote sensing image needs to be improved.
Disclosure of Invention
The invention aims to provide a deep learning target detection method based on central point regression, which solves the problems of high false detection, difficult small target detection and the like when the original CenterNet algorithm is used for detecting a remote sensing image.
The technical scheme of the invention is as follows:
a deep learning target detection method based on central point regression comprises the following specific steps:
a horizontal connection module is introduced to the original CenterNet network structure to correlate the characteristics among different layers, and the deep layer characteristics and the shallow layer characteristics are fused to improve the small target detection performance;
a channel attention module is introduced into the horizontal connection module to perform self-adaptive calibration on the feature response among different channels, so that the feature extraction capability of the network is improved;
and finally, performing a comparison experiment on the UCAS-AOD and RSOD public remote sensing data sets.
The horizontal connection module is a Feature Fusion module which comprises a C-CenterNet and a T-CenterNet, wherein the C-CenterNet is subjected to standard 1 x 1 convolution to enable Feature layers before Fusion to have the same space size, and the T-CenterNet is used for converting the standard convolution in the C-CenterNet into hole convolution for testing.
The channel attention module is an extrusion-Excitation attention module SE-Net, the extrusion-Excitation attention module firstly carries out extrusion Squeeze operation on H multiplied by W feature maps with the input channel number of C to obtain 1 multiplied by 1 feature maps with the channel number of C, then carries out Excitation operation on the obtained feature maps to obtain weight values among the channels, and finally multiplies the original feature maps by the weight of the corresponding channels through Scale operation to obtain new feature maps, so that updating of effective information channels and inhibition of useless information channels are completed.
Carrying out a comparison experiment on a UCAS-AOD and RSOD public remote sensing data set, randomly selecting pictures as a training set in the comparison experiment process on a data set sample, keeping the ratio of the training set to the testing set to be 9:1, carrying out iterative training by using an Adam optimizer, uniformly zooming an input image to the resolution ratio of 512 multiplied by 512, wherein the initial learning rate in the training process is 1e-3, the batch _ size is 4, reducing the learning rate by 10 times after training 50 epochs, setting the batch _ size to be 4, and then training the 50 epochs, and in addition, in order to accelerate the convergence speed, using pre-training weights obtained in an ImageNet classification task on a backoff-50 in the training process.
In order to verify the detection performance of the deep learning target detection method based on the central point regression, different detection network performances are trained and contrasted under the condition of the same experiment platform and training data set, and the comparison comprises a single-stage target detection algorithm and a two-stage target detection algorithm to obtain a test result.
Compared with the prior art, the invention has the beneficial effects that: the method solves the problems of high false detection, difficult small target detection and the like when the original CenterNet algorithm is used for detecting the remote sensing image, has higher detection precision in the aircraft target detection of the remote sensing image, simultaneously keeps the speed advantage of a single-stage detection model, and has certain practical value.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Figure 2 is a schematic diagram of the improved centret network architecture of the present invention.
Fig. 3 is a schematic diagram of a horizontal connection module network structure of the present invention.
FIG. 4 is a schematic view of the compression-activated attention module of the present invention.
Fig. 5 is a schematic diagram of a network of horizontally connected modules after the introduction of a channel attention module according to the present invention.
FIG. 6 is a comparison before and after the present invention is focused on.
FIG. 7 is a graph of total Loss variation for training on the UCAS-AOD dataset.
Fig. 8 is a PR curve comparison before and after refinement for a UCAS dataset.
Fig. 9 is a graph of total Loss variation trained on RSOD data sets.
Fig. 10 is a comparison of PR curves before and after modification on the RSOD dataset.
FIG. 11 is a visual comparison before and after improvement.
Fig. 12 is a graph comparing the remaining algorithms.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The CenterNet is used as a new Anchor-free end-to-end target detection algorithm, the center of a boundary frame of an object is represented by using one point, and then regression is directly carried out on image features around the center position to obtain other attributes, such as the size, the direction, the posture and the like of the object, so that the target detection problem becomes a standard key point estimation problem. The method comprises the following specific steps: inputting an I e RW×H×3Wherein the width is W and the height is H. Generating heatmaps by convolutional neural networks
Figure BDA0003211068910000051
Wherein, R is the zoom size, the default is to take R to 4, and C is the number of key point categories. When in use
Figure BDA0003211068910000052
Representing the detection of a key point or points,
Figure BDA0003211068910000053
representing the background. In the training stage, the key points of the group Truth are distributed to the thermodynamic diagram through the Gaussian kernel of the following formula (1)
Figure BDA0003211068910000054
The above.
Figure BDA0003211068910000055
Wherein
Figure BDA0003211068910000056
Is the target real point, x and y are the predicted central points, sigmapIs the standard deviation of the object size adaptation.
The maximum value of the element is taken provided that the two gaussian distributions of the same class overlap. Zhou et al first predicted the heatmap using Hourglass, ResNet, DLA, etc. as the backbone network
Figure BDA0003211068910000057
The feature map generated by the backbone network is then fed into a detection module consisting of three branches, which are respectively the Heatmap, Width-Height and Offset predicted branches. Each branch comprises a convolution layer composition of 3 multiplied by 3 and 1 multiplied by 1, wherein the number of Heatmap final output channels is related to the number of categories contained in the data set, for example, 20 categories are contained in the VOC data set, at this time, the number of Heatmap output channels is 20, the number of Width-Height and Offset final output channels is 2, and represents the length and Width of the target of the predicted center point and the Offset of the horizontal and vertical coordinates of the center point. And finally, extracting relevant information in the network output thermodynamic diagram and obtaining an input image detection result.
The centret has many advantages in remote sensing image aircraft target detection. First, it does not have to artificially set the threshold for foreground and background classification, reducing the requirements for positive and negative samples of the data set. Secondly, because each target only detects one central point in the detection process, the large-calculation and time-consuming non-maximum suppression (NMS) is not needed, and the detection speed is improved. Meanwhile, the difficulty of small target detection and dense target detection caused by an anchor mechanism is also reduced. However, because of the problems of shooting height, angle and the like, a plurality of pixels in the picture are single-digit small target objects, which brings great challenges to the CenterNet algorithm, and the CenterNet algorithm is difficult to break through when being directly applied to remote sensing image detection, so the CenterNet network structure must be improved, the detection is better adapted to the detection of a remote sensing image data set, and the detection precision is improved.
In order to enable the CenterNet algorithm to be better suitable for airplane target detection in remote sensing images, the problems of low detection precision and high false detection rate of small targets in the images are solved. The invention provides a multi-scale channel attention detection method based on a CenterNet algorithm. The network structure is shown in fig. 2, compared with the original centrnet network structure, the network structure introduces the extension structure in the figure, and is mainly improved in the following 2 aspects:
(1) in order to improve the detection performance of the small target, the invention introduces a horizontal connection module to correlate the features between different layers, the method integrates the deep-layer features and the shallow-layer features, effectively combines the advantages of strong semantic information of the deep-layer features and strong position and texture information of the shallow-layer features, and has certain effect on improving the detection of the small target.
(2) In order to improve the detection precision, a channel attention module is added in the horizontal connection module to perform self-adaptive calibration on the feature response among different channels, so that the feature extraction capability of the network is improved.
The central point network adopts a coding-decoding structure, and can learn high-level semantic information through continuous convolution operation of the network. However, the target in the remote sensing image has the characteristics of small size and density, and the small target features can be gathered through a series of convolution, so that the problems of missing detection, false detection and the like are caused. To improve the feature representation of small objects, the present invention introduces a horizontal connectivity module to fuse features of a given feature layer with features of a higher-level feature map. As shown in fig. 2, in the network structure of the centrnet algorithm, the present invention merges the Conv1 layer and the Conv7 layer, the Conv2 layer and the Conv6 layer, and the Conv3 layer and the Conv5 layer, respectively. Because the Feature layers have different space sizes, a Feature Fusion module is used for processing before Fusion. For the module, the invention designs two structures as shown in figure 3 for experiment and respectively named C-CenterNet and T-CenterNet. Where C-Centeret was subjected to a standard 1X 1 convolution to make the feature layers before fusion the same spatial size, T-CenterNet tested the standard convolution in C-CenterNet by exchanging it for a hole convolution. Since each feature value in different layers has different proportions, batch normalization and Relu activation processing are required after convolution.
The attention mechanism can focus local information of an image, locate interesting information and suppress useless information. In order to make the model focus more on channels with valid information, the invention introduces a squeeze-and-excite attention module (SE-Net) in the horizontal fusion module. As shown in fig. 4, the module first performs a Squeeze (Squeeze) operation on the H × W feature map with the input channel number C to obtain a 1 × 1 feature map with the channel number C, which corresponds to the Global Pooling (Global Pooling) operation in fig. 4. And then, performing Excitation (Excitation) operation on the obtained feature map, and obtaining the weight value between each channel corresponding to the two full connection layer and Sigmoid operations in fig. 4. And finally, multiplying the original feature map by the weight of the corresponding channel through Scale operation to obtain a new feature map, and finishing the updating of the effective information channel and the suppression of the useless information channel.
The "Feature Fusion" structure after the channel attention mechanism is introduced is shown in FIG. 5. For example, as shown in fig. 5, the pairs before and after attention is introduced can be seen from the arrow marks in the figure, when no attention mechanism is provided, the network has some false detection situations, and after the attention mechanism is introduced, the false detection rate is reduced, so that the detection accuracy is improved.
Loss function of the algorithm of the present invention predicts loss L from the center pointkBias loss LoffWide and high loss LsizeThe three parts are as follows.
The centret generates many target centroids when predicting the centroid of the heat map, and the centroid of each target is unique, resulting in an excessive ratio of negative samples to positive samples. Therefore, the authors have used pixel-level logistic regression, Focal-loss function, improved based on centrnet, to solve the problem of maldistribution of positive and negative samples. The formula is shown in the following formula 2.
Figure BDA0003211068910000071
Wherein alpha and beta are focal loss hyper-parameters, and 2 and 4 are respectively taken in the experiment. N is gate in imageThe number of key points. Primarily serves to normalize all focal distances.
Figure BDA0003211068910000072
As a predicted value, YxycThe true tag value. When Y isxycWhen equal to 1, predict value for easily distinguishable samples
Figure BDA0003211068910000073
Is close to 1, so that
Figure BDA0003211068910000074
And
Figure BDA0003211068910000081
close to 0, so that the final LkAre small. Conversely, for indistinguishable samples, the values are predicted
Figure BDA0003211068910000082
Close to the value of 0 (c) and,
Figure BDA0003211068910000083
then it is so large that the final LkIs very large. When Y isxycWhen not equal to 1, predict value
Figure BDA0003211068910000084
The theoretical value should be 0, if the value is large, then
Figure BDA0003211068910000085
May be increased to provide some penalty. If the predicted value is close to 0, then
Figure BDA0003211068910000086
Will be small to reduce the lost specific gravity. And (1-Y)xyc)βAt YxycUnequal to 1, acts to weaken the loss specific gravity of the negative sample around the center point.
Because the resolution of the feature map after the processing of the backbone extraction network is one fourth of that of the input image, one pixel point equivalent to the output feature map corresponds to one original image 4 multiplied by 4This causes a large error, and therefore the author introduces a center point bias value
Figure BDA0003211068910000087
And the offset value is trained by using an L1 loss function, and the formula is shown as the following formula 3.
Figure BDA0003211068910000088
Wherein, N is the number of key points in the image, p is the center point of the target frame, R is the down-sampling factor, and the value of the invention is 4.
Figure BDA0003211068910000089
Is an offset value.
After the centret predicts all centers, the regression to the size of the object was done for each object
Figure BDA00032110689100000810
To reduce computational burden, a single size prediction is used for each target class
Figure BDA00032110689100000811
And using L1 loss function to the wide high loss LsizeTraining is performed, and the formula is shown in the following formula 4.
Figure BDA00032110689100000812
Total loss LdetThe weighted sum of the branch losses satisfies the following relationship 5:
Ldet=LkoffLoffsizeLsize (5)
wherein the weight λoff、λsizeAnd λm1, 0.1 and 0.3 are taken respectively.
Results and analysis of the experiments
Data set and experimental environment
In order to verify the feasibility of the algorithm structure, the experiment selects the airplane category picture in the UCAS-AOD remote sensing image data and the RSOD data set for network training and testing. The UCAS-AOD data set is manufactured by the university of Chinese academy of sciences, contains 1000 airplane remote sensing pictures and 7482 airplane samples, is concentrated in data and uniform in target direction distribution. The RSOD data set is manufactured by Wuhan university and comprises 446 remote sensing images of airplanes and 4993 frames of airplane samples, the images have various brightness and contrast, and interference such as shielding, shadow, distortion and the like exists. In the experimental process, pictures are randomly selected from data set samples to serve as training sets, the ratio of the training sets to the testing sets is kept to be 9:1, and the experimental environment configuration is shown in the following table 1.
Table 1 experimental environment configuration
Table1 Experimental environment configuration
Figure BDA0003211068910000091
Evaluation index
The invention adopts a set of standard evaluation indexes of current target detection, which comprises accuracy (accuracy), Precision (Precision), Recall rate (Recall), and number of pictures (FPS) and Average Precision (mAP) processed by a model Per Second, wherein the FPS reflects the processing speed index of the model, the mAP can better measure the model detection Precision under multiple Recall rates, and the related calculation formula is as follows:
Figure BDA0003211068910000092
Figure BDA0003211068910000101
Figure BDA0003211068910000102
wherein TP represents the number of samples considered as positive samples in the positive samples, FN represents the number of samples considered as negative samples in the positive samples, FP represents the number of samples considered as positive samples in the negative samples, TP + FP represents the number of samples all divided into positive samples, and TP + FN represents the number of samples all divided into positive samples.
The prediction results TP and FP are determined by an intersection ratio iou (intersection over union), and when IoU is greater than a set threshold, the result is marked as TP, otherwise, the result is marked as FP. Setting a threshold value without confidence coefficient, obtaining different numbers of detection frames, and obtaining fewer detection frames if the threshold value is high. Otherwise, the number of detection frames obtained by the low threshold is large.
IoU is obtained from the following equation 9:
Figure BDA0003211068910000103
FPS is the number of pictures that the model can detect per second, has represented the speed of detection, can be solved by equation 10:
Figure BDA0003211068910000104
wherein, N is the number of the tested samples, and T is the time required for testing all the samples.
Details of training
In the experiment, the sampling rate R is 4, iterative training is carried out by using an Adam optimizer, and the input image is uniformly scaled to the resolution of 512 x 512. The initial learning rate in the training process is 1e-3, the batch _ size is 4, after 50 epochs are trained, the learning rate is reduced by 10 times, the batch _ size is 4, and 50 epochs are trained again. In addition, in order to accelerate convergence speed, pre-training weights obtained in ImageNet classification task are used for the backbone based on ResNet-50 in the training process.
Analysis of Experimental results
UCAS-AOD remote sensing image data set experimental result
The Loss basically reaches a stable state through 100 long training rounds on a UCAS-AOD remote sensing image data set, in the training process, the Loss function change pair of the original algorithm and the algorithm of the invention is shown in figure 7, and the Loss value is obtained by weighted summation of three parts, namely central point prediction Loss, offset Loss and width and height Loss. As can be seen from the figure, loss tends to be stable before and after the improvement. However, the convergence rate of the four improved model designs is improved before improvement, and the final convergence value is better than that before improvement. Furthermore, SC-CenterN and ST-CenterNet after the attention mechanism is added are optimized in convergence rate and convergence value compared with C-CenterNet and T-CenterNet before the addition of the SC-CenterN and ST-CenterNet. Wherein, compared with other improvements, the ST-CenterNet has the advantages of optimal effect, fastest convergence rate and lower convergence value.
Based on the above equations 6 and 7, the PR curves of the model detection outputs before and after improvement can be calculated and compared with the ST-centrnet with the best effect, as shown in fig. 8, where (a) in fig. 8 is the PR curve of the original algorithm and (b) is the PR curve of the algorithm of the present invention.
In order to verify the detection performance of the algorithm, the invention also trains and contrasts the performance of different detection networks under the condition of the same experimental platform and training data set, the contrast comprises single-stage and two-stage target detection algorithms, and the obtained test results are shown in the following table 2. The improved ST-CenterNet network structure of the present invention improves the average accuracy mAP by 6.22% and 7.23% respectively over the current more popular Faster R-CNN and SSD. Compared with the fast R-CNN, the detection precision and the detection speed are greatly improved. Under the same condition, the detection speed is slightly lower than that of the original network, but the real-time detection effect is achieved, and the feasibility of the algorithm is proved.
TABLE 2 UCAS-AOD remote sensing aircraft data set test results
Table2 UCAS-AOD remote sensing aircraft data set test results
Figure BDA0003211068910000111
RSOD data set experimental results
In order to further test the algorithm of the invention, the RSOD remote sensing data set is additionally selected for training and testing, and the training and testing sets are also randomly divided according to the ratio of 9: 1. Compared with a UCAS-AOD data set, the data set has more complex picture samples and has interference factors such as shielding, shadow, distortion and the like. After 100 epochs, the total Loss before and after improvement reached steady state, and the total Loss change before and after improvement is shown in fig. 9. As can be seen from the figure, the improved convergence speed and the final convergence value of the four model designs are superior to those of the original algorithm. Comparing the loss changes of C-CenterNet and SC-CenterNet, and T-CenterNet and ST-CenterNet respectively, it can be seen that after the horizontal feature fusion module introduces the attention mechanism, the convergence rate of the model is greatly improved, and the final convergence value is better than that before the attention mechanism is not introduced.
The PR curves of the model detection outputs before and after the improvement are calculated by the equations 6 and 7 in the test process, and compared with the ST-centrnet with the best effect, as shown in fig. 10, wherein (a) in fig. 10 is the PR curve of the original algorithm, and (b) is the PR curve of the algorithm of the present invention.
A comparative experiment was also conducted using the two-stage algorithm, fast-RCNN, and the single-stage algorithm, SSD, with the improved algorithm of the present invention under the same experimental conditions, and the comparative results are shown in Table 3.
TABLE 3 RSOD remote sensing airplane data set test results
Table2 RSOD remote sensing aircraft data set test results
Figure BDA0003211068910000121
As can be seen from Table 3, the ST-CenterNet algorithm with the optimal detection effect of the invention is respectively improved by 26.42 percent and 43.41 percent in precision compared with the fast-RCNN and the SSD, and is respectively improved by 72 frames/s and 35 frames/s in detection precision. Compared with the original CenterNet algorithm, the detection precision is improved by 18.25%, the false detection rate and the missing detection rate of the network are reduced, and the detection rate of small targets is improved to a certain extent. In addition, the FPS is reduced by 0.06 percent compared with the original algorithm, is in an acceptable range, is still superior to the Faster-RCNN algorithm and the SSD algorithm, and proves that the network structure has good robustness on the detection of the remote sensing image airplane target.
3.2.3 visual comparative analysis
In order to observe the network detection effect before and after the improvement more intuitively, an ST-CenterNet network structure with the highest precision after the improvement is selected, a plurality of groups of representative pictures are selected from an experimental sample for testing, visual pictures of the algorithm before and after the improvement are compared, and the detection result is shown in FIG. 11.
As can be seen from comparison in fig. 11, when the target is a small dispersed target, the target can be well detected before and after the improvement, but the original algorithm has the problem of false detection, and the confidence of the target of the improved algorithm is generally higher than that before the improvement; when the target is a dense small target, the yellow mark shows that the original algorithm has a large number of missed detection conditions, and the missed detection rate of the improved algorithm is reduced. There are still individual false positives, as indicated by the blue marks in the figure; when the target is near a building with a complex background, the original algorithm has a large number of false detection conditions, the false detection rate of the improved algorithm is reduced, and only individual false detection conditions are left to be further optimized. The comparison result shows that the improved algorithm has a certain effect of improving the detection of the small target, and the false detection rate and the missing detection rate are reduced.
The present invention also visually compares the improved ST-CenterNet with the classic two-stage detection algorithm Faster RCNN and the single-stage detection algorithm SSD, respectively, the comparison results are shown in FIG. 12.
As can be seen from fig. 12, the Faster RCNN is better than the algorithm of the present invention in the confidence of detecting the target, but there is still a small target missing detection situation, and compared with the detection speed of a single picture through experiments, the detection speed of the algorithm of the present invention is higher than that of the Faster RCNN by more than 8 times. Compared with the SSD algorithm, when the discrimination of the target and the background is higher, the detection effect of the SSD algorithm is good, but when small targets are involved, the SSD algorithm can only detect two effective targets, the missing rate is high, and the algorithm can still effectively detect the small targets. The comparison result shows the good robustness and the good detection effect of the algorithm.
Ablation experiment
In order to further verify the effectiveness of each improved module in the algorithm, the method selects the RSOD remote sensing image airplane data set with a more complex data sample and the ST-CenterNet model with the highest experimental result precision to perform an ablation experiment. The results of the experiment are shown in table 4 below.
TABLE 4 ablation experiments on RSOD data sets
Table4 Ablation experiment on RSOD dataset
Figure BDA0003211068910000141
As can be seen from Table 4, the accuracy and recall rate are improved by adding the horizontal fusion module alone compared with the original algorithm, and the precision is improved by 16.63%. After an attention mechanism is introduced into the horizontal fusion module to optimize the weight of each channel, the accuracy is further improved, the average precision is respectively improved by 1.62 percent and 18.25 percent compared with the average precision before and in the original algorithm, and the effectiveness of each module of the algorithm is proved.
The invention tries to apply the anchor frame-free CenterNet algorithm with more balanced precision and speed to the detection of the remote sensing image airplane target, and provides a multi-scale channel attention detection method in order to solve the problems of high false detection, difficult small target detection and the like when the original CenterNet algorithm is used for detecting the remote sensing image. First, feature extraction is performed using an "encode-decode" structure. Secondly, the first step is to carry out the first,
aiming at the problem of low small target detection precision, a horizontal connection module is introduced to fuse deep layer features and shallow layer features so as to improve the small target detection performance. Next, in order to improve detection accuracy and reduce false detection rate, a channel attention module is introduced into the horizontal connection module to optimize response between channels, focus information of interest, and suppress unnecessary information. Finally, a comparison experiment is carried out on a UCAS-AOD and RSOD public remote sensing data set with other mainstream algorithms, the AP on the UCAS-AOD data set reaches 96.78%, compared with the original CenterNet, Faster R-CNN and SSD300, the AP is respectively improved by 16%, 6.22% and 7.23%, and a certain detection speed is kept. Experimental results show that the method has high detection precision in the remote sensing image airplane target detection, maintains the speed advantage of a single-stage detection model, and has certain practical value.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (5)

1. A deep learning target detection method based on central point regression is characterized by comprising the following specific steps:
a horizontal connection module is introduced to the original CenterNet network structure to correlate the characteristics among different layers, and the deep layer characteristics and the shallow layer characteristics are fused to improve the small target detection performance;
a channel attention module is introduced into the horizontal connection module to perform self-adaptive calibration on the feature response among different channels, so that the feature extraction capability of the network is improved;
and finally, performing a comparison experiment on the UCAS-AOD and RSOD public remote sensing data sets.
2. The method of claim 1, wherein the horizontal connection module is a Feature Fusion module, and the Feature Fusion module includes C-centret and T-centret, wherein C-centret has the same spatial size in the Feature layer before Fusion through a standard 1 × 1 convolution, and T-centret changes the standard convolution in C-centret into a hole convolution for testing, and since each Feature value in different layers has different proportions, batch normalization and Relu activation are required after convolution.
3. The method according to claim 2, wherein the channel attention module is a Squeeze-Excitation attention module SE-Net, the Squeeze-Excitation attention module first performs a Squeeze operation on an H × W feature map with an input channel number of C to obtain a 1 × 1 feature map with a channel number of C, then performs an Excitation operation on the obtained feature map to obtain weight values between channels, and finally multiplies the original feature map by the weight of the corresponding channel by a Scale operation to obtain a new feature map, thereby completing updating of channels containing valid information and suppressing of channels containing useless information.
4. The deep learning target detection method based on the centroidal regression is characterized in that a comparison experiment is performed on a UCAS-AOD and RSOD public remote sensing data set, pictures are randomly selected from data set samples in the comparison experiment as a training set, the ratio of training to a testing set is kept to be 9:1, the down-sampling rate R in the experiment is 4, an Adam optimizer is used for iterative training, input images are uniformly scaled to the resolution of 512 x 512, the initial learning rate in the training process is 1e-3, the batch _ size is 4, after 50 epochs are trained, the learning rate is reduced by 10 times, the batch _ size is set to be 4, 50 epochs are trained, and in addition, in order to increase the convergence speed, pre-training weights obtained in an Imageclassification task are used for Res-50 based backbone in the training process.
5. The method for detecting the deep learning target based on the centroid regression as claimed in claim 4, wherein for verifying the detection performance of the method for detecting the deep learning target based on the centroid regression, training and comparison tests are performed on different detection network performances under the same experimental platform and training data set, and the comparison comprises single-stage and two-stage target detection algorithms to obtain the test results.
CN202110930245.4A 2021-08-13 2021-08-13 Deep learning target detection method based on center point regression Active CN113688830B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110930245.4A CN113688830B (en) 2021-08-13 2021-08-13 Deep learning target detection method based on center point regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110930245.4A CN113688830B (en) 2021-08-13 2021-08-13 Deep learning target detection method based on center point regression

Publications (2)

Publication Number Publication Date
CN113688830A true CN113688830A (en) 2021-11-23
CN113688830B CN113688830B (en) 2024-04-26

Family

ID=78579852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110930245.4A Active CN113688830B (en) 2021-08-13 2021-08-13 Deep learning target detection method based on center point regression

Country Status (1)

Country Link
CN (1) CN113688830B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989498A (en) * 2021-12-27 2022-01-28 北京文安智能技术股份有限公司 Training method of target detection model for multi-class garbage scene recognition
CN113989825A (en) * 2021-11-25 2022-01-28 航天信息股份有限公司 Bill image detection method and device and storage medium
CN114638878A (en) * 2022-03-18 2022-06-17 北京安德医智科技有限公司 Two-dimensional echocardiogram pipe diameter detection method and device based on deep learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
AU2020103905A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning
CN112395958A (en) * 2020-10-29 2021-02-23 中国地质大学(武汉) Remote sensing image small target detection method based on four-scale depth and shallow layer feature fusion
CN112580664A (en) * 2020-12-15 2021-03-30 哈尔滨理工大学 Small target detection method based on SSD (solid State disk) network
CN112686304A (en) * 2020-12-29 2021-04-20 山东大学 Target detection method and device based on attention mechanism and multi-scale feature fusion and storage medium
CN112966747A (en) * 2021-03-04 2021-06-15 北京联合大学 Improved vehicle detection method based on anchor-frame-free detection network
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor
CN113191334A (en) * 2021-05-31 2021-07-30 广西师范大学 Plant canopy dense leaf counting method based on improved CenterNet

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
WO2021115159A1 (en) * 2019-12-09 2021-06-17 中兴通讯股份有限公司 Character recognition network model training method, character recognition method, apparatuses, terminal, and computer storage medium therefor
CN112395958A (en) * 2020-10-29 2021-02-23 中国地质大学(武汉) Remote sensing image small target detection method based on four-scale depth and shallow layer feature fusion
AU2020103905A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning
CN112580664A (en) * 2020-12-15 2021-03-30 哈尔滨理工大学 Small target detection method based on SSD (solid State disk) network
CN112686304A (en) * 2020-12-29 2021-04-20 山东大学 Target detection method and device based on attention mechanism and multi-scale feature fusion and storage medium
CN112966747A (en) * 2021-03-04 2021-06-15 北京联合大学 Improved vehicle detection method based on anchor-frame-free detection network
CN113191334A (en) * 2021-05-31 2021-07-30 广西师范大学 Plant canopy dense leaf counting method based on improved CenterNet

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张筱晗;姚力波;吕亚飞;韩鹏;李健伟;: "基于中心点的遥感图像多方向舰船目标检测", 光子学报, no. 04 *
邱博;刘翔;石蕴玉;尚岩峰;: "一种轻量化的多目标实时检测模型", 北京航空航天大学学报, no. 09 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989825A (en) * 2021-11-25 2022-01-28 航天信息股份有限公司 Bill image detection method and device and storage medium
CN113989498A (en) * 2021-12-27 2022-01-28 北京文安智能技术股份有限公司 Training method of target detection model for multi-class garbage scene recognition
CN114638878A (en) * 2022-03-18 2022-06-17 北京安德医智科技有限公司 Two-dimensional echocardiogram pipe diameter detection method and device based on deep learning
CN114638878B (en) * 2022-03-18 2022-11-11 北京安德医智科技有限公司 Two-dimensional echocardiogram pipe diameter detection method and device based on deep learning

Also Published As

Publication number Publication date
CN113688830B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN112819804B (en) Insulator defect detection method based on improved YOLOv convolutional neural network
CN110991311B (en) Target detection method based on dense connection deep network
CN108921051B (en) Pedestrian attribute identification network and technology based on cyclic neural network attention model
CN113688830A (en) Deep learning target detection method based on central point regression
CN107529650B (en) Closed loop detection method and device and computer equipment
CN107633226B (en) Human body motion tracking feature processing method
CN111079739B (en) Multi-scale attention feature detection method
CN111914924B (en) Rapid ship target detection method, storage medium and computing equipment
Li et al. Coda: Counting objects via scale-aware adversarial density adaption
CN113420819B (en) Lightweight underwater target detection method based on CenterNet
CN105139429B (en) A kind of fire detection method based on flame notable figure and spatial pyramid histogram
CN113326735B (en) YOLOv 5-based multi-mode small target detection method
CN115187786A (en) Rotation-based CenterNet2 target detection method
CN111582091A (en) Pedestrian identification method based on multi-branch convolutional neural network
CN114821356B (en) Optical remote sensing target detection method for accurate positioning
CN111192240B (en) Remote sensing image target detection method based on random access memory
CN113920159A (en) Infrared aerial small target tracking method based on full convolution twin network
CN113609904B (en) Single-target tracking algorithm based on dynamic global information modeling and twin network
CN115761888A (en) Tower crane operator abnormal behavior detection method based on NL-C3D model
CN116310386A (en) Shallow adaptive enhanced context-based method for detecting small central Net target
CN115223056A (en) Multi-scale feature enhancement-based optical remote sensing image ship target detection method
CN115331162A (en) Cross-scale infrared pedestrian detection method, system, medium, equipment and terminal
CN112862766B (en) Insulator detection method and system based on image data expansion technology
Wang et al. Research on vehicle detection based on faster R-CNN for UAV images
CN112989952A (en) Crowd density estimation method and device based on mask guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant