CN113688830A

CN113688830A - Deep learning target detection method based on central point regression

Info

Publication number: CN113688830A
Application number: CN202110930245.4A
Authority: CN
Inventors: 李婕; 周顺; 王恩果; 李毅; 巩朋成; 张正文; 朱鑫潮
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2021-11-23
Anticipated expiration: 2041-08-13
Also published as: CN113688830B

Abstract

The invention provides a deep learning target detection method based on central point regression, which comprises the following specific steps: a horizontal connection module is introduced to the original CenterNet network structure to correlate the characteristics among different layers, and the deep layer characteristics and the shallow layer characteristics are fused to improve the small target detection performance; a channel attention module is introduced into the horizontal connection module to perform self-adaptive calibration on the feature response among different channels, so that the feature extraction capability of the network is improved; and finally, performing a comparison experiment on the UCAS-AOD and RSOD public remote sensing data sets. The method has higher detection precision in the remote sensing image airplane target detection, maintains the speed advantage of a single-stage detection model, and has certain practical value.

Description

Deep learning target detection method based on central point regression

Technical Field

The invention relates to the technical field of target detection, in particular to a deep learning target detection method based on central point regression.

Background

The remote sensing image is shot by a satellite and comprises spatial resolution, time resolution, spectral resolution and the like. The target detection in the remote sensing image has important significance and application value in civil and military, and particularly, the aircraft target detection in the remote sensing image can provide more valuable information for more efficient management of civil aviation and military operation. Different from the traditional airplane image, the remote sensing image airplane target detection faces the difficulties of multi-scale, complex background, large occupied picture memory and the like.

With the rapid development of deep learning, a Convolutional Neural Network (CNN) based target detection method has become a trend of processing and recognizing remote sensing images. Currently, mainstream deep learning target detection algorithms can be divided into two categories: anchor-based methods and non-anchor-based methods. Most anchor-based methods can be divided into two categories, one-stage and two-stage. The single-stage methods such as SSD and YOLO can complete the classification and regression of objects in one stage, and have the advantage of very high detection speed. Two-stage methods, such as R-CNN, Fast R-CNN, Faster R-CNN, etc., can often achieve more accurate detection results by introducing a regional advice network. In recent years, much work has been done to improve the accuracy and efficiency of anchor-based methods, and has gradually grown to maturity. However, since the detection performance of the anchor-based method depends on the number of positive and negative samples and the hyper-parameters of the anchor point, such as the size, aspect ratio and number of the anchor point, there is no effective method for automatically adjusting the hyper-parameters at present, and the hyper-parameters must be manually calibrated one by one, thereby limiting the popularization of the anchor-based method. In addition, the anchor-based detection method introduces a Non-Maximum Suppression algorithm (NMS) to eliminate repeated target frames, which increases the complexity of the algorithm, the amount of computation, and the like, and causes the final detection speed to be relatively slow and the real-time to be relatively poor.

To increase the flexibility of the detector, anchorless methods have been developed and are receiving much attention. The anchor-free method does not rely on a preset anchor, directly performs feature extraction and detection on key points or dense regions of an input learning image, and can adapt to various objects through regression. Such as CornerNet, uses a single convolutional neural network to predict the top-left and bottom-right heat maps of all instances of the same object class, as well as the embedding vector for each detected corner, and finally matches a pair of corner points belonging to the same object with the suggested associative embedding method of Newell et al. The extreme detects four extreme points and a center point of an object by predicting four multi-peak value graphs of each object type, and the extreme points are grouped into the object based on a geometric method, so that a final result is obtained. The centret is improved on the basis of the CornerNet related idea for reference, object detection is carried out by directly predicting the target central point, other object attributes such as size, 3D position, direction and even posture are obtained by a regression method, and in addition, NMS, RPN and the like are not needed for training and testing, so that the centret is simpler, faster and more accurate compared with a detector based on a boundary box, and end-to-end is really realized. Liu et al first attempted to solve the problem of remote sensing image target detection using the anchorless approach of CenterNet and evaluated the performance of each backbone network of CenterNet on a NWPU VHR-10 remote sensing dataset. Zhang et al propose a feature-enhanced central point network, which introduces horizontal connections between different layers, and improves the detection accuracy of small targets in remote sensing images by combining deep-layer features and shallow-layer features, but the detection speed needs to be improved. The literature, "single-thread-based anchor-free target detection model", proposes a single-thread anchor-free target detection model, integrates multiple prediction branches into one branch by using an hourglass backbone extraction network, has a model-approximate semantic feature characterization capability combined with ResNet and FPN structures, saves memory space, maintains small target detection accuracy, but has the problem of low feature extraction efficiency. The method provides a certain idea for the direct application of the classic anchor-free method to the target detection of the remote sensing image, but the balance of detection speed and precision caused by the problems of complex background, small target, non-uniform shape to be detected and the like of the remote sensing image needs to be improved.

Disclosure of Invention

The invention aims to provide a deep learning target detection method based on central point regression, which solves the problems of high false detection, difficult small target detection and the like when the original CenterNet algorithm is used for detecting a remote sensing image.

The technical scheme of the invention is as follows:

a deep learning target detection method based on central point regression comprises the following specific steps:

a horizontal connection module is introduced to the original CenterNet network structure to correlate the characteristics among different layers, and the deep layer characteristics and the shallow layer characteristics are fused to improve the small target detection performance;

a channel attention module is introduced into the horizontal connection module to perform self-adaptive calibration on the feature response among different channels, so that the feature extraction capability of the network is improved;

and finally, performing a comparison experiment on the UCAS-AOD and RSOD public remote sensing data sets.

The horizontal connection module is a Feature Fusion module which comprises a C-CenterNet and a T-CenterNet, wherein the C-CenterNet is subjected to standard 1 x 1 convolution to enable Feature layers before Fusion to have the same space size, and the T-CenterNet is used for converting the standard convolution in the C-CenterNet into hole convolution for testing.

The channel attention module is an extrusion-Excitation attention module SE-Net, the extrusion-Excitation attention module firstly carries out extrusion Squeeze operation on H multiplied by W feature maps with the input channel number of C to obtain 1 multiplied by 1 feature maps with the channel number of C, then carries out Excitation operation on the obtained feature maps to obtain weight values among the channels, and finally multiplies the original feature maps by the weight of the corresponding channels through Scale operation to obtain new feature maps, so that updating of effective information channels and inhibition of useless information channels are completed.

Carrying out a comparison experiment on a UCAS-AOD and RSOD public remote sensing data set, randomly selecting pictures as a training set in the comparison experiment process on a data set sample, keeping the ratio of the training set to the testing set to be 9:1, carrying out iterative training by using an Adam optimizer, uniformly zooming an input image to the resolution ratio of 512 multiplied by 512, wherein the initial learning rate in the training process is 1e-3, the batch _ size is 4, reducing the learning rate by 10 times after training 50 epochs, setting the batch _ size to be 4, and then training the 50 epochs, and in addition, in order to accelerate the convergence speed, using pre-training weights obtained in an ImageNet classification task on a backoff-50 in the training process.

In order to verify the detection performance of the deep learning target detection method based on the central point regression, different detection network performances are trained and contrasted under the condition of the same experiment platform and training data set, and the comparison comprises a single-stage target detection algorithm and a two-stage target detection algorithm to obtain a test result.

Compared with the prior art, the invention has the beneficial effects that: the method solves the problems of high false detection, difficult small target detection and the like when the original CenterNet algorithm is used for detecting the remote sensing image, has higher detection precision in the aircraft target detection of the remote sensing image, simultaneously keeps the speed advantage of a single-stage detection model, and has certain practical value.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

Figure 2 is a schematic diagram of the improved centret network architecture of the present invention.

Fig. 3 is a schematic diagram of a horizontal connection module network structure of the present invention.

FIG. 4 is a schematic view of the compression-activated attention module of the present invention.

Fig. 5 is a schematic diagram of a network of horizontally connected modules after the introduction of a channel attention module according to the present invention.

FIG. 6 is a comparison before and after the present invention is focused on.

FIG. 7 is a graph of total Loss variation for training on the UCAS-AOD dataset.

Fig. 8 is a PR curve comparison before and after refinement for a UCAS dataset.

Fig. 9 is a graph of total Loss variation trained on RSOD data sets.

Fig. 10 is a comparison of PR curves before and after modification on the RSOD dataset.

FIG. 11 is a visual comparison before and after improvement.

Fig. 12 is a graph comparing the remaining algorithms.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The CenterNet is used as a new Anchor-free end-to-end target detection algorithm, the center of a boundary frame of an object is represented by using one point, and then regression is directly carried out on image features around the center position to obtain other attributes, such as the size, the direction, the posture and the like of the object, so that the target detection problem becomes a standard key point estimation problem. The method comprises the following specific steps: inputting an I e R^W×H×3Wherein the width is W and the height is H. Generating heatmaps by convolutional neural networks

Wherein, R is the zoom size, the default is to take R to 4, and C is the number of key point categories. When in use

Representing the detection of a key point or points,

representing the background. In the training stage, the key points of the group Truth are distributed to the thermodynamic diagram through the Gaussian kernel of the following formula (1)

The above.

Wherein

Is the target real point, x and y are the predicted central points, sigma_pIs the standard deviation of the object size adaptation.

The maximum value of the element is taken provided that the two gaussian distributions of the same class overlap. Zhou et al first predicted the heatmap using Hourglass, ResNet, DLA, etc. as the backbone network

The feature map generated by the backbone network is then fed into a detection module consisting of three branches, which are respectively the Heatmap, Width-Height and Offset predicted branches. Each branch comprises a convolution layer composition of 3 multiplied by 3 and 1 multiplied by 1, wherein the number of Heatmap final output channels is related to the number of categories contained in the data set, for example, 20 categories are contained in the VOC data set, at this time, the number of Heatmap output channels is 20, the number of Width-Height and Offset final output channels is 2, and represents the length and Width of the target of the predicted center point and the Offset of the horizontal and vertical coordinates of the center point. And finally, extracting relevant information in the network output thermodynamic diagram and obtaining an input image detection result.

The centret has many advantages in remote sensing image aircraft target detection. First, it does not have to artificially set the threshold for foreground and background classification, reducing the requirements for positive and negative samples of the data set. Secondly, because each target only detects one central point in the detection process, the large-calculation and time-consuming non-maximum suppression (NMS) is not needed, and the detection speed is improved. Meanwhile, the difficulty of small target detection and dense target detection caused by an anchor mechanism is also reduced. However, because of the problems of shooting height, angle and the like, a plurality of pixels in the picture are single-digit small target objects, which brings great challenges to the CenterNet algorithm, and the CenterNet algorithm is difficult to break through when being directly applied to remote sensing image detection, so the CenterNet network structure must be improved, the detection is better adapted to the detection of a remote sensing image data set, and the detection precision is improved.

In order to enable the CenterNet algorithm to be better suitable for airplane target detection in remote sensing images, the problems of low detection precision and high false detection rate of small targets in the images are solved. The invention provides a multi-scale channel attention detection method based on a CenterNet algorithm. The network structure is shown in fig. 2, compared with the original centrnet network structure, the network structure introduces the extension structure in the figure, and is mainly improved in the following 2 aspects:

(1) in order to improve the detection performance of the small target, the invention introduces a horizontal connection module to correlate the features between different layers, the method integrates the deep-layer features and the shallow-layer features, effectively combines the advantages of strong semantic information of the deep-layer features and strong position and texture information of the shallow-layer features, and has certain effect on improving the detection of the small target.

(2) In order to improve the detection precision, a channel attention module is added in the horizontal connection module to perform self-adaptive calibration on the feature response among different channels, so that the feature extraction capability of the network is improved.

The central point network adopts a coding-decoding structure, and can learn high-level semantic information through continuous convolution operation of the network. However, the target in the remote sensing image has the characteristics of small size and density, and the small target features can be gathered through a series of convolution, so that the problems of missing detection, false detection and the like are caused. To improve the feature representation of small objects, the present invention introduces a horizontal connectivity module to fuse features of a given feature layer with features of a higher-level feature map. As shown in fig. 2, in the network structure of the centrnet algorithm, the present invention merges the Conv1 layer and the Conv7 layer, the Conv2 layer and the Conv6 layer, and the Conv3 layer and the Conv5 layer, respectively. Because the Feature layers have different space sizes, a Feature Fusion module is used for processing before Fusion. For the module, the invention designs two structures as shown in figure 3 for experiment and respectively named C-CenterNet and T-CenterNet. Where C-Centeret was subjected to a standard 1X 1 convolution to make the feature layers before fusion the same spatial size, T-CenterNet tested the standard convolution in C-CenterNet by exchanging it for a hole convolution. Since each feature value in different layers has different proportions, batch normalization and Relu activation processing are required after convolution.

The attention mechanism can focus local information of an image, locate interesting information and suppress useless information. In order to make the model focus more on channels with valid information, the invention introduces a squeeze-and-excite attention module (SE-Net) in the horizontal fusion module. As shown in fig. 4, the module first performs a Squeeze (Squeeze) operation on the H × W feature map with the input channel number C to obtain a 1 × 1 feature map with the channel number C, which corresponds to the Global Pooling (Global Pooling) operation in fig. 4. And then, performing Excitation (Excitation) operation on the obtained feature map, and obtaining the weight value between each channel corresponding to the two full connection layer and Sigmoid operations in fig. 4. And finally, multiplying the original feature map by the weight of the corresponding channel through Scale operation to obtain a new feature map, and finishing the updating of the effective information channel and the suppression of the useless information channel.

The "Feature Fusion" structure after the channel attention mechanism is introduced is shown in FIG. 5. For example, as shown in fig. 5, the pairs before and after attention is introduced can be seen from the arrow marks in the figure, when no attention mechanism is provided, the network has some false detection situations, and after the attention mechanism is introduced, the false detection rate is reduced, so that the detection accuracy is improved.

Loss function of the algorithm of the present invention predicts loss L from the center point_kBias loss L_offWide and high loss L_sizeThe three parts are as follows.

The centret generates many target centroids when predicting the centroid of the heat map, and the centroid of each target is unique, resulting in an excessive ratio of negative samples to positive samples. Therefore, the authors have used pixel-level logistic regression, Focal-loss function, improved based on centrnet, to solve the problem of maldistribution of positive and negative samples. The formula is shown in the following formula 2.

Wherein alpha and beta are focal loss hyper-parameters, and 2 and 4 are respectively taken in the experiment. N is gate in imageThe number of key points. Primarily serves to normalize all focal distances.

As a predicted value, Y_xycThe true tag value. When Y is_xycWhen equal to 1, predict value for easily distinguishable samples

Is close to 1, so that

And

close to 0, so that the final L_kAre small. Conversely, for indistinguishable samples, the values are predicted

Close to the value of 0 (c) and,

then it is so large that the final L_kIs very large. When Y is_xycWhen not equal to 1, predict value

The theoretical value should be 0, if the value is large, then

May be increased to provide some penalty. If the predicted value is close to 0, then

Will be small to reduce the lost specific gravity. And (1-Y)_xyc)^βAt Y_xycUnequal to 1, acts to weaken the loss specific gravity of the negative sample around the center point.

Because the resolution of the feature map after the processing of the backbone extraction network is one fourth of that of the input image, one pixel point equivalent to the output feature map corresponds to one original image 4 multiplied by 4This causes a large error, and therefore the author introduces a center point bias value

And the offset value is trained by using an L1 loss function, and the formula is shown as the following formula 3.

Wherein, N is the number of key points in the image, p is the center point of the target frame, R is the down-sampling factor, and the value of the invention is 4.

Is an offset value.

After the centret predicts all centers, the regression to the size of the object was done for each object

To reduce computational burden, a single size prediction is used for each target class

And using L1 loss function to the wide high loss L_sizeTraining is performed, and the formula is shown in the following formula 4.

Total loss L_detThe weighted sum of the branch losses satisfies the following relationship 5:

L_det＝L_k+λ_offL_off+λ_sizeL_size (5)

wherein the weight λ_off、λ_sizeAnd λ_m1, 0.1 and 0.3 are taken respectively.

Results and analysis of the experiments

Data set and experimental environment

In order to verify the feasibility of the algorithm structure, the experiment selects the airplane category picture in the UCAS-AOD remote sensing image data and the RSOD data set for network training and testing. The UCAS-AOD data set is manufactured by the university of Chinese academy of sciences, contains 1000 airplane remote sensing pictures and 7482 airplane samples, is concentrated in data and uniform in target direction distribution. The RSOD data set is manufactured by Wuhan university and comprises 446 remote sensing images of airplanes and 4993 frames of airplane samples, the images have various brightness and contrast, and interference such as shielding, shadow, distortion and the like exists. In the experimental process, pictures are randomly selected from data set samples to serve as training sets, the ratio of the training sets to the testing sets is kept to be 9:1, and the experimental environment configuration is shown in the following table 1.

Table 1 experimental environment configuration

Table1 Experimental environment configuration

Evaluation index

The invention adopts a set of standard evaluation indexes of current target detection, which comprises accuracy (accuracy), Precision (Precision), Recall rate (Recall), and number of pictures (FPS) and Average Precision (mAP) processed by a model Per Second, wherein the FPS reflects the processing speed index of the model, the mAP can better measure the model detection Precision under multiple Recall rates, and the related calculation formula is as follows:

wherein TP represents the number of samples considered as positive samples in the positive samples, FN represents the number of samples considered as negative samples in the positive samples, FP represents the number of samples considered as positive samples in the negative samples, TP + FP represents the number of samples all divided into positive samples, and TP + FN represents the number of samples all divided into positive samples.

The prediction results TP and FP are determined by an intersection ratio iou (intersection over union), and when IoU is greater than a set threshold, the result is marked as TP, otherwise, the result is marked as FP. Setting a threshold value without confidence coefficient, obtaining different numbers of detection frames, and obtaining fewer detection frames if the threshold value is high. Otherwise, the number of detection frames obtained by the low threshold is large.

IoU is obtained from the following equation 9:

FPS is the number of pictures that the model can detect per second, has represented the speed of detection, can be solved by equation 10:

wherein, N is the number of the tested samples, and T is the time required for testing all the samples.

Details of training

In the experiment, the sampling rate R is 4, iterative training is carried out by using an Adam optimizer, and the input image is uniformly scaled to the resolution of 512 x 512. The initial learning rate in the training process is 1e-3, the batch _ size is 4, after 50 epochs are trained, the learning rate is reduced by 10 times, the batch _ size is 4, and 50 epochs are trained again. In addition, in order to accelerate convergence speed, pre-training weights obtained in ImageNet classification task are used for the backbone based on ResNet-50 in the training process.

Analysis of Experimental results

UCAS-AOD remote sensing image data set experimental result

The Loss basically reaches a stable state through 100 long training rounds on a UCAS-AOD remote sensing image data set, in the training process, the Loss function change pair of the original algorithm and the algorithm of the invention is shown in figure 7, and the Loss value is obtained by weighted summation of three parts, namely central point prediction Loss, offset Loss and width and height Loss. As can be seen from the figure, loss tends to be stable before and after the improvement. However, the convergence rate of the four improved model designs is improved before improvement, and the final convergence value is better than that before improvement. Furthermore, SC-CenterN and ST-CenterNet after the attention mechanism is added are optimized in convergence rate and convergence value compared with C-CenterNet and T-CenterNet before the addition of the SC-CenterN and ST-CenterNet. Wherein, compared with other improvements, the ST-CenterNet has the advantages of optimal effect, fastest convergence rate and lower convergence value.

Based on the above equations 6 and 7, the PR curves of the model detection outputs before and after improvement can be calculated and compared with the ST-centrnet with the best effect, as shown in fig. 8, where (a) in fig. 8 is the PR curve of the original algorithm and (b) is the PR curve of the algorithm of the present invention.

In order to verify the detection performance of the algorithm, the invention also trains and contrasts the performance of different detection networks under the condition of the same experimental platform and training data set, the contrast comprises single-stage and two-stage target detection algorithms, and the obtained test results are shown in the following table 2. The improved ST-CenterNet network structure of the present invention improves the average accuracy mAP by 6.22% and 7.23% respectively over the current more popular Faster R-CNN and SSD. Compared with the fast R-CNN, the detection precision and the detection speed are greatly improved. Under the same condition, the detection speed is slightly lower than that of the original network, but the real-time detection effect is achieved, and the feasibility of the algorithm is proved.

TABLE 2 UCAS-AOD remote sensing aircraft data set test results

Table2 UCAS-AOD remote sensing aircraft data set test results

RSOD data set experimental results

In order to further test the algorithm of the invention, the RSOD remote sensing data set is additionally selected for training and testing, and the training and testing sets are also randomly divided according to the ratio of 9: 1. Compared with a UCAS-AOD data set, the data set has more complex picture samples and has interference factors such as shielding, shadow, distortion and the like. After 100 epochs, the total Loss before and after improvement reached steady state, and the total Loss change before and after improvement is shown in fig. 9. As can be seen from the figure, the improved convergence speed and the final convergence value of the four model designs are superior to those of the original algorithm. Comparing the loss changes of C-CenterNet and SC-CenterNet, and T-CenterNet and ST-CenterNet respectively, it can be seen that after the horizontal feature fusion module introduces the attention mechanism, the convergence rate of the model is greatly improved, and the final convergence value is better than that before the attention mechanism is not introduced.

The PR curves of the model detection outputs before and after the improvement are calculated by the equations 6 and 7 in the test process, and compared with the ST-centrnet with the best effect, as shown in fig. 10, wherein (a) in fig. 10 is the PR curve of the original algorithm, and (b) is the PR curve of the algorithm of the present invention.

A comparative experiment was also conducted using the two-stage algorithm, fast-RCNN, and the single-stage algorithm, SSD, with the improved algorithm of the present invention under the same experimental conditions, and the comparative results are shown in Table 3.

TABLE 3 RSOD remote sensing airplane data set test results

Table2 RSOD remote sensing aircraft data set test results

As can be seen from Table 3, the ST-CenterNet algorithm with the optimal detection effect of the invention is respectively improved by 26.42 percent and 43.41 percent in precision compared with the fast-RCNN and the SSD, and is respectively improved by 72 frames/s and 35 frames/s in detection precision. Compared with the original CenterNet algorithm, the detection precision is improved by 18.25%, the false detection rate and the missing detection rate of the network are reduced, and the detection rate of small targets is improved to a certain extent. In addition, the FPS is reduced by 0.06 percent compared with the original algorithm, is in an acceptable range, is still superior to the Faster-RCNN algorithm and the SSD algorithm, and proves that the network structure has good robustness on the detection of the remote sensing image airplane target.

3.2.3 visual comparative analysis

In order to observe the network detection effect before and after the improvement more intuitively, an ST-CenterNet network structure with the highest precision after the improvement is selected, a plurality of groups of representative pictures are selected from an experimental sample for testing, visual pictures of the algorithm before and after the improvement are compared, and the detection result is shown in FIG. 11.

As can be seen from comparison in fig. 11, when the target is a small dispersed target, the target can be well detected before and after the improvement, but the original algorithm has the problem of false detection, and the confidence of the target of the improved algorithm is generally higher than that before the improvement; when the target is a dense small target, the yellow mark shows that the original algorithm has a large number of missed detection conditions, and the missed detection rate of the improved algorithm is reduced. There are still individual false positives, as indicated by the blue marks in the figure; when the target is near a building with a complex background, the original algorithm has a large number of false detection conditions, the false detection rate of the improved algorithm is reduced, and only individual false detection conditions are left to be further optimized. The comparison result shows that the improved algorithm has a certain effect of improving the detection of the small target, and the false detection rate and the missing detection rate are reduced.

The present invention also visually compares the improved ST-CenterNet with the classic two-stage detection algorithm Faster RCNN and the single-stage detection algorithm SSD, respectively, the comparison results are shown in FIG. 12.

As can be seen from fig. 12, the Faster RCNN is better than the algorithm of the present invention in the confidence of detecting the target, but there is still a small target missing detection situation, and compared with the detection speed of a single picture through experiments, the detection speed of the algorithm of the present invention is higher than that of the Faster RCNN by more than 8 times. Compared with the SSD algorithm, when the discrimination of the target and the background is higher, the detection effect of the SSD algorithm is good, but when small targets are involved, the SSD algorithm can only detect two effective targets, the missing rate is high, and the algorithm can still effectively detect the small targets. The comparison result shows the good robustness and the good detection effect of the algorithm.

Ablation experiment

In order to further verify the effectiveness of each improved module in the algorithm, the method selects the RSOD remote sensing image airplane data set with a more complex data sample and the ST-CenterNet model with the highest experimental result precision to perform an ablation experiment. The results of the experiment are shown in table 4 below.

TABLE 4 ablation experiments on RSOD data sets

Table4 Ablation experiment on RSOD dataset

As can be seen from Table 4, the accuracy and recall rate are improved by adding the horizontal fusion module alone compared with the original algorithm, and the precision is improved by 16.63%. After an attention mechanism is introduced into the horizontal fusion module to optimize the weight of each channel, the accuracy is further improved, the average precision is respectively improved by 1.62 percent and 18.25 percent compared with the average precision before and in the original algorithm, and the effectiveness of each module of the algorithm is proved.

The invention tries to apply the anchor frame-free CenterNet algorithm with more balanced precision and speed to the detection of the remote sensing image airplane target, and provides a multi-scale channel attention detection method in order to solve the problems of high false detection, difficult small target detection and the like when the original CenterNet algorithm is used for detecting the remote sensing image. First, feature extraction is performed using an "encode-decode" structure. Secondly, the first step is to carry out the first,

aiming at the problem of low small target detection precision, a horizontal connection module is introduced to fuse deep layer features and shallow layer features so as to improve the small target detection performance. Next, in order to improve detection accuracy and reduce false detection rate, a channel attention module is introduced into the horizontal connection module to optimize response between channels, focus information of interest, and suppress unnecessary information. Finally, a comparison experiment is carried out on a UCAS-AOD and RSOD public remote sensing data set with other mainstream algorithms, the AP on the UCAS-AOD data set reaches 96.78%, compared with the original CenterNet, Faster R-CNN and SSD300, the AP is respectively improved by 16%, 6.22% and 7.23%, and a certain detection speed is kept. Experimental results show that the method has high detection precision in the remote sensing image airplane target detection, maintains the speed advantage of a single-stage detection model, and has certain practical value.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A deep learning target detection method based on central point regression is characterized by comprising the following specific steps:

2. The method of claim 1, wherein the horizontal connection module is a Feature Fusion module, and the Feature Fusion module includes C-centret and T-centret, wherein C-centret has the same spatial size in the Feature layer before Fusion through a standard 1 × 1 convolution, and T-centret changes the standard convolution in C-centret into a hole convolution for testing, and since each Feature value in different layers has different proportions, batch normalization and Relu activation are required after convolution.

3. The method according to claim 2, wherein the channel attention module is a Squeeze-Excitation attention module SE-Net, the Squeeze-Excitation attention module first performs a Squeeze operation on an H × W feature map with an input channel number of C to obtain a 1 × 1 feature map with a channel number of C, then performs an Excitation operation on the obtained feature map to obtain weight values between channels, and finally multiplies the original feature map by the weight of the corresponding channel by a Scale operation to obtain a new feature map, thereby completing updating of channels containing valid information and suppressing of channels containing useless information.

4. The deep learning target detection method based on the centroidal regression is characterized in that a comparison experiment is performed on a UCAS-AOD and RSOD public remote sensing data set, pictures are randomly selected from data set samples in the comparison experiment as a training set, the ratio of training to a testing set is kept to be 9:1, the down-sampling rate R in the experiment is 4, an Adam optimizer is used for iterative training, input images are uniformly scaled to the resolution of 512 x 512, the initial learning rate in the training process is 1e-3, the batch _ size is 4, after 50 epochs are trained, the learning rate is reduced by 10 times, the batch _ size is set to be 4, 50 epochs are trained, and in addition, in order to increase the convergence speed, pre-training weights obtained in an Imageclassification task are used for Res-50 based backbone in the training process.

5. The method for detecting the deep learning target based on the centroid regression as claimed in claim 4, wherein for verifying the detection performance of the method for detecting the deep learning target based on the centroid regression, training and comparison tests are performed on different detection network performances under the same experimental platform and training data set, and the comparison comprises single-stage and two-stage target detection algorithms to obtain the test results.