CN114565959A - Target detection method and device based on YOLO-SD-Tiny - Google Patents

Target detection method and device based on YOLO-SD-Tiny Download PDF

Info

Publication number
CN114565959A
CN114565959A CN202210152654.0A CN202210152654A CN114565959A CN 114565959 A CN114565959 A CN 114565959A CN 202210152654 A CN202210152654 A CN 202210152654A CN 114565959 A CN114565959 A CN 114565959A
Authority
CN
China
Prior art keywords
yolo
tiny
feature
target detection
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210152654.0A
Other languages
Chinese (zh)
Inventor
周斌
沈振冈
李文豪
李艳红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Etah Information Technology Co ltd
Original Assignee
Wuhan Etah Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Etah Information Technology Co ltd filed Critical Wuhan Etah Information Technology Co ltd
Priority to CN202210152654.0A priority Critical patent/CN114565959A/en
Publication of CN114565959A publication Critical patent/CN114565959A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method and a device based on YOLO-SD-Tiny, relating to the field of target detection, wherein the method comprises the following steps: after replacing an activation function adopted in the last CSP-Body and CBL in the Yolov4-Tiny trunk feature extraction network with a Mish activation function, extracting information of a picture to be detected to obtain an effective feature layer; Self-DeConvolation upsampling is carried out on the effective characteristic layer according to the characteristic pyramid network FPN and then output; and predicting the upsampled output value by using a YOLO Head. The invention is suitable for general equipment, in particular to low-performance equipment with lower calculation force, and can improve the accuracy and the detection speed of target detection.

Description

Target detection method and device based on YOLO-SD-Tiny
Technical Field
The invention relates to the field of target detection, in particular to a target detection method and device based on YOLO-SD-Tiny.
Background
Face detection is a very important computer vision task and is an important branch in target detection. The target detection algorithm based on deep learning is mainly divided into two types, one type is based on regional suggestions, and the other type is not based on regional suggestions.
The target detection algorithm based on the regional suggestion mainly comprises R-CNN, Fast R-CNN, Faster R-CNN and the like, and the target detection algorithm based on the regional suggestion is divided into two steps, namely, a series of candidate boxes are generated based on a target, and then classification and coordinate regression are carried out through a convolutional neural network. The target detection algorithm based on the regional suggestion has high accuracy, but the model parameters are often overlarge and the real-time performance is poor.
The target detection algorithm without the region suggestion mainly comprises a YOLO series (You Only Look one) algorithm, the candidate box generation process and the classification return are combined together, the calculation complexity of the neural network is greatly reduced, and the accuracy is lower compared with a two-stage method based on the region suggestion. The YOLO algorithm is also continuously developed, and the speed and the precision of more algorithms such as YOLOV4 which are currently applied are greatly improved compared with those of the series of early algorithms.
In recent years, some elaborately designed network models and operators have been proposed, which greatly reduce the amount of computation and parameters. For example, YOLOV4-Tiny is proposed on the basis of YOLOV4, the structure of YOLOV4-Tiny is a simplified version of YOLOV4, the model belongs to a lightweight model, only 600 thousands of parameters are equivalent to one tenth of the original parameters, and therefore the detection speed is greatly improved.
The mainstream target detection algorithms often need high-computation-power equipment, but the mobile equipment and the embedded equipment cannot support such complex models in computation power, and a target detector with high accuracy and high real-time performance, which can better meet different computation-power equipment, is lacking at present.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a target detection method based on YOLO-SD-Tiny, which is suitable for general equipment, particularly low-performance equipment with low calculation power, and can improve the accuracy and detection speed of target detection.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
a target detection method based on YOLO-SD-Tiny, the method includes the following steps:
after replacing an activation function adopted in the last CSP-Body and CBL in the Yolov4-Tiny trunk feature extraction network with a Mish activation function, extracting information of a picture to be detected to obtain an effective feature layer;
Self-Deconvolation upsampling and outputting the effective characteristic layer according to the characteristic pyramid network FPN;
and predicting the upsampled output value by using a YOLO Head.
In some embodiments, the performing Self-deconvolation upsampling on the valid feature layer according to the feature pyramid network FPN and outputting the upsampling includes:
compressing the number of channels of the significant feature layer F with the shape of H × W × C to C by 1 × 1 convolutionr
Setting the up-sampling rate to be sigma, and carrying out convolution based on the compressed effective characteristic layer F
Figure BDA0003511198420000021
For outputting feature layers
Figure BDA0003511198420000022
A point l intPredicting an upsampling kernel associated with location information
Figure BDA0003511198420000023
Wherein
Figure BDA0003511198420000024
The obtained core
Figure BDA0003511198420000031
Obtained after a weighted sum operator reshape
Figure BDA0003511198420000032
Wherein
Figure BDA0003511198420000033
Wherein
Figure BDA0003511198420000034
For outputting feature layers
Figure BDA0003511198420000035
Midpoint ltTheta is the weighted sum operator,
Figure BDA0003511198420000036
kareaa neighborhood of a point in the significance signature F, kencoderIs a ratio of kareaA neighborhood of one smaller region;
outputting feature layers
Figure BDA0003511198420000037
Point l intMapping back to the point l corresponding to the effective characteristic layer F, and taking out the k taking l as the centerarea×kareaRegion, and up-sampling kernel of the predicted point
Figure BDA0003511198420000038
And performing dot product to obtain an output value.
In some embodiments, after obtaining the output value, the method further includes:
normalization of the kernel σ H × σ W × k by softmaxarea×kareaMaking the sum of the kernel weights 1;
in some embodiments, when predicting the upsampled output value using the YOLO Head, the CIOU loss is used as the bounding box regression loss.
In some embodiments, when predicting the upsampled output value by using the YOLO Head, the GHM loss is used as the classification loss.
The invention provides a target detection device based on YOLO-SD-Tiny, which is suitable for general equipment, particularly low-performance equipment with low calculation capacity, and can improve the accuracy rate and detection speed of target detection.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
a YOLO-SD-Tiny-based target detection device, comprising:
the system comprises a trunk feature extraction network, a detection layer and a feature layer, wherein the trunk feature extraction network is formed by replacing an activation function adopted in the last CSP-Body and CBL in the Yolov4-Tiny trunk feature extraction network with a Mish activation function and is used for extracting information of a picture to be detected to obtain an effective feature layer;
a feature pyramid network FPN, configured to perform Self-deconvoltation upsampling on the effective feature layer and output the upsampled;
YOLO Head, which is used to predict the up-sampled output value.
In some embodiments, the feature pyramid network FPN comprises a Self-DeConvolution computation unit comprising:
an upsampling kernel prediction module to:
compressing the number of channels of the significant feature layer F with the shape of H × W × C to C by 1 × 1 convolutionr
Setting the up-sampling rate to be sigma, and carrying out convolution based on the compressed effective characteristic layer F
Figure BDA0003511198420000041
For outputting feature layers
Figure BDA0003511198420000042
A point l intPredicting an upsampling kernel associated with location information
Figure BDA0003511198420000043
Wherein
Figure BDA0003511198420000044
The obtained core
Figure BDA0003511198420000045
Obtained after a weighted sum operator reshape
Figure BDA0003511198420000046
Wherein
Figure BDA0003511198420000047
Wherein
Figure BDA0003511198420000048
For outputting feature layers
Figure BDA0003511198420000049
Midpoint ltTheta is the weighted sum operator,
Figure BDA00035111984200000410
kareaa neighborhood of a point in the significance signature F, kencoderIs a ratio of kareaA neighborhood of one smaller region;
a feature traversal module to: outputting feature layers
Figure BDA00035111984200000411
Point l intMapping back to the point l corresponding to the effective characteristic layer F, and taking out the k taking l as the centerarea×kareaRegion, and up-sampling kernel of the predicted point
Figure BDA00035111984200000412
And performing dot product to obtain an output value.
In some embodiments, after obtaining the output value, the upsampling kernel prediction module is further to:
normalization of the kernel σ H × σ W × k by softmaxarea×kareaMaking the sum of the kernel weights 1;
in some embodiments, when predicting the upsampled output value using the YOLO Head, the CIOU loss is used as the bounding box regression loss.
In some embodiments, when predicting the upsampled output value by using the YOLO Head, the GHM loss is used as the classification loss.
Compared with the prior art, the invention has the advantages that:
according to the invention, a YOLO-SD-Tiny model is provided for the problems that the target detection model is too large to be deployed on low-performance equipment, the real-time performance is poor and the like, and a Mish activation function MCSP-Body is introduced into a trunk feature extraction network part to enable information to flow into a network better; and an SD module is introduced into the characteristic pyramid network part to accelerate the speed and the receptive field of characteristic fusion. According to experimental result analysis, the YOLO-SD-Tiny provided by the invention on the OccludeFace data set is improved by 6.35% in AP and improved by 9.64% in detection speed compared with the YOLOv4-Tiny, and the problems of detection speed and accuracy are solved to a certain extent.
Drawings
FIG. 1 is a flow chart of a target detection method based on YOLO-SD-Tiny in the embodiment of the present invention;
FIG. 2 is a schematic diagram of a YOLO-SD-Tiny overall network model in an embodiment of the present invention;
FIG. 3 is a comparison graph of Mish activation function and LeakyReLU activation function curves in an embodiment of the present invention;
FIG. 4 is a flow chart of upsampling kernel prediction in an embodiment of the present invention;
FIG. 5 is a flow chart of feature traversal in an embodiment of the present invention.
Detailed Description
For the purpose of making the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, an embodiment of the present invention provides a method for detecting a target based on YOLO-SD-Tiny, the method includes the following steps:
s1, after an activation function adopted in the last CSP-Body and CBL in the YoloV4-Tiny trunk feature extraction network is replaced by a Mish activation function, information extraction is carried out on a picture to be detected to obtain an effective feature layer.
The Yolov4-Tiny trunk feature extraction network comprises two CBLs, three CSP-bodies and one CBL in sequence.
It is worth noting that CBL stands for Convolution (Convolution), Batch Normalization (Batch Normalization) and LeakyReLU activation functions. The CSP-Body consists of three CBL structures and one Maxpool, the CSP-Body divides the characteristic diagram transmitted from the upper layer into two parts, and then the two parts are combined through a cross-stage hierarchical structure, and the CSP-Body can enhance the learning capacity of the neural network through a residual structure, thereby reducing the memory occupation and the calculation amount while ensuring that the precision of the neural network is not lost.
Referring to fig. 2, in the present embodiment, the leak relu activation function used in the fifth CSP-Body and the sixth CBL is replaced with the Mish activation function. The substitutions are denoted MCSP-body and CBM, respectively. During sampling, upsampling is carried out based on Self-deconvoltation (SD), so the model in the embodiment of the invention is called YOLO-SD-Tiny.
When 416 × 416 is taken as an input, the integral network model of YOLO-SD-Tiny is shown in fig. 2, and as can be seen from fig. 2, YOLO-SD-Tiny is divided into three parts, namely, a trunk feature extraction network, a feature pyramid network and a YOLO Head, wherein the trunk feature extraction network is composed of two CBLs, two CSP-bodies, one MCSP-Body and CMB. The MCSP-Body replaces the CBL structure in CSP-Body with a CBM structure based on the Mish activation function, resulting in better information flow into the network.
The activation function can complete the nonlinear change from the input to the output of the neuron, and has important significance for the training of the neural network. The activation functions commonly used by neural networks are Sigmoid, Tanh, ReLU, LeakyReLU, etc., but they all have certain disadvantages. Taking the ReLU activation function as an example, when the input is negative, the gradient becomes zero, resulting in the gradient disappearing; the LeakyReLU activation function allows slight negative gradients when accepting negative inputs, and avoids the effect of gradient disappearance caused by negative inputs to some extent.
The equation for the computation of the Mish activation function is as follows:
f(x)=xtanh(softplus(x))=xtanh(ln(1+ex))
the value range of the Mish activation function is [ ≈ -0.31, + ∞ ]. A comparison of the curves of the mesh activation function and the LeakyReLU activation function is shown in FIG. 3. As can be seen from fig. 3, the hash activation function allows a slight negative value to exist in the value range and a maximum value is not set, so that gradient flow with better effect is brought, and the input of the neural network can enable information to go deep into the network better after being mapped by the smooth activation function, so as to obtain better accuracy.
S2, performing Self-deconvoltation upsampling on the effective characteristic layer according to the characteristic pyramid network FPN and outputting the upsampled effective characteristic layer.
The FPN, i.e. the feature pyramid network, is a top-down feature fusion method. The feature pyramid network adopts simple fusion from top to bottom, namely, a higher-level feature map which is more abstract and has stronger semantic meaning is sampled, and the feature map obtained by the upsampling is connected to a previous-level feature map through the transverse direction. The high-level features can be fused to the shallow features to help the shallow features to better detect the target. The traditional up-sampling is based on an interpolation method, the interpolation method cannot utilize semantic information of a feature map, and the receptive field is small.
For this reason, in this embodiment, upsampling is performed based on Self-deconvolation (SD for short), the SD involves two modules, the first module is an upsampling kernel prediction module, and the second module is a feature traversal module. For a valid feature layer F with a shape of H × W × C, given an integer upsampling rate of σ, SD will then result in an output feature layer with a shape of σ H × σ W × C
Figure BDA0003511198420000071
For output feature layer
Figure BDA0003511198420000072
A certain point l int=(xt,yt) In the effective feature layer F, a point l ═ (x, y) can be found, corresponding to this, where
Figure BDA0003511198420000073
Denote the neighborhood of l as N (F)l,k)。
Step S2 includes an upsampling kernel prediction process and a feature traversal process, and specifically includes:
s21, compressing the number of channels of the effective characteristic layer F with the shape of H multiplied by W multiplied by C to C through 1 multiplied by 1 convolutionr
S22, setting the up-sampling rate to be sigma, and carrying out convolution on the compressed effective characteristic layer F
Figure BDA0003511198420000081
For outputting feature layers
Figure BDA0003511198420000082
A point l intPredicting an upsampling kernel associated with location information
Figure BDA0003511198420000083
Wherein
Figure BDA0003511198420000084
S23, obtaining the core
Figure BDA0003511198420000085
Obtained after a weighted sum operator reshape
Figure BDA0003511198420000086
Wherein
Figure BDA0003511198420000087
Wherein
Figure BDA0003511198420000088
Is output asFeature layer
Figure BDA0003511198420000089
Midpoint ltTheta is a weighted sum operator,
Figure BDA00035111984200000810
kareaa neighborhood of a point in the significance signature F, kencoderIs a ratio of kareaA neighborhood of one smaller region;
referring to FIG. 4, it can be understood that in the upsampling kernel prediction module, the number of channels is first compressed to C by 1 × 1 convolutionrFollowed by convolution
Figure BDA00035111984200000811
For outputting feature layers
Figure BDA00035111984200000812
A point l intPredicting an upsampling kernel associated with location information
Figure BDA00035111984200000813
At this stage the parameter is kencoder×kencoder×Cr×σ2×karea 2Wherein k isencoder=karea-1。
S24, outputting the characteristic layer
Figure BDA00035111984200000814
Point l intMapping back to the point l corresponding to the effective characteristic layer F, and taking out the k taking l as the centerarea×kareaRegion, and up-sampling kernel of the predicted point
Figure BDA00035111984200000815
And performing dot product to obtain an output value.
It will be appreciated that the feature traversal procedure is shown in FIG. 5 for point l at the output feature level (upsampling kernel)tMapping it back to the point l corresponding to the effective feature layer F, and taking out the k centered on itarea×kareaAnd performing dot product on the predicted up-sampling kernel of the point to obtain an output value. Different channels at the same position share the same up-sampling kernel, so that a neighborhood map N (F) is provided for one point l in the effective feature layer Fl,karea) K at larea×kareaRegion per pixel point pair output feature layer
Figure BDA0003511198420000091
Corresponding pixel point ltThe contribution of (2) is different, based on the content of the features rather than the distance of the positions, so that the feature map semantics from feature reorganization can be stronger than the original feature map, because each pixel point can focus on the information from the relevant point in the local region.
And S3, predicting the up-sampled output value by using a YOLO Head.
It should be noted that the loss function of YOLO-SD-Tiny in this embodiment is divided into three parts, namely, confidence loss, classification loss, and bounding box regression loss.
In a preferred embodiment, the CIOU loss function is used as the bounding box regression loss. The IOU refers to the ratio of the intersection and the union of the prediction box and the real box, and is used as a measurement standard of the regression accuracy of the bounding box, and the calculation formula of the IOU and the CIOU is as follows:
Figure BDA0003511198420000092
wherein B is a prediction box, BgtIs a real frame.
Figure BDA0003511198420000093
Wherein, b and bgtRespectively representing the central points of the prediction frame and the real frame, alpha is a weight function, v is a parameter for measuring the aspect ratio of the boundary frame, and c represents the diagonal distance of the minimum closure area which can simultaneously contain the prediction frame and the real frame.
Figure BDA0003511198420000094
In terms of classification loss, ghm (differential hardbanding mechanism) loss is introduced to solve the problem of positive and negative sample imbalance and the problem of particularly difficult samples (outliers). The gradient mode length d of the outliers is much larger than the average samples, which may reduce the accuracy of the model if the model is forced to focus on these samples. In order to simultaneously attenuate the easily separable sample and the particularly difficult separable sample, the gradient density gd (g) is proposed, and the calculation formula is as follows:
Figure BDA0003511198420000095
wherein, deltaε(gkG) shows that in samples 1-N, the gradient mode length is distributed in
Figure BDA0003511198420000101
Number of samples in the range,/ε(g) Represent
Figure BDA0003511198420000102
The physical meaning of the interval, and thus the gradient density gd (g), is the total number of samples per unit of the portion of the gradient modulo length g. Next, the GHM loss can be obtained by multiplying the cross entropy and the inverse of the gradient density of the sample, and the calculation formula is as follows:
Figure BDA0003511198420000103
Figure BDA0003511198420000104
where N is the total number of samples, LCE(pi,pi *) Is a binary cross entropy loss, p ∈ [0,1 ]]Is the probability of model prediction, p*E {0,1} is a true label for a certain classAnd (6) a label.
Therefore, in the embodiment of the invention, the loss function of the YOLO-SD-Tiny adopts CIOU loss to accelerate the regression speed of the boundary box in the regression loss of the boundary box, and adopts GHM loss to solve the problem of unbalance of positive and negative samples and the problem of particularly difficultly-classified samples (outliers) in the classification loss.
The integral process of the YOLO-SD-Tiny target detection in the embodiment of the invention is as follows:
firstly, an input image is divided by using S multiplied by S grids, each grid in the S multiplied by S grids is only responsible for predicting a target with a central point falling in the grid, and 3 prediction frames are calculated, and each prediction frame corresponds to 5+ C values; where C represents the total number of categories in the dataset and 5 represents the predicted bounding box center point coordinates (x, y), the predicted box width height dimension (w, h) and the confidence. Then, the class confidence of the network prediction is solved, which is related to the probability P (n) that the target falls into the gridobject) And predicting the accuracy P (n) of the ith target by gridsclass|nobject) And the intersection ratio (IOU) are related, and the expression is as follows:
Figure BDA0003511198420000105
if the target center falls into the grid, P (n)object) 1, otherwise 0;
Figure BDA0003511198420000106
is the intersection ratio between the prediction box and the real box. Finally, the DIOU NMS is used to screen out the prediction box with the highest score as the target detection box, and the output feature maps are 26 × 26 and 13 × 13, respectively, so as to achieve the positioning and classification of the target, it should be noted that the NMS is a necessary post-processing step in the target detection, and is intended to remove the repeated boxes, leaving the most accurate box, and the DIOU NMS suggests that two boxes with farther center points may be located on different objects, and should not be deleted (this is the biggest difference between the DIOU NMS and the NMS).
It is worth to be noted that, there are various performance evaluation indexes in the field of target detection, for example, the model can be evaluated by using the most widely used accuracy (Precision) and Recall (Recall) at present, and the calculation formula is as follows:
Figure BDA0003511198420000111
Figure BDA0003511198420000112
wherein, the accuracy rate P is used for evaluating the prediction result, TP (true Positive) represents the number of positive samples correctly predicted by the model, FP (false positive) represents the number of negative samples predicted as positive samples; the recall rate is used to evaluate samples, indicating how many positive samples of all samples were correctly predicted, and fn (false negative) indicates that the model predicts what would otherwise be positive samples as negative samples.
The AP represents the area surrounded by PR curves and coordinates formed by Precision and Recall of a single type under different confidence degree thresholds, the accuracy and the Recall rate are comprehensively considered, and the identification effect evaluation of single type target detection is comprehensive. The FPS is the number of images which can be detected by the model in one second, and the larger the FPS value is, the faster the detection speed of the model is.
The method comprises the steps of carrying out comparative analysis on a data set OccludeFace by using a YOLO-SD-Tiny algorithm and YOLOv4-Tiny in the embodiment of the invention, carrying out ablation experiments on the YOLO-SD-Tiny (with MCSP-Body) and the YOLO-SD-Tiny (with GHM & CIOU), verifying the influence of different modules on a model, wherein the experiment results are shown in table 1, and the MCSP-Body introduced based on a Mish activation function is improved by 0.67% on AP compared with the YOLOv4-Tiny, so that the gradient does not disappear, and the function is smoothly activated, so that information can be better embedded into a network, and the detection accuracy is improved. The yield of GHM (loss of GHM) introduced in the classification loss part and the yield of CIOU introduced in the boundary box regression part is improved by 2.09% on AP compared with the original yield of YOLOv4-Tiny model, which shows that the detection accuracy of the model can be improved by comprehensively considering the CIOU of the overlapping area, the central point and the aspect ratio and solving the GHM loss of positive and negative samples and difficultly-divided samples. YOLO-SD-Tiny showed 6.35% increase in AP and 9.64% increase in assay rate FPS compared to YOLOv 4-Tiny. By combining the comparison of various experimental data in table 1, it can be verified that the improved method provided by the invention can effectively improve the target detection precision and detection speed.
TABLE 1
Figure BDA0003511198420000121
In conclusion, the invention provides the YOLO-SD-Tiny model for the problems of incapability of deployment on low-performance equipment, poor real-time performance and the like caused by overlarge target detection model, introduces the MCSP-Body based on the Mish activation function in the trunk feature extraction network part, and leads information to flow into the network better; and an SD module is introduced into the characteristic pyramid network part to accelerate the speed and the receptive field of characteristic fusion. According to experimental result analysis, the YOLO-SD-Tiny provided by the invention on the OccludeFace data set is improved by 6.35% in AP and improved by 9.64% in detection speed compared with the YOLOv4-Tiny, and the problems of detection speed and accuracy are solved to a certain extent.
Meanwhile, the embodiment of the invention also provides a target detection device based on the YOLO-SD-Tiny, which comprises a trunk feature extraction network, a feature pyramid network FPN and a YOLO Head.
The trunk feature extraction network is formed by replacing an activation function adopted in the last CSP-Body and CBL in the YOLOV4-Tiny trunk feature extraction network with a Mish activation function, and is used for extracting information of a picture to be detected to obtain an effective feature layer.
And the feature pyramid network FPN is used for performing Self-deconvoltation upsampling on the effective feature layer and outputting the upsampled effective feature layer. YOLO Head is used to predict the up-sampled output value.
In some embodiments, feature pyramid network FPN includes a Self-deconvolation computation unit that includes an upsampling kernel prediction module and a feature traversal module.
An upsampling kernel prediction module to:
compressing the number of channels of the significant feature layer F with shape H × W × C to C by 1 × 1 convolutionr
Setting the up-sampling rate to be sigma, and carrying out convolution based on the compressed effective characteristic layer F
Figure BDA0003511198420000131
For outputting feature layers
Figure BDA0003511198420000132
A point of (l)tPredicting an upsampling kernel associated with location information
Figure BDA0003511198420000133
Wherein
Figure BDA0003511198420000134
The obtained core
Figure BDA0003511198420000135
Obtained after a weighted sum operator reshape
Figure BDA0003511198420000136
Wherein
Figure BDA0003511198420000137
Wherein
Figure BDA0003511198420000138
For outputting feature layers
Figure BDA0003511198420000139
Midpoint ltTheta is the weighted sum operator,
Figure BDA00035111984200001310
kareaa neighborhood of a point in the significance signature F, kencoderIs a ratio of kareaA neighborhood of one smaller region;
a feature traversal module for: outputting feature layers
Figure BDA00035111984200001311
Point l intMapping back to the point l corresponding to the effective characteristic layer F, and taking out the k taking l as the centerarea×kareaAnd performing dot product on the region and the predicted up-sampling kernel of the point to obtain an output value.
In some embodiments, after obtaining the output value, the upsampling kernel prediction module is further to:
normalization of the kernel σ H × σ W × k by softmaxarea×kareaMaking the sum of the kernel weights 1;
in some embodiments, when predicting the upsampled output value using the YOLO Head, the CIOU loss is used as the bounding box regression loss.
In some embodiments, when predicting the upsampled output value by using the YOLO Head, the GHM loss is used as the classification loss.
According to the face detection device based on YOLOV4-Tiny, a Mish activation function MCSP-Body is introduced into a main feature extraction network part, so that information can flow into a network better; and an SD module is introduced into the characteristic pyramid network part to accelerate the characteristic fusion speed and the receptive field. According to experimental result analysis, the YOLO-SD-Tiny provided by the invention on the OccludeFace data set is improved by 6.35% in AP and improved by 9.64% in detection speed compared with the YOLOv4-Tiny, and the problems of detection speed and accuracy are solved to a certain extent.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A target detection method based on YOLO-SD-Tiny is characterized by comprising the following steps:
replacing an activation function adopted in the last CSP-Body and CBL in the Yolov4-Tiny trunk feature extraction network with a Mish activation function, and extracting information of a picture to be detected to obtain an effective feature layer;
Self-DeConvolation upsampling is carried out on the effective characteristic layer according to the characteristic pyramid network FPN and then output;
and predicting the upsampled output value by using a YOLO Head.
2. The YOLO-SD-Tiny-based target detection method of claim 1, wherein the seif-deconvoitation upsampling and outputting the valid feature layers according to the feature pyramid network FPN comprises:
compressing the number of channels of the significant feature layer F with the shape of H × W × C to C by 1 × 1 convolutionr
Setting the up-sampling rate to be sigma, and carrying out convolution based on the compressed effective characteristic layer F
Figure FDA0003511198410000011
For outputting feature layers
Figure FDA0003511198410000012
A point l intPredicting an upsampling kernel associated with location information
Figure FDA0003511198410000013
Wherein
Figure FDA0003511198410000014
The obtained core
Figure FDA0003511198410000015
Obtained after a weighted sum operator reshape
Figure FDA0003511198410000016
Wherein
Figure FDA0003511198410000017
Wherein
Figure FDA0003511198410000018
For outputting feature layers
Figure FDA0003511198410000019
Midpoint ltTheta is the weighted sum operator,
Figure FDA00035111984100000110
kareaa neighborhood of a point in the significance signature F, kencoderIs a ratio of kareaA neighborhood of one smaller region;
will output the feature layer
Figure FDA00035111984100000111
Point l intMapping back to the point l corresponding to the effective characteristic layer F, and taking out the k taking l as the centerarea×kareaRegion, and up-sampling kernel of the predicted point
Figure FDA00035111984100000112
And performing dot product to obtain an output value.
3. The YOLO-SD-Tiny-based target detection method of claim 2, further comprising, after obtaining the output value:
normalization of the kernel σ H × σ W × k by softmaxarea×kareaSo that the sum of the kernel weights is 1.
4. The YOLO-SD-Tiny-any based target detection method of claim 1, wherein when predicting the up-sampled output value using YOLO Head, CIOU loss is used as the bounding box regression loss.
5. The YOLO-SD-Tiny-any based target detection method of claim 1, wherein the GHM loss is used as the classification loss when predicting the up-sampled output value using YOLO Head.
6. A target detection device based on YOLO-SD-Tiny, characterized by comprising:
the system comprises a trunk feature extraction network and a detection module, wherein the trunk feature extraction network is formed by replacing an activation function adopted in the last CSP-Body and CBL in the YOLOV4-Tiny trunk feature extraction network with a Mish activation function and is used for extracting information of a picture to be detected to obtain an effective feature layer;
a feature pyramid network FPN, configured to perform Self-deconvoltation upsampling on the effective feature layer and output the upsampled;
YOLO Head, which is used to predict the up-sampled output value.
7. The YOLO-SD-Tiny based target detection apparatus of claim 6, wherein the feature pyramid network FPN comprises a Self-deconvolition calculation unit, the Self-deconvolition calculation unit comprising:
an upsampling kernel prediction module to:
compressing the number of channels of the significant feature layer F with the shape of H × W × C to C by 1 × 1 convolutionr
Setting the up-sampling rate to sigma, based on the compressedThe effective feature layer F is convolved
Figure FDA0003511198410000021
For outputting feature layers
Figure FDA0003511198410000031
A point l intPredicting an upsampling kernel associated with location information
Figure FDA0003511198410000032
Wherein
Figure FDA0003511198410000033
The obtained core
Figure FDA0003511198410000034
Obtained after a weighted sum operator reshape
Figure FDA0003511198410000035
Wherein
Figure FDA0003511198410000036
Wherein
Figure FDA0003511198410000037
For outputting feature layers
Figure FDA0003511198410000038
Midpoint ltTheta is the weighted sum operator,
Figure FDA0003511198410000039
kareaa neighborhood of a point in the significance signature F, kencoderIs a ratio of kareaA neighborhood of one smaller region;
a feature traversal module to: outputting feature layers
Figure FDA00035111984100000310
Point l intMapping back to the point l corresponding to the effective characteristic layer F, and taking out the k taking l as the centerarea×kareaRegion, and up-sampling kernel of the predicted point
Figure FDA00035111984100000311
And performing dot product to obtain an output value.
8. The YOLO-SD-Tiny based target detection apparatus of claim 7, wherein the upsampling kernel prediction module is further configured to, after obtaining the output value.
Normalization of the kernel σ H × σ W × k by softmaxarea×kareaMaking the sum of the kernel weights 1;
9. the YOLO-SD-Tiny-any based target detection method of claim 6, wherein when predicting the up-sampled output value using YOLO Head, CIOU loss is used as the bounding box regression loss.
10. The YOLO-SD-Tiny-any based target detection method of claim 6, wherein the GHM loss is used as the classification loss when predicting the up-sampled output value using YOLO Head.
CN202210152654.0A 2022-02-18 2022-02-18 Target detection method and device based on YOLO-SD-Tiny Pending CN114565959A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210152654.0A CN114565959A (en) 2022-02-18 2022-02-18 Target detection method and device based on YOLO-SD-Tiny

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210152654.0A CN114565959A (en) 2022-02-18 2022-02-18 Target detection method and device based on YOLO-SD-Tiny

Publications (1)

Publication Number Publication Date
CN114565959A true CN114565959A (en) 2022-05-31

Family

ID=81713124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210152654.0A Pending CN114565959A (en) 2022-02-18 2022-02-18 Target detection method and device based on YOLO-SD-Tiny

Country Status (1)

Country Link
CN (1) CN114565959A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115546473A (en) * 2022-12-01 2022-12-30 珠海亿智电子科技有限公司 Target detection method, apparatus, device and medium
CN115731533A (en) * 2022-11-29 2023-03-03 淮阴工学院 Vehicle-mounted target detection method based on improved YOLOv5
CN117218606A (en) * 2023-11-09 2023-12-12 四川泓宝润业工程技术有限公司 Escape door detection method and device, storage medium and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731533A (en) * 2022-11-29 2023-03-03 淮阴工学院 Vehicle-mounted target detection method based on improved YOLOv5
CN115731533B (en) * 2022-11-29 2024-04-05 淮阴工学院 Vehicle-mounted target detection method based on improved YOLOv5
CN115546473A (en) * 2022-12-01 2022-12-30 珠海亿智电子科技有限公司 Target detection method, apparatus, device and medium
CN117218606A (en) * 2023-11-09 2023-12-12 四川泓宝润业工程技术有限公司 Escape door detection method and device, storage medium and electronic equipment
CN117218606B (en) * 2023-11-09 2024-02-02 四川泓宝润业工程技术有限公司 Escape door detection method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN111126472B (en) SSD (solid State disk) -based improved target detection method
CN114565959A (en) Target detection method and device based on YOLO-SD-Tiny
CN111709416B (en) License plate positioning method, device, system and storage medium
CN109117876A (en) A kind of dense small target deteection model building method, model and detection method
CN110991444B (en) License plate recognition method and device for complex scene
CN113920107A (en) Insulator damage detection method based on improved yolov5 algorithm
CN112070713A (en) Multi-scale target detection method introducing attention mechanism
CN112288008A (en) Mosaic multispectral image disguised target detection method based on deep learning
CN111444865B (en) Multi-scale target detection method based on gradual refinement
CN110222604A (en) Target identification method and device based on shared convolutional neural networks
CN113159216B (en) Positive sample expansion method for surface defect detection
CN116342894B (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN114841972A (en) Power transmission line defect identification method based on saliency map and semantic embedded feature pyramid
CN114049572A (en) Detection method for identifying small target
CN115512387A (en) Construction site safety helmet wearing detection method based on improved YOLOV5 model
CN115965862A (en) SAR ship target detection method based on mask network fusion image characteristics
CN116824335A (en) YOLOv5 improved algorithm-based fire disaster early warning method and system
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
Wu et al. Single shot multibox detector for vehicles and pedestrians detection and classification
CN112825116B (en) Method, device, medium and equipment for detecting and tracking human face of monitoring video image
CN116758411A (en) Ship small target detection method based on remote sensing image pixel-by-pixel processing
CN110163081A (en) Regional invasion real-time detection method, system and storage medium based on SSD
Jiangzhou et al. Research on real-time object detection algorithm in traffic monitoring scene
CN115171074A (en) Vehicle target identification method based on multi-scale yolo algorithm
CN114782983A (en) Road scene pedestrian detection method based on improved feature pyramid and boundary loss

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination