CN115272894A - Unmanned aerial vehicle-oriented image target detection method and device, electronic equipment and storage medium - Google Patents

Unmanned aerial vehicle-oriented image target detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115272894A
CN115272894A CN202210917031.8A CN202210917031A CN115272894A CN 115272894 A CN115272894 A CN 115272894A CN 202210917031 A CN202210917031 A CN 202210917031A CN 115272894 A CN115272894 A CN 115272894A
Authority
CN
China
Prior art keywords
model
target detection
aerial vehicle
unmanned aerial
pruning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210917031.8A
Other languages
Chinese (zh)
Inventor
王素玉
张磊
张宏宇
周伯翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202210917031.8A priority Critical patent/CN115272894A/en
Publication of CN115272894A publication Critical patent/CN115272894A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The invention discloses a method and a device for detecting an image target facing an unmanned aerial vehicle, electronic equipment and a storage medium, wherein the method comprises the steps of obtaining an aerial image data set of the unmanned aerial vehicle and processing the image data set; constructing an unmanned aerial vehicle image target detection model; training the target detection model by using the image data set to obtain a final model; wherein, the unmanned aerial vehicle image target detection model is constructed by the following steps: replacing a backbone network of the Yolov5s model with a MobileNet V3_ Small network; cutting an original MobileNet V3_ Small network, and removing the last 4 layers of the original design for a classification task, wherein the last 4 layers comprise 3 convolution layers and 1 pooling layer; and taking the spatial pyramid pooling structure SPPF as the last layer of the MobileNet V3_ Small network. According to the invention, the lightweight MobileNet V3_ Small network is constructed, and the lightweight network is trained, so that the model not only ensures the speed, but also ensures the precision.

Description

Unmanned aerial vehicle-oriented image target detection method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of images, in particular to an unmanned aerial vehicle-oriented image target detection method and device, electronic equipment and a storage medium.
Background
In the deep neural network, the effect of the model is improved by superposing a large number of convolutional layers, but the network comprises a large number of redundant parameters by increasing the number of network layers. In practical application, the redundant parameters increase the prediction time, and meanwhile, the requirement on the memory is higher, so that the model lightweight becomes more important on the equipment with limited performance, such as an unmanned aerial vehicle and the like. At present, there are two main methods for model weight reduction, one is compression based on the existing complex model, and the other is to design a lightweight network structure. MobileNet V3 designs a light-weight network structure, and effectively improves the reasoning speed and performance of a mobile embedded terminal. The pruning lightweight method improves the deployment capability of the model on the premise of limited hardware resource conditions. However, the balance between the precision and the speed is difficult to achieve by simply adopting a light weight method, so that various light weight strategies facing the target detection of the aerial image of the unmanned aerial vehicle are designed, the model parameters and the calculated amount are reduced, and the target detection precision is improved as much as possible.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an unmanned aerial vehicle-oriented image target detection method, an unmanned aerial vehicle-oriented image target detection device, electronic equipment and a storage medium.
The invention discloses an unmanned aerial vehicle-oriented image target detection method, which comprises the following steps:
acquiring an unmanned aerial vehicle aerial image data set, and processing the image data set;
constructing an unmanned aerial vehicle image target detection model;
training the target detection model by the image data set to obtain a final model;
wherein, the construction of the unmanned aerial vehicle image target detection model comprises the following steps:
replacing a backbone network of the Yolov5s model with a MobileNet V3_ Small network;
cutting the original MobileNet V3_ Small network, and removing the last 4 layers of the original design for classification tasks, wherein the last 4 layers comprise 3 convolutional layers and 1 pooling layer;
and taking the spatial pyramid pooling structure SPPF as the last layer of the MobileNet V3_ Small network.
Preferably, training the image dataset on the object detection model comprises:
performing sparse training on the target detection model by using L1 regularization;
after sparse training, calculating to obtain a pruning evaluation index according to the sum of the scale parameter of the BN layer and the absolute value of the filter;
based on a set pruning threshold, pruning channels corresponding to the pruning evaluation indexes lower than the pruning threshold;
and taking the cut model as a student model, taking the original Yolov5s model as a teacher model, and carrying out knowledge distillation training under the supervision of the teacher model to obtain the final model.
Preferably, the sparse training using L1 regularization on the target detection model comprises:
the method for acquiring the pruning channel according to the BN layer comprises the following formula:
Figure BDA0003775944200000021
sparse training is carried out on the scale parameters of the BN layer, the numerical value of the BN layer is enabled to be continuously close to 0, and the formula is as follows:
Figure BDA0003775944200000022
the formula of the loss function of the target detection model is as follows:
Figure BDA0003775944200000023
in the formula: z is a linear or branched member in 、Z out Input features and output features, respectively; gamma is a scale parameter; beta is a deviation; mu.s c And σ c Respectively the mean value and the variance of the current batch; epsilon is a parameter;
Figure BDA0003775944200000024
is the original loss function; lambda sigma λ∈τ g (gamma) is a penalty item of the lamination weight parameter and the BN layer scale parameter; λ is a regularization coefficient; x and W are input image characteristics and convolution layer weight parameters respectively; g (. Gamma.) is L1.
Preferably, after sparse training, the step of obtaining a pruning evaluation index by calculation according to the sum of the scale parameter of the BN layer and the absolute value of the filter comprises:
the sum of absolute values of the filters is as follows:
Figure BDA0003775944200000025
in the formula: k is m Is the sum of the absolute values of the weighting parameters for the filter m; i W i The absolute value of the weight of the ith convolution kernel in the convolution kernel m is |; j is the total number of convolution kernels in the current filter m;
the calculation formula of the pruning evaluation index is as follows:
s i =γ×K i
in the formula: s is i Is the pruning standard score of the filter i, gamma is the scale parameter of a BN layer connected behind the filter i, K i Is the sum of the absolute values of the filters i.
Preferably, based on a set pruning threshold, pruning the channel corresponding to the pruning evaluation index being lower than the pruning threshold includes:
carrying out ascending arrangement on the pruning evaluation indexes;
meanwhile, the pruning proportion is set to be 50%, and a pruning threshold value is set from the position of 50% from small to large;
and if all branch evaluation indexes are smaller than the pruning threshold, reserving the maximum two branch evaluation indexes.
Preferably, the step of training knowledge distillation under the supervision of the teacher model by taking the pruned model as a student model and the original Yolov5s model as a teacher model to obtain the final model comprises the following steps:
the loss function of the student model comprises a regression loss function, a classification loss function and a confidence coefficient loss function;
the confidence loss function formula is:
Figure BDA0003775944200000031
the classification loss function formula is:
Figure BDA0003775944200000032
the regression loss function formula is as follows:
Figure BDA0003775944200000033
the loss function of the student model is:
Figure BDA0003775944200000034
in the formula:
Figure BDA0003775944200000035
a confidence label, a category label and a prediction box position label of the student model respectively,
Figure BDA0003775944200000036
respectively predicting the confidence score, the category score and the prediction frame coordinate output by the student model; lambda [ alpha ] D Is a balance parameter;
Figure BDA0003775944200000037
and predicting the confidence of the teacher model.
Preferably, processing the image data set comprises performing a normalization operation on each image, mapping all pixel values to a range of 0 to 1.
The invention also provides an image target detection device for the unmanned aerial vehicle, which comprises:
the acquisition module is used for acquiring an unmanned aerial vehicle aerial image data set and processing the image data set;
the construction module is used for constructing an unmanned aerial vehicle image target detection model;
the training module is used for training the image data set on the target detection model to obtain a final model;
wherein, the constructing of the unmanned aerial vehicle image target detection model comprises the following steps:
replacing a backbone network of the Yolov5s model with a MobileNet V3_ Small network;
cutting the original MobileNet V3_ Small network, and removing the last 4 layers of the original design for a classification task, wherein the last 4 layers comprise 3 convolution layers and 1 pooling layer;
and taking the spatial pyramid pooling structure SPPF as the last layer of the MobileNet V3_ Small network.
The invention also provides an electronic device comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program which, when executed by the processing unit, causes the processing unit to perform the above-mentioned method.
The invention also provides a storage medium storing a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the above-mentioned method.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the lightweight MobileNet V3_ Small network is constructed and trained, so that the model not only ensures the speed, but also ensures the precision.
Drawings
FIG. 1 is a schematic view of a flow structure of an unmanned aerial vehicle-oriented image target detection method according to the present invention;
FIG. 2 is a schematic diagram of an image target detection method for an unmanned aerial vehicle according to the present invention, wherein the structure of the image target detection method is provided with an SPPF;
fig. 3 is a network structure diagram of a target detection model in the unmanned aerial vehicle-oriented image target detection method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention is described in further detail below with reference to the attached drawing figures:
referring to fig. 1, the invention discloses an unmanned aerial vehicle-oriented image target detection method, which comprises the following steps:
acquiring an unmanned aerial vehicle aerial image data set, and processing the image data set;
in this embodiment, the image data set of the VisDrone unmanned aerial vehicle aerial photography is used, so that the input image can be trained and predicted normally, the image is preprocessed, the image is normalized, all pixel values are mapped to be in the range of 0 to 1, the size of the input image is limited to 640 × 640, and the input image is conveniently input into the model.
Constructing an unmanned aerial vehicle image target detection model;
specifically, referring to fig. 2, constructing the unmanned aerial vehicle image target detection model includes:
replacing a backbone network of the Yolov5s model with a MobileNet V3_ Small network;
cutting an original MobileNet V3_ Small network, and removing the last 4 layers of the original design for a classification task, wherein the last 4 layers comprise 3 convolution layers and 1 pooling layer;
and taking the spatial pyramid pooling structure SPPF as the last layer of the MobileNet V3_ Small network.
Referring to fig. 3, the mobilenetv3 \/inversed residual module, which is the main component of the backbone network MobileNetV3_ Small, includes a 1 × 1 convolution operation, a BN layer, a Relu activation function, a 3 × 3 depth separable convolution, and an SE attention mechanism. Specifically, the 13 th layer of 1 × 1 convolutional layer, the 14 th layer of 7 × 7 global average Pooling layer, and the 15, 16 layers of 1 × 1 convolution are deleted, and instead, there is a modified Spatial Pyramid Pooling Structure (SPPF) as the last layer of the backbone network.
Training the target detection model by using the image data set to obtain a final model;
in this embodiment, training the image dataset on the target detection model includes:
performing sparse training on a target detection model by using L1 regularization;
specifically, the method for obtaining the pruning channel according to the BN layer has the formula:
Figure BDA0003775944200000051
sparse training is carried out on the scale parameters of the BN layer, the numerical value of the BN layer is enabled to be continuously close to 0, and the formula is as follows:
Figure BDA0003775944200000052
in the formula: the gamma scale parameter and the beta deviation participate in the back propagation of the detection network at the same time, and are trainable parameters. Sparse training is carried out on the scale parameter gamma of the BN layer, namely, L1 regular constraint is added, and the numerical value is enabled to be close to 0 continuously; z is a linear or branched member in 、Z out Respectively input features and output featuresPerforming identification; mu.s c And σ c Respectively the mean value and the variance of the current batch; the parameter epsilon is to prevent the denominator from being 0.
The formula of the loss function of the target detection model is as follows:
Figure BDA0003775944200000061
in the formula: z is a linear or branched member in 、Z out Input features and output features, respectively; gamma is a scale parameter; beta is a deviation; mu.s c And σ c Respectively the mean value and the variance of the current batch; epsilon is a parameter;
Figure BDA0003775944200000062
is the original loss function; lambda sigma λ∈τ g (gamma) is a penalty item of the lamination weight parameter and the BN layer scale parameter; λ is a regularization coefficient; x and W are input image characteristics and convolution layer weight parameters respectively; g (. Gamma.) is L1.
Figure BDA0003775944200000063
The first term of the formula is an original loss function, the second term is a penalty term of a convolutional layer weight parameter and a BN layer scale parameter, lambda is a regularization coefficient, namely a sparsity rate, the larger the constraint strength is, x and W respectively represent an input image characteristic and the convolutional layer weight parameter, and g (gamma) represents L1, namely g (gamma) = | gamma |. In the sparse training process, a mode of simultaneously constraining the convolution kernel weight parameters and the BN layer scale parameters is adopted, the smaller the convolution kernel weight is, the lower the information importance degree is, and the smaller the BN layer scale parameters are, the smaller the constraint strength is. After the sparse training is completed, when the size of the scale parameter γ approaches 0, regardless of the size of the convolution output feature of the previous layer, the feature of the channel is multiplied by the scale parameter whose corresponding value is small enough, and then the output of the channel is also small enough, so that the corresponding relationship between the input feature channel and the output feature channel is cut off. The method is improved on the basis of the scale parameters, the parameters of a BN layer are taken as the standard for judging the importance of a pruning channel, meanwhile, the convolution parameters of the filter are involved in the judgment standard, the operation is specifically adopted, the penalty of the weight parameters is added into a loss function, and a plurality of weights close to 0 appear in the filter after sparse training.
After sparse training, calculating to obtain pruning evaluation indexes according to the sum of the scale parameters of the BN layer and the absolute value of the filter;
specifically, for the sum of absolute values of the same filter, the magnitude of the sum may represent the importance of the convolution kernel, that is, the formula of the sum of absolute values of the filter is:
Figure BDA0003775944200000064
in the formula: k is m Is the sum of the absolute values of the weighting parameters for filter m; i W i I is the absolute value of the weight of the ith convolution kernel in the convolution kernel m; j is the total number of convolution kernels in the current filter m;
combining the sum of the scale parameter gamma after sparse training and the absolute value of the filter, and fusing the judgment information of the filter and the BN layer to calculate a final pruning evaluation index s, wherein the calculation formula is as follows:
s i =γ×K i
in the formula, s i Is the pruning standard score of the filter i, gamma is the scale parameter of the BN layer connected behind the filter i, K i Is the sum of the absolute values of the filters i.
Based on a set pruning threshold, pruning channels corresponding to pruning evaluation indexes lower than the pruning threshold;
specifically, after the operation is completed, a filter list set S for judging pruning standards is obtained, scores in the set are sorted in an ascending order, the pruning proportion is set to be 50%, and a pruning threshold is set from a position with a small value to a position with a large value of 50%;
if all branch evaluation indexes are smaller than a pruning threshold, the maximum two branch evaluation indexes are reserved; and pruning and removing the filters which are lower than the threshold value and the corresponding BN layer scale parameter gamma, and simultaneously reducing the residual filters after pruning and the feature maps corresponding to the scale parameters. The convolution kernels corresponding to the channels are cut off, the calculated amount of the convolution kernels is reduced while the parameters of the network are reduced through channel pruning, and a lightweight network model with less calculated amount and less memory occupation is obtained. According to the unmanned aerial vehicle image target detection method, pruning is continued on the basis of replacing a MobileNet lightweight trunk network, a lighter unmanned aerial vehicle image target detection model is obtained, and specific experimental data after pruning are shown in an experimental detail part.
θ=sort R (S);
In the formula: theta is the final threshold size for pruning operation, and sort _ R (S) is pruning deletion for the result before sorting the filter list set in ascending order by the threshold R.
And (4) taking the cut model as a student model, taking the original Yolov5s model as a teacher model, and carrying out knowledge distillation training under the supervision of the teacher model to obtain a final model.
Specifically, the method comprises the following steps of taking the cut model as a student model, taking an original Yolov5s model as a teacher model, and carrying out knowledge distillation training under the supervision of the teacher model to obtain a final model, wherein the step of obtaining the final model comprises the following steps:
the loss function of the student model comprises a regression loss function, a classification loss function and a confidence coefficient loss function;
the confidence loss function is formulated as:
Figure BDA0003775944200000071
in the formula:
Figure BDA0003775944200000072
respectively representing confidence label, student model confidence prediction, teacher model confidence prediction, lambda D Used for balancing two-part loss, the first term in the formula represents the original confidence coefficient loss function of the student model, the second term represents the loss function of knowledge distillation, and the confidence coefficientDegree label
Figure BDA0003775944200000073
Replacement with prediction output of teacher model
Figure BDA0003775944200000074
Thereby realizing the transfer of knowledge to the student model;
the classification loss function is formulated as:
Figure BDA0003775944200000081
namely, the classification loss of the student model with the same design also consists of two parts, and the difference is that the second part of the teacher model is added with confidence measure parameters
Figure BDA0003775944200000082
The probability that each anchor frame contains the detected target is expressed, and the method has the significance that if one anchor frame is the background, the confidence coefficient value is small, the loss of the whole second teacher model is invalid, and the student model is prevented from learning unimportant background information, so that the convergence is accelerated.
Similarly, the regression loss function formula is:
Figure BDA0003775944200000083
and finally, calculating final loss on the last layer of convolution output characteristic graph by using a single-stage detection algorithm, wherein the student model loss function comprises a target detection loss function and a distillation loss function, namely the loss function of the student model is finally:
Figure BDA0003775944200000084
in the formula:
Figure BDA0003775944200000085
respectively a confidence label, a category label and a prediction box position label of the student model,
Figure BDA0003775944200000086
respectively predicting the confidence score, the category score and the prediction frame coordinate output by the student model; lambda [ alpha ] D Is a balance parameter;
Figure BDA0003775944200000087
and predicting the confidence of the teacher model.
Further, the student model is used as a single-stage target detection network, which is different from a double-stage detection model based on the candidate region RCNN series, and Yolo outputs detection frame coordinates, category scores and confidence score thereof at the last layer of the network. Assuming that the form of the final layer of prediction output feature matrix is M × (C + 5), M represents the number of anchor frames generated by each cell in the prediction feature map, C represents the number of preset classification categories, and the meaning of numeral 5 is 4 position coordinates representing detection frames and 1 confidence representing that the current anchor frame contains a target. In the candidate region-based two-stage detection model, the knowledge distillation loss function is applied in a manner that features output by the last convolutional layer of the teacher model are directly transmitted to the student model, but no candidate region exists in the single-stage detection algorithm, the features are directly predicted on the last output feature map, three anchors are generated at each position on the feature map, for example, the size of the feature map output finally in the final prediction branch is 40 × 40, the number of generated anchor frames is 4800, a large number of anchor frames do not have targets and only contain background regions, and if a large number of background regions are transmitted to the student model, the network is caused to continuously optimize and learn coordinates of the background regions and classify and optimize the background regions, so that the convergence of the student model is difficult. The number of candidate region detection boxes of the two-stage detection model is small, and most of the candidate region detection boxes contain the target to be detected, so that a method based on confidence measure knowledge distillation is designed in a single-stage detection algorithm. The single-stage final prediction result comprises confidence degree prediction, so a method for constraining knowledge distillation loss by using confidence degree scores of prediction output is provided, namely, the final loss function of the student model is only contributed when the confidence degree of a prediction output target of the teacher model is higher.
The single-stage detection network completes the knowledge distillation process, and besides adding distillation loss to the loss function, the method also has a step of very important operation, namely performing Non-Maximum inhibition operation (NMS) on a characteristic diagram. The feature diagram finally generated in the training process of the single-stage detection algorithm comprises a plurality of cells, one cell generates three anchor frames by default, the plurality of anchor frames can predict the same detection object in the reasoning process, and after the end-to-end prediction is completed, the detection frames which are highly overlapped in the detection result output by the last layer of convolutional layer are filtered out through NMS operation. However, in the knowledge distillation training process, the teacher model prediction process also generates excessively overlapped detection frames, and excessive redundant information is transmitted to the student models, so that the student models are over-fitted. Therefore, using a feature map based NMS in the teacher model inference process, e.g., the last level output contains n × n cells, if the same class is predicted between adjacent cells, they have a greater probability of detecting the same goal, so the adjacent cell output class is the same and its corresponding distillation loss is set to 0, and the size of the adjacent cell is set to 3 × 3 during the experiment. And finally, adding a distillation loss function in the student model and adding NMS operation based on the characteristic diagram in the prediction output of the teacher model to complete the training process of the teacher model for guiding the student model.
Training: and storing the optimal model parameters, and loading the stored model parameters in prediction.
The experimental environment was based on the ubuntu16.04 operating system, using Python and Pytorch versions 3.8 and 1.7, respectively. The whole experiment is improved and tested on the basis of the yolov5 detection framework, and experiment comparison and analysis are respectively carried out according to the method.
Table 1 influence of lightweight backbone network MobileNetV3 on algorithm performance
TABLE 1
Figure BDA0003775944200000091
Table 2 channel pruning lightweight experimental results based on convolutional layer weights and BN layer sparse training
TABLE 2
Figure BDA0003775944200000101
A complex model YooloV 5x is used in the experiment to guide the training of knowledge distillation of the pruning lightweight model, and the experimental data are shown in Table 3. And (3) adopting a distillation strategy based on an output response diagram, applying a knowledge distillation loss function based on confidence coefficient scale parameters for training, and distilling the knowledge learned by the YoloV5x to the models after Mobile-yolos and pruning. Compared with the data in Table 2, the amount and the calculated amount of Distillation-mobile-yolo5s after Distillation are not increased, and mAP50 is increased by 1.7; knowledge distillation promotion is most obvious when a Mobile-yolos-30% pruning model is carried out, the mAP is promoted by 3.9 compared with the model before knowledge distillation is not carried out, and the FPS reaches 16; distilling the mixture in a Mobile-yolos-50% model to obtain Distillation-Mobile-yolos-50%, wherein the reasoning speed reaches 21FPS. The experimental data shows that the teacher model is used for guiding and training the student model, and the generalization capability of the student model is greatly improved.
TABLE 3 knowledge distillation experimental data based on confidence measure parameters
TABLE 3
Figure BDA0003775944200000102
Through experimental result analysis, the method is found that by using a light-weight backbone network MobileNet V3 and adopting pruning light-weight based on convolutional layer weight parameters and BN layer sparse training, the model parameters and the calculated amount are greatly reduced, and the loss of detection precision is accompanied. And then, using knowledge distillation, and guiding the training of the small model by the large model so that the precision of the small model is close to the detection precision of the large model.
The invention also provides an unmanned aerial vehicle-oriented image target detection device, which comprises:
the acquisition module is used for acquiring an unmanned aerial vehicle aerial image data set and processing the image data set;
the construction module is used for constructing an unmanned aerial vehicle image target detection model;
the training module is used for training the image data set to a target detection model and training to obtain a final model;
wherein, the unmanned aerial vehicle image target detection model of structure includes:
replacing a backbone network of the Yolov5s model with a MobileNet V3_ Small network;
cutting an original MobileNet V3_ Small network, and removing the last 4 layers of the original design for a classification task, wherein the last 4 layers comprise 3 convolution layers and 1 pooling layer;
and taking the spatial pyramid pooling structure SPPF as the last layer of the MobileNet V3_ Small network.
The invention also provides an electronic device comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program which, when executed by the processing unit, causes the processing unit to perform the above method.
The present invention also provides a storage medium storing a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the above-mentioned method.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An unmanned aerial vehicle-oriented image target detection method is characterized by comprising the following steps:
acquiring an unmanned aerial vehicle aerial image data set, and processing the image data set;
constructing an unmanned aerial vehicle image target detection model;
training the target detection model by using the image data set to obtain a final model;
wherein, the constructing of the unmanned aerial vehicle image target detection model comprises the following steps:
replacing a backbone network of the Yolov5s model with a MobileNet V3_ Small network;
cutting the original MobileNet V3_ Small network, and removing the last 4 layers of the original design for a classification task, wherein the last 4 layers comprise 3 convolution layers and 1 pooling layer;
and taking the spatial pyramid pooling structure SPPF as the last layer of the MobileNet V3_ Small network.
2. The drone-oriented image target detection method of claim 2, wherein training the image dataset to the target detection model comprises:
performing sparse training on the target detection model by using L1 regularization;
after sparse training, calculating to obtain a pruning evaluation index according to the sum of the scale parameter of the BN layer and the absolute value of the filter;
based on a set pruning threshold, pruning channels corresponding to the pruning evaluation indexes lower than the pruning threshold;
and taking the cut model as a student model, taking the original Yolov5s model as a teacher model, and carrying out knowledge distillation training under the supervision of the teacher model to obtain the final model.
3. The unmanned aerial vehicle-oriented image target detection method of claim 2, wherein performing sparse training on the target detection model using L1 regularization comprises:
the method for obtaining the pruning channel according to the BN layer comprises the following formula:
Figure FDA0003775944190000011
sparse training is carried out on the scale parameters of the BN layer, the numerical value of the BN layer is enabled to be continuously close to 0, and the formula is as follows:
Figure FDA0003775944190000012
the formula of the loss function of the target detection model is as follows:
Figure FDA0003775944190000013
in the formula: z is a linear or branched member in 、Z out Input features and output features, respectively; gamma is a scale parameter; beta is a deviation; mu.s c And σ c Respectively the mean value and the variance of the current batch; epsilon is a parameter;
Figure FDA0003775944190000021
is the original loss function; lambda sigma λ∈τ g (gamma) is a penalty item of the lamination weight parameter and the BN layer scale parameter; λ is a regularization coefficient; x and W are input image characteristics and convolution layer weight parameters respectively; g (. Gamma.) is L1.
4. The unmanned aerial vehicle-oriented image target detection method of claim 3, wherein after sparse training, obtaining pruning evaluation indexes by calculation according to the sum of the scale parameter of the BN layer and the absolute value of the filter comprises:
the sum of absolute values of the filters is as follows:
Figure FDA0003775944190000022
in the formula: k m Is the sum of the absolute values of the weighting parameters for the m filters; i W i I is the absolute value of the weight of the ith convolution kernel in the convolution kernel mFor the value; j is the total number of convolution kernels in the current filter m;
the calculation formula of the pruning evaluation index is as follows:
s i =γ×K i
in the formula: s is i Is the pruning standard score of the filter i, gamma is the scale parameter of the BN layer connected behind the filter i, K i Is the sum of the absolute values of the filters i.
5. The unmanned aerial vehicle-oriented image target detection method of claim 4, wherein based on a set pruning threshold, pruning channels for which the pruning evaluation index is lower than the pruning threshold comprises:
carrying out ascending arrangement on the pruning evaluation indexes;
meanwhile, setting the pruning proportion to be 50%, and setting a pruning threshold value at the position from the small position to the large position of 50%;
and if all branch evaluation indexes are smaller than the pruning threshold, reserving the maximum two branch evaluation indexes.
6. The unmanned aerial vehicle-oriented image target detection method as claimed in claim 5, wherein taking the pruned model as a student model, taking the original Yolov5s model as a teacher model, and performing knowledge distillation training under the supervision of the teacher model to obtain the final model comprises:
the loss function of the student model comprises a regression loss function, a classification loss function and a confidence coefficient loss function;
the confidence loss function formula is:
Figure FDA0003775944190000023
the classification loss function formula is:
Figure FDA0003775944190000031
the regression loss function formula is as follows:
Figure FDA0003775944190000032
the loss function of the student model is:
Figure FDA0003775944190000033
in the formula:
Figure FDA0003775944190000034
a confidence label, a category label and a prediction box position label of the student model respectively,
Figure FDA0003775944190000035
respectively predicting the confidence score, the category score and the prediction frame coordinate output by the student model; lambda [ alpha ] D Is a balance parameter;
Figure FDA0003775944190000036
and predicting the confidence of the teacher model.
7. An unmanned aerial vehicle-oriented image target detection method as claimed in claim 1, wherein processing the image data set comprises performing a normalization operation on each image to map all pixel values to a range of 0 to 1.
8. The utility model provides a towards unmanned aerial vehicle image target detection device which characterized in that includes:
the acquisition module is used for acquiring an unmanned aerial vehicle aerial image data set and processing the image data set;
the construction module is used for constructing an unmanned aerial vehicle image target detection model;
the training module is used for training the image data set to the target detection model to obtain a final model;
wherein, the constructing of the unmanned aerial vehicle image target detection model comprises the following steps:
replacing a backbone network of the Yolov5s model with a MobileNet V3_ Small network;
cutting the original MobileNet V3_ Small network, and removing the last 4 layers of the original design for a classification task, wherein the last 4 layers comprise 3 convolution layers and 1 pooling layer;
and taking the spatial pyramid pooling structure SPPF as the last layer of the MobileNet V3_ Small network.
9. An electronic device, comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program that, when executed by the processing unit, causes the processing unit to perform the method of any one of claims 1 to 7.
10. A storage medium storing a computer program executable by an electronic device, the program causing the electronic device to perform the method of any one of claims 1 to 7 when the program is run on the electronic device.
CN202210917031.8A 2022-08-01 2022-08-01 Unmanned aerial vehicle-oriented image target detection method and device, electronic equipment and storage medium Pending CN115272894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210917031.8A CN115272894A (en) 2022-08-01 2022-08-01 Unmanned aerial vehicle-oriented image target detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210917031.8A CN115272894A (en) 2022-08-01 2022-08-01 Unmanned aerial vehicle-oriented image target detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115272894A true CN115272894A (en) 2022-11-01

Family

ID=83746988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210917031.8A Pending CN115272894A (en) 2022-08-01 2022-08-01 Unmanned aerial vehicle-oriented image target detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115272894A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229301A (en) * 2023-05-09 2023-06-06 南京瀚海伏羲防务科技有限公司 Lightweight unmanned aerial vehicle obstacle detection model, detection method and detection system
CN116721420A (en) * 2023-08-10 2023-09-08 南昌工程学院 Semantic segmentation model construction method and system for ultraviolet image of electrical equipment
CN117456170A (en) * 2023-12-22 2024-01-26 苏州镁伽科技有限公司 Target detection method and device, electronic equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229301A (en) * 2023-05-09 2023-06-06 南京瀚海伏羲防务科技有限公司 Lightweight unmanned aerial vehicle obstacle detection model, detection method and detection system
CN116229301B (en) * 2023-05-09 2023-10-27 南京瀚海伏羲防务科技有限公司 Lightweight unmanned aerial vehicle obstacle detection model, detection method and detection system
CN116721420A (en) * 2023-08-10 2023-09-08 南昌工程学院 Semantic segmentation model construction method and system for ultraviolet image of electrical equipment
CN116721420B (en) * 2023-08-10 2023-10-20 南昌工程学院 Semantic segmentation model construction method and system for ultraviolet image of electrical equipment
CN117456170A (en) * 2023-12-22 2024-01-26 苏州镁伽科技有限公司 Target detection method and device, electronic equipment and storage medium
CN117456170B (en) * 2023-12-22 2024-03-19 苏州镁伽科技有限公司 Target detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110598029B (en) Fine-grained image classification method based on attention transfer mechanism
CN115272894A (en) Unmanned aerial vehicle-oriented image target detection method and device, electronic equipment and storage medium
CN112150821B (en) Lightweight vehicle detection model construction method, system and device
CN110929577A (en) Improved target identification method based on YOLOv3 lightweight framework
CN112464911A (en) Improved YOLOv 3-tiny-based traffic sign detection and identification method
CN112541532B (en) Target detection method based on dense connection structure
CN110428413B (en) Spodoptera frugiperda imago image detection method used under lamp-induced device
CN115331172A (en) Workshop dangerous behavior recognition alarm method and system based on monitoring video
CN116187398B (en) Method and equipment for constructing lightweight neural network for unmanned aerial vehicle ocean image detection
CN116110022B (en) Lightweight traffic sign detection method and system based on response knowledge distillation
CN112819024B (en) Model processing method, user data processing method and device and computer equipment
CN113592825A (en) YOLO algorithm-based real-time coal gangue detection method
CN113436174A (en) Construction method and application of human face quality evaluation model
CN116385879A (en) Semi-supervised sea surface target detection method, system, equipment and storage medium
CN115393690A (en) Light neural network air-to-ground observation multi-target identification method
CN115346135A (en) Optical remote sensing image ship target identification method based on convolutional neural network
CN113420651A (en) Lightweight method and system of deep convolutional neural network and target detection method
CN117034090A (en) Model parameter adjustment and model application methods, devices, equipment and media
CN111178370B (en) Vehicle searching method and related device
CN113627240B (en) Unmanned aerial vehicle tree species identification method based on improved SSD learning model
CN115205573A (en) Image processing method, device and equipment
CN113947723A (en) High-resolution remote sensing scene target detection method based on size balance FCOS
CN113139612A (en) Image classification method, training method of classification network and related products
CN116861261B (en) Training method, deployment method, system, medium and equipment for automatic driving model
CN110728292A (en) Self-adaptive feature selection algorithm under multi-task joint optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination