CN116363610A

CN116363610A - Improved YOLOv 5-based aerial vehicle rotating target detection method

Info

Publication number: CN116363610A
Application number: CN202310347944.5A
Authority: CN
Inventors: 李永军; 李耀; 张心茹; 张大蔚; 李博; 罗金成; 李超越; 陈锦智敏
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-06-30

Abstract

The invention relates to the technical field of rotating vehicle target detection, in particular to an aerial vehicle rotating target detection method based on improved YOLOv5, which comprises the following steps: constructing an aerial vehicle rotation target detection model based on improved YOLOv 5; constructing an aerial vehicle detection data set; training an aerial vehicle rotation target detection network by utilizing a bounding box loss function, a confidence loss function, a category loss function and an angle classification loss function; detecting the trained model, evaluating the model according to the detection result, and judging whether the actual requirement is met; if the actual requirements are met, the trained model is used for vehicle detection under the condition of aerial photography; if the actual demand is not met, the model is corrected and then training is continued until the actual demand can be met. The final model has good detection effect on the aerial vehicle rotating target, and effectively solves the problems of poor detection effect and large calculated amount on the vehicle rotating target under the aerial condition.

Description

Improved YOLOv 5-based aerial vehicle rotating target detection method

Technical Field

The invention relates to the technical field of rotating vehicle target detection, in particular to an aerial vehicle rotating target detection method based on improved YOLOv 5.

Background

With the continuous development of technology in recent years, the field of artificial intelligence is continuously high in heat, good effects are achieved in application in a plurality of fields, intelligent recognition and detection technologies of images are also continuously developed, and rapid development of vehicle target detection technologies is achieved. In addition, along with the development of modern economic society, the living standard of people is continuously improved, the number of automobiles is continuously increased, and the object detection of the vehicles is the basis for vehicle identification and vehicle tracking. The traditional target detection algorithm has low detection efficiency, poor result and higher resource consumption, and can not accurately detect the target for the vehicle under the complex condition. With the development of machine learning and GPU parallelism in recent years, numerous object detection algorithms have emerged. The YOLO series is one of the most widely used target detection algorithms at present. And the YOLO has good performance advantages through continuous iteration of the version and continuous optimization of the algorithm, but the YOLO algorithm has a certain distance for realizing accurate detection of aerial vehicle images.

A rotational target detection method based on YOLOv5 is used for target detection, a coordinate offset regression algorithm is selected, and an attribute-Net module is added behind a backlight of YOLOv5 to perform feature extraction, so that noise in a feature map is reduced. And an offset loss function and an attribute-Net module loss function are added during training, a predicted target frame rotation target is processed, a horizontal frame loss function is changed into a rotation frame loss function, and matrix operation is used, so that the calculation efficiency is improved. The final model obtains good detection on the detection of the rotating target in the aerial image, and effectively solves the problem of detecting the rotating small target. However, the rotation of the target is not simple, and the rotation algorithm selected by the invention has a certain improvement space and a certain gap from practical application.

The prior art proposes a vehicle target detection algorithm based on improved YOLOv5, and a disposable (OSA) aggregation module is introduced under a YOLOv5s network model frame to optimize a main network, so that the characteristic extraction capability of the network is enhanced; and a non-local attention mechanism is adopted for characteristic enhancement; the screening test frame is also performed using a non-maximum suppression method. The method optimizes the backbone network structure, introduces a disposable aggregation module and a non-local attention mechanism, enhances the detection speed compared with the original YOLOv5s model, improves the average accuracy, but only detects the vehicle target under the complex condition by using a horizontal frame and can not detect the vehicle target by using a rotating frame, has poor effect on detecting the aerial vehicle rotating target and has a certain improvement space.

Disclosure of Invention

The summary of the invention is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. The summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In order to solve the technical problem of poor effect on detecting the aerial vehicle rotating target, the invention provides an aerial vehicle rotating target detection method based on improved YOLOv 5.

The invention provides an aerial vehicle rotating target detection method based on improved YOLOv5, which comprises the following steps:

constructing an aerial vehicle rotation target detection model based on improved YOLOv 5;

establishing an aerial vehicle rotating target detection data set and preprocessing to obtain an aerial vehicle rotating target detection model data set based on improved YOLOv 5;

training the constructed aerial vehicle rotating target detection model based on the improved YOLOv5 by adopting a dataset of the aerial vehicle rotating target detection model based on the improved YOLOv 5;

testing an aerial vehicle rotation target detection model based on improved YOLOv5 after training is completed, and obtaining a test result;

according to the test result, evaluating whether the performance of the trained aerial vehicle rotation target detection model based on the improved YOLOv5 meets the actual application requirement;

if the performance of the trained aerial vehicle rotating target detection model based on the improved YOLOv5 meets the actual application requirements, using the trained aerial vehicle rotating target detection model based on the improved YOLOv5 for aerial vehicle rotating target detection;

if the performance of the aerial vehicle rotating target detection model based on the improved YOLOv5 after training does not meet the actual application requirement, correcting parameters of the aerial vehicle rotating target detection model based on the improved YOLOv5, repeating training on the aerial vehicle rotating target detection model based on the improved YOLOv5 until the performance of the aerial vehicle rotating target detection model based on the improved YOLOv5 meets the actual application requirement, and using the trained aerial vehicle rotating target detection model based on the improved YOLOv5 for detecting the aerial vehicle rotating target.

Further, the constructing an aerial vehicle rotation target detection model based on improved YOLOv5 comprises:

constructing a Focus module, wherein the Focus module performs slicing operation on an input image;

a backup shallow network design Diamond Mapping Unit module based on an improved YOLOv5 aerial vehicle rotation target detection model, wherein the Diamond Mapping Unit module is used for improving a feature extraction process of the shallow network;

constructing Cheap Backbone Network and Cheap Neck Network by applying a Ghost mapping technology on a Backbone network and a Neck Neck network based on an aerial vehicle rotation target detection model of improved YOLOv5, wherein a Cheap CSP module is a key component;

introducing an adaptive space feature fusion ASFF mechanism into a Head network based on an improved YOLOv5 aerial vehicle rotation target detection model to form an ASFF-Head network, wherein the ASFF mechanism carries out adaptive space fusion on feature graphs from Cheap Backbone Network and Cheap Neck Network to regenerate three feature graphs with different scales;

and adding an angle class prediction branch into an ASFF-Head network based on an aerial vehicle rotation target detection model of improved YOLOv5, and calculating bounding box loss, confidence coefficient loss, class loss and angle class loss for the obtained feature graphs with different scales so as to complete prediction.

Further, the establishing and preprocessing an aerial vehicle rotating target detection data set to obtain an aerial vehicle rotating target detection model data set based on improved YOLOv5 comprises the following steps:

shooting urban roads under different weather conditions by adopting an unmanned aerial vehicle platform, preprocessing the acquired vehicle images by adopting a turnover and cutting method, marking the preprocessed vehicle images by utilizing a rotary marking tool ropylelmg, and manufacturing a tag file, wherein the urban roads are shot under different weather conditions by adopting the unmanned aerial vehicle platform, and the obtained data set is an aerial vehicle rotary target detection data set;

and randomly selecting a vehicle image with a preset duty ratio and a corresponding tag file in the preprocessed vehicle image as a training set, and taking the rest vehicle image and the corresponding tag file in the preprocessed vehicle image as a test set, wherein the training set and the test set form a data set of an aerial vehicle rotating target detection model based on improved YOLOv 5.

Further, the training of the constructed improved YOLOv 5-based aerial vehicle rotating target detection model by adopting the dataset of the improved YOLOv 5-based aerial vehicle rotating target detection model comprises the following steps:

initializing training parameters of an aerial vehicle rotating target detection model based on improved YOLOv 5;

inputting the training set into a constructed aerial vehicle rotating target detection model based on improved YOLOv5, and performing iterative training on the rotating vehicle target detection model by using a bounding box loss function, a confidence loss function, a category loss function and an angle classification loss function to obtain the trained aerial vehicle rotating target detection model based on improved YOLOv 5.

Further, the test training is completed based on an improved YOLOv5 aerial vehicle rotation target detection model, and a test result is obtained, which comprises the following steps:

and inputting the test set into a trained aerial vehicle rotation target detection model based on improved YOLOv5 for testing, and obtaining a test result.

Further, according to the test result, evaluating whether the performance of the trained aerial vehicle rotation target detection model based on the improved YOLOv5 meets the actual application requirement or not includes:

according to the test result, the trained aerial vehicle rotation target detection model based on the improved YOLOv5 is evaluated by using an accuracy (Precision), a Recall (Recall) and an average accuracy mAP (mean Average Precision).

Further, the Diamond Mapping Unit module performs feature extraction through double branches, wherein the first branch is composed of a max pooling operation and a convolution operation of 1*1; the second branch consists of the convolution of 1*1 and the convolution of 3*3.

Further, the Cheap CSP module is composed of three rolled-up and one ghostbottleck module, which is stacked by two Ghost modules.

Further, the bounding box loss function CIoU is:

wherein IoU is the intersection ratio of the predicted frame and the real frame, b represents the parameter of the prediction center, b ^gt Parameters representing the center of a real target bounding box ρ ² Is the square of the distance between two central points, alpha and v are aspect ratios, w and h represent the width and height of the prediction frame, w ^gt And h ^gt Respectively representing the width and the height of a real frame, pi is the radian of 180 degrees, arctan is an arctan function;

confidence loss function I _obj The method comprises the following steps:

wherein S is ² Represents the number of grids, B represents the number of anchors in each grid,

indicating whether or not there is a target in the jth anchor in the ith grid, and if there is a target, the +.>

1, otherwise 0, < >>

Representing the true value, C _i Representing predicted value lambda _noobj Representing constraint coefficients;

class loss function l _cls The method comprises the following steps:

wherein S is ² The number of meshes is represented and,

1, otherwise 0, < >>

Representing true class probability, P _i (c) Representing a predicted category probability;

the angle classification loss function loss (z, y) is:

loss(z,y)＝mean{l ₀ ,L,l _N-1 },

l _n ＝sum{l _n,0 ，l _n,1 L l _n,179 },

l _n,a ＝-[y _n,a ln(δ(z _n,a ))+(1-y _n,a )ln(1-δ(z _n,a ))],

y _n,a ＝CSL(x)

where g (x) is the window function, r is the radius of the window function, θ is the angle of the bounding box;n is the number of samples; a epsilon [0,180 ] for a total of 180 categories; delta is a sigmoid function; z _n,a To predict the probability of the nth sample at the nth angle, z _n,a Is 1, is a predicted value; y is _n,a The label for the nth sample at the a-th angle under CSL (x) expression is the true value, y _n,a Is 1; according to the predicted value and the true value of each angle of the nth sample, sequentially passing through l _n,a Calculating, and summing 180 angle results of the nth sample to obtain l _n The calculation results of the N samples are averaged to obtain the angle loss at this time.

Further, the ASFF mechanism performs adaptive spatial fusion on the feature graphs from Cheap Backbone Network and Cheap Neck Network, and regenerates the formulas corresponding to the feature graphs with three different scales as follows:

ASFF _k ＝P ^1→k *α ^k +P ^2→k *β ^k +P ^3→k *γ ^k

α ^k +β ^k +γ ^k ＝1

wherein k=1, 2,3, asff _k Is the feature map output after the k-th layer fusion, alpha ^k ,β ^k ,γ ^k Weight matrix representing the kth layer, alpha ^k +β ^k +γ ^k =1 is a constraint; p (P) ^1→k ,P ^2→k ,P ^3→k Representing the feature map obtained by adjusting the size and the channel number of the other two layers based on the k-th layer.

The invention aims to provide an aerial vehicle rotation target detection method based on improved YOLOv5, which has higher reliability and practicability and aims to improve the accuracy of detecting the aerial vehicle rotation target by the original YOLOv5 network. The technical idea provided by the invention is as follows: and constructing an aerial vehicle rotation target detection deep learning model based on improved YOLOv5, establishing a data set, and preprocessing the data set. And evaluating and correcting the model through training and detection, and finally detecting the aerial vehicle by using a trained aerial vehicle rotating target detection network.

The invention has the following beneficial effects:

firstly, angle classification loss is added, the CSL technology is utilized to convert the angle regression problem into the angle classification problem, and the bounding box loss function, the confidence coefficient loss function, the class loss function and the angle classification loss function are utilized to train the model network.

Secondly, firstly, a Diamond Mapping Unit module is designed on a shallow network of a backstone, compared with the original shallow network design, the method and the device of the invention furthest reduce the information loss of the original input image, improve the characteristic extraction process of the shallow network, effectively represent the target characteristics and improve the vehicle detection performance.

Thirdly, introducing an adaptive spatial feature fusion ASFF mechanism into the Head network to construct an ASFF-Head network, wherein the mechanism learns the relation between feature graphs of different scales, and compared with the feature graphs directly generated by the original Head network, the ASSF mechanism ensures that each spatial position of the feature graphs can adaptively fuse the feature information of different levels.

Fourth, a network structure with a light weight effect is constructed by utilizing a Ghost mapping technology in a Backbone network and a Neck Neck network, and compared with a feature map generated by directly using Conv, the invention utilizes the Ghost mapping technology to generate a complete similar feature map from part of the feature map, thereby compressing the quantity of network parameters and optimizing the gradient updating process.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an aerial vehicle rotational target detection method based on improved YOLOv5 in accordance with the present invention;

FIG. 2 is a further flow chart according to the present invention;

FIG. 3 is a schematic diagram of a network architecture according to the present invention;

FIG. 4 is a diagram of the recognition result according to the present invention 1;

FIG. 5 is a diagram of recognition results according to the present invention 2;

FIG. 6 is a diagram of recognition results according to the present invention 3;

fig. 7 is a schematic diagram of the recognition result according to the present invention 4.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given below of the specific implementation, structure, features and effects of the technical solution according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following detailed development of each step is performed:

referring to FIG. 1, a flow chart of some embodiments of an improved Yolov 5-based method for detecting an aerial vehicle rotation target in accordance with the present invention is shown. The method for detecting the aerial vehicle rotating target based on the improved YOLOv5 comprises the following steps of:

and S1, constructing an aerial vehicle rotation target detection model based on the improved YOLOv 5.

In some embodiments, an improved YOLOv 5-based aerial vehicle rotational target detection model may be constructed.

It should be noted that, another flowchart of the present invention may be shown in fig. 2. The network structure of the improved YOLOv 5-based aerial vehicle rotational target detection model may be as shown in fig. 3.

As an example, this step may include the steps of:

first, a Focus module is constructed.

The Focus module performs slicing operation on an input image.

For example, a Focus module is constructed, and before a picture in YOLOv5 enters a backlight, the above Focus module performs slicing operation on the picture, specifically, a value is taken every other pixel in one picture, thus four pictures are obtained, the four pictures are complementary and look similar, but no information is lost, thus W, H information is concentrated into a channel space, an input channel is expanded by 4 times, that is, the spliced picture becomes 12 channels relative to the original RGB three-channel mode, and finally, the obtained new picture is subjected to convolution operation, and finally, a double downsampling feature map without information loss is obtained.

Second, a module is designed Diamond Mapping Unit on a backhaul shallow network based on an improved YOLOv5 aerial vehicle rotational target detection model.

The Diamond Mapping Unit module is used for improving the feature extraction process of the shallow network and enhancing the network expression capability. The Diamond Mapping Unit module performs feature extraction through double branches. The first branch consists of a max pooling operation and a convolution operation of 1*1. The second branch consists of the convolution of 1*1 and the convolution of 3*3.

For example, a module Diamond Mapping Unit of a backhaul shallow network for improving an aerial vehicle rotation target detection model of YOLOv5 may be configured to divide a feature map input through a first step included as an example in step S1 into two branches, the first branch is subjected to a max pooling operation, the main features of the feature map are enhanced, and the resolution is halved, and a convolution of 1*1 is performed on the feature map obtained after the max pooling operation to increase the number of channels. The second branch firstly passes through the convolution of 1*1 to reduce the number of channels, then the obtained feature map is subjected to downsampling operation by 3*3 convolution with the step length of 2, and the obtained feature map and the feature map generated by the first branch are fused in the channel dimension and then fed into the convolution of 1*1 to reduce the number of channels.

Third, constructing Cheap Backbone Network and Cheap Neck Network by applying the Ghost mapping technology on a Backbone network and a Neck Neck network based on an aerial vehicle rotation target detection model of improved YOLOv 5.

The Cheap CSP module is a key component for reducing the calculation load and network parameters. The Cheap CSP module described above is composed mainly of three volumes and a ghostbottleck module. The GhostBottenceck module is mainly formed by stacking two Ghost modules.

For example, a Ghost mapping technique can be applied to a Backbone network and a Neck Neck network based on an aerial vehicle rotation target detection model of improved YOLOv5 to build a lightweight effect Cheap Backbone Network and Cheap Neck Network. The Ghost module firstly obtains a part of feature images through common convolution, performs low-cost transformation on the obtained feature images to generate more similar feature images, and finally fuses the two groups of feature images, wherein a specific formula can be as follows:

P＝Q*f′+b

wherein, the liquid crystal display device comprises a liquid crystal display device,

for inputting the height h, width w, the characteristic diagram is convolution operation, +.>

Is a convolution kernel of size c×m, b is the bias term, ++>

To generate a profile of height h ', width w'. Then, a series of linear operations are applied to each feature map in P to respectively regenerate s feature maps:

wherein p is _i Is the ith feature map in P, phi _ij (p _i ) Is p to _i The jth linear operation (except the last) is performed to generate the jth ghost feature map. Final output of n=m×s feature maps [ p ] ₁₁ ,p ₁₂ ,L p _ms ]。

Cheap Backbone Network and Cheap Neck Network utilize the Ghost mapping technique to compress the amount of parameters of the network and reduce computational load compared to the original network.

And fourthly, introducing an adaptive space feature fusion ASFF mechanism into a Head network based on an aerial vehicle rotation target detection model of improved YOLOv5 to form an ASFF-Head network, wherein the ASFF mechanism carries out adaptive space fusion on feature graphs from Cheap Backbone Network and Cheap Neck Network to regenerate three feature graphs (marked as P1, P2 and P3) with different scales, so as to solve the problem of inconsistent features of different layers.

For example, the ASFF mechanism in the fourth step included as an example in step S1 adjusts the feature maps of the three layers to have the same resolution and dimension as those of the feature maps of the step S1, performs feature fusion according to a formula in training, and then regenerates the feature maps of three different scales, where the ASFF mechanism performs adaptive spatial fusion on the feature maps from Cheap Backbone Network and Cheap Neck Network, and regenerates the formulas corresponding to the feature maps of three different scales to be:

ASFF _k ＝P ^1→k *α ^k +P ^2→k *β ^k +P ^3→k *γ ^k

α ^k +β ^k +γ ^k ＝1

wherein k=1, 2,3, asff _k Is the feature map output after the k-th layer fusion, alpha ^k ,β ^k ,γ ^k Weight matrix representing the kth layer, alpha ^k +β ^k +γ ^k =1 is a constraint. P (P) ^1→k ,P ^2→k ,P ^3→k Representing the feature map obtained by adjusting the size and the channel number of the other two layers based on the k-th layer.

When k=1, P1 keeps the number of channels of the size unchanged, P ^1→1 Namely P1 and P2 are subjected to double downsampling to adjust the size and the channel number to be the same as those of P1 to obtain P ^2→1 P3 is subjected to four times downsampling to adjust the size and the channel number to obtain P ^3→1 ，α ¹ ,β ¹ ,γ ¹ Representing the weight matrix of the 1 st level, multiplying the three feature images after adjustment with the weight matrix of each, adding and fusing to obtain ASFF ₁ 。

When k=2, P1 is scaled up to the same size as P2 by interpolation, and the pass is adjusted by double up-samplingThe number of the tracks is the same as P2 to obtain P ^1→2 P2 keeps the size and number of channels unchanged, P ^2→1 Namely P2 and P3 are subjected to double downsampling to adjust the size and the channel number to be the same as those of P2 to obtain P ^3→2 ，α ² ,β ² ,γ ² Representing the weight matrix of the 2 nd layer, multiplying the adjusted feature images with the weight matrix of each, adding, and fusing to obtain ASFF ₂ 。

When k=3, P1 is expanded to the same size as P3 by interpolation, and P is obtained by four-fold up-sampling adjustment of the number of channels to be the same as P3 ^1→3 P2 is expanded to the same size as P3 by interpolation, and P is obtained by adjusting the channel number to be the same as P3 by double up-sampling ^2→3 P3 remains unchanged in size and dimension, P ^3→3 Namely P3, alpha ³ ,β ³ ,γ ³ Representing the weight matrix of the 3 rd level, multiplying the adjusted feature images with the weight matrix of each, adding, and fusing to obtain ASFF ₃ 。

α ^k +β ^k +γ ^k =1 is a constraint that is obtained by passing a 1*1 convolution layer with an output channel number of 3 and using the Softmax function to get a range of 0,1]Is a weight of (2).

And fifthly, adding an angle class prediction branch in an ASFF-Head network based on an aerial vehicle rotation target detection model of improved YOLOv5, and calculating bounding box loss, confidence coefficient loss, class loss and angle class loss for the obtained feature graphs with different scales, thereby completing prediction.

For example, an angle class prediction branch may be added to an ASFF-Head network based on an improved YOLOv5 aerial vehicle rotation target detection model, the angle regression problem is converted into an angle classification problem by using a circular smoothing label CSL, and a bounding box loss, a confidence loss, a class loss and an angle classification loss are calculated for the feature map of different scales obtained by the fourth step included as an example in step S1, thereby completing the prediction.

For example, the functions corresponding to the bounding box loss, confidence loss, class loss, and angle classification loss are respectively:

the bounding box loss function CIoU is:

wherein IoU is the intersection ratio of the predicted frame and the real frame, b represents the parameter of the prediction center, b ^gt Parameters representing the center of a real target bounding box ρ ² Is the square of the distance between two central points, alpha and v are aspect ratios, w and h represent the width and height of the prediction frame, w ^gt And h ^gt Representing the width and height, respectively, of the real box, pi being the arc of 180 deg., arctan being the arctan function.

Confidence loss function l _obd The method comprises the following steps:

1, otherwise 0, < >>

Representing the true value, C _i Representing predicted value lambda _noobj Representing the constraint coefficients.

Class loss function l _cls The method comprises the following steps:

wherein S is ² The number of meshes is represented and,

1, otherwise 0, < >>

Representing true class probability, P _i (c) Representing the predicted category probability.

The angle classification loss function loss (z, y) is:

loss(z,y)＝mean{l ₀ ,L,l _N-1 },

l _n ＝sum{l _n,0 ，l _n,1 L l _n,179 },

l _n,a ＝-[y _n,a ln(δ(z _n,a ))+(1-y _n,a )ln(1-δ(z _n,a ))],

y _n,a ＝CSL(x)

where g (x) is a window function, a gaussian function may be selected, r is the radius of the window function, and θ is the angle of the bounding box.N is the number of samples. a e 0,180) for a total of 180 categories. Delta is a sigmoid function. z _n,a To predict the probability of the nth sample at the nth angle, z _n,a Is 1, i.e. the predicted value. y is _n,a For the label of the nth sample at the a-th angle under CSL (x) expression, i.e. true value, y _n,a Is 1. According to the predicted value and the true value of each angle of the nth sample, sequentially passing through l _n,a Calculating, and summing 180 angle results of the nth sample to obtain l _n The calculation results of the N samples are averaged to obtain the angle loss at this time. For example, the angle predictions and the true values of the nth sample can be sequentially added to l _n,a Calculating, and summing 180 angle results of the nth sample to obtain l _n And averaging the calculation results of the N samples to obtain the angle loss at the time.

And S2, establishing an aerial vehicle rotating target detection data set and preprocessing to obtain an aerial vehicle rotating target detection model data set based on the improved YOLOv 5.

In some embodiments, a dataset based on an improved YOLOv5 aerial vehicle rotational target detection model may be constructed.

As an example, this step may include the steps of:

the first step, adopting an unmanned plane platform to shoot urban roads under different weather conditions, adopting a method of overturning and cutting to preprocess the acquired vehicle images, and adopting a rotary labeling tool ropylelmg to label the preprocessed vehicle images to manufacture a label file.

The unmanned aerial vehicle platform is adopted to shoot urban roads under different weather conditions, and the obtained data set is an aerial vehicle rotation target detection data set.

For example, an unmanned plane platform can be adopted to shoot urban roads under different weather conditions, the collected images are preprocessed by adopting methods such as overturning and cutting, the preprocessed vehicle images are marked by using a rotary marking tool ropylelmg, and a tag file is manufactured. For example, an unmanned aerial vehicle with a high-definition camera can be adopted to shoot vehicle targets in different road sections and different time periods at different heights and different angles, shoot urban roads under different weather conditions, preprocess shooting images, discard aerial shooting images without vehicle information, overturn the rest of the images, and cut to expand a training data set and enrich set image data of data.

And secondly, randomly selecting a vehicle image with preset duty ratio and a corresponding tag file in the preprocessed vehicle image as a training set, and taking the rest vehicle image and the corresponding tag file in the preprocessed vehicle image as a test set.

Wherein the training set and the test set constitute a dataset based on an improved YOLOv5 aerial vehicle rotational target detection model. The preset duty cycle may be a preset duty cycle. For example, the preset duty cycle may be 60%.

For example, 60% of the sample images and the corresponding tag files in the first step included as an example in step S2 may be randomly selected as the training set, and 40% of the sample images and the corresponding tag files may be selected as the test set.

And S3, training the constructed aerial vehicle rotating target detection model based on the improved YOLOv5 by adopting a dataset of the aerial vehicle rotating target detection model based on the improved YOLOv 5.

In some embodiments, an improved YOLOv5 based aerial vehicle rotational target detection model may be trained.

As an example, this step may include the steps of:

first, training parameters of an aerial vehicle rotational target detection model based on improved YOLOv5 are initialized.

For example, in the training process, the epoch is set to 250, the batch size batch-size is 8, the initial learning rate lr is 0.01, the momentum and weight decay are 0.937 and 0.0005, respectively, and the training warm-up round is set to 3.

Secondly, inputting the training set into a constructed aerial vehicle rotating target detection model based on the improved YOLOv5, and performing iterative training on the rotating vehicle target detection model by using a boundary box loss function, a confidence loss function, a category loss function and an angle classification loss function to obtain the trained aerial vehicle rotating target detection model based on the improved YOLOv 5.

The preset number may be a preset number. For example, the preset number may be 300. The preset number of epochs may be 300 epochs.

For example, the training set in the second step, which is included as an example in the step S2, may be sent to the aerial vehicle rotation target detection model based on the improved YOLOv5 and constructed in the step S1, and 300epoch iterative training is performed on the rotation vehicle target detection model by using a bounding box loss function, a confidence loss function, a category loss function and an angle classification loss function, so as to obtain an aerial vehicle rotation target detection model which is primarily trained, and in the training process, when the aerial vehicle rotation target detection model based on the improved YOLOv5 enters the training set, whether the aerial vehicle target is included in the aerial image is determined according to the read-in tag data, and the aerial image which is less than half of the display of the vehicle target in the training set is automatically deleted so as to avoid training of the interference model.

And S4, testing the improved Yolov 5-based aerial vehicle rotating target detection model after training, and obtaining a test result.

In some embodiments, the test set may be input into a trained improved YOLOv 5-based aerial vehicle rotation target detection model for testing, resulting in test results.

As an example, the test set in the second step included as an example in the step S2 may be input into the aerial vehicle rotation target detection model obtained after the second step included as an example in the step S3 is trained to detect, and the obtained detection result is the aerial vehicle rotation target detection model based on the improved YOLOv5 after the test training is completed, so as to obtain the test result.

And S5, evaluating whether the performance of the trained aerial vehicle rotation target detection model based on the improved YOLOv5 meets the actual application requirement or not according to the test result.

In some embodiments, the trained improved YOLOv 5-based aerial vehicle rotation target detection model may be evaluated based on test results using accuracy, recall, and average accuracy.

As an example, the aerial vehicle rotation target detection model may be evaluated with a Precision (Precision) Recall (Recall) and an average accuracy mAP (mean Average Precision) according to the test result in step S4. The Recall (Recall) indicates how many of the positive examples in the sample were predicted to be correct (find all) and the proportion of all positive examples that were predicted to be correct. Accuracy (Precision) indicates how many samples that are predicted to be positive are truly positive samples (find pairs). The proportion of true positive examples in the predicted result. In the field of target detection in machine learning, mAP (mean Average Precision) is a very important measure for measuring the performance of a target detection algorithm. In general, the overall average accuracy (mAP, also called overall average accuracy) is obtained by comprehensively weighted-averaging the average Accuracy (AP) of all class detections.

And S6, if the performance of the trained aerial vehicle rotating target detection model based on the improved YOLOv5 meets the actual application requirements, using the trained aerial vehicle rotating target detection model based on the improved YOLOv5 for aerial vehicle rotating target detection.

In some embodiments, if the evaluation result of the aerial vehicle rotation target detection model obtained in step S5 meets the actual requirement, the aerial vehicle rotation target detection model based on the improved YOLOv5 is applied to the actual aerial rotation vehicle target detection.

And S7, if the performance of the aerial vehicle rotating target detection model based on the improved YOLOv5 after training does not meet the actual application requirement, correcting parameters of the aerial vehicle rotating target detection model based on the improved YOLOv5, and repeating training on the aerial vehicle rotating target detection model based on the improved YOLOv5 until the performance of the aerial vehicle rotating target detection model based on the improved YOLOv5 meets the actual application requirement, and using the trained aerial vehicle rotating target detection model based on the improved YOLOv5 for detecting the aerial vehicle rotating target.

In some embodiments, if the evaluation result obtained in step S5 on the aerial vehicle rotation target detection model does not meet the actual requirement, the parameters of the model constructed in step S1 are corrected and then the process goes to step S3 to retrain.

The recognition result of the present invention can be shown in fig. 4, 5, 6 and 7. In conclusion, the angle classification loss is added, the angle regression problem is converted into the angle classification problem by using the CSL technology, and the model network is trained by using the bounding box loss function, the confidence coefficient loss function, the class loss function and the angle classification loss function. Firstly, a Diamond Mapping Unit module is designed on a shallow network of a backstone, compared with the original shallow network design, the method and the device of the invention furthest reduce the information loss of the original input image, improve the characteristic extraction process of the shallow network, effectively represent the target characteristics and improve the vehicle detection performance. And introducing an adaptive spatial feature fusion ASFF mechanism into the Head network, and constructing an ASFF-Head network, wherein the mechanism learns the relation between feature images of different scales, and compared with the feature images directly generated by the original Head network, the ASSF mechanism ensures that each spatial position of the feature images can be adaptively fused with feature information of different levels. In the invention, a network structure with a light weight effect is constructed by utilizing a Ghost mapping technology in a Backbone network and a Neck Neck network, and compared with a characteristic map generated by directly using Conv, a complete similar characteristic map is generated by utilizing the Ghost mapping technology from a part of the characteristic map, thereby compressing the network parameter quantity and optimizing the gradient updating process.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention and are intended to be included within the scope of the invention.

Claims

1. An aerial vehicle rotating target detection method based on improved YOLOv5 is characterized by comprising the following steps of:

2. The method for detecting the rotation target of the aerial vehicle based on the improved YOLOv5 according to claim 1, wherein the constructing the detection model of the rotation target of the aerial vehicle based on the improved YOLOv5 comprises:

3. The method for detecting the rotational target of the aerial vehicle based on the improved YOLOv5 according to claim 1, wherein the steps of establishing an aerial vehicle rotational target detection data set and preprocessing the data set to obtain a data set of an aerial vehicle rotational target detection model based on the improved YOLOv5 comprise the following steps:

4. A method for detecting an aerial vehicle rotational target based on modified YOLOv5 as claimed in claim 3, wherein training the constructed aerial vehicle rotational target detection model based on modified YOLOv5 using the dataset of the aerial vehicle rotational target detection model based on modified YOLOv5 comprises:

5. The method for detecting an aerial vehicle rotating target based on improved YOLOv5 of claim 4, wherein the test training is completed based on an aerial vehicle rotating target detection model of improved YOLOv5, and the method comprises the following steps:

6. The method for detecting an aerial vehicle rotation target based on improved YOLOv5 of claim 5, wherein the step of evaluating whether the performance of the trained aerial vehicle rotation target detection model based on improved YOLOv5 meets the actual application requirements according to the test result comprises:

and according to the test result, evaluating the trained aerial vehicle rotation target detection model based on the improved YOLOv5 by using the accuracy, the recall and the average accuracy.

7. The improved YOLOv 5-based method of claim 2, wherein the Diamond Mapping Unit module performs feature extraction by two branches, wherein the first branch consists of a max pooling operation and a 1*1 convolution operation; the second branch consists of the convolution of 1*1 and the convolution of 3*3.

8. The improved YOLOv 5-based aerial vehicle rotation target detection method of claim 2, wherein the Cheap CSP module consists of three rolls and a ghostbottleck module, wherein the ghostbottleck module is formed by stacking two Ghost modules.

9. The method for detecting an aerial vehicle rotation target based on improved YOLOv5 of claim 4, wherein the bounding box loss function CIoU is:

wherein IoU is the intersection ratio of the predicted frame and the real frame, b represents the parameter of the prediction center, b ^gt Parameters representing the center of a real target bounding box ρ ² Is the square of the distance between two center points, alpha and v are aspect ratios, and w and h are dividedRepresenting the width and height, w, of the prediction frame ^gt And h ^gt Respectively representing the width and the height of a real frame, pi is the radian of 180 degrees, arctan is an arctan function;

confidence loss function l _obj The method comprises the following steps:

1, otherwise 0, < >>

class loss function l _cls The method comprises the following steps:

wherein S is ² The number of meshes is represented and,

1, otherwise 0, < >>

the angle classification loss function loss (z, y) is:

loss(z,y)＝mean{l ₀ ,L,l _N-1 },

l _n ＝sum{l _n,0 ，l _n,1 L l _n,179 },

l _n,a ＝-[y _n,a ln(δ(z _n,a ))+(1-y _n,a )ln(1-δ(z _n,a ))],

y _n,a ＝CSL(x)

where g (x) is the window function, r is the radius of the window function, θ is the angle of the bounding box; n is the number of samples; a epsilon [0,180 ] for a total of 180 categories; delta is a sigmoid function; z _n,a To predict the probability of the nth sample at the nth angle, z _n,a Is 1, is a predicted value; y is _n,a The label for the nth sample at the a-th angle under CSL (x) expression is the true value, y _n,a Is 1; according to the predicted value and the true value of each angle of the nth sample, sequentially passing through l _n,a Calculating, and summing 180 angle results of the nth sample to obtain l _n The calculation results of the N samples are averaged to obtain the angle loss at this time.

10. The improved YOLOv 5-based aerial vehicle rotation target detection method of claim 2, wherein the ASFF mechanism performs adaptive spatial fusion on feature maps from Cheap Backbone Network and Cheap Neck Network, and regenerates the formulas corresponding to the three feature maps with different scales as follows:

ASFF _k ＝P ^1→k *α ^k +P ^2→k *β ^k +P ^3→k *γ ^k

α ^k +β ^k +γ ^k ＝1