CN111985451A

CN111985451A - A UAV scene detection method based on YOLOv4

Info

Publication number: CN111985451A
Application number: CN202010921511.2A
Authority: CN
Inventors: 韩玉洁; 曹杰; 万思钰; 刘琨
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-11-24

Abstract

The invention provides a YOLOv4-based UAV scene detection method, comprising: S1, establishing a proprietary data set; dividing a training set and a test set according to a certain proportion; S2, establishing a network structure; S3, using a pre-training model, and The pre-training model sets specific training parameters to obtain the training model; S4, performs iterative training until the loss function converges, and obtains the UAV scene detection model; S5, uses the test set to test the model, and judges whether it meets the requirements, if If it does not meet the requirements, continue to step S4 until the test results meet the requirements; S6, output the UAV scene detection model that meets the requirements; S7, use the UAV scene detection model to perform target detection on the sequence images. Compared with the prior art, the invention occupies less memory, the average intersection ratio is increased by 5.26% after the improvement, the precision rate is increased by 3.30% compared with the original version, and the improved recall rate is increased by 1.08%.

Description

A UAV scene detection method based on YOLOv4

技术领域technical field

本发明涉及无人机领域，特别是涉及一种基于YOLOv4的无人机场景检测方法。The invention relates to the field of unmanned aerial vehicles, in particular to a method for detecting unmanned aerial vehicle scenes based on YOLOv4.

背景技术Background technique

无人机小巧灵活、视角广阔，近几年来在农业植保、灾情检测、安全防护、航拍视频等方面有广泛的应用。各种深度学习方案使目标识别的准确率和速度有极大发展，但目前目标识别的对象大多是平面视角的，无人机图像中的目标尺度多变、尺寸较小、分辨率低，现有的模型无法直接应用在无人机图像目标识别领域。以Faster-RCNN为代表的两阶段目标检测框架需要的硬件资源比较多而且速度较慢，不适合实时性场景。UAVs are small, flexible, and have a wide viewing angle. In recent years, they have been widely used in agricultural plant protection, disaster detection, security protection, and aerial video. Various deep learning schemes have greatly developed the accuracy and speed of target recognition, but most of the objects recognized by the target are from a flat perspective. Some models cannot be directly applied in the field of UAV image target recognition. The two-stage target detection framework represented by Faster-RCNN requires more hardware resources and is slow, which is not suitable for real-time scenarios.

为解决无人机图像中小目标多、像素低、多尺度和无人机硬件平台资源有限、实时性高的难题，本发明基于YOLOv4网络改进训练一种基于无人机图像的着陆场景多目标识别模型。In order to solve the problems of many small targets, low pixels, multi-scale and limited resources of the UAV hardware platform and high real-time performance in the UAV image, the present invention improves and trains a landing scene multi-target recognition based on the UAV image based on the YOLOv4 network. Model.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种基于YOLOv4的无人机场景检测方法，主要解决的问题是：现有的目标识别模型对无人机图像中的尺度多变、尺寸较小、分辨率低的小目标检测效果不好，难以判断无人机所处场景的问题。In view of this, the purpose of the present invention is to provide a UAV scene detection method based on YOLOv4, the main problem to be solved is: the existing target recognition model has a variety of scales, small size, and resolution in the UAV image. The detection effect of small targets with low rates is not good, and it is difficult to judge the problem of the scene where the drone is located.

为实现上述的目的，本发明提供如下的技术方案：For achieving the above-mentioned purpose, the present invention provides the following technical solutions:

本发明公开了一种基于YOLOv4的无人机场景检测方法，包括：The invention discloses a YOLOv4-based UAV scene detection method, comprising:

S1、建立专有数据集；按照一定比例划分训练集和测试集；S1. Establish a proprietary data set; divide the training set and the test set according to a certain proportion;

S2、建立网络结构，所述网络结构基于改进的YOLOv4网络，以CSPdarknet53作为主干网络，空间金字塔池化模块和路径聚合网络模块作为颈部，YOLOV3作为头部预测输出；S2, establish a network structure, the network structure is based on the improved YOLOv4 network, with CSPdarknet53 as the backbone network, the spatial pyramid pooling module and the path aggregation network module as the neck, and YOLOV3 as the head prediction output;

S3、首先使用ImageNet大型数据集对S2步骤中得到的网络结构进行训练，得到预训练模型，然后在对网络结构进行特定训练参数设定；S3. First, use the ImageNet large data set to train the network structure obtained in step S2 to obtain a pre-training model, and then set specific training parameters for the network structure;

S4、使用训练集对预训练模型进行迭代训练，直到损失函数收敛，得到无人机场景检测模型；S4. Use the training set to iteratively train the pre-trained model until the loss function converges to obtain the UAV scene detection model;

S5、使用上述测试集对无人机场景检测模型进行测试，并且判断是否符合要求，若不符合，继续进行S4步骤，继续进行迭代训练直到测试结果符合要求；S5. Use the above test set to test the UAV scene detection model, and judge whether it meets the requirements. If it does not meet the requirements, continue to step S4, and continue to perform iterative training until the test results meet the requirements;

S6、输出符合要求的无人机场景检测模型；S6. Output the UAV scene detection model that meets the requirements;

S7、使用步骤S6中符合要求的无人机场景检测模型对序列图像进行目标检测，并且识别出无人机所处场景。S7. Use the unmanned aerial vehicle scene detection model that meets the requirements in step S6 to perform target detection on the sequence images, and identify the scene where the unmanned aerial vehicle is located.

进一步的，在所述步骤S1中，所述建立专有数据集，包括如下步骤：Further, in the step S1, the establishment of a dedicated data set includes the following steps:

S1.1、采集基础数据样本，所述基础数据样本，包括：截取自无人机拍摄视频形成的图片，以及截取自网络上航拍数据集中的图片；所述图片包括：含有汽车、船只、操场、篮球场、桥梁、港口六种目标的图片，以及不含有汽车、船只、操场、篮球场、桥梁、港口六种目标的图片；S1.1. Collect basic data samples, the basic data samples include: pictures captured from videos captured by drones, and pictures captured from aerial photography data sets on the Internet; the pictures include: cars, boats, playgrounds , pictures of six kinds of targets of basketball court, bridge and port, and pictures of six kinds of targets that do not contain cars, boats, playgrounds, basketball courts, bridges and ports;

S1.2、对所述基础数据样本中的目标进行标签，并将标签处理为YOLO网络所需要的格式，所述标签包括：类别，中心点横坐标，中心点纵坐标，目标宽度和目标长度；S1.2, label the target in the basic data sample, and process the label into the format required by the YOLO network, the label includes: category, center point abscissa, center point ordinate, target width and target length ;

S1.3、采用数据增强方法对经过标签的图片和未经过标签的图片进行扩充，得到专有数据集，所述专有数据集共有1000张图片。S1.3, using a data enhancement method to expand the labeled pictures and the unlabeled pictures to obtain a proprietary data set, and the proprietary data set has a total of 1000 pictures.

进一步的，在所述步骤S1中，所述训练集和测试集的比例为：9:1。Further, in the step S1, the ratio of the training set and the test set is: 9:1.

进一步的，在所述步骤S3中所述特定训练参数为：批次batch＝64，每张图的大小为608x608，批次细分subdivision＝16，最大批次数为20000次，初始学习率为0.0013。Further, in the step S3, the specific training parameters are: batch=64, the size of each image is 608×608, the batch subdivision=16, the maximum number of batches is 20000 times, and the initial learning rate is 0.0013 .

进一步的，在所述步骤S4中，所述损失函数的收敛值为0.5。Further, in the step S4, the convergence value of the loss function is 0.5.

进一步的，在所述步骤S5中，所述符合要求是指对无人机场景检测模型进行性能评估，mAP@0.5达到93.33％及以上。Further, in the step S5, the meeting the requirements refers to the performance evaluation of the UAV scene detection model, and the mAP@0.5 reaches 93.33% and above.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明提供的无人机场景检测模型比RCNN系列网络占用内存小，只需占用2G显存，可用于无人机机载硬件平台以及其他硬件资源不足的设备上；改进后的YOLOV4比原版网络平均交并比改进以后提升了5.26％，精确率比原版提升了3.30％，改进后的召回率提升了1.08％。The UAV scene detection model provided by the present invention occupies less memory than the RCNN series network, only needs 2G video memory, and can be used on the UAV airborne hardware platform and other equipment with insufficient hardware resources; the improved YOLOV4 is more average than the original network. The intersection and union are improved by 5.26%, the precision rate is increased by 3.30% compared with the original version, and the improved recall rate is increased by 1.08%.

附图说明Description of drawings

图1为基于YOLOv4的无人机场景检测方法流程图。Figure 1 is a flowchart of the UAV scene detection method based on YOLOv4.

图2为YOLOv4网络框架图。Figure 2 shows the YOLOv4 network framework diagram.

图3为YOLOv4网络详细结构图。Figure 3 is a detailed structural diagram of the YOLOv4 network.

图4为损失函数随着模型迭代训练次数变化的关系图。Figure 4 shows the relationship between the loss function and the number of model iterations and training.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明公开了一种基于YOLOv4的无人机场景检测方法，本发明中提供的无人机场景检测模型相对于现有的模型，占用内存小，只需占用2G显存，并且模型经过训练后整体性能得到了较好的提升。The invention discloses a UAV scene detection method based on YOLOv4. Compared with the existing model, the UAV scene detection model provided in the present invention occupies less memory, only needs to occupy 2G video memory, and the model is trained as a whole. Performance has been better improved.

实施例1Example 1

参见图1、图2和图3,本实施例1公开了一种基于YOLOv4的无人机场景检测方法，包括：Referring to Fig. 1, Fig. 2 and Fig. 3, the present embodiment 1 discloses a YOLOv4-based UAV scene detection method, including:

S1、建立专有数据集；按照9:1的比例划分训练集和测试集；S1. Establish a proprietary data set; divide the training set and the test set according to the ratio of 9:1;

专有数据集的数据来自基础数据样本，基础数据样本通过两种渠道获取，第一种渠道是：截取自无人机拍摄视频形成的图片，第二种渠道是截取自网络上航拍数据集中的图片；图片的种类分为两类，第一类图片是含有涉及汽车、船只、操场、篮球场、桥梁、港口六种目标至少一种目标的图片，第二类图片是均不含有汽车、船只、操场、篮球场、桥梁、港口六种目标的图片。The data of the proprietary data set comes from basic data samples, which are obtained through two channels. The first channel is: intercepted pictures formed from videos captured by drones, and the second channel is intercepted from aerial photography data sets on the Internet. Pictures; the types of pictures are divided into two categories. The first category of pictures is pictures that contain at least one of the six types of objects, including cars, boats, playgrounds, basketball courts, bridges and ports. The second category of pictures is pictures that do not contain cars or boats. , playground, basketball court, bridge, port pictures.

对上述图片中的目标进行标签，在本实施例中，对图片中的目标进行标签这一动作可以是使用labelImg标签工具或者其他具有相同功能的标签工具，labelImg标签工具输出的文件格式是xml文件，因为YOLO训练的标签文件是txt格式，所以需要再将xml文件转换化为txt格式，可以通过手动转化，或者工具进行批量转化，具体标签包括：类别，中心点横坐标，中心点纵坐标，目标宽度和目标长度。Label the target in the above-mentioned picture, in this embodiment, the action of labeling the target in the picture can be to use the labelImg label tool or other label tools with the same function, and the file format output by the labelImg label tool is an xml file. , because the label file for YOLO training is in txt format, it is necessary to convert the xml file into txt format, which can be converted manually or in batches with tools. The specific labels include: category, abscissa of center point, ordinate of center point, Target width and target length.

最后采用数据增强方法对上述经过标签的图片和未经过标签的图片进行扩充，得到专有数据集，专有数据集共有1000张图片。Finally, the data enhancement method is used to expand the above-mentioned labeled pictures and unlabeled pictures to obtain a proprietary data set, which has a total of 1000 pictures.

S2、建立网络结构，网络结构基于改进的YOLOv4网络，以CSPdarknet53作为主干网络，空间金字塔池化模块和路径聚合网络模块作为颈部，YOLOV3作为头部预测输出；网络结构的框架图请参照图2和图3；S2. Establish a network structure. The network structure is based on the improved YOLOv4 network, with CSPdarknet53 as the backbone network, spatial pyramid pooling module and path aggregation network module as the neck, and YOLOV3 as the head prediction output; please refer to Figure 2 for the frame diagram of the network structure. and Figure 3;

多通道(CSP)只将特征图的一部分直接进行卷积操作，其卷积结果和另一部分原始特征进行组合，可以增强神经网络的学习能力，能够在轻量化的同时保持准确性、降低计算瓶颈、降低内存成本。Multi-channel (CSP) only directly performs convolution operations on a part of the feature map, and the convolution results are combined with another part of the original features, which can enhance the learning ability of the neural network, maintain accuracy while reducing weight, and reduce computational bottlenecks , reduce memory costs.

空间金字塔池化模块将空间金字塔匹配特征法集成到卷积神经网络上，无论输入图像的尺寸多大，空间金字塔池化模块均可以产生固定大小的输出。The Spatial Pyramid Pooling module integrates the Spatial Pyramid Matching Feature Method into a Convolutional Neural Network, and can produce a fixed-size output regardless of the size of the input image.

路径聚合网络模块在进行上采样后又进行下采样，改善了神经网络中信息的流通路径，加强了特征金字塔。The path aggregation network module performs downsampling after upsampling, which improves the information flow path in the neural network and strengthens the feature pyramid.

S3、首先使用ImageNet大型数据集对S2步骤中得到的网络结构进行训练，得到预训练模型，所述预训练模型含有多种物体相关的初始化特征参数，然后在对网络结构进行特定训练参数设定；具体的说，上述特定训练参数为：每批次送入网络的图片数batch＝64，每张图的大小为608x608，批次细分subdivision＝16，以减小显存占用，最大批次数为20000次，初始学习率为0.0013。S3. First, use the ImageNet large data set to train the network structure obtained in step S2 to obtain a pre-training model. The pre-training model contains a variety of object-related initialization feature parameters, and then set specific training parameters for the network structure. ; Specifically, the above specific training parameters are: the number of pictures sent to the network in each batch batch=64, the size of each picture is 608x608, the batch subdivision=16, in order to reduce the memory usage, the maximum number of batches is 20000 times, the initial learning rate is 0.0013.

S4、向预训练模型输入训练集进行迭代训练，直到损失函数收敛，损失函数的收敛值为0.5，得到无人机场景检测模型，根据根据图4的训练损失函数形状可知，学习率设置合理，训练占用显存2G，相比两阶段的目标检测框架更加省计算资源。S4. Input the training set to the pre-training model to perform iterative training until the loss function converges. The convergence value of the loss function is 0.5, and the UAV scene detection model is obtained. According to the shape of the training loss function according to Figure 4, the learning rate is set reasonably. Training takes up 2G of video memory, which saves computing resources compared to the two-stage target detection framework.

训练后得到的模型的整体性能指标如表1所示，平均交并比改进以后提升了5.26％，精确率比原版提升了3.30％，改进后的召回率提升了1.08％，mAP@0.5达到93％。The overall performance indicators of the model obtained after training are shown in Table 1. The average intersection and union are improved by 5.26%, the precision rate is increased by 3.30% compared with the original version, the improved recall rate is increased by 1.08%, and mAP@0.5 reaches 93%. %.

表1Table 1

性能指标Performance 精确率accuracy 召回率recall 平均交并比average crossover ratio mAP@0.5mAP@0.5 YOLOV4YOLOV4 0.910.91 0.930.93 0.760.76 0.890.89 改进YOLOv4Improve YOLOv4 0.940.94 0.940.94 0.800.80 0.930.93

S5、使用上述测试集对无人机场景检测模型进行测试，并且判断是否符合要求，若不符合，继续进行S4步骤，继续进行迭代训练直到测试结果符合要求；具体的话，所述要求是指mAP@0.5达到93.33％及以上。S5. Use the above test set to test the UAV scene detection model, and judge whether it meets the requirements. If it does not meet the requirements, continue to step S4, and continue to perform iterative training until the test results meet the requirements; specifically, the requirements refer to mAP @0.5 achieves 93.33% and above.

S6、输出符合要求的无人机场景检测模型。S6. Output a UAV scene detection model that meets the requirements.

S7、使用步骤S6中符合要求的无人机场景检测模型对序列图像进行目标检测，具体的说，无人机通过机载摄像头拍摄的视频传输到地面站后，地面站在对序列中的图像进行检测，识别出无人机所处场景。S7. Use the unmanned aerial vehicle scene detection model that meets the requirements in step S6 to perform target detection on the sequence images. Detect and identify the scene where the drone is located.

本发明未详述之处，均为本领域技术人员的公知技术。The parts that are not described in detail in the present invention are known techniques of those skilled in the art.

以上详细描述了本发明的较佳具体实施例。应当理解，本领域的普通技术人员无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此，凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案，皆应在由权利要求书所确定的保护范围内。The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make many modifications and changes according to the concept of the present invention without creative efforts. Therefore, all technical solutions that can be obtained by those skilled in the art through logical analysis, reasoning or limited experiments on the basis of the prior art according to the concept of the present invention shall fall within the protection scope determined by the claims.

Claims

1. A method for unmanned aerial vehicle scene detection based on YOLOv4 is characterized by comprising the following steps:

s1, establishing a proprietary data set; dividing a training set and a test set according to a certain proportion;

s2, establishing a network structure, wherein the network structure is based on an improved YOLOv4 network, CSPdarknet53 is used as a main network, a spatial pyramid pooling module and a path aggregation network module are used as necks, and YOLOV3 is used as head prediction output;

s3, firstly, training the network structure obtained in the step S2 by using an ImageNet large-scale data set to obtain a pre-training model, and then setting specific training parameters for the network structure;

s4, carrying out iterative training on the pre-training model by using a training set until a loss function is converged to obtain an unmanned aerial vehicle scene detection model;

s5, testing the unmanned aerial vehicle scene detection model by using the test set, judging whether the unmanned aerial vehicle scene detection model meets the requirements, if not, continuing to perform the step S4, and continuing to perform iterative training until the test result meets the requirements;

s6, outputting an unmanned aerial vehicle scene detection model meeting the requirements;

and S7, carrying out target detection on the sequence images by using the unmanned aerial vehicle scene detection model meeting the requirements in the step S6, and identifying the scene where the unmanned aerial vehicle is located.

2. The method for detecting unmanned aerial vehicle scene based on YOLOv4 of claim 1, wherein in the step S1, the establishing the proprietary data set includes the following steps:

s1.1, acquiring basic data samples, wherein the basic data samples comprise: intercepting a picture formed by shooting a video by an unmanned aerial vehicle, and intercepting a picture in an aerial data set on a network;

the picture comprises: pictures containing six targets of an automobile, a ship, an playground, a basketball court, a bridge and a port, and pictures containing six targets of the automobile, the ship, the playground, the basketball court, the bridge and the port;

s1.2, labeling the target of the picture in the basic data sample, and processing the label into a format required by a YOLO network, wherein the label comprises: category, center point abscissa, center point ordinate, target width and target length;

s1.3, expanding the pictures which are subjected to the tags and the pictures which are not subjected to the tags by adopting a data enhancement method to obtain a proprietary data set, wherein the proprietary data set comprises 1000 pictures.

3. The method of claim 1, wherein in step S1, the ratio of the training set to the test set is: 9:1.

4. The method of claim 1, wherein the specific training parameters in step S3 are: the batch size is 64, the size of each graph is 608x608, the batch subdivision is 16, the maximum batch number is 20000, and the initial learning rate is 0.0013.

5. The method of claim 1, wherein in step S4, the convergence value of the loss function is 0.5.

6. The method of claim 1, wherein in step S5, the compliance is a performance evaluation of the drone scene detection model, and the mapp @0.5 is 93.33% or more.