CN115937636A

CN115937636A - Traffic target detection method for unmanned driving based on deep learning

Info

Publication number: CN115937636A
Application number: CN202211703954.XA
Authority: CN
Inventors: 朱勇建; 李长旭; 王栋; 张裕; 王嘉钰; 刘云翔
Original assignee: Shanghai Institute of Technology
Current assignee: Shanghai Institute of Technology
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-04-07

Abstract

The invention provides a traffic target detection method for unmanned driving based on deep learning, which is characterized in that on the basis of the existing mature model YOLOv5, an ACmix module with improved convolution and self-attention mechanism fusion is added in front of an SPP module, a multi-scale target detection layer is added, a BDD100K data set is downloaded and processed, a training set, a verification set and a test set for model training are constructed, and finally the BDD100K data set is sent to the constructed traffic target detection model based on the YOLOv5 improvement for model training, testing and evaluation; in the model construction stage, the introduced ACmix module is more beneficial to extracting target features. In the training stage, images which do not contain traffic targets in the sent data set are deleted, so that the training of the model is prevented from being interfered, and the network convergence is accelerated. In the evaluation stage, the accuracy and the speed of the model are optimized by adjusting the width and the depth of the model so as to meet the requirements of practical application.

Description

Traffic target detection method for unmanned driving based on deep learning

Technical Field

The invention relates to a traffic target detection method for unmanned driving based on deep learning.

Background

With the continuous development of automatic driving technology, the target detection method is more and more concerned. Due to the complexity and diversity of the actual driving road, the rapid, accurate and high-precision target detection method plays an important role in aspects such as automatic driving. In a road environment, the background of a target image to be identified in an image shot by a camera is complex, traffic targets are different in size, and the problems of dynamic objects, shielding and the like exist. The target detection technology based on deep learning is one of the most important research directions in the field of computer vision. With the development of artificial intelligence technology and the continuous upgrading and iteration of computer hardware, target detection is gradually developed from a traditional feature extraction method to detection by using a deep learning technology. The field of deep learning target detection comprises that a single-stage target represented by YOLO has high detection speed and high precision, a model is simplified and is convenient to improve, but the real-time performance is poor, the detection precision of a small target with low resolution is low, and the phenomenon of missed detection and false detection is easy to occur.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a traffic target detection method for unmanned driving based on deep learning.

In order to achieve the above purpose, the technical solution for solving the technical problem is as follows:

a traffic target detection method for unmanned driving based on deep learning comprises the following steps:

s1: downloading and processing a BDD100K data set, including a training data set and a testing data set;

s2: adding a multi-scale target detection layer to the model;

s3: an ACmix module is introduced into the YOLOv5 network model, so that the feature expression and learning capacity of the model can be enhanced, and the model operation overhead is reduced.

Further, the S1 includes the following contents: converting the downloaded BDD100K data set into a txt format adopted by YOLOv5 from a json format, constructing a traffic target detection training data set and a test data set by adopting real shot images in the BDD100K data set, wherein the training data set consists of 10 ten thousand images, comprises six types of samples of bus, car, truck, person, bike and motor, and is according to train: val: test =7:2: the proportion of 1 is divided into 70000 frames of training sets, 20000 frames of verification sets and 10000 frames of test sets for model training, verification and testing.

Further, in S1: the bus is a medium bus or a large bus; car is car including car, minibus, SUV various forms; truck is small, medium and large truck containing pickup; person is human; bike. The motor is a motorcycle.

Further, the S2 includes the following contents:

s2-1, adding an upper sampling layer in an upper sampling module of a PAN feature fusion network of a YOLOv5 target detection model, wherein the upper sampling layer is an upper sampling layer increased by 4 times on the basis of 8 times, 16 times and 32 times of the upper sampling layer;

s2-2, adding a Concat fusion layer in a PAN feature fusion network of a YOLOv5 target detection model, and performing feature fusion on the 4 times of the added upsampling layer in the S2-1 and a feature map with the same size obtained in the backbone network feature extraction process through the added Concat fusion layer to generate a 4 times of upsampled feature map;

s2-3, adding a small target detection layer, using the 4 times of up-sampled feature map in S2-2 for small target detection, adding a traffic target detection model for unmanned driving based on deep learning into a prediction layer with 4 scales, and using the prediction layer for multi-scale detection of the Head part;

and S2-4, adding a group of anchor point frames with small target sizes according to the small target detection layer added in the S2-3, and acquiring the anchor point frames according with the small target size characteristics by adopting a K-means self-adaptive algorithm.

Further, in S3:

an ACmix module is introduced into a Yolov5 network model, and particularly, the ACmix module is inserted at the tail of a backbone network of the Yolov5, namely between the last CBL module and the SPP module in the backbone network, so that the model characteristic expression capability is improved, and the model operation overhead is reduced.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: by means of multi-scale fusion features, the advantages of convolution and self attention combination are utilized, accuracy is improved, calculated amount is reduced, and both accuracy and detection efficiency are considered.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. In the drawings:

FIG. 1 is a flow chart of a traffic target detection method for unmanned driving based on deep learning of the present invention;

FIG. 2 is a schematic diagram of a detection network model architecture of the present invention;

fig. 3 is a schematic diagram of the ACmix structure of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in FIG. 1, the invention discloses a traffic target detection method for unmanned driving based on deep learning, which comprises the following steps:

s1: the BDD100K data sets, including the training data set and the test data set, are downloaded and processed.

Further, S1 includes the following:

converting the downloaded BDD100K data set into a txt format adopted by YOLOv5 from a json format, constructing a traffic target detection training data set and a test data set by adopting real shot images in the BDD100K data set, wherein the training data set consists of 10 ten thousand images, comprises six types of samples of bus, car, truck, person, bike and motor, and is according to train: val: test =7:2: the proportion of 1 is divided into 70000 frames of training sets, 20000 frames of verification sets and 10000 frames of test sets for model training, verification and testing.

S2: and adding a multi-scale target detection layer to the model.

Further, S2 includes the following:

s2-1: as shown in fig. 2, in an upsampling module of a PAN feature fusion network of a YOLOv5 target detection model, an upsampling layer is added, wherein the upsampling layer is 4 times of the upsampling layer added on the basis of 8 times, 16 times and 32 times of the upsampling layer;

s2-2: in a PAN feature fusion network of a YOLOv5 target detection model, a Concat fusion layer is added, feature fusion is carried out on feature graphs with the same size obtained in the process of extracting features of the backbone network and the added 4-time upsampling layer in S2-1 through the added Concat fusion layer, the receptive field is increased by adopting 4-level Spatial Pyramid Pooling (Spatial Pyramid Pooling), multi-scale feature fusion is realized on the feature graphs with 4 levels and different sizes by utilizing SPP (shortest Path processing), the multi-scale feature fusion of a Neck part is realized, and the specific Neck structure is shown in FIG. 2;

s2-3: adding a small target detection layer, using the feature map sampled 4 times in the step S2-2 for small target detection, adding a prediction layer with 4 scales based on an infrared image pedestrian target detection deep learning model of improved YOLOv5, wherein the prediction layer is the upsampling feature layer with 4 scales respectively 4 times, 8 times, 16 times and 32 times, and for an input infrared image with 512 × 512, four feature scales obtained by adding a detection layer are respectively: a 128 × 128 scale feature layer, a 64 × 64 scale feature layer, a 32 × 32 scale feature layer and a 16 × 16 scale feature layer, which are used for realizing multi-scale detection of a Head part, wherein a specific Head structure is shown in fig. 2;

s2-4: according to the small target detection layer added in the step S2-3, increasing the size of a group of anchor point frames (anchors) with small target sizes, and acquiring the anchor point frames which accord with the small target size characteristics of the infrared image pedestrian by adopting a K-means self-adaptive algorithm; according to the increased 128 x 128-scale feature layer in S2-3, the anchors corresponding to the small scales are added to the divided small-scale grids, so that the anchors are added to 12 groups corresponding to 4 detection scales;

s3: an ACmix module is introduced into a YOLOv5 network model, so that the feature expression and learning capacity of the model can be enhanced, and the model operation overhead is reduced

Further, S3 includes the following:

and inserting an ACmix module between the last CBL module and the SPP module in the backbone network.

Specifically, ACmix includes two stages.

Stage I: the input features are projected by 3 1 × 1 convolutions and reshaped into N blocks, respectively. In this way, a rich set of intermediate features is obtained comprising 3 × N feature maps.

Stage II: for the self-attention path, the intermediate features are clustered into N groups, each group containing 3 feature maps, each feature from a 1 × 1 convolution. The corresponding 3 feature maps, as queries, keys, and values, respectively, follow the traditional multi-headed self-attention model.

For a convolution path with the kernel size of k, a lightweight full-link layer (3 Nx (k ^ 2) N) is adopted to generate k ^2 feature maps, and N groups are shared. Thus, by translating and aggregating the generated features, we process the input features in a convolution manner and gather information from the local receptive fields like the conventional method. Finally, the outputs of the two paths are added.

The convolution and self-attention of the first stage actually share the same operation when projecting the input feature map by 1 × 1 convolution. In stage II, ACmix introduces additional computational overhead through lightweight full-link layer and packet convolution, whose computational complexity is linear to the channel size C, and is smaller than stage I.

Compared with the prior art, the invention improves the backbone network and the Neck network in the YOLOv5, increases the multi-scale target detection layer, introduces the ACmix module, can reduce the model calculation overhead, improves the speed, enables the feature extraction network to pay more attention to the extraction of the shallow feature, can more thoroughly extract the shallow detail feature and the deep high-level semantic feature, and enables the robustness of the model to be better.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A traffic target detection method for unmanned driving based on deep learning is characterized by comprising the following steps:

s1: download and process BDD100K data sets, including: training a data set and testing the data set;

s2: adding a multi-scale target detection layer to the YOLOv5 target detection model;

s3: an ACmix module is introduced into a YOLOv5 target detection model to enhance the feature expression and learning capacity of the model and reduce the model operation overhead.

2. The traffic target detection method for unmanned aerial vehicle based on deep learning according to claim 1, wherein the S1 comprises:

converting the downloaded BDD100K data set into a txt format adopted by YOLOV5 from a json format, constructing a traffic target detection training data set and a test data set by adopting real shooting images through the BDD100K data set, wherein the training data set consists of 10 ten thousand images and comprises: bus, car, truck, person, bike, and motor six samples, and according to train: val: test =7:2: the proportion of 1 is divided into 70000 frames of training sets, 20000 frames of verification sets and 10000 frames of test sets for model training, verification and testing.

3. The traffic-target detecting method for unmanned aerial vehicle based on deep learning according to claim 2, wherein in S1,

the bus is a medium bus or a large bus;

car is a car including cars, minibuses, and SUVs in various forms;

truck is small, medium and large truck containing pickup;

person is human;

bike;

the motor is a motorcycle.

4. The deep learning-based traffic target detection method for unmanned aerial vehicle according to claim 1, wherein the S2 comprises:

s2-1, adding an upsampling layer in an upsampling module of a PAN feature fusion network of a YOLOv5 target detection model, wherein the upsampling layer is an upsampling layer increased by 4 times on the basis of 8 times, 16 times and 32 times of the upsampling layer;

s2-2, adding a Concat fusion layer in a PAN feature fusion network of a YOLOv5 target detection model, and performing feature fusion on the feature graph with the same size obtained in the process of extracting features of the added 4 times of upsampling layers and the backbone network through the added Concat fusion layer to generate a 4 times of upsampling feature graph;

s2-3, adding a small target detection layer, using the 4 times of up-sampled feature map for small target detection, adding a traffic target detection model for unmanned driving based on deep learning into prediction layers of 4 scales, and using the prediction layers for multi-scale detection of Head part;

and S2-4, adding a group of anchor point frames with small target sizes according to the added small target detection layer, and acquiring the anchor point frames according with the small target size characteristics by adopting a K-means self-adaptive algorithm.

5. The deep learning-based traffic target detection method for unmanned aerial vehicle according to claim 1, wherein in S3:

introducing an ACmix module into a YOLOv5 target detection model, wherein the ACmix module comprises: an ACmix module is inserted at the tail of a backbone network of YOLOv5, namely between the last CBL module and the SPP module in the backbone network, so that the model feature expression capability is improved, and the model operation overhead is reduced.