CN116343150A

CN116343150A - Road sign target detection method based on improved YOLOv7

Info

Publication number: CN116343150A
Application number: CN202310293143.5A
Authority: CN
Inventors: 邓月明; 谢竞; 陈正浩
Original assignee: Hunan Normal University
Current assignee: Hunan Normal University
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-06-27

Abstract

The invention discloses a target detection method based on road identification recognition of improved YOLOv7, which comprises the steps of obtaining a public road identification data set CCTSDB-2021; dividing the data set into a test set and a training set; carrying out data enhancement on the image; constructing an improved YOLOv7-SDB network model, and importing a training set into a detection model for training; and carrying out target detection on the test set by the trained detection model. According to the invention, the Conv convolution of 6 conv blocks in the ELAN block in the backup is subjected to the concat splicing operation with the CBS block, so that the number of channels is increased under the condition that each piece of characteristic information is not increased; introducing a GAM attention module after each concat operation in the Head to improve the detection precision; in the space pyramid pooling module, SPPCSPC is replaced by SPPFCSPC by referring to the thought of SPPF, so that the speed is improved and the calculated amount is reduced under the condition of keeping the receptive field unchanged; the original Mosaic data enhancement is changed into Mosaic-9 data enhancement, so that the data set samples are enriched. Compared with the original YOLOv7 model, the method has the advantages that the detection precision and the detection speed are obviously improved.

Description

Road sign target detection method based on improved YOLOv7

Technical Field

The invention relates to the field of computer vision target detection, in particular to a target detection method based on improved YOLOv7 road identification.

Background

Road traffic sign detection refers to detecting and identifying various traffic signs on roads, such as speed limit signs, forbidden signs, indicating signs, warning signs, and the like. Object detection based on road traffic identification is an important task in computer vision, aiming at identifying and locating various traffic identifications on roads. The technology has important significance in the fields of automatic driving, intelligent traffic systems and the like.

At present, algorithms for road traffic sign target detection are mainly divided into two categories: based on traditional vision methods and based on deep learning methods. The road traffic sign target detection algorithm based on the traditional vision method is mainly based on the characteristics of template matching, edge detection, shape matching and the like, such as a Canny algorithm based on edge detection, a color segmentation algorithm based on image color distribution and the like. And the deep learning-based mode extracts information from the image through convolutional neural network learning, and utilizes various optimization strategies to identify and position the target. The algorithms currently in widespread use include R-CNN, faster R-CNN, YOLO, SSD, and the like.

YOLO series is widely used in industry for its good detection performance and detection speed, where YOLO v7 brings new characteristics compared to previous generation versions, including multi-scale feature fusion (MSFF) layer, smooth tag (SL) algorithm, class Balance (CB) loss function, etc. Although the YOLOv7 algorithm has certain optimization in the design of the multi-scale fusion and depth deformable convolution network, the detection effect of the algorithm can be limited for some small-size and slender targets, and the problems of missed detection and the like are easy to generate.

Disclosure of Invention

The invention provides a road sign target detection method based on an improved YOLOv7 model based on an original YOLOv7 model, and aims to improve the detection precision of medium and small-sized road signs under the condition of properly improving the parameter quantity.

The invention is realized by the following technical scheme, and the road sign target detection method based on the improved YOLOv7 provided by the invention comprises the following steps:

1. acquiring a public road identification data set CCTSDB-2021, wherein the data set comprises 20491 road identification pictures and 3 categories which are respectively indication identifications, warning identifications and forbidden identifications;

2. 7500 pictures in the data set are taken, and the test set and the training set are divided according to a ratio of 8:2;

3. performing Mosaic-9 data enhancement processing on the selected picture;

4. k-means cluster analysis is carried out on the real label samples in the data set;

5. constructing a road identification detection model YOLOv7-SDB based on the improved YOLOv 7;

6. leading the training set into a detection model for training to obtain a road identification detection model;

7. carrying out road identification detection on the test set by the detection model;

the invention provides a preferable scheme, wherein the Mosaic-9 data enhancement process comprises the following steps: randomly selecting 9 pictures, performing augmentation operation on the 9 pictures respectively, and pasting the 9 pictures to the corresponding positions of the masks with the same size as the final output image respectively; augmentation operations include random clipping, scaling, alignment, and gamut variation; and obtaining a new picture and enriching the data set.

According to the preferred scheme provided by the invention, the K-means clustering algorithm clusters the width and height of all target frames contained in the training set to obtain 9 most representative width and height combinations, and the 9 anchors are divided into 3 groups according to the width and height.

According to the preferred scheme provided by the invention, the construction of the improved YOLOv7-SDB structure improvement process based on the improved YOLOv7 road sign detection model comprises the following steps: firstly, performing concha splicing operation on 6 conv convolutions in an ELAN block in a back band part in a detection network and a CBS block, then respectively introducing GAM (Global Attention Mechanism) attention modules after each concha operation in a Head part, and finally replacing an SPPCSPC module in the Head part with an SPPFCSPC module by referring to the idea of SPPF. For the final output 32 times downsampled feature map C5 of the Backbone, then the number of channels goes from 1024 to 512 through the SPPFCSPC module. Fusing the P3 layer, the P4 layer and the P5 layer with the C4 layer and the C3 layer in a top-down mode; and then fusing with the P4 layer and the P5 layer in a bottom-up mode. Finally, three detection layers with different sizes of 80×80, 40×40 and 20×20 are obtained, wherein 80×80 is used for detecting small targets, 40×40 is used for detecting small targets and 20×20 is used for detecting medium targets.

In a preferred embodiment of the present invention, the GAM attention module includes: the GAM attention mechanism can stably improve the performance of model detection by designing a mechanism for reducing feature information generation and amplifying global dimension interaction features. Defining an input feature map as

Intermediate state variable ∈ ->

And final output +.>

The expression of (2) is shown in formula 1:

(1)

wherein,,

and->

Channel and spatial attention patterns, respectively, +.>

Representing a calculated multiplication of the element pattern.

Input features

Finally pass through the channelThe sub-attention unit and the space sub-attention unit output a value of +>

Is used to preserve cross-three dimensional information using 3D placement by activating channel attention subunits and multi-layer MLP (multi-layer perceptron) is used to amplify cross-dimensional channel spatial correlation. In the spatial attention subunit, two convolutions are used to merge spatial information in order to fully focus on the spatial information.

The SPPFCSPC module of a preferred scheme provided by the invention comprises: the SPPFCSPC module is based on the SPPCSPC module according to the idea of SPPF, with the difference that the SPPFCSPC module processes the 3 largest pooling layers sequentially, and can achieve a speed boost with unchanged receptive field.

According to the preferred scheme provided by the invention, the super-parameter setting of the road sign detection model based on the improved YOLOv7 is as follows:

the Learning rate (Learning rate) was set to 0.01, the optimizer was chosen to be random gradient descent (SGD), the Momentum (Momentum) was set to 0.937, the number of training iterations (Batch) was 300 rounds, the Batch Size (batch_size) was 16, and the sio_loss was used as the boundary loss function.

The invention discloses a road identification target detection method based on improved YOLOv7, which is based on an original YOLOv7 target detection network model, combines a road identification data set CCTSDB-2021 to carry out optimization and improvement, can carry out real-time detection on medium and small-sized road identifications, and improves the detection precision and detection speed of the model.

Drawings

Fig. 1 is a flowchart of a road sign target detection method based on improved YOLOv 7.

Fig. 2 is a schematic diagram of a detection network structure of a road sign target detection method based on improved YOLOv 7.

Fig. 3 is a schematic diagram of the structure of the GAM attention module.

Fig. 4 is a training effect diagram of the YOLOv7-SDB road identification detection network.

Fig. 5 is a diagram showing the detection effect of the YOLOv7-SDB road identification detection network.

FIG. 6 is a schematic diagram of a confusion matrix for a Yolov7-SDB road identification detection network.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, a detailed description of the present invention will be further described with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of the method of the present invention, and the method for detecting a road sign target based on improved YOLOv7 provided by the present invention specifically includes the following steps:

step 1: and acquiring a public road identification image, and dividing a target detection data set required by training and testing.

The operation in the step 1 is as follows: 7500 pictures in the CCTSDB-2021 data set are taken, and the training set and the testing set are divided according to the proportion of 8:2. And respectively creating summary files train. Txt and val. Txt for storing the absolute paths and the label positions and the categories of the pictures in rows by writing a python script file, and finally placing label labels and jpg pictures which are separated in a training set and a testing set under the same catalog.

Step 2: and performing Mosaic-9 data enhancement processing on the selected picture.

The operation in the step 2 is as follows: randomly selecting 9 pictures, performing augmentation operation on the 9 pictures respectively, and pasting the 9 pictures to the corresponding positions of the masks with the same size as the final output image respectively; augmentation operations include random clipping, scaling, alignment, and gamut variation; and obtaining a new picture and enriching the data set.

Randomly cutting: the obstacle is caused to appear in different positions of the original picture in different proportions by image random cropping.

Scaling: the original picture is scaled in size.

Arrangement: the combination of the original pictures and the arrangement of the frames.

Color gamut variation: the brightness, saturation and tone of the original picture are changed.

Step 3: and carrying out K-means cluster analysis on the real label sample in the data set.

The operation in the step 3 is as follows: the K-means clustering algorithm clusters the width and height of all target frames contained in the training set to obtain 9 most representative width and height combinations, and the 9 Anchor are divided into 3 groups according to the width and height. The 9 different prior frame sizes were calculated to be [6,7], [11,10], [14,15], [19,19], [25, 25], [20,52], [37,36], [63,65], [98,119], respectively.

Step 4: and constructing a road identification detection model YOLOv7-SDB based on the improved YOLOv7, wherein the network structure is shown in figure 2.

The operation in the step 4 is as follows: firstly, performing concha splicing operation on 6 conv convolutions in an ELAN block in a backhaul part in a detection network and a CBS block, and then respectively introducing a GAM attention module after each concha operation in a Head to increase the feature extraction capability of the network. The GAM attention module is shown in fig. 3.

The GAM attention mechanism can stably improve the performance of model detection by designing a mechanism for reducing feature information generation and amplifying global dimension interaction features. Defining an input feature map as

Intermediate state variable

And final output +.>

The expression of (2) is shown in formula 1:

(1)

wherein,,

and->

Channel and spatial attention patterns, respectively, +.>

Representing a calculated multiplication of the element pattern.

Input features

Finally, a signal is output as +.>

Finally, the SPPCSPC module of the head part is replaced by the SPPFCSPC module by referring to the thought of SPPF. The SPPFCSPC module is based on the SPPCSPC module according to the idea of SPPF, and differs from the SPPCSPC module in that the SPPFCSPC module sequentially processes the 3 largest pooling layers, and can obtain a speed improvement without changing the receptive field.

For the final output 32 times downsampled feature map C5 of the Backbone, then the number of channels goes from 1024 to 512 through the SPPFCSPC module. Fusing the P3 layer, the P4 layer and the P5 layer with the C4 layer and the C3 layer in a top-down mode; and then fusing with the P4 layer and the P5 layer in a bottom-up mode. Finally, three detection layers with different sizes of 80×80, 40×40 and 20×20 are obtained, wherein 80×80 is used for detecting small targets, 40×40 is used for detecting small targets and 20×20 is used for detecting medium targets.

The Learning rate (Learning rate) based on the road identification detection model of improved YOLOv7 was set to 0.01, the optimizer was chosen to be random gradient descent (SGD), the Momentum (Momentum) was set to 0.937, the training iteration number (Batch) was 300 rounds, the Batch Size (batch_size) was 16, and the sio_loss was used as the boundary loss function.

The Siou_loss boundary loss function mainly includes a 4-part loss: angle loss (Angle cost), distance loss (Distance cost), shape loss (Shape cost), and cross ratio loss (IoU cost).

Angle cost (Angle cost):

wherein the method comprises the steps of

For the difference in height between the center points of the real and predicted frames,/for the difference in height between the center points of the real and predicted frames>

For the distance of the center point of the real frame and the predicted frame, in fact +.>

Equal to angle->

。

Distance loss (Distance cost):

wherein the method comprises the steps of

，/>

，/>

The width and height of the minimum bounding rectangle for the real and predicted frames. />

For the center coordinates of the real frame, < >>

Is the predicted frame center coordinates.

Shape penalty (Shape cost):

=/>

wherein the method comprises the steps of

，/>

，/>

And->

Width and height of prediction frame and real frame, respectively,/-for>

Controlling the degree of concern over shape loss.

Cross-ratio loss (IoU cost):

wherein the method comprises the steps of

For predicting frame area, ++>

Is the real frame area.

Finally, the total loss function is:

step 5: the training results of the road identification detection model YOLOv7-SDB based on the improved YOLOv7 were evaluated.

The operation in the step 5 is as follows: the parameter of the statistical model, the detection speed FPS on the test set, the average accuracy mAP, and the original YOLOv7 target detection method were compared, and the results are shown in table 1 below.

TABLE 1

As shown in the results of Table 1, compared with the original YOLOv7 network, the model of the invention improves mAP by 12.4% on the basis of properly improving parameter, improves the detection speed FPS by approximately 20%, and satisfies the detection accuracy and instantaneity.

The model training environment of the invention is: CPU uses Intel (R) Xeon (R) W-2102@ 2.90GHz, GPU uses GeForce RTX 2080Ti, running memory 64GB, operating system is Ubuntu 18.04.3 LTS, and deep learning framework is PyTorch.

Claims

1. The road identification target detection method based on the improved YOLOv7 is characterized by comprising the following steps of:

1.1. acquiring a public road identification data set CCTSDB-2021, wherein the data set comprises 20491 road identification pictures and 3 categories which are respectively indication identifications, warning identifications and forbidden identifications;

1.2. 7500 pictures in the data set are taken, and the test set and the training set are divided according to a ratio of 8:2;

1.3. performing Mosaic-9 data enhancement processing on the selected picture;

1.4. k-means cluster analysis is carried out on the real label samples in the data set;

1.5. constructing a road identification detection model YOLOv7-SDB based on the improved YOLOv 7;

1.6. leading the training set into a detection model for training to obtain a road identification detection model;

1.7. and carrying out road identification detection on the test set by the detection model.

2. The improved YOLOv 7-based road marking target detection method of claim 1, wherein the step 1.3. Mosaic-9 data enhancement procedure comprises:

randomly selecting 9 pictures, performing augmentation operation on the 9 pictures respectively, and pasting the 9 pictures to the corresponding positions of the masks with the same size as the final output image respectively; augmentation operations include random clipping, scaling, alignment, and gamut variation; and obtaining a new picture and enriching the data set.

3. The improved YOLOv 7-based road identification target detection method according to claim 1, wherein the K-means clustering algorithm in step 1.4 clusters the widths and heights of all target frames contained in the training set to obtain 9 most representative combinations of widths and heights, and the 9 anchors are divided into 3 groups according to the sizes of the widths and the heights.

4. The method for detecting a road marking target based on improved YOLOv7 as claimed in claim 1, wherein the construction of the road marking detection model YOLOv7-SDB structure improvement based on improved YOLOv7 in step 1.5 comprises:

firstly, performing concha splicing operation on 6 conv convolutions in an ELAN block in a back band part in a detection network and a CBS block, then respectively introducing GAM (Global Attention Mechanism) attention modules after each concha operation in a Head part, and finally replacing an SPPCSPC module in the Head part with an SPPFCSPC module by referring to the idea of SPPF. For the final output 32 times downsampled feature map C5 of the Backbone, then the number of channels goes from 1024 to 512 through the SPPFCSPC module. Fusing the P3 layer, the P4 layer and the P5 layer with the C4 layer and the C3 layer in a top-down mode; and then fusing with the P4 layer and the P5 layer in a bottom-up mode. Finally, three detection layers with different sizes of 80×80, 40×40 and 20×20 are obtained, wherein 80×80 is used for detecting small targets, 40×40 is used for detecting small targets and 20×20 is used for detecting medium targets.

5. The improved YOLOv 7-based road marking target detection method of claim 1, wherein the GAM attention module comprises:

the GAM attention mechanism can stably promote by designing a mechanism for reducing feature information generation and amplifying global dimension interaction featuresPerformance of model test. Defining an input feature map as

Intermediate state variable ∈ ->

And final output +.>

The expression of (2) is shown in formula 1:

,/>

（1）

wherein,,

and->

Channel and spatial attention patterns, respectively, +.>

Representing a calculated multiplication of the element pattern. Input features->

Finally, a signal is output as +.>

6. The improved YOLOv 7-based road marking target detection method of claim 1, wherein the SPPFCSPC module comprises:

the SPPFCSPC module is based on the SPPCSPC module according to the idea of SPPF, with the difference that the SPPFCSPC module processes the 3 largest pooling layers sequentially, and can achieve a speed boost with unchanged receptive field.

7. The road marking target detection method based on modified YOLOv7 as claimed in claim 1, wherein a Learning rate (Learning rate) of the road marking detection model based on modified YOLOv7 is set to 0.01, an optimizer is selected to be a random gradient descent (SGD), a Momentum (Momentum) is set to 0.937, a training iteration number (Batch) is 300 rounds, a Batch Size (batch_size) is 16, and a sio_loss is used as a boundary loss function.