CN115424242A

CN115424242A - Improved yolov3 traffic sign identification method, equipment and medium

Info

Publication number: CN115424242A
Application number: CN202211055743.XA
Authority: CN
Inventors: 王传钊; 谢乐成; 吴锐
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2022-12-02

Abstract

The invention relates to the technical field of traffic sign identification, in particular to a traffic sign identification method, equipment and medium based on improved yolov 3. The method comprises the steps of obtaining an image data set for detecting the traffic sign, carrying out data enhancement processing on the image data set, and dividing the image data set into a training set and an image verification set; improving a Yolov3 detection network structure, changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, modifying a loss function loss of the Yolov3 model into a focal _ loss function, and adding a multi-scale fusion plate structure to obtain an improved Yolov3 model; training the improved Yolov3 model by using an image training set and an image verification set, and evaluating the trained improved Yolov3 model; and inputting the image data of the traffic sign into the trained improved Yolov3 model, and outputting a traffic sign recognition result. The invention can improve the recognition rate of the model to the traffic sign and improve the detection precision of the small target.

Description

Improved yolov3 traffic sign identification method, equipment and medium

Technical Field

The invention relates to the technical field of traffic sign identification, in particular to a traffic sign identification method, equipment and medium based on improved yolov 3.

Background

The traffic sign detection system is an important component of intelligent driving, a camera carried on an automobile is used for shooting a traffic road scene in real time, and a shot picture is transmitted into the detection system for traffic sign detection, so that effective road traffic information is provided for a driver. The traffic information ahead can be predicted in advance, so that the driver can make a judgment in time, the reaction time is prolonged, and the probability of traffic accidents is reduced.

The traffic sign detection algorithm mainly faces the following challenges: (1) The traffic sign in the actually shot image has small occupation ratio and belongs to a small target, and the related information of the small target is easy to lose in the detection process of the algorithm on the image, so the realization of the traffic sign detection has higher difficulty. (2) The traffic signs of the same type have the same background color and higher similarity, and only the specific contents are different, so that the traffic sign categories are difficult to distinguish in the actual detection process.

At present, target detection algorithms in deep learning are many, and CNN (convolutional neural network) is basically used for detecting targets. The method mainly comprises the following steps according to different algorithms: two-stage and two-stage. one-stage mainly includes R-CNN, faster R-CNN, mask R-CNN. one-stage includes SSD and yolov1-5 series. They have excellent ability to detect traffic signs in images, but these methods have problems of low detection accuracy or slow detection speed.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides an improved yolov 3-based traffic sign identification method, equipment and medium, a loss function is optimized through an improved yolov3 model, and the identification precision of a small-target traffic sign can be improved by adopting multi-scale detection fusion.

The first purpose of the invention is to provide a traffic sign identification method based on improved yolov 3.

It is a second object of the invention to provide a computer apparatus.

A third object of the present invention is to provide a storage medium.

The first purpose of the invention can be achieved by adopting the following technical scheme:

s1, acquiring an image data set for detecting a traffic sign, performing data enhancement processing on the image data set, and dividing the image data set into a training set and an image verification set;

s2, improving a Yolov3 detection network structure, changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, modifying a loss function loss of the Yolov3 model into a focal _ loss function, and adding a multi-scale fusion plate structure to obtain an improved Yolov3 model;

s3, training the improved Yolov3 model by using the image training set and the image verification set, and evaluating the trained improved Yolov3 model;

and S4, inputting the image data of the traffic sign into the trained improved Yolov3 model, and outputting a traffic sign recognition result.

In a preferred technical scheme, the step 2 comprises the following steps:

changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, wherein the mobilnetv3 overall structure comprises a depth separable convolution, an SE module and a bottleeck structure;

modifying the loss function loss of the Yolov3 model into a focal _ loss function;

and adding a multi-scale fusion plate structure, wherein the multi-scale fusion integral structure is used for fusing the features of different scales extracted by the Mobilnetv3 model to obtain a feature fusion diagram containing information of each scale.

And testing the recall rate of the trained improved Yolov3 model and the FPS of the trained improved Yolov3 model in the process of predicting the video, and evaluating the trained improved Yolov3 model.

In a preferred technical solution, the step 3 includes:

3.1, initializing internal parameters of the improved Yolov3 model, wherein the internal parameters comprise: inputting a picture size, an initial learning rate, a termination learning rate, an epoch and a batch;

3.2, converting the picture and the data label into 3-channel matrix data in an RGB format, sequentially carrying out forward reasoning on the data through a model, and calculating through a loss function to obtain loss;

3.3, updating, adjusting and improving the internal parameters of the Yolov3 model through inverse gradient propagation;

3.4, repeating the steps 3.2-3.3 in sequence until the parameters are not updated any more, obtaining and storing a trained improved Yolov3 model;

The second purpose of the invention can be achieved by adopting the following technical scheme:

a computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the above improved traffic sign identification method based on yolov 3.

The third purpose of the invention can be achieved by adopting the following technical scheme:

a storage medium storing a program which, when executed by a processor, implements the above improved yolov 3-based traffic sign recognition method.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the invention provides an improved yolov 3-based traffic sign identification method, equipment and medium, which adopt Mobilnetv3 as a backbone of a network to optimize a loss function, adopt multi-scale detection fusion to effectively overcome the defects of unbalanced data distribution and large proportion difference between positive and negative samples, identify objects with different sizes and then fuse identification results, improve the identification rate of a model and improve the detection precision of small targets.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a traffic sign detection method according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating visualization of a network structure of mobilenetv3 in the embodiment of the present invention;

fig. 3 is a data tag in yolo format in an embodiment of the invention.

Detailed Description

The technical solutions of the present invention will be described in further detail with reference to the accompanying drawings and examples, and it is obvious that the described examples are some, but not all, examples of the present invention, and the embodiments of the present invention are not limited thereto. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

as shown in FIG. 1, the improved yolov 3-based traffic sign identification method comprises the following steps:

s1, an image data set for detecting the traffic sign is obtained, and the image data set is divided into a training set and an image verification set.

The situation of the open source data set of the traffic sign needs to be investigated, and the open source data set with the number of pictures more than 10000 at present comprises a CCTSDB open source data set and a TT100k open source data set. The traffic sign board with the TT100k source data set has the advantages of excessive data types, more sparse data and obvious data long tail effect. CCTSDB opens that the data set is more evenly distributed, although it is less in kind (3 kinds).

The CCTSDB initiative data set is the initiative data set of the traffic signs collected by the university of long-sand science, and the TT100K initiative data set is the initiative data set of the traffic signs collected and labeled by the university of qinghua.

The open source data set CCTSDB comprises two files, namely train and test. Where train contains around 16000 training pictures and test contains 800 test pictures. The CCTSDB open source data set comprises 3 types of traffic signs: indication mark, prohibition mark, warning mark. The advantage of setting the classification in this way is that the data distribution is more balanced and the long tail effect is weak.

In this embodiment, the CCTSDB traffic sign data set is used as the data set of the present invention, and the data enhancement processing is performed on the CCTSDB traffic sign data set, where the data enhancement processing includes Mosaic processing and Mixup processing, and the data enhancement is used to enhance the robustness of the model, fully utilize the feature information of the original data, and improve the recognition rate of the model.

Mosaic processing: 4 pictures are utilized at one time, the 4 pictures are randomly spliced, each picture has a frame corresponding to the picture, the pictures are combined to form a new picture, and the pictures are distributed in the upper, lower, left and right directions during splicing and do not affect each other.

Mixup treatment: and performing mixed enhancement on the images, averaging the images, recalculating the label values of the images, mixing the images among different classes, and expanding a training data set. For example, the pictures of partial warning signs and indicating signs are overlapped by taking red warning signs as obvious features and indicating signs as background features through feature fusion. Therefore, the data set is amplified, the complexity of the data set is enhanced, continuous data samples can be provided for different classes, and the robustness of the model is improved.

S2, improving a Yolov3 detection network structure, changing a backbone network structure of a Yolov3 model into a Mobilnetv3 model, modifying a loss function loss of the Yolov3 model into a focal _ loss function, adding a multi-scale fusion plate structure, identifying objects with different sizes, and fusing identification results to obtain an improved Yolov3 model.

S21, changing a backbone network structure of a yolov3 model into a Mobilnetv3 model, wherein the Mobilenetv3 is a lightweight model for extracting a characteristic layer in a deep neural network, and has the advantages of less parameter quantity, high accuracy and stable model.

The yolov3 model refers to a classic model in the field of deep learning, computer vision, target detection, and the backhaul network structure refers to a network structure used for extracting image features in the deep learning model structure.

The Mobilnetv3 overall structure comprises a depth separable convolution, an SE module and a bottleeck structure, wherein the depth separable convolution converts a common 3x3 convolution into a 3x3+1x1 convolution, and the calculation amount is reduced; the SE module is a search network, so that the model can automatically inhibit some unnecessary features and promote some apparent features; the bottleeck structure can reduce the dimension of model input, improve the recognition rate and reduce the calculation amount. A schematic diagram of visualization of a network structure of mobilenetv3 is shown in fig. 2, and first, channels of a feature map are amplified to increase the number of features. Feature extraction is then performed by depth separable convolution. Finally, the dimension of the feature graph is reduced, and the number of channels is reduced.

S22, modifying the loss function loss of the yolov3 model into a focal _ loss function. loss refers to a loss function (including coordinate loss and class loss) in object detection; the focal _ loss is based on the binary cross entropy CE, which is a cross entropy loss of dynamic scaling, and the weights of the samples that are easy to distinguish in the training process can be dynamically reduced by a dynamic scaling factor, so that the center of gravity can be quickly focused on the samples that are difficult to distinguish. The focal _ locations can solve the problem of unbalance of positive and negative samples in the one-stage model, and effectively relieve the condition that the data distribution is not uniform and the negative samples are too many.

And S23, adding a multi-scale fusion plate structure, wherein the multi-scale fusion integral structure is used for fusing the features of different scales extracted by the Mobilnetv3 model to obtain a feature fusion graph containing information of each scale, and the feature fusion graph is favorable for detecting and modifying targets of different sizes.

S3, training the improved Yolov3 model by utilizing the image training set and the image verification set;

before training the improved Yolov3 model, firstly writing a script, and converting an original label of a data set into a format label required by Yolov 3. As shown in fig. 3, the data label of Yolo format, the data set format of Yolo algorithm is:

the first bit represents the id number of the corresponding tag, the second bit represents the x coordinate proportion of the center coordinate of the target object in the image, the third bit represents the y coordinate proportion of the center coordinate of the target object in the image, the fourth bit represents the proportion occupied by w of the target object frame in the image, and the fifth bit represents the proportion occupied by h of the target object frame in the image. Note in particular that the interval between must be one space.

Secondly, a data directory is arranged, and the data directory comprises:

myData

.. images # storage

…train

…test

…val

.. labels # stores the label document corresponding to the image

…train

…test

…val

myData is the data set save total path for model reads. The images path is used for storing the marked original pictures, 3 folders are arranged below the images path, the train folder is used for storing the image training set, the test folder is used for storing the image verification set, namely the data of the model evaluating indexes in the training process, and the val folder is used for evaluating the model indexes after the model training is finished. The labels path is used for storing the image annotation files corresponding to the images path, and the train, the test and the val are in one-to-one correspondence. The code will read the data and the tag in a fixed format and must therefore be arranged in this format.

Training the improved Yolov3 model by utilizing an image training set and an image verification set, and specifically comprises the following steps:

3.1, initializing internal parameters of the improved Yolov3 model, wherein the internal parameters comprise: the picture size, initial learning rate, termination learning rate, epoch, and batch are input.

The internal parameters include: inputting picture size, initial learning rate (learning rate when the model starts to train), termination learning rate (learning rate of the model gradually decreases until a threshold), epoch (all pictures represent one epoch in one traversal), and batch (size is set according to GPU video memory).

The effect of the initialization of the internal parameters is to better start the model training, so that the model training has a good starting point.

And 3.2, converting the picture and the data label into 3-channel matrix data in an RGB format, sequentially carrying out forward reasoning on the data through the model, and calculating through a loss function to obtain loss.

Matrix data; in general, the calculation for the convolutional neural network is to convert a picture into n-dimensional matrix data to operate.

Forward reasoning: similarly, f (x) = ax + b, where f (x) is a model, x is our input matrix data, and a and b are parameters we need to improve training. The forward reasoning means that matrix data is sent into a model to carry out operation once and a certain result is output.

loss function calculates loss: assume that in f (x) = ax + b, when x =2, 100 is its real data. f (x = 2) =2a + b. loss (100,f (x = 2)) = k. Here, loss is our loss function. K is the penalty of operating to improve the result of our prediction f (x = 2) and the actual result 100.

And 3.3, updating and adjusting the internal parameters of the improved Yolov3 model through inverse gradient propagation.

The assumption a, b in 3.2 above is the internal parameters of the f (x) model, and the backward gradient propagation means that the calculated loss and gradient are provided to continuously optimize and adjust the values of the a, b parameters until a proper a, b lets the model meet an x, and a value infinitely close to the correct value is output.

And 3.4, repeating the steps 3.2-3.3 in sequence until the parameters are not updated any more, and obtaining and storing the trained improved Yolov3 model.

This step requires continuous input of picture matrix data and updating of parameters. And finally, storing the training result and the model. The purpose of storing the model is to facilitate the use of the model for the required predictions and evaluations, the stored model being stored by a Tensorflow proprietary module.

And 3.5, testing the recall rate of the trained improved Yolov3 model and the FPS of the trained improved Yolov3 model in video prediction, and evaluating the trained improved Yolov3 model.

Before testing, 2 scenes of day, night and day need to be selected for comprehensive evaluation. The reason for selecting multiple scenes is that the robustness of the model needs to be checked in the model test, and the indexes of the multiple scenes are more accurate evaluation indexes.

In this embodiment, 800 pictures are selected to evaluate the model, wherein 500 pictures in the day and 300 pictures at night have a resolution of 1280x 720. And testing the recall rate, wherein the recall rate mainly influences the number of false reports of the model, and the higher the recall rate is, the fewer false reports are. And testing the FPS, wherein the FPS influences the model reasoning speed.

The improved Yolov3 model recall rate after training is shown in table 1. Table 1 compares recall indexes of the original model and the improved model, and evaluates from the distance dimension and the scene dimension, so that the model evaluation indexes have stringency.

TABLE 1

recal	0-30m day	30-50m day	50m + day
				Yolov3_tiny_row	0.933	0.754	0.43
Yolov3_Mobilnetv3	0.952	0.786	0.55
				recal	0-30m night	At night of 30-50m	50m + night
Yolov3_tiny_row	0.862	0.642	0.32
				Yolov3_Mobilnetv3	0.883	0.703	0.41

FPS for improved Yolov3 model recall after training, as shown in Table 2. Table 2 compares the operating speeds of the two models, extracts the average value of 100 consecutive frames, and has accurate and reliable result.

TABLE 2

fps	Taking the average value of continuous 100 frames
		Yolov3_tiny_row	48ms
Yolov3_Mobilnetv3	40ms

The evaluation results show that the overall real of the improved yolov3 model is obviously slightly higher than that of the original model. The reason is that the feature extraction of Mobilnetv3 in the model is superior to the original backbone. The improved yolov3 model is significantly better than the original model for the detection of small targets at long distances (50 m +) because focal _ loss is used. The FPS velocity of the improved yolov3 model is superior to the original model because the amount of parameters of the improved model is significantly less than that of the original model.

And inputting the image data of the traffic sign into the trained improved Yolov3 model, outputting a series of matrix information by the improved Yolov3 model, and mapping out the sign category (including warning, indicating and forbidding) corresponding to the target frame on each image through matrix mapping. The improved Yolov3 model has the characteristics of high speed, high small target recognition rate and high scene coverage rate.

Example 2:

the present embodiment provides a computer device, which may be a server, a computer, or the like, and includes a processor, a memory, an input device, a display, and a network interface connected by a system bus, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium and an internal memory, the nonvolatile storage medium stores an operating system, a computer program, and a database, the internal memory provides an environment for the operating system and the computer program in the nonvolatile storage medium to run, and when the processor executes the computer program stored in the memory, the traffic sign identification method based on improved Yolov3 of embodiment 1 is implemented as follows:

The step 2 comprises the following steps:

The step 3 comprises the following steps:

3.4, repeating the steps 3.2-3.3 in sequence until the parameters are not updated any more, and obtaining and storing the trained improved Yolov3 model;

Example 3:

the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the program is executed by a processor, and the processor executes the computer program stored in the memory, the method for recognizing a traffic sign based on improved Yolov3 of the foregoing embodiment 1 is implemented as follows:

The step 2 comprises the following steps:

The step 3 comprises the following steps:

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. The traffic sign identification method based on the improved Yolov3 is characterized by comprising the following steps:

2. The improved Yolov 3-based traffic sign recognition method of claim 1, wherein the traffic sign detected image data set is a CCTSDB open source data set, and the CCTSDB open source data set comprises an indicator flag, a prohibition flag, and a warning flag.

3. The improved Yolov 3-based traffic sign recognition method according to claim 1, wherein the data enhancement processing on the image data set comprises:

performing Mosaic processing on an image data set, randomly splicing 4 pictures by utilizing 4 pictures at one time, wherein each picture has a frame corresponding to the picture, and becomes a new picture after combination, and the pictures are distributed in the upper, lower, left and right directions during splicing and do not influence each other;

performing Mosaic processing on the image data set, performing mixed class enhancement on the images, averaging the images, recalculating label values of the images, mixing the images of different classes, and expanding a training data set.

4. The improved Yolov 3-based traffic sign recognition method according to claim 1, wherein the step 2 comprises:

and adding a multi-scale fusion plate structure, wherein the multi-scale fusion integral structure is used for fusing the features of different scales extracted by the Mobilnetv3 model to obtain a feature fusion graph containing information of each scale.

5. The improved Yolov 3-based traffic sign recognition method according to claim 4, wherein the step 3 comprises:

s31, initializing internal parameters of the improved Yolov3 model, wherein the internal parameters comprise: inputting a picture size, an initial learning rate, a termination learning rate, an epoch and a batch;

s32, converting the picture and the data label into 3-channel matrix data in an RGB format, sequentially carrying out forward reasoning on the data through a model, and calculating through a loss function to obtain loss;

s33, updating, adjusting and improving internal parameters of the Yolov3 model through inverse gradient propagation;

s34, repeating the steps S32-S33 in sequence until the internal parameters of the improved Yolov3 model are not updated any more, and obtaining and storing the trained improved Yolov3 model;

s35, testing the recall rate of the trained improved Yolov3 model and the FPS of the trained improved Yolov3 model when a video is predicted, and evaluating the trained improved Yolov3 model.

6. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor implements the method for improved Yolov 3-based traffic sign recognition according to any one of claims 1 to 5 when executing the program stored in the memory.

7. A storage medium storing a program, wherein the program, when executed by a processor, implements the improved Yolov 3-based traffic sign recognition method according to any one of claims 1 to 5.