CN115424242A - Improved yolov3 traffic sign identification method, equipment and medium - Google Patents

Improved yolov3 traffic sign identification method, equipment and medium Download PDF

Info

Publication number
CN115424242A
CN115424242A CN202211055743.XA CN202211055743A CN115424242A CN 115424242 A CN115424242 A CN 115424242A CN 202211055743 A CN202211055743 A CN 202211055743A CN 115424242 A CN115424242 A CN 115424242A
Authority
CN
China
Prior art keywords
model
traffic sign
improved
yolov3 model
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211055743.XA
Other languages
Chinese (zh)
Inventor
王传钊
谢乐成
吴锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Changan Automobile Co Ltd
Original Assignee
Chongqing Changan Automobile Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Changan Automobile Co Ltd filed Critical Chongqing Changan Automobile Co Ltd
Priority to CN202211055743.XA priority Critical patent/CN115424242A/en
Publication of CN115424242A publication Critical patent/CN115424242A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of traffic sign identification, in particular to a traffic sign identification method, equipment and medium based on improved yolov 3. The method comprises the steps of obtaining an image data set for detecting the traffic sign, carrying out data enhancement processing on the image data set, and dividing the image data set into a training set and an image verification set; improving a Yolov3 detection network structure, changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, modifying a loss function loss of the Yolov3 model into a focal _ loss function, and adding a multi-scale fusion plate structure to obtain an improved Yolov3 model; training the improved Yolov3 model by using an image training set and an image verification set, and evaluating the trained improved Yolov3 model; and inputting the image data of the traffic sign into the trained improved Yolov3 model, and outputting a traffic sign recognition result. The invention can improve the recognition rate of the model to the traffic sign and improve the detection precision of the small target.

Description

Improved yolov3 traffic sign identification method, equipment and medium
Technical Field
The invention relates to the technical field of traffic sign identification, in particular to a traffic sign identification method, equipment and medium based on improved yolov 3.
Background
The traffic sign detection system is an important component of intelligent driving, a camera carried on an automobile is used for shooting a traffic road scene in real time, and a shot picture is transmitted into the detection system for traffic sign detection, so that effective road traffic information is provided for a driver. The traffic information ahead can be predicted in advance, so that the driver can make a judgment in time, the reaction time is prolonged, and the probability of traffic accidents is reduced.
The traffic sign detection algorithm mainly faces the following challenges: (1) The traffic sign in the actually shot image has small occupation ratio and belongs to a small target, and the related information of the small target is easy to lose in the detection process of the algorithm on the image, so the realization of the traffic sign detection has higher difficulty. (2) The traffic signs of the same type have the same background color and higher similarity, and only the specific contents are different, so that the traffic sign categories are difficult to distinguish in the actual detection process.
At present, target detection algorithms in deep learning are many, and CNN (convolutional neural network) is basically used for detecting targets. The method mainly comprises the following steps according to different algorithms: two-stage and two-stage. one-stage mainly includes R-CNN, faster R-CNN, mask R-CNN. one-stage includes SSD and yolov1-5 series. They have excellent ability to detect traffic signs in images, but these methods have problems of low detection accuracy or slow detection speed.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides an improved yolov 3-based traffic sign identification method, equipment and medium, a loss function is optimized through an improved yolov3 model, and the identification precision of a small-target traffic sign can be improved by adopting multi-scale detection fusion.
The first purpose of the invention is to provide a traffic sign identification method based on improved yolov 3.
It is a second object of the invention to provide a computer apparatus.
A third object of the present invention is to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
s1, acquiring an image data set for detecting a traffic sign, performing data enhancement processing on the image data set, and dividing the image data set into a training set and an image verification set;
s2, improving a Yolov3 detection network structure, changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, modifying a loss function loss of the Yolov3 model into a focal _ loss function, and adding a multi-scale fusion plate structure to obtain an improved Yolov3 model;
s3, training the improved Yolov3 model by using the image training set and the image verification set, and evaluating the trained improved Yolov3 model;
and S4, inputting the image data of the traffic sign into the trained improved Yolov3 model, and outputting a traffic sign recognition result.
In a preferred technical scheme, the step 2 comprises the following steps:
changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, wherein the mobilnetv3 overall structure comprises a depth separable convolution, an SE module and a bottleeck structure;
modifying the loss function loss of the Yolov3 model into a focal _ loss function;
and adding a multi-scale fusion plate structure, wherein the multi-scale fusion integral structure is used for fusing the features of different scales extracted by the Mobilnetv3 model to obtain a feature fusion diagram containing information of each scale.
And testing the recall rate of the trained improved Yolov3 model and the FPS of the trained improved Yolov3 model in the process of predicting the video, and evaluating the trained improved Yolov3 model.
In a preferred technical solution, the step 3 includes:
3.1, initializing internal parameters of the improved Yolov3 model, wherein the internal parameters comprise: inputting a picture size, an initial learning rate, a termination learning rate, an epoch and a batch;
3.2, converting the picture and the data label into 3-channel matrix data in an RGB format, sequentially carrying out forward reasoning on the data through a model, and calculating through a loss function to obtain loss;
3.3, updating, adjusting and improving the internal parameters of the Yolov3 model through inverse gradient propagation;
3.4, repeating the steps 3.2-3.3 in sequence until the parameters are not updated any more, obtaining and storing a trained improved Yolov3 model;
and testing the recall rate of the trained improved Yolov3 model and the FPS of the trained improved Yolov3 model in the process of predicting the video, and evaluating the trained improved Yolov3 model.
The second purpose of the invention can be achieved by adopting the following technical scheme:
a computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the above improved traffic sign identification method based on yolov 3.
The third purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium storing a program which, when executed by a processor, implements the above improved yolov 3-based traffic sign recognition method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention provides an improved yolov 3-based traffic sign identification method, equipment and medium, which adopt Mobilnetv3 as a backbone of a network to optimize a loss function, adopt multi-scale detection fusion to effectively overcome the defects of unbalanced data distribution and large proportion difference between positive and negative samples, identify objects with different sizes and then fuse identification results, improve the identification rate of a model and improve the detection precision of small targets.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a traffic sign detection method according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating visualization of a network structure of mobilenetv3 in the embodiment of the present invention;
fig. 3 is a data tag in yolo format in an embodiment of the invention.
Detailed Description
The technical solutions of the present invention will be described in further detail with reference to the accompanying drawings and examples, and it is obvious that the described examples are some, but not all, examples of the present invention, and the embodiments of the present invention are not limited thereto. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
as shown in FIG. 1, the improved yolov 3-based traffic sign identification method comprises the following steps:
s1, an image data set for detecting the traffic sign is obtained, and the image data set is divided into a training set and an image verification set.
The situation of the open source data set of the traffic sign needs to be investigated, and the open source data set with the number of pictures more than 10000 at present comprises a CCTSDB open source data set and a TT100k open source data set. The traffic sign board with the TT100k source data set has the advantages of excessive data types, more sparse data and obvious data long tail effect. CCTSDB opens that the data set is more evenly distributed, although it is less in kind (3 kinds).
The CCTSDB initiative data set is the initiative data set of the traffic signs collected by the university of long-sand science, and the TT100K initiative data set is the initiative data set of the traffic signs collected and labeled by the university of qinghua.
The open source data set CCTSDB comprises two files, namely train and test. Where train contains around 16000 training pictures and test contains 800 test pictures. The CCTSDB open source data set comprises 3 types of traffic signs: indication mark, prohibition mark, warning mark. The advantage of setting the classification in this way is that the data distribution is more balanced and the long tail effect is weak.
In this embodiment, the CCTSDB traffic sign data set is used as the data set of the present invention, and the data enhancement processing is performed on the CCTSDB traffic sign data set, where the data enhancement processing includes Mosaic processing and Mixup processing, and the data enhancement is used to enhance the robustness of the model, fully utilize the feature information of the original data, and improve the recognition rate of the model.
Mosaic processing: 4 pictures are utilized at one time, the 4 pictures are randomly spliced, each picture has a frame corresponding to the picture, the pictures are combined to form a new picture, and the pictures are distributed in the upper, lower, left and right directions during splicing and do not affect each other.
Mixup treatment: and performing mixed enhancement on the images, averaging the images, recalculating the label values of the images, mixing the images among different classes, and expanding a training data set. For example, the pictures of partial warning signs and indicating signs are overlapped by taking red warning signs as obvious features and indicating signs as background features through feature fusion. Therefore, the data set is amplified, the complexity of the data set is enhanced, continuous data samples can be provided for different classes, and the robustness of the model is improved.
S2, improving a Yolov3 detection network structure, changing a backbone network structure of a Yolov3 model into a Mobilnetv3 model, modifying a loss function loss of the Yolov3 model into a focal _ loss function, adding a multi-scale fusion plate structure, identifying objects with different sizes, and fusing identification results to obtain an improved Yolov3 model.
S21, changing a backbone network structure of a yolov3 model into a Mobilnetv3 model, wherein the Mobilenetv3 is a lightweight model for extracting a characteristic layer in a deep neural network, and has the advantages of less parameter quantity, high accuracy and stable model.
The yolov3 model refers to a classic model in the field of deep learning, computer vision, target detection, and the backhaul network structure refers to a network structure used for extracting image features in the deep learning model structure.
The Mobilnetv3 overall structure comprises a depth separable convolution, an SE module and a bottleeck structure, wherein the depth separable convolution converts a common 3x3 convolution into a 3x3+1x1 convolution, and the calculation amount is reduced; the SE module is a search network, so that the model can automatically inhibit some unnecessary features and promote some apparent features; the bottleeck structure can reduce the dimension of model input, improve the recognition rate and reduce the calculation amount. A schematic diagram of visualization of a network structure of mobilenetv3 is shown in fig. 2, and first, channels of a feature map are amplified to increase the number of features. Feature extraction is then performed by depth separable convolution. Finally, the dimension of the feature graph is reduced, and the number of channels is reduced.
S22, modifying the loss function loss of the yolov3 model into a focal _ loss function. loss refers to a loss function (including coordinate loss and class loss) in object detection; the focal _ loss is based on the binary cross entropy CE, which is a cross entropy loss of dynamic scaling, and the weights of the samples that are easy to distinguish in the training process can be dynamically reduced by a dynamic scaling factor, so that the center of gravity can be quickly focused on the samples that are difficult to distinguish. The focal _ locations can solve the problem of unbalance of positive and negative samples in the one-stage model, and effectively relieve the condition that the data distribution is not uniform and the negative samples are too many.
And S23, adding a multi-scale fusion plate structure, wherein the multi-scale fusion integral structure is used for fusing the features of different scales extracted by the Mobilnetv3 model to obtain a feature fusion graph containing information of each scale, and the feature fusion graph is favorable for detecting and modifying targets of different sizes.
S3, training the improved Yolov3 model by utilizing the image training set and the image verification set;
before training the improved Yolov3 model, firstly writing a script, and converting an original label of a data set into a format label required by Yolov 3. As shown in fig. 3, the data label of Yolo format, the data set format of Yolo algorithm is:
the first bit represents the id number of the corresponding tag, the second bit represents the x coordinate proportion of the center coordinate of the target object in the image, the third bit represents the y coordinate proportion of the center coordinate of the target object in the image, the fourth bit represents the proportion occupied by w of the target object frame in the image, and the fifth bit represents the proportion occupied by h of the target object frame in the image. Note in particular that the interval between must be one space.
Secondly, a data directory is arranged, and the data directory comprises:
myData
.. images # storage
…train
…test
…val
.. labels # stores the label document corresponding to the image
…train
…test
…val
myData is the data set save total path for model reads. The images path is used for storing the marked original pictures, 3 folders are arranged below the images path, the train folder is used for storing the image training set, the test folder is used for storing the image verification set, namely the data of the model evaluating indexes in the training process, and the val folder is used for evaluating the model indexes after the model training is finished. The labels path is used for storing the image annotation files corresponding to the images path, and the train, the test and the val are in one-to-one correspondence. The code will read the data and the tag in a fixed format and must therefore be arranged in this format.
Training the improved Yolov3 model by utilizing an image training set and an image verification set, and specifically comprises the following steps:
3.1, initializing internal parameters of the improved Yolov3 model, wherein the internal parameters comprise: the picture size, initial learning rate, termination learning rate, epoch, and batch are input.
The internal parameters include: inputting picture size, initial learning rate (learning rate when the model starts to train), termination learning rate (learning rate of the model gradually decreases until a threshold), epoch (all pictures represent one epoch in one traversal), and batch (size is set according to GPU video memory).
The effect of the initialization of the internal parameters is to better start the model training, so that the model training has a good starting point.
And 3.2, converting the picture and the data label into 3-channel matrix data in an RGB format, sequentially carrying out forward reasoning on the data through the model, and calculating through a loss function to obtain loss.
Matrix data; in general, the calculation for the convolutional neural network is to convert a picture into n-dimensional matrix data to operate.
Forward reasoning: similarly, f (x) = ax + b, where f (x) is a model, x is our input matrix data, and a and b are parameters we need to improve training. The forward reasoning means that matrix data is sent into a model to carry out operation once and a certain result is output.
loss function calculates loss: assume that in f (x) = ax + b, when x =2, 100 is its real data. f (x = 2) =2a + b. loss (100,f (x = 2)) = k. Here, loss is our loss function. K is the penalty of operating to improve the result of our prediction f (x = 2) and the actual result 100.
And 3.3, updating and adjusting the internal parameters of the improved Yolov3 model through inverse gradient propagation.
The assumption a, b in 3.2 above is the internal parameters of the f (x) model, and the backward gradient propagation means that the calculated loss and gradient are provided to continuously optimize and adjust the values of the a, b parameters until a proper a, b lets the model meet an x, and a value infinitely close to the correct value is output.
And 3.4, repeating the steps 3.2-3.3 in sequence until the parameters are not updated any more, and obtaining and storing the trained improved Yolov3 model.
This step requires continuous input of picture matrix data and updating of parameters. And finally, storing the training result and the model. The purpose of storing the model is to facilitate the use of the model for the required predictions and evaluations, the stored model being stored by a Tensorflow proprietary module.
And 3.5, testing the recall rate of the trained improved Yolov3 model and the FPS of the trained improved Yolov3 model in video prediction, and evaluating the trained improved Yolov3 model.
Before testing, 2 scenes of day, night and day need to be selected for comprehensive evaluation. The reason for selecting multiple scenes is that the robustness of the model needs to be checked in the model test, and the indexes of the multiple scenes are more accurate evaluation indexes.
In this embodiment, 800 pictures are selected to evaluate the model, wherein 500 pictures in the day and 300 pictures at night have a resolution of 1280x 720. And testing the recall rate, wherein the recall rate mainly influences the number of false reports of the model, and the higher the recall rate is, the fewer false reports are. And testing the FPS, wherein the FPS influences the model reasoning speed.
The improved Yolov3 model recall rate after training is shown in table 1. Table 1 compares recall indexes of the original model and the improved model, and evaluates from the distance dimension and the scene dimension, so that the model evaluation indexes have stringency.
TABLE 1
recal 0-30m day 30-50m day 50m + day
Yolov3_tiny_row 0.933 0.754 0.43
Yolov3_Mobilnetv3 0.952 0.786 0.55
recal 0-30m night At night of 30-50m 50m + night
Yolov3_tiny_row 0.862 0.642 0.32
Yolov3_Mobilnetv3 0.883 0.703 0.41
FPS for improved Yolov3 model recall after training, as shown in Table 2. Table 2 compares the operating speeds of the two models, extracts the average value of 100 consecutive frames, and has accurate and reliable result.
TABLE 2
fps Taking the average value of continuous 100 frames
Yolov3_tiny_row 48ms
Yolov3_Mobilnetv3 40ms
The evaluation results show that the overall real of the improved yolov3 model is obviously slightly higher than that of the original model. The reason is that the feature extraction of Mobilnetv3 in the model is superior to the original backbone. The improved yolov3 model is significantly better than the original model for the detection of small targets at long distances (50 m +) because focal _ loss is used. The FPS velocity of the improved yolov3 model is superior to the original model because the amount of parameters of the improved model is significantly less than that of the original model.
And S4, inputting the image data of the traffic sign into the trained improved Yolov3 model, and outputting a traffic sign recognition result.
And inputting the image data of the traffic sign into the trained improved Yolov3 model, outputting a series of matrix information by the improved Yolov3 model, and mapping out the sign category (including warning, indicating and forbidding) corresponding to the target frame on each image through matrix mapping. The improved Yolov3 model has the characteristics of high speed, high small target recognition rate and high scene coverage rate.
Example 2:
the present embodiment provides a computer device, which may be a server, a computer, or the like, and includes a processor, a memory, an input device, a display, and a network interface connected by a system bus, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium and an internal memory, the nonvolatile storage medium stores an operating system, a computer program, and a database, the internal memory provides an environment for the operating system and the computer program in the nonvolatile storage medium to run, and when the processor executes the computer program stored in the memory, the traffic sign identification method based on improved Yolov3 of embodiment 1 is implemented as follows:
s1, acquiring an image data set for detecting a traffic sign, performing data enhancement processing on the image data set, and dividing the image data set into a training set and an image verification set;
s2, improving a Yolov3 detection network structure, changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, modifying a loss function loss of the Yolov3 model into a focal _ loss function, and adding a multi-scale fusion plate structure to obtain an improved Yolov3 model;
s3, training the improved Yolov3 model by using the image training set and the image verification set, and evaluating the trained improved Yolov3 model;
and S4, inputting the image data of the traffic sign into the trained improved Yolov3 model, and outputting a traffic sign recognition result.
The step 2 comprises the following steps:
changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, wherein the mobilnetv3 overall structure comprises a depth separable convolution, an SE module and a bottleeck structure;
modifying the loss function loss of the Yolov3 model into a focal _ loss function;
and adding a multi-scale fusion plate structure, wherein the multi-scale fusion integral structure is used for fusing the features of different scales extracted by the Mobilnetv3 model to obtain a feature fusion diagram containing information of each scale.
The step 3 comprises the following steps:
3.1, initializing internal parameters of the improved Yolov3 model, wherein the internal parameters comprise: inputting a picture size, an initial learning rate, a termination learning rate, an epoch and a batch;
3.2, converting the picture and the data label into 3-channel matrix data in an RGB format, sequentially carrying out forward reasoning on the data through a model, and calculating through a loss function to obtain loss;
3.3, updating, adjusting and improving the internal parameters of the Yolov3 model through inverse gradient propagation;
3.4, repeating the steps 3.2-3.3 in sequence until the parameters are not updated any more, and obtaining and storing the trained improved Yolov3 model;
and testing the recall rate of the trained improved Yolov3 model and the FPS of the trained improved Yolov3 model in the process of predicting the video, and evaluating the trained improved Yolov3 model.
Example 3:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the program is executed by a processor, and the processor executes the computer program stored in the memory, the method for recognizing a traffic sign based on improved Yolov3 of the foregoing embodiment 1 is implemented as follows:
s1, acquiring an image data set for detecting a traffic sign, performing data enhancement processing on the image data set, and dividing the image data set into a training set and an image verification set;
s2, improving a Yolov3 detection network structure, changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, modifying a loss function loss of the Yolov3 model into a focal _ loss function, and adding a multi-scale fusion plate structure to obtain an improved Yolov3 model;
s3, training the improved Yolov3 model by using the image training set and the image verification set, and evaluating the trained improved Yolov3 model;
and S4, inputting the image data of the traffic sign into the trained improved Yolov3 model, and outputting a traffic sign recognition result.
The step 2 comprises the following steps:
changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, wherein the mobilnetv3 overall structure comprises a depth separable convolution, an SE module and a bottleeck structure;
modifying the loss function loss of the Yolov3 model into a focal _ loss function;
and adding a multi-scale fusion plate structure, wherein the multi-scale fusion integral structure is used for fusing the features of different scales extracted by the Mobilnetv3 model to obtain a feature fusion diagram containing information of each scale.
The step 3 comprises the following steps:
3.1, initializing internal parameters of the improved Yolov3 model, wherein the internal parameters comprise: inputting a picture size, an initial learning rate, a termination learning rate, an epoch and a batch;
3.2, converting the picture and the data label into 3-channel matrix data in an RGB format, sequentially carrying out forward reasoning on the data through a model, and calculating through a loss function to obtain loss;
3.3, updating, adjusting and improving the internal parameters of the Yolov3 model through inverse gradient propagation;
3.4, repeating the steps 3.2-3.3 in sequence until the parameters are not updated any more, obtaining and storing a trained improved Yolov3 model;
and testing the recall rate of the trained improved Yolov3 model and the FPS of the trained improved Yolov3 model in the process of predicting the video, and evaluating the trained improved Yolov3 model.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (7)

1. The traffic sign identification method based on the improved Yolov3 is characterized by comprising the following steps:
s1, acquiring an image data set for detecting a traffic sign, performing data enhancement processing on the image data set, and dividing the image data set into a training set and an image verification set;
s2, improving a Yolov3 detection network structure, changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, modifying a loss function loss of the Yolov3 model into a focal _ loss function, and adding a multi-scale fusion plate structure to obtain an improved Yolov3 model;
s3, training the improved Yolov3 model by using the image training set and the image verification set, and evaluating the trained improved Yolov3 model;
and S4, inputting the image data of the traffic sign into the trained improved Yolov3 model, and outputting a traffic sign recognition result.
2. The improved Yolov 3-based traffic sign recognition method of claim 1, wherein the traffic sign detected image data set is a CCTSDB open source data set, and the CCTSDB open source data set comprises an indicator flag, a prohibition flag, and a warning flag.
3. The improved Yolov 3-based traffic sign recognition method according to claim 1, wherein the data enhancement processing on the image data set comprises:
performing Mosaic processing on an image data set, randomly splicing 4 pictures by utilizing 4 pictures at one time, wherein each picture has a frame corresponding to the picture, and becomes a new picture after combination, and the pictures are distributed in the upper, lower, left and right directions during splicing and do not influence each other;
performing Mosaic processing on the image data set, performing mixed class enhancement on the images, averaging the images, recalculating label values of the images, mixing the images of different classes, and expanding a training data set.
4. The improved Yolov 3-based traffic sign recognition method according to claim 1, wherein the step 2 comprises:
changing a backbone network structure of a Yolov3 model into a mobilnetv3 model, wherein the mobilnetv3 overall structure comprises a depth separable convolution, an SE module and a bottleeck structure;
modifying the loss function loss of the Yolov3 model into a focal _ loss function;
and adding a multi-scale fusion plate structure, wherein the multi-scale fusion integral structure is used for fusing the features of different scales extracted by the Mobilnetv3 model to obtain a feature fusion graph containing information of each scale.
5. The improved Yolov 3-based traffic sign recognition method according to claim 4, wherein the step 3 comprises:
s31, initializing internal parameters of the improved Yolov3 model, wherein the internal parameters comprise: inputting a picture size, an initial learning rate, a termination learning rate, an epoch and a batch;
s32, converting the picture and the data label into 3-channel matrix data in an RGB format, sequentially carrying out forward reasoning on the data through a model, and calculating through a loss function to obtain loss;
s33, updating, adjusting and improving internal parameters of the Yolov3 model through inverse gradient propagation;
s34, repeating the steps S32-S33 in sequence until the internal parameters of the improved Yolov3 model are not updated any more, and obtaining and storing the trained improved Yolov3 model;
s35, testing the recall rate of the trained improved Yolov3 model and the FPS of the trained improved Yolov3 model when a video is predicted, and evaluating the trained improved Yolov3 model.
6. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor implements the method for improved Yolov 3-based traffic sign recognition according to any one of claims 1 to 5 when executing the program stored in the memory.
7. A storage medium storing a program, wherein the program, when executed by a processor, implements the improved Yolov 3-based traffic sign recognition method according to any one of claims 1 to 5.
CN202211055743.XA 2022-08-31 2022-08-31 Improved yolov3 traffic sign identification method, equipment and medium Pending CN115424242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211055743.XA CN115424242A (en) 2022-08-31 2022-08-31 Improved yolov3 traffic sign identification method, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211055743.XA CN115424242A (en) 2022-08-31 2022-08-31 Improved yolov3 traffic sign identification method, equipment and medium

Publications (1)

Publication Number Publication Date
CN115424242A true CN115424242A (en) 2022-12-02

Family

ID=84199695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211055743.XA Pending CN115424242A (en) 2022-08-31 2022-08-31 Improved yolov3 traffic sign identification method, equipment and medium

Country Status (1)

Country Link
CN (1) CN115424242A (en)

Similar Documents

Publication Publication Date Title
CN109978893B (en) Training method, device, equipment and storage medium of image semantic segmentation network
CN109086811B (en) Multi-label image classification method and device and electronic equipment
CN113033604B (en) Vehicle detection method, system and storage medium based on SF-YOLOv4 network model
CN111126514A (en) Image multi-label classification method, device, equipment and medium
CN111274926B (en) Image data screening method, device, computer equipment and storage medium
CN112949578B (en) Vehicle lamp state identification method, device, equipment and storage medium
CN110599453A (en) Panel defect detection method and device based on image fusion and equipment terminal
CN115830399B (en) Classification model training method, device, equipment, storage medium and program product
CN112613434A (en) Road target detection method, device and storage medium
CN116964588A (en) Target detection method, target detection model training method and device
CN115131634A (en) Image recognition method, device, equipment, storage medium and computer program product
CN117576073A (en) Road defect detection method, device and medium based on improved YOLOv8 model
CN117079276B (en) Semantic segmentation method, system, equipment and medium based on knowledge distillation
CN116413740B (en) Laser radar point cloud ground detection method and device
CN112418020A (en) Attention mechanism-based YOLOv3 illegal billboard intelligent detection method
CN111832463A (en) Deep learning-based traffic sign detection method
CN116071557A (en) Long tail target detection method, computer readable storage medium and driving device
CN115424242A (en) Improved yolov3 traffic sign identification method, equipment and medium
CN115588191A (en) Cell sorting method and system based on image acoustic flow control cell sorting model
CN111126271B (en) Bayonet snap image vehicle detection method, computer storage medium and electronic equipment
CN113963238A (en) Construction method of multitask perception recognition model and multitask perception recognition method
CN112560853A (en) Image processing method, device and storage medium
Jiang Street parking sign detection, recognition and trust system
CN116612466B (en) Content identification method, device, equipment and medium based on artificial intelligence
CN115359346B (en) Small micro-space identification method and device based on street view picture and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination