CN109447034B - Traffic sign detection method in automatic driving based on YOLOv3 network - Google Patents

Traffic sign detection method in automatic driving based on YOLOv3 network Download PDF

Info

Publication number
CN109447034B
CN109447034B CN201811354012.9A CN201811354012A CN109447034B CN 109447034 B CN109447034 B CN 109447034B CN 201811354012 A CN201811354012 A CN 201811354012A CN 109447034 B CN109447034 B CN 109447034B
Authority
CN
China
Prior art keywords
yolov3 network
training
network
set data
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811354012.9A
Other languages
Chinese (zh)
Other versions
CN109447034A (en
Inventor
王超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN201811354012.9A priority Critical patent/CN109447034B/en
Publication of CN109447034A publication Critical patent/CN109447034A/en
Application granted granted Critical
Publication of CN109447034B publication Critical patent/CN109447034B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A traffic sign detection method in automatic driving based on a YOLOv3 network belongs to the field of traffic sign detection. The invention solves the problems that the existing YOLOv3 network target detection algorithm is low in detection precision and the detection speed cannot meet the real-time requirement. The invention provides an improved loss function, thereby reducing the influence of large target errors on the detection effect of small targets and improving the detection accuracy of small-size targets; an improved activation function is provided, a negative value is reserved, and meanwhile, the change and information transmitted to the next layer are reduced, so that the robustness of the algorithm to noise is enhanced; and clustering the real frames in the traffic identification data set through a K-means algorithm, so as to realize the prefetching of the position of the target frame and accelerate the convergence of the network. The detection accuracy mAP of the traffic sign detection model on the test set reaches 92.88%, the detection speed reaches 35FPS, and the real-time requirement is completely met. The invention can be applied to the field of traffic sign detection.

Description

Traffic sign detection method in automatic driving based on YOLOv3 network
Technical Field
The invention belongs to the field of traffic sign detection, and particularly relates to a traffic sign detection method in automatic driving.
Background
Object detection is an important research direction in the field of automatic driving. The main detection targets are divided into two types: a stationary object and a moving object. Stationary objects such as traffic lights, traffic signs, lanes, obstacles, etc.; moving objects such as vehicles, pedestrians, non-motorized vehicles, etc. The traffic sign detection provides abundant and necessary navigation information for the unmanned automobile in the driving process, and is fundamental work with important significance.
The traditional target detection method mainly comprises the following steps: and (4) preprocessing, selecting a candidate region, and extracting target features and feature classification. Commonly used features are SIFT (scale-invariant feature transform), hog (histogram of oriented gradient), Haar. Common classifiers are as follows: SVM (support vector machine), RF (random forest), Adaboost, etc. The method has high design requirements on target features, and if the designed features are not good, the accuracy of the final model is low even if the best classifier is used. Meanwhile, the characteristics have strong pertinence, only a certain kind of targets can be detected, and the generalization capability is poor. And the extracted features are all low-level features (low-level features) of the target and cannot express the real high-level semantic features of the target.
Deep learning has achieved abundant research results in the field of computer vision in recent years, particularly in the field of target detection. The method for extracting the target features by using the convolutional neural network (convolutional neural network) can greatly reduce a plurality of defects of manually extracting the features. R-CNN is a convolutional neural network-based target detection model proposed by Girshick et al in 2014. Firstly, a large number of candidate regions are extracted from the whole picture through a selective search algorithm, then the candidate regions are adjusted to be of a fixed size and input into a convolutional neural network for feature extraction, and finally an SVM classifier is used for classification. The mAP (mean Average precision) of R-CNN reaches 62.4 percent, and the detection time is longer due to higher algorithm complexity. In response to this problem, researchers have proposed a number of improved target candidate region-based algorithms. SPPnet [8] it fixes the feature image to the required size by designing a pyramid pooling layer (pyramid pooling layer) after the last convolutional layer. Fast R-CNN proposes a multi-task loss function (multi-task loss), which adds a loss of target location after the conventional loss function to correct the location information. The Faster R-CNN adds a sliding window (sliding window) to the feature image output from the last convolutional layer, and creates anchor frames of different sizes at the positions crossed by the sliding window and with the center of the window as the center (anchor), and maps the anchor frames to the original picture to become candidate regions. The R-FCN adopts an FCN (full relational networks) network structure, and constructs position-sensitive score maps (position-sensitive score maps) by using a special Convolutional layer. Researchers have also proposed a number of regression (regression) based target detection algorithms such as: yolo (you only look once), ssd (single shot multibox detector), YOLOv2, YOLOv3, and the like. Of these, YOLOv3 is one of the most elegant target detection algorithms at present, which mirrors many of the results of previous researchers. When the input size is 416 × 416, the detection accuracy can reach 55.3%, but the detection time is only 29 ms. Although the existing Yolov3 target detection algorithm achieves certain achievement in the aspect of target detection, the detection precision is still not high, and the detection speed cannot meet the requirement of real-time performance.
Disclosure of Invention
The invention aims to solve the problems that the existing YOLOv3 network target detection algorithm is low in detection precision and the detection speed cannot meet the real-time requirement.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the method for detecting the traffic sign in automatic driving based on the YOLOv3 network comprises the following steps:
firstly, manufacturing training set data and test set data with traffic identification target labels based on a GTSDB data set;
clustering real target frames labeled in the training set data, obtaining initial candidate target frames of the traffic identification type targets predicted in the training set data by adopting an area intersection ratio IOU as a rating index, and taking the initial candidate target frames as initial network parameters of a YOLOv3 network; calling initial network parameters of a YOLOv3 network, inputting training set data into the YOLOv3 network for training until loss function values output by the training set data are less than or equal to a threshold value Q1Or stopping training when the set maximum iteration number N is reached to obtain a trained YOLOv3 network;
inputting the test set data into the well-trained YOLOv3 network, and if the detection precision corresponding to the test set data is more than or equal to the precision threshold Q2Then, the trained YOLOv3 network is used as the final YOLOv3 network;
if the detection precision corresponding to the test set data is smaller than the precision threshold Q2Continuing to train the well-trained YOLOv3 network obtained in the step two until the detection precision corresponding to the test set data is greater than or equal to the precision threshold Q2The YOLOv3 network at this time is used as a final YOLOv3 network;
and inputting the collected images containing the traffic signs in the automatic driving into a final YOLOv3 network to detect the traffic signs.
The invention has the beneficial effects that: the invention provides a traffic sign detection method in automatic driving based on a YOLOv3 network, and the invention provides an improved loss function, and the size of a real target frame is taken into consideration by weighting the loss part of the width and the height of a detected target, so that the influence of a large target error on the detection effect of a small target is reduced, and the detection accuracy of the small target is improved; by providing an improved activation function, when x is 0 or a negative value, Softplus is adopted to translate log2 units downwards, so that the negative value is reserved, the change and information propagated to the next layer are reduced, and the robustness of the algorithm to noise is enhanced; and finally, clustering the real frame through a K-means algorithm, so as to realize the prefetching of the position of the target frame and accelerate the convergence of the network. The result shows that the detection precision of the traffic sign detection model provided by the invention on the test set is greatly improved, the mAP reaches 92.88%, the detection speed reaches 35FPS, the real-time requirement is completely met, and the convergence speed in the training process is improved by about 66.67%.
Drawings
FIG. 1 is a graph of a currently common activation function ReLU;
x represents input and y represents output;
FIG. 2 is a graph of an activation function Leaky-ReLU (Leaky Rectified Linear Unit);
FIG. 3 is a graph of the activation function Softplus-ReLU applied in the present invention;
FIG. 4 is a schematic diagram of the influence of the K-means clustering initial candidate bounding box on the model performance;
wherein: the gray rectangles represent the values of the original method, and the black rectangles represent the values of the K-means clustering method;
FIG. 5 is a graph comparing the loss (loss function) curves of the method of the present invention (K-means clustering) with that of training without the clustering method;
FIG. 6 is a graph comparing the detection effect of the conventional YOLOv3 with the detection effect of the present invention using the modified loss function;
Detailed Description
The first embodiment is as follows: the traffic sign detection method in automatic driving based on the YOLOv3 network in the embodiment specifically comprises the following steps:
firstly, manufacturing training set data and test set data with traffic identification target labels based on a GTSDB data set;
clustering real target frames marked in the training set data, obtaining initial candidate target frames of the predicted traffic identification type targets in the training set data by adopting the area intersection ratio IOU as a rating index, and carrying out classification on the initial candidate target framesThe target frame is used as an initial network parameter of the YOLOv3 network; (this has the benefit of speeding up the convergence speed of the training process); calling initial network parameters of a YOLOv3 network, inputting training set data into the YOLOv3 network for training until loss function values output by the training set data are less than or equal to a threshold value Q1Or stopping training when the set maximum iteration number N is reached to obtain a trained YOLOv3 network;
inputting the test set data into the well-trained YOLOv3 network, and if the detection accuracy (mAP) corresponding to the test set data is more than or equal to the accuracy threshold Q2Then, the trained YOLOv3 network is used as the final YOLOv3 network;
if the detection precision corresponding to the test set data is smaller than the precision threshold Q2Continuing to train the well-trained YOLOv3 network obtained in the step two until the detection precision corresponding to the test set data is greater than or equal to the precision threshold Q2The YOLOv3 network at this time is used as a final YOLOv3 network;
and inputting the collected images containing the traffic signs in the automatic driving into a final YOLOv3 network to detect the traffic signs.
The second embodiment is as follows: the first difference between the present embodiment and the specific embodiment is: the specific process of the step one is as follows:
the GTSDB data set comprises M images in total, and after the traffic identification type targets in the M images are labeled, the labeled M images are randomly divided into a training set and a test set.
The third concrete implementation mode: the second embodiment is different from the first embodiment in that: the data volume ratio of the training set to the test set is 8: 1.
the fourth concrete implementation mode: the third difference between the present embodiment and the specific embodiment is that: clustering real target frames marked in the training set data, and obtaining an initial candidate target frame of a traffic identification type target predicted in the training set data by adopting an area intersection ratio IOU as a rating index, wherein the specific process comprises the following steps:
clustering real target frames of the training set data by adopting a K-means algorithm, and taking the area intersection ratio IOU of the predicted candidate target frames and the real target frames as a rating index, namely taking the predicted candidate target frames as initial candidate target frames when the area intersection ratio IOU is not less than 0.5;
the area intersection ratio IOU (intersection over unit) is expressed as follows:
Figure BDA0001865587870000041
wherein: boxpredRepresenting the area of the predicted candidate target bounding boxtruthRepresenting the area of the real target frame;
the distance Dis (box, centroid) between all real target bounding boxes and the initial candidate target bounding box is expressed as:
Dis(box,centroid)=1-IOU(box,centroid)
wherein: dis (box, centroid) represents the distance between all real target frames and the initial candidate target frame in the training set data, and IOU (box, centroid) represents the average intersection ratio between all real target frames and the initial candidate target frame in the training set data.
The fifth concrete implementation mode: the fourth difference between this embodiment and the specific embodiment is that: the initial network parameters of the YOLOv3 network are called, and the training set data are input into the YOLOv3 network for training until the loss function value of the training set data output is less than or equal to the threshold value Q1Or stopping training when the set maximum iteration number N is reached to obtain a trained YOLOv3 network, wherein the specific process is as follows:
calling initial network parameters of a YOLOv3 network, inputting training set data into a YOLOv3 network for training, continuously training and adjusting weight values and bias values of convolution layers of the YOLOv3 network, and outputting loss function values (object) of the training set data;
Figure BDA0001865587870000051
wherein: the coordinate adopts an error square sum loss function; the confidence coefficient and the category adopt a binary cross entropy loss function;
λcoorda penalty factor for coordinate prediction; lambda [ alpha ]noobjA penalty coefficient for confidence when the traffic does not contain the traffic identification target; k × K indicates the number of meshes into which one input image is divided; m represents the predicted number of target frames of each mesh; x is the number ofi,yi,wiAnd hiRespectively represents the abscissa, ordinate, width and height of the central point of the predicted traffic sign,
Figure BDA0001865587870000052
Figure BDA0001865587870000053
Figure BDA0001865587870000054
and
Figure BDA0001865587870000055
respectively representing the abscissa, the ordinate, the width and the height of the central point of the real traffic sign;
Figure BDA0001865587870000056
the ith grid representing the frame of the jth candidate target is responsible for detecting the object;
Figure BDA0001865587870000057
indicating that the ith grid where the jth candidate target frame is not responsible for detecting the object; ciAnd
Figure BDA0001865587870000058
respectively representing the prediction confidence coefficient and the real confidence coefficient of the traffic identification target in the ith grid; p is a radical ofi(c) And
Figure BDA0001865587870000059
prediction summary respectively representing traffic signs in ith grid belonging to a certain classA value and a true probability value; c represents a certain category, and classes represents the total number of categories;
until the loss function value of the data output of the training set is less than or equal to the threshold value Q1Or stopping training when the set maximum iteration number N is reached, and taking the network obtained when the training is stopped as a trained YOLOv3 network.
The sixth specific implementation mode: the fifth embodiment is different from the fifth embodiment in that: in the second step, when training set data is input into the YOLOv3 network for training, the learning rate is set to be 0.0001, and the batch _ size is set to be 256.
In the actual training process, the values of the learning rate and the batch _ size may be appropriately adjusted in order to improve the training accuracy.
The seventh embodiment: as shown in fig. 3, the sixth embodiment is different from the first embodiment in that: the activation function adopted by the convolution layer of the YOLOv3 network is defined as:
Figure BDA0001865587870000061
wherein: x represents information of a previous layer in a YOLOv3 network as input, and y represents nonlinear output; when x takes a positive value, the convolution layer of the YOLOv3 network adopts the same form as the activation function ReLU, when x is 0 or a negative value, Softplus is adopted to shift log2 units downwards, and when the parameter x is continuously reduced, the activation function gradually converges to-log 2.
This means that Softplus-ReLU has a smaller derivative value, which reduces the propagation of changes and information to the next layer. Therefore, Softplus-ReLU has a strong robustness to noise information and its complexity is relatively low.
The activation function of the embodiment is different from a traditional target detection algorithm, and the convergence rate of the currently used activation function relu (rectified Linear unit) is faster than that of the traditional activation function such as sigmoid, tanh and the like. The formula is defined as follows:
Figure BDA0001865587870000062
as shown in fig. 1, ReLU, but because the values falling in the negative field are all 0, as training progresses, it may happen that the neuron weights cannot be updated, and the gradient flowing through the neuron is always 0 from this point on, i.e., the ReLU neuron dies irreversibly during training.
The activation functions of the three versions of YOLO are all leakage-ReLU (leakage Rectified Linear unit), and as shown in fig. 2, the activation functions are the same as those of ReLU when x takes a positive value, but the output of leakage-ReLU does not take 0 when x takes a negative value or 0, but a Linear function with a small slope is adopted, so that the output when x takes a negative value is reserved. Although Leaky-ReLU has a negative value, noise robustness in the deactivated state (deactivated state) cannot be ensured. The formula is defined as follows:
Figure BDA0001865587870000063
in view of the above problems, the present embodiment proposes an improved activation function Softplus-ReLU, which is applied to each convolutional layer of the network. Like ReLU when x takes positive values, Softplus is used to translate log2 units downward when x is 0 or negative. As the parameters are continually scaled down, the function gradually converges to-log 2.
The specific implementation mode is eight: the first difference between the present embodiment and the specific embodiment is: the threshold value Q1The value of (A) is 0.1.
The specific implementation method nine: the first difference between the present embodiment and the specific embodiment is: the precision threshold Q2The value of (A) is 90%.
Examples
In order to verify the effectiveness of the improved method provided by the invention and evaluate the performance of the traffic sign detection model, four groups of comparative experiments are performed. Respectively as follows: (1) whether a K-means algorithm is used for influencing indexes such as accuracy, recall rate, convergence rate and the like of a model obtained by clustering the initial candidate frames. (2) The effect of using different activation functions on the performance of the target detection model. (3) The difference in the detection effect of the model with the improved loss function and the unmodified model is improved. (4) The improved model of the invention is compared with other mainstream models in detection performance. The evaluation index of the final model detection performance mainly selects an Average accuracy mean value mAP (mean Average precision) and a detection frame Per second FPS (Frames Per second). The aim is to improve the detection speed as much as possible on the premise of ensuring the detection precision.
The experimental environment of the invention is configured as follows: the CPU model is Intel Xeon E5-2620v3 processor, 64G memory, the graphics card model is Nvidia GeForce GTX TITAN X, the CUDA version is 8.0.44, the OpenCV version is 2.4.13, and the operating system is Ubuntu 16.04. Using gtsdb (german Traffic Sign Detection benchmark), 900 images of 1360 × 800 pixels are included, with the target image size being between 16 × 16 pixels and 128 × 128 pixels. The road scene pictures comprise road scene pictures with different conditions such as severe change of illumination conditions, similar background color interference, motion blur, local shielding and the like. The network parameter configuration is as follows: momentum is 0.9; decapay is 0.0005; max batches 50000; a learning rate of 0.001; step lengths are respectively 30000 and 40000; scales were 0.1, 0.1. The invention adopts a loading pre-training (pre-training) model dark net53.conv.74 as an initial parameter of the network during training, thus greatly shortening the training time. And meanwhile, the angle, the exposure, the saturation, the hue and the size of the input picture are adjusted to enhance the robustness of the model.
Clustering selection initial candidate frame performance analysis: in order to compare the influence of whether clustering is used to select the initial candidate frame on the detection performance of the model, the initial candidate frame is firstly trained according to the original candidate frame parameters of YOLOv3, then the parameters with the frame number of 9 obtained by clustering are trained, and the performance of the finally obtained model on the test set is shown in fig. 4. When the threshold value of detection is 0.5, the model of the initial candidate frame using the cluster is obviously higher than the model of the original candidate frame in accuracy (precision), recall rate and FPS. The recall rate and the accuracy rate are respectively improved by 2.88 percentage points and 3.41 percentage points, the average IOU is improved by 3.39 percentage points, and the model can detect 2 more pictures per second.
As shown in FIG. 5, the model using K-means clustering candidate frames converges gradually after about 900 training iterations, while the model without using clustering starts to converge at 1500 times, and the convergence rate is improved by nearly 70% by K-means clustering. The reason is that the initial frame candidate parameters after clustering are closer to the characteristic of the width and height of the traffic identification target, and are easier to approach the real target frame continuously during optimization.
Performance analysis of different activation functions: in order to verify the influence of different activation functions on a traffic sign detection model, four different activation functions are respectively selected for experiments, namely ReLU, Softplus, Leaky-ReLU and the improved activation function Softplus-ReLU provided by the invention. Results as shown in table 1, the model maps using ReLU and using Softplus are comparable; when the model uses Leaky-ReLU, mAP is improved by 1.63 percentage points, which benefits from the reservation of output when Leaky-ReLU takes a negative value; however, the mAP of the activation function Softplus-ReLU proposed by the invention is the highest, and is respectively improved by 4.42 percent and 2.79 percent compared with the ReLU and the Leaky-ReLU, because the Softplus-ReLU simultaneously retains the advantages of the two activation functions: the method has the advantages of high convergence rate and high robustness to noise.
TABLE 1 comparison of Performance of different activation function test models
Figure BDA0001865587870000081
Improved loss function performance analysis: in order to verify the effectiveness of the model for improving the loss function on the detection of the small target, the invention trains two traffic identification detection models under the condition of ensuring that other parameters are unchanged, and the detection effect is as shown in fig. 6: the first behavior is a picture to be detected and comprises a plurality of small targets; the second behavior is a detection effect graph of the traditional YOLOv 3; the third row is a graph of the detection effect of the loss function improved by the present invention. It can be seen that the improved model detection effect of the invention is obviously better than that of YOLOv3, YOLOv3 has missing detection for all small targets with the size less than 30 × 30 in three pictures, and the loss function of the model provided by the invention balances the loss of the large target and the small target to make the loss weight of the small target larger, so that the learning is better. And finally, detecting all traffic identifications in the picture. Therefore, the improved method for the loss function proposed by the present invention is effective for the detection of such objects as traffic signs.
And (3) performance comparison: the model provided by the invention is compared with other mainstream models in detection performance, and all models are trained by adopting the same data set. The traffic sign detection model has the highest detection accuracy (mAP reaches 92.88%), which benefits from the enhancement of the classification capability of the model after the activation function is improved and the enhancement of the small target recognition capability of the model after the loss function is improved. Since the network structure Darknet53 of the model of the invention is more complicated than Darknet19 of Yolov2 in terms of more layers, the detection speed is lower than that of Yolov2, but higher than that of other detection models. The speed of the final improved model is 35FPS, which is higher than the standard human eye vision persistence of 24 Frames Per Second (FPS) of real-time detection, and the requirements of real-time detection are completely met.
Moreover, the method of the invention is not only applied to the detection of the traffic identification, but also applied to the identification of small targets in the image. Other embodiments of the present invention are also possible, and various changes and modifications may be made by one skilled in the art without departing from the spirit and its scope, and it is intended that all such changes and modifications fall within the scope of the appended claims.

Claims (8)

1. The method for detecting the traffic sign in automatic driving based on the YOLOv3 network is characterized by comprising the following steps of:
firstly, manufacturing training set data and test set data with traffic identification target labels based on a GTSDB data set;
clustering real target frames marked in the training set data, obtaining initial candidate target frames of the predicted traffic identification type targets in the training set data by adopting the area intersection ratio IOU as a rating index, and carrying out the following steps ofTaking the initial candidate target frame as an initial network parameter of a YOLOv3 network; calling initial network parameters of a YOLOv3 network, inputting training set data into the YOLOv3 network for training until loss function values output by the training set data are less than or equal to a threshold value Q1Or stopping training when the set maximum iteration number N is reached to obtain a trained YOLOv3 network;
the initial network parameters of the YOLOv3 network are called, and the training set data are input into the YOLOv3 network for training until the loss function value of the training set data output is less than or equal to the threshold value Q1Or stopping training when the set maximum iteration number N is reached to obtain a trained YOLOv3 network, wherein the specific process is as follows:
calling initial network parameters of a YOLOv3 network, inputting training set data into a YOLOv3 network for training, continuously training and adjusting weight values and bias values of convolution layers of the YOLOv3 network, and outputting loss function values (object) of the training set data;
Figure FDA0002859320110000011
λcoorda penalty factor for coordinate prediction; lambda [ alpha ]noobjA penalty coefficient for confidence when the traffic does not contain the traffic identification target; k × K indicates the number of meshes into which one input image is divided; m represents the predicted number of target frames of each mesh; x is the number ofi,yi,wiAnd hiRespectively represents the abscissa, ordinate, width and height of the central point of the predicted traffic sign,
Figure FDA0002859320110000012
Figure FDA0002859320110000013
and
Figure FDA0002859320110000014
respectively representing central points of real traffic signsAbscissa, ordinate, width and height;
Figure FDA0002859320110000015
the ith grid representing the frame of the jth candidate target is responsible for detecting the traffic identification target;
Figure FDA0002859320110000021
indicating that the ith grid where the jth candidate target frame is located is not responsible for detecting the traffic identification target; ciAnd
Figure FDA0002859320110000022
respectively representing the prediction confidence coefficient and the real confidence coefficient of the traffic identification target in the ith grid; p is a radical ofi(c) And
Figure FDA0002859320110000023
respectively representing the predicted probability value and the real probability value of the traffic identification in the ith grid belonging to a certain class; c represents a certain category, and classes represents the total number of categories;
until the loss function value of the data output of the training set is less than or equal to the threshold value Q1Or stopping training when the set maximum iteration number N is reached, and taking the network obtained when the training is stopped as a trained YOLOv3 network;
inputting the test set data into the well-trained YOLOv3 network, and if the detection precision corresponding to the test set data is more than or equal to the precision threshold Q2Then, the trained YOLOv3 network is used as the final YOLOv3 network;
if the detection precision corresponding to the test set data is smaller than the precision threshold Q2Continuing to train the well-trained YOLOv3 network obtained in the step two until the detection precision corresponding to the test set data is greater than or equal to the precision threshold Q2The YOLOv3 network at this time is used as a final YOLOv3 network;
and inputting the collected images containing the traffic signs in the automatic driving into a final YOLOv3 network to detect the traffic signs.
2. The method for detecting the traffic sign in automatic driving based on the YOLOv3 network as claimed in claim 1, wherein the specific process of the first step is as follows:
the GTSDB data set comprises M images in total, and after the traffic identification type targets in the M images are labeled, the labeled M images are randomly divided into a training set and a test set.
3. The method of claim 2, wherein the training set and the test set have a data volume ratio of 8: 1.
4. the method as claimed in claim 3, wherein the method for detecting the traffic sign in the automatic driving based on the YOLOv3 network is characterized in that real target frames labeled in the training set data are clustered, and an area intersection ratio IOU is used as a rating index to obtain an initial candidate target frame of the traffic sign class target predicted in the training set data, and the specific process is as follows:
clustering real target frames of the training set data by adopting a K-means algorithm, and taking the area intersection ratio IOU of the predicted candidate target frames and the real target frames as a rating index, namely taking the predicted candidate target frames as initial candidate target frames when the area intersection ratio IOU is not less than 0.5;
the area intersection ratio IOU is expressed as follows:
Figure FDA0002859320110000024
wherein: boxpredRepresenting the area of the predicted candidate target bounding boxtruthRepresenting the area of the real target frame;
the distance Dis (box, centroid) between all real target bounding boxes and the initial candidate target bounding box is expressed as:
Dis(box,centroid)=1-IOU(box,centroid)
wherein: dis (box, centroid) represents the distance between all real target frames and the initial candidate target frame in the training set data, and IOU (box, centroid) represents the average intersection ratio between all real target frames and the initial candidate target frame in the training set data.
5. The method according to claim 4, wherein the learning rate is set to 0.0001 and the batch _ size is set to 256 when the training set data is input into the YOLOv3 network for training in the second step.
6. The method for detecting traffic signs in automatic driving based on the YOLOv3 network of claim 5, wherein the activation function adopted by the convolutional layer of the YOLOv3 network is defined as:
Figure FDA0002859320110000031
wherein: x represents information of a previous layer in a YOLOv3 network as input, and y represents nonlinear output; when x takes a positive value, the convolution layer of the YOLOv3 network adopts the same form as the activation function ReLU, when x is 0 or a negative value, Softplus is adopted to shift log2 units downwards, and when the parameter x is continuously reduced, the activation function gradually converges to-log 2.
7. The method of detecting traffic sign in automatic driving based on YOLOv3 network of claim 1, wherein the threshold Q is set1The value of (A) is 0.1.
8. The method for detecting traffic signs in automatic driving based on YOLOv3 network as claimed in claim 1, wherein the accuracy threshold Q is2The value of (A) is 90%.
CN201811354012.9A 2018-11-14 2018-11-14 Traffic sign detection method in automatic driving based on YOLOv3 network Active CN109447034B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811354012.9A CN109447034B (en) 2018-11-14 2018-11-14 Traffic sign detection method in automatic driving based on YOLOv3 network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811354012.9A CN109447034B (en) 2018-11-14 2018-11-14 Traffic sign detection method in automatic driving based on YOLOv3 network

Publications (2)

Publication Number Publication Date
CN109447034A CN109447034A (en) 2019-03-08
CN109447034B true CN109447034B (en) 2021-04-06

Family

ID=65552920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811354012.9A Active CN109447034B (en) 2018-11-14 2018-11-14 Traffic sign detection method in automatic driving based on YOLOv3 network

Country Status (1)

Country Link
CN (1) CN109447034B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977817B (en) * 2019-03-14 2021-04-27 南京邮电大学 Motor train unit bottom plate bolt fault detection method based on deep learning
CN109948617A (en) * 2019-03-29 2019-06-28 南京邮电大学 A kind of invoice image position method
CN110070087A (en) * 2019-05-05 2019-07-30 广东三维家信息科技有限公司 Image identification method and device
CN110070074B (en) * 2019-05-07 2022-06-14 安徽工业大学 Method for constructing pedestrian detection model
CN110245582A (en) * 2019-05-25 2019-09-17 天津大学 A method of based on classification component single in deep learning for identification bitmap
CN110287747A (en) * 2019-07-01 2019-09-27 深圳江行联加智能科技有限公司 A kind of bar code detection method based on end-to-end depth network
CN110490099B (en) * 2019-07-31 2022-10-21 武汉大学 Subway public place pedestrian flow analysis method based on machine vision
CN110443208A (en) * 2019-08-08 2019-11-12 南京工业大学 A kind of vehicle target detection method, system and equipment based on YOLOv2
CN110598620B (en) * 2019-09-06 2022-05-06 腾讯科技(深圳)有限公司 Deep neural network model-based recommendation method and device
CN110796186A (en) * 2019-10-22 2020-02-14 华中科技大学无锡研究院 Dry and wet garbage identification and classification method based on improved YOLOv3 network
CN110929577A (en) * 2019-10-23 2020-03-27 桂林电子科技大学 Improved target identification method based on YOLOv3 lightweight framework
CN110838112A (en) * 2019-11-08 2020-02-25 上海电机学院 Insulator defect detection method based on Hough transform and YOLOv3 network
CN111104965A (en) * 2019-11-25 2020-05-05 河北科技大学 Vehicle target identification method and device
CN111444821B (en) * 2020-03-24 2022-03-25 西北工业大学 Automatic identification method for urban road signs
CN111553199A (en) * 2020-04-07 2020-08-18 厦门大学 Motor vehicle traffic violation automatic detection technology based on computer vision
CN111695638A (en) * 2020-06-16 2020-09-22 兰州理工大学 Improved YOLOv3 candidate box weighted fusion selection strategy
CN111709381A (en) * 2020-06-19 2020-09-25 桂林电子科技大学 Road environment target detection method based on YOLOv3-SPP
CN111899227A (en) * 2020-07-06 2020-11-06 北京交通大学 Automatic railway fastener defect acquisition and identification method based on unmanned aerial vehicle operation
CN112654999B (en) * 2020-07-21 2022-01-28 华为技术有限公司 Method and device for determining labeling information
CN112085620A (en) * 2020-08-25 2020-12-15 广西电网有限责任公司电力科学研究院 Safety supervision method and system serving power production operation scene
CN112052817B (en) * 2020-09-15 2023-09-05 中国人民解放军海军大连舰艇学院 Improved YOLOv3 model side-scan sonar sunken ship target automatic identification method based on transfer learning
CN112132032A (en) * 2020-09-23 2020-12-25 平安国际智慧城市科技股份有限公司 Traffic sign detection method and device, electronic equipment and storage medium
CN112464705A (en) * 2020-10-13 2021-03-09 泰安市泰山森林病虫害防治检疫站 Method and system for detecting pine wood nematode disease tree based on YOLOv3-CIoU
CN112434583B (en) * 2020-11-14 2023-04-07 武汉中海庭数据技术有限公司 Lane transverse deceleration marking line detection method and system, electronic equipment and storage medium
CN112560933A (en) * 2020-12-10 2021-03-26 中邮信息科技(北京)有限公司 Model training method and device, electronic equipment and medium
CN113780111B (en) * 2021-08-25 2023-11-24 哈尔滨工程大学 Pipeline connector defect accurate identification method based on optimized YOLOv3 algorithm
CN114241717A (en) * 2021-12-17 2022-03-25 广州西麦科技股份有限公司 Electric shock prevention safety early warning method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563392A (en) * 2017-09-07 2018-01-09 西安电子科技大学 The YOLO object detection methods accelerated using OpenCL
CN108537117A (en) * 2018-03-06 2018-09-14 哈尔滨思派科技有限公司 A kind of occupant detection method and system based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169421B (en) * 2017-04-20 2020-04-28 华南理工大学 Automobile driving scene target detection method based on deep convolutional neural network
CN108629288B (en) * 2018-04-09 2020-05-19 华中科技大学 Gesture recognition model training method, gesture recognition method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563392A (en) * 2017-09-07 2018-01-09 西安电子科技大学 The YOLO object detection methods accelerated using OpenCL
CN108537117A (en) * 2018-03-06 2018-09-14 哈尔滨思派科技有限公司 A kind of occupant detection method and system based on deep learning

Also Published As

Publication number Publication date
CN109447034A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN109447034B (en) Traffic sign detection method in automatic driving based on YOLOv3 network
Wang et al. Data-driven based tiny-YOLOv3 method for front vehicle detection inducing SPP-net
CN110619369B (en) Fine-grained image classification method based on feature pyramid and global average pooling
CN107563372B (en) License plate positioning method based on deep learning SSD frame
Tian et al. Traffic sign detection using a multi-scale recurrent attention network
US10410096B2 (en) Context-based priors for object detection in images
CN112766188B (en) Small target pedestrian detection method based on improved YOLO algorithm
CN111553201B (en) Traffic light detection method based on YOLOv3 optimization algorithm
CN110263786B (en) Road multi-target identification system and method based on feature dimension fusion
CN112464911A (en) Improved YOLOv 3-tiny-based traffic sign detection and identification method
CN111723829B (en) Full-convolution target detection method based on attention mask fusion
Yang et al. Real-time pedestrian and vehicle detection for autonomous driving
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
Yao et al. Traffic sign detection algorithm based on improved YOLOv4-Tiny
Yang et al. Instance segmentation and classification method for plant leaf images based on ISC-MRCNN and APS-DCCNN
CN112733942A (en) Variable-scale target detection method based on multi-stage feature adaptive fusion
CN114049572A (en) Detection method for identifying small target
CN113159215A (en) Small target detection and identification method based on fast Rcnn
Wei et al. Pedestrian detection in underground mines via parallel feature transfer network
CN112560717A (en) Deep learning-based lane line detection method
Zhao et al. Real-time moving pedestrian detection using contour features
Zhao et al. Vehicle detection based on improved yolov3 algorithm
Zhang et al. Traffic Sign Detection and Recognition Based onDeep Learning.
Wang et al. CDFF: a fast and highly accurate method for recognizing traffic signs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant