CN111985476A

CN111985476A - Traffic sign target detection method based on SSD algorithm

Info

Publication number: CN111985476A
Application number: CN202010877065.XA
Authority: CN
Inventors: 陈炳才; 刘顺民; 马致明
Original assignee: Xinjiang Normal University
Current assignee: Xinjiang Normal University
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-11-24

Abstract

Aiming at the problems that the existing algorithm for detecting the target has low identification accuracy rate on the traffic sign, weak generalization capability and difficult detection on small targets and cannot be really applied to practice, the invention uses the characteristic pyramid to replace a multi-scale characteristic layer to detect the target on the basis of the SSD algorithm, and provides the method for detecting the traffic sign target of the SSD. First, data of a picture is preprocessed. And secondly, dividing the processed data into a training set and a testing set. The algorithm is then refined. The data is then used for model training. And finally testing the trained model. Compared with the original SSD algorithm, the average precision of the result is improved by 5.4 percent, the precision of detecting the traffic sign is ensured, and the detection performance of the algorithm is improved.

Description

Traffic sign target detection method based on SSD algorithm

Technical Field

The invention belongs to the technical field of target detection and unmanned crossing, and particularly relates to a traffic sign target detection method based on an SSD algorithm.

Technical Field

With the rapid development of unmanned technology, unmanned driving slowly enters the commercial stage. Traffic signs play an important role in the safe driving of vehicles, and therefore the requirements for the detection performance of traffic signs are also increasing. The traditional method mainly uses a histogram of gradient directions (HOG), a color histogram and edge features to construct different feature spaces, and performs feature extraction on the color features and the shape features of the traffic signs in the pictures. However, the methods have low detection accuracy and poor robustness, are weak in generalization capability, are difficult to detect small targets, and are difficult to meet the actual production and application requirements of people.

The construction of large-scale training data sets, the continuous enhancement of hardware computing power and the continuous development of deep learning, and the deep network structure has achieved great success in different visual tasks. The detection algorithm is rapidly developed, and a series of target detection algorithms, such as R-CNN, FastR-CNN, FasterR-CNN, YOLO series algorithms, SSD and other target detection algorithms of a rapid and high-precision deep convolution network are generated. Different from the traditional algorithm, the method has certain invariance to the geometric change, deformation and illumination depth convolution network, overcomes the problems in the traditional method, and has certain generalization capability. These mainstream algorithms are further classified into two major classes, one is a two-phase detection algorithm and the other is a single-phase detection algorithm. The two-stage detection algorithm comprises the steps of firstly generating a candidate region possibly containing an object, secondly classifying and calibrating the candidate region further, and finally obtaining detection results such as R-CNN, Fast R-CNN and Fast R-CNN. While the single-stage detection algorithm directly classifies and calibrates the object and gives the final result, the step of generating candidate regions, such as the YOLO series algorithm and the SSD algorithm, is not shown. Compared with a two-stage target detection algorithm, the single-stage target detection algorithm has high speed, false positives caused by background errors are avoided, and the generalized characteristics of the object can be learned quickly.

Disclosure of Invention

The invention aims to provide a traffic sign target detection method based on an SSD algorithm aiming at the problems mentioned in the background technology, so that traffic sign targets far away from vehicles can be well detected when the traffic sign targets are small, and the detection capability is improved.

A traffic sign target detection method based on an SSD algorithm comprises the following specific steps.

Firstly, processing a traffic sign data set before training.

And secondly, dividing the processed data set into a training set and a testing set respectively.

Thirdly, improving the SSD algorithm: the SSD detection method comprises the steps of detecting feature maps of multiple scales, detecting by using a feature pyramid instead of a multi-scale feature layer on the basis, classifying and regressing to obtain a target position, and finally obtaining a result through non-maximum value suppression.

And fourthly, training the model by using an improved algorithm.

And fifthly, detecting the traffic sign by using the trained model.

Further, the data labels in the txt file in the original data set are converted into the format of PASCAL VOC.

Further, 2500 pictures in the data set are divided into a training set and a testing set according to the proportion of 7:3, wherein the training set is 1750 pictures, and the testing set is 750 pictures.

Further, a feature map of 1x1 generated by Conv11_2 in the SSD algorithm is subjected to upsampling processing by using a double-line interpolation method to generate a feature map of 3x3, the feature map is subjected to convolution, the size of a convolution kernel is 3x3, both coding and strip are 1, and the 3x3 feature map generated after convolution is subjected to feature fusion with Conv10_2 to serve as a 3x3 feature map in a prediction layer. And then, the 3x3 feature map in the previous step is subjected to up-sampling processing by using a double-line interpolation method again, a 5x5 feature map is generated, the same parameters in the previous step are adopted for convolution, and feature fusion is carried out on the feature map which is output to be 5x5 and conv9_2 to obtain a 5x5 feature map in the prediction layer. The 10x10, 19x19 and 38x38 feature maps in the SSD prediction layer are generated sequentially by upsampling and feature fusion using the same method as described above. Thus, 6 feature maps are successfully obtained, and the feature maps are used for constructing a feature pyramid to detect the target.

Further, when performing model training, the true value should be matched with the predicted value information. The positive sample is set when the intersection ratio of the prediction frame and the real frame is greater than 0.5, because the positive and negative samples are prevented from being extremely unbalanced. But the true value is too small, the problem cannot be solved well. The SSD algorithm adopts a hard-case mining method with respect to the above problem, and keeps the positive and negative samples 3: the training is performed at a ratio of 1. The trained model can be better by the method.

Furthermore, information in each layer can be fully utilized, the bottom layer features have a good effect on positioning information of the targets, the high layer features have a good classification result on the targets, the positioning information and the classification information of the targets can be combined by performing feature fusion on the bottom layer features and the high layer features, prediction is performed on different feature layers, and the identification accuracy is greatly improved.

The invention has stronger detection capability aiming at small targets and better overall detection effect, and is more suitable for detecting traffic signs.

Drawings

FIG. 1 is a flow chart of the steps of the present invention.

Fig. 2 is a diagram of the algorithm structure of the present invention.

Fig. 3 is a comparison graph after the traffic sign is detected by the invention and the original algorithm.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the present invention is a flow chart of steps of a traffic sign target detection method based on SSD algorithm, which specifically includes the following steps:

step 1, processing a traffic sign data set before training;

step 2, dividing the processed data set into a training set and a testing set respectively;

step 3, improving the SSD algorithm: the detection method of the SSD comprises the steps of detecting feature maps of multiple scales, detecting by using a feature pyramid to replace a multi-scale feature layer on the basis, classifying and regressing to obtain a target position, and finally obtaining a result through non-maximum value inhibition;

step 4, training a model by using an improved algorithm;

and 5, detecting the traffic sign by using the trained model.

The data set in step (1) is selected from a traffic panel data set TPD selected from a chinese traffic sign data set, wherein the label information of the data is stored in a txt file, and the data label needs to be converted into a format of PASCAL VOC. When the data set is divided into a training set and a test set, 1750 data are respectively used as the training set 750 as the test set according to the proportion of 7: 3.

As shown in fig. 2, is a structural diagram of the algorithm of the present invention, and is also an improvement of the algorithm in step 3. The method specifically comprises the following steps.

Step 3.1, first the picture is input into the network.

Step 3.2, the 1x1 feature map generated by Conv11_2 is used as the 1x1 feature map in the prediction layer.

And 3.3, performing upsampling processing on the 1x1 feature map generated by Conv11_2 by using a double-line interpolation method to generate a 3x3 feature map, performing convolution on the feature map, wherein the size of a convolution kernel is 3x3, the coding and strip are both 1, and performing feature fusion on the generated 3x3 feature map and Conv10_2 after convolution to obtain a 3x3 feature map in the prediction layer.

And 3.4, performing up-sampling processing on the 3x3 feature map in the previous step by using a two-line interpolation method again to generate a 5x5 feature map, performing convolution by adopting the same parameters in the previous step, and performing feature fusion on the feature map output by 5x5 and conv9_2 to obtain the 5x5 feature map in the prediction layer.

And 3.5, performing up-sampling processing on the 5x5 feature map in the previous step by using a two-line interpolation method again to generate a 10x10 feature map, performing convolution by adopting the same parameters in the previous step, and performing feature fusion on the feature map which is output by 10x10 and conv8_2 to obtain a 10x10 feature map in the prediction layer.

And 3.6, performing up-sampling processing on the 10x10 feature map in the previous step by using a two-line interpolation method again to generate a 19x19 feature map, performing convolution by adopting the same parameters in the previous step, and performing feature fusion on the feature map output by 19x19 and conv7_2 to obtain the 19x19 feature map in the prediction layer.

And 3.7, performing up-sampling processing on the 19x19 feature map in the previous step by using a two-line interpolation method again to generate a 38x38 feature map, performing convolution by adopting the same parameters in the previous step, and performing feature fusion on the feature map output by 38x38 and conv7_2 to obtain a 38x38 feature map in the prediction layer. And sequentially generating 10x10, 19x19 and 38x38 feature maps in the SSD prediction layer. Thus, 6 feature maps are successfully obtained, and the feature maps are used for constructing a feature pyramid to detect the target.

The VGG16 convolutional neural network obtained from the ImageNet classification pre-training is selected to initialize the weights of the feature extraction network convolutional layers when the model is trained in step 4. When the model training is performed, the true value should be matched with the predicted value information. The positive sample is set when the intersection ratio of the prediction frame and the real frame is greater than 0.5, because the positive and negative samples are prevented from being extremely unbalanced. But the true value is too small, the problem cannot be solved well. The SSD algorithm adopts a hard-case mining method with respect to the above problem, and keeps the positive and negative samples 3: the training is performed at a ratio of 1. Setting various parameters: the weight attenuation _ decay is 0.0005, the learning rate learning _ rate is 0.0001, the momentum is 0.9, and the batch _ size is 4, for 5 ten thousand iterative trainings. The experiment platform is used for improving the SSD algorithm under a window 1064-bit operating system and a Pythrch framework and verifying the TPD data set.

The average accuracy of the final experimental result of the invention is 5.4% higher than that of the original SSD algorithm, as shown in FIG. 3, a is the result of detecting the picture by the original SSD algorithm, and b is the result of detecting the picture by the invention. The traffic sign board detection method has the advantages that the traffic sign board detection method is higher in detection capability and higher in detection accuracy for the traffic sign boards which are far away from the vehicle and are smaller. The original SSD algorithm is not easily detectable for smaller targets and is less accurate than the present invention.

Claims

1. A traffic sign target detection method based on an SSD algorithm is characterized by comprising the following steps:

firstly, processing a traffic sign data set before training;

secondly, dividing the processed data set into a training set and a testing set respectively;

thirdly, improving the SSD algorithm: the detection method of the SSD comprises the steps of detecting feature maps of multiple scales, detecting by using a feature pyramid to replace a multi-scale feature layer on the basis, classifying and regressing to obtain a target position, and finally obtaining a result through non-maximum value inhibition;

fourthly, training a model by using an improved algorithm;

and fifthly, detecting the traffic sign by using the trained model.

2. The method of claim 1, wherein the data labels in the txt file collected from the original data set are converted into PASCAL VOC format.

3. The method for detecting the traffic sign target based on the SSD algorithm is characterized in that 2500 pictures in a data set are divided into a training set and a testing set according to the proportion of 7:3, wherein the training set is 1750 pictures, and the testing set is 750 pictures.

4. The method as claimed in claim 1, wherein the method for detecting the traffic sign target based on the SSD algorithm comprises the steps of performing up-sampling processing on a 1x1 feature map generated by Conv11_2 in the SSD algorithm by using a two-line interpolation method to generate a 3x3 feature map, performing convolution on the feature map, wherein the convolution kernel size is 3x3, the coding and the strip are both 1, performing feature fusion on the 3x3 feature map and Conv10_2 generated after convolution to obtain a 3x3 feature map in the prediction layer, performing up-sampling processing on the 3x3 feature map in the previous step by using the interpolation method again to generate a 5x5 feature map, performing convolution by using the same parameters in the previous step, outputting the feature map of the 5x5 and Conv9_2 to obtain a 5x5 feature map in the prediction layer, and sequentially generating an up-sampling and feature fusion by using the same method to obtain a 10x10 feature map, a 19x19 and a 19x19 feature map in the SSD prediction layer, 38x38 feature maps, thus obtaining 6 feature maps successfully, and using the feature maps to construct a feature pyramid to detect the target.

5. The method for detecting the traffic sign target based on the SSD algorithm is characterized in that when model training is carried out, a real value and predicted value information are matched, when the intersection ratio of a prediction frame and a real frame is more than 0.5, the result is a positive sample, the positive sample and the negative sample are prevented from being extremely unbalanced, but the problem cannot be well solved due to the fact that the real value is too few, the SSD algorithm adopts a method which is difficult to excavate, and the positive sample and the negative sample are kept 3: the proportion of 1 is used for training, and the trained model can be better by the method.

6. The traffic sign target detection method based on the SSD algorithm is characterized in that information in each layer can be fully utilized, bottom layer features have good effect on positioning information of targets, high layer features have good classification results of the targets, the positioning information and the classification information of the targets can be combined by performing feature fusion on the bottom layer features and the high layer features, prediction is performed on different feature layers, and identification accuracy is greatly improved.