CN107194343B

CN107194343B - Traffic lights detection method based on the relevant convolution in position Yu Fire model

Info

Publication number: CN107194343B
Application number: CN201710342500.7A
Authority: CN
Inventors: 王�琦; 李学龙; 孟照铁
Original assignee: Northwest University of Technology
Current assignee: Northwest University of Technology
Priority date: 2017-05-16
Filing date: 2017-05-16
Publication date: 2019-11-22
Anticipated expiration: 2037-05-16
Also published as: CN107194343A

Abstract

The traffic lights detection method based on the relevant convolution in position Yu Fire model that the present invention provides a kind of.This method uses multiple Fire models of connecting as trunk convolutional network FireNet, the relevant convolutional layer of point of addition behind FireNet the last layer characteristic pattern, and is trained to network, and trained network is utilized to carry out traffic lights detection.Since the parameter of Fire model is few, running space demand is low, it is more conducive to running on embedded device；FireNet has more powerful feature representation ability as convolutional network, can sufficiently excavate the information of traffic lights in different scenes, obtain more accurate feature representation；The relevant convolutional layer of point of addition, can be improved the accuracy of positioning behind FireNet the last layer characteristic pattern, and lesser and plurality of classes traffic lights are effectively detected.

Description

Traffic lights detection method based on the relevant convolution in position Yu Fire model

Technical field

The invention belongs to computer visions, object detection technical field, and in particular to one kind based on the relevant convolution in position with The traffic lights detection method of Fire model.

Background technique

Can be accurate in vehicle travel process, the position and state for quickly determining the traffic lights in front are that advanced auxiliary is driven Sail the technology important with unmanned middle one kind.However illumination, day due to the complexity of true traffic scene, such as acute variation The problems such as differences in resolution of gas, the influence for the factors such as blocking and imaging device, so that the inspection of the traffic lights in real scene It surveys relatively difficult.With the prevalence of deep learning, in target detection, image recognition, multiple computer visions such as image segmentation Field all obtains thrilling effect.By the method for deep learning apply traffic lights detection on can preferably handle it is above-mentioned There are the problem of.Currently, the detection method for traffic lights has three categories:

The first is the method based on image procossing.This method is operated by Threshold segmentation, morphological transformation etc. to image It is handled, obtains interested object area in picture, then by specific priori knowledge, such as the connectivity of region, length and width Than, shape, relative position etc., these regions are handled, are screened layer by layer, what is finally obtained is exactly the region where traffic lights, is passed through Setpoint color threshold value or the color that traffic lights are judged using special color space.R.de Charette et al. is in document “R.de Charette and F.Nashashibi,Real time visual traffic lights recognition based on Spot Light Detection and adaptive traffic lights templates,IEEE A kind of method of traffic lights detection is proposed in Intelligent Vehicles Symposium, pp.358-363,2009 ", This method is converted by morphological image, and Threshold segmentation obtains object candidate region, then screens candidate regions using appearance ratio Domain obtains the state of traffic lights finally by template matching.Its shortcoming is can not to adapt to changeable scene, and threshold value is excessively Sensitivity, inadequate robust.

Second is the method based on Orientation on map, and the traffic lights information for being measured by accurate GPS and manually being marked obtains To more accurately traffic lights priori knowledge, when close to traffic lights, the candidate region of object is obtained using geometric transformation, Then classify on candidate region.V.John et al. document " V.John, K.Yoneda, Z.Liu, and S.Mita, Saliency Map Generation by the Convolutional Neural Network for Real-Time Traffic Light Detection Using Template Matching.IEEE Trans.Computational It proposes to generate offline conspicuousness map by GPS in Imaging, vol.1, no.3, pp.159-173, Sept.2015 ", benefit With in-vehicle camera parameter when close to traffic lights, the region that traffic lights occur is obtained by trigonometry, then uses convolution Neural network and template matching detect traffic lights classification.Its deficiency is excessively to rely on sensor device, under same effect, Cost is excessively high.

The third is the method based on machine learning, as Shi et al. document " Z.Shi, Z.Zhou, and C.Zhang, Real-Time Traffic Light Detection With Adaptive Background Suppression Filter.IEEE Trans.Intelligent Transportation Systems,vol.17,no.3,pp.690-700, It proposes in Oct.2015 " by learning to the sample in training set, carry out filtering background that can be adaptive, to obtain Then interested target area again classifies to obtained result.It can effectively avoid manually setting based on machine learning A variety of threshold values are set, by the model for learning to obtain, there is stronger generalization ability.One point as machine learning of deep learning Branch has more powerful learning ability, is increasingly becoming object detection field at present compared to traditional machine learning model Mainstream algorithm.

Summary of the invention

A plurality of types of traffic lights are detected while in order to improve accuracy in detection, the invention proposes one kind to be based on position The traffic lights detection method of relevant convolution and Fire model.The main thought of this method is: using multiple Fire models of connecting As trunk convolutional network FireNet, and the relevant convolutional layer of point of addition behind FireNet the last layer characteristic pattern, In The accuracy that positioning is improved while reducing network parameter can be effectively detected lesser and plurality of classes using the network Traffic lights, while the complex scenes such as different illumination and weather can be successfully managed, obtain preferable detection effect.

A kind of traffic lights detection method based on the relevant convolution in position Yu Fire model, feature the following steps are included:

Step 1: acquire it is various in true traffic scene under the conditions of traffic lights sample, to collected each frame image into Rower note, marks out the bounding box and specific category of traffic lights, the pictures after mark is divided into training set and test set, respectively For the training and verifying of network model, training set picture number is greater than test set picture number；The classification includes level Turn left with green light, red light, the green light straight trip of vertical both direction, green light, red light straight trip, red light left-hand rotation；

Step 2: N number of Fire model being together in series and is combined into trunk convolutional network FireNet, to be used to extract feature； Wherein, the Fire model is to be combined by the convolutional layer that M1 convolution kernel size is 1 × 1 and M2 convolution kernel size is 3 × 3 It forms, N > 5, M1 > 2, M2 > 2；

Step 3: the training set obtained using step 1 is carried out classification pre-training to trunk convolutional network FireNet, obtained Netinit weight；

Step 4: a light weight based on Anchor is added after the last layer characteristic pattern of trunk convolutional network FireNet Grade detection layers；The relevant convolutional layer of point of addition after the characteristic pattern of trunk convolutional network FireNet the last layer；One is added again A pond layer combines the relevant convolutional layer of detection layers and position based on Anchor, obtains final network；

Step 5: the initialization weight obtained using step 3 initializes the final network that step 4 obtains, then benefit It is trained with the training set that step 1 obtains, and test image is detected with trained network, can be obtained red The position of green light and specific category.

The beneficial effects of the present invention are: using the multiple Fire models of connecting as trunk convolutional network FireNet, compared to General convolutional network is more conducive to transporting on embedded device since the parameter of Fire model is few, running space demand is low Row；FireNet has more powerful feature representation ability as convolutional network, can sufficiently excavate traffic lights in different scenes Information obtains more accurate feature representation；The relevant convolutional layer of point of addition behind FireNet the last layer characteristic pattern, The accuracy that positioning can be improved, is effectively detected lesser traffic lights, and the relevant convolutional layer in position can also be to the volume of coupling Product feature is separated, and the different spatial information of traffic lights is obtained, and is thus allowed for more accurate position and is returned；By In using full convolution, it is possible to reduce the accumulation of process error further promotes the precision of detection, while difference can be uniformly processed Scale, different classes of classification problem.

Detailed description of the invention

Fig. 1 is a kind of traffic lights detection method flow chart based on the relevant convolution in position Yu Fire model of the invention

Fig. 2 is the schematic diagram of the method for the present invention Fire model

Fig. 3 is the relevant convolutional layer model schematic in the method for the present invention position

Fig. 4 is the result figure that traffic lights detection is carried out using the method for the present invention

Specific embodiment

Present invention will be further explained below with reference to the attached drawings and examples, and the present invention includes but are not limited to following implementations Example.

As shown in Figure 1, a kind of traffic lights detection method based on the relevant convolution in position Yu Fire model of the invention, real It is existing that steps are as follows:

1. preparing data set

Image is labeled, and generates training and tests required data set.Specifically, pass through the equipment such as in-vehicle camera Traffic lights picture under the conditions of different weather, different location, different time etc. are various during acquisition real driving, and to acquisition To image marked frame by frame, mark out the bounding box and specific category of traffic lights, the pictures after mark be divided into training Collection and test set, are respectively used to the training and verifying of network model, and training set picture number is greater than test set picture number；It is described Classification include horizontal and vertical directions green light, red light, green light straight trip, green light turn left, red light straight trip, red light turn left Deng.

2. planned network Fire model, and combine and obtain FireNet

Fire model is composed of the convolutional layer that several convolution kernel sizes are 1 × 1 and 3 × 3, in the present embodiment Fire model is that the convolutional layer that 1 × 1 and 3 convolution kernel size is 3 × 3 is constituted by 3 convolution kernel sizes, as shown in Figure 2.Using The convolutional layer that convolution kernel size is 1 × 1 greatly reduces number of parameters, available lesser model.3 × 3 convolution kernel It is the minimum convoluted core scale that can learn to Local Structure of Image information.It, will be several according to the specific requirements to model size A Fire model is cascaded, and obtains trunk convolutional network FireNet.9 Fire models are connected on one in the present embodiment It rises and constitutes trunk convolutional network FireNet.

3. pair FireNet model carries out pre-training

The training set obtained using step 1 carries out classification pre-training to trunk convolutional network FireNet, and the present embodiment is pre- Training uses stochastic gradient descent method, to the loss function L of target classification_cls=-logp_uIt optimizes, wherein U is true class label, and N is class number, and e is natural constant, p_uIt is softmax function, L_clsFor classification loss function.It is logical It crosses and minimizes the loss function, obtain netinit weight.

4. adding the lightweight detection layers based on Anchor

The lightweight detection based on Anchor is added after the last layer characteristic pattern of trunk convolutional network FireNet Layer, i.e., using Ren et al. in document " S.Ren, K.He, R.Girshick, and J.Sun, Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks,in Proc.Neural The thought of the anchor proposed in Information Processing Systems, pp.91-99,2015. " designs one light The sliding window of magnitude detects network, and after being added to the last layer characteristic pattern of trunk convolutional network FireNet, for generating Candidate region.In the present embodiment, the sliding window that lightweight is made of the convolutional layer that convolution kernel size is 3 × 3 detects network, packet 4 scales (25,50,80,120), 3 length-width ratios (1:2,1:1,2:1) totally 12 anchor are contained.

5. the relevant convolutional layer of point of addition

The relevant convolutional layer of point of addition after the output of FireNet the last layer, for extracting the space bit of traffic lights Confidence breath.The relevant convolutional layer in the position used in the present embodiment is as shown in figure 3, be divided into upper left, the right side for the location information of target Upper, lower-left, bottom right and in totally five parts, respectively correspond the feature of the different position of traffic lights.

6. adding pond layer

It adds a pond layer to combine the relevant convolutional layer of detection layers and position based on Anchor, obtain final Network.

7. the modified network of training

The initialization weight obtained using step 3 initializes the final network that step 6 obtains, recycle step 1 obtained training set is trained it.The present embodiment carries out multitask loss function using stochastic gradient descent method excellent Change, multitask loss function are as follows:

Wherein, L_clsFor Classification Loss function, p, u are the classification and true classification of prediction, p respectively_uIt is softmax letter Number.L_locIt is positioning loss function, t^uFor the position of prediction, v is the result manually marked；[u >=1] indicates that value is when u >=1 1, it is otherwise 0；I is callout box information, wherein (x, y) is the top left co-ordinate of the bounding box of mark, and w is the width of bounding box, h For the height of bounding box；λ is that a hyper parameter is used to balance two loss functions, λ=2 in the present embodiment.

The power that training obtains is saved until convergence to multitask loss function training using stochastic gradient descent method It weighs to get trained network is arrived.

8. traffic lights detect

Test image is detected using network trained in step 7, the position of traffic lights and specific can be obtained Classification.

The present embodiment is run in K40, CentOS operating system of video memory 12G, carries out emulation experiment using Python.It is real The real roads scene video image that trained and test data is acquired both from automobile data recorder used in testing, resolution ratio are 1280 × 720, and each frame video image is marked, 9000 pictures are finally chosen altogether sets up training set, 2000 picture groups Build test set.The picture of selection contains various complicated weather conditions, such as rainy day, dense fog, also contains other mal-conditions, Such as Qiang Guang, it backlight, blocks.It contains the two major classes traffic lights of horizontal and vertical type, contains green light, red in each major class again Lamp, green light straight trip, green light turns left, red light is kept straight on, red light turns left.

Fig. 4 is wherein piece image testing result schematic diagram, due to data set relative difficult used in the present invention, and The scale of traffic lights target is generally smaller, by test of many times, for the average detected accuracy rate of various types of other traffic lights About 68.26%, average recall rate is about 89.74%.Show that this method can effectively detect the traffic lights of plurality of classes, together When keep lower omission factor.By the design of Fire model, network weight parameter is saved with double precision data type, it is last Network weight model size is 4.4MB, is far below VGG16, and the weight parameter of the network model of ResNet101 can be used for vehicle It carries on embedded platform.By the design of the relevant convolutional layer in position, so that only can obtain more accurately determining by detection Position effect, has certain facilitation simultaneously for the detection of distance small target traffic lights.

Claims

1. a kind of traffic lights detection method based on the relevant convolution in position Yu Fire model, feature the following steps are included:

Step 1: acquire it is various in true traffic scene under the conditions of traffic lights sample, collected each frame image is marked Note, marks out the bounding box and specific category of traffic lights, the pictures after mark is divided into training set and test set, are respectively used to The training and verifying of network model, training set picture number are greater than test set picture number；The classification includes horizontal and vertical Green light, red light, green light straight trip, the green light of straight both direction turn left, red light is kept straight on, red light turns left；

Step 2: N number of Fire model being together in series and is combined into trunk convolutional network FireNet, to be used to extract feature；Wherein, The Fire model is to be composed of the convolutional layer that M1 convolution kernel size is 1 × 1 and M2 convolution kernel size is 3 × 3, N > 5, M1 > 2, M2 > 2；

Step 3: the training set obtained using step 1 is carried out classification pre-training to trunk convolutional network FireNet, obtains network Initialize weight；

Step 4: the lightweight inspection based on Anchor is added after the last layer characteristic pattern of trunk convolutional network FireNet Survey layer；The relevant convolutional layer of point of addition after the characteristic pattern of trunk convolutional network FireNet the last layer；A pond is added again Change layer to combine the relevant convolutional layer of detection layers and position based on Anchor, obtains final network；

Step 5: the initialization weight obtained using step 3 initializes the final network that step 4 obtains, and recycles step Rapid 1 obtained training set is trained it, and is detected with trained network to test image, and traffic lights can be obtained Position and specific category.