CN110163187A

CN110163187A - Remote road traffic sign detection recognition methods based on F-RCNN

Info

Publication number: CN110163187A
Application number: CN201910474058.2A
Authority: CN
Inventors: 杜娟; 刘志刚; 刘贤梅; 王辉; 刘苗苗; 王梅
Original assignee: Northeast Petroleum University
Current assignee: Hefei Jiuzhou Longteng Scientific And Technological Achievement Transformation Co ltd
Priority date: 2019-06-02
Filing date: 2019-06-02
Publication date: 2019-08-23
Anticipated expiration: 2039-06-02
Also published as: CN110163187B

Abstract

The present invention relates to the remote road traffic sign detection recognition methods based on F-RCNN, it includes: that an pre-processes Traffic Sign Images sample set；Two, carry out pre-training to the VGG-16 in F-RCNN；Traffic sign training dataset is input to VGG-16 by three, completes feature extraction；Four, construct fusion feature figure；Area generation network RPN in five .F-RCNN carries out Area generation according to fusion feature figure, obtains the candidate region of traffic sign；All candidate regions six, are input to the layer of the RoI-Pooling in F-RCNN, generate fixed-size feature vector；Feature vector is sent into extreme learning machine network by seven, exports classification and the position of traffic sign；Eight, train F-RCNN model using adaptive loss function is contributed；Nine, complete the road traffic sign detection identification of actual scene.The present invention realizes remote road traffic sign detection identification, and accuracy of identification is high.

Description

Remote road traffic sign detection recognition methods based on F-RCNN

One, technical field:

The present invention relates to towards it is unmanned and auxiliary drive intelligent transportation field, solving road traffic sign Remote detection and recognition methods, and in particular to be the remote road traffic sign detection recognition methods based on F-RCNN.

Two, background technique:

In intelligent transportation field, road traffic sign detection and identification are that the important of systems such as unmanned, auxiliary driving is ground Study carefully problem.Many research work have been carried out to this both at home and abroad, but there are still very big deficiency, can not have been applied in practice.It is former Because as follows: (1) some detection recognition methods, using disclosed German traffic sign data set GTSRB and GTSDB, traffic sign The ratio for occupying image is very big, and road traffic sign detection is difficult to adapt to galloping situation, and GTSDB data apart from short It concentrates detectable mark type less, is unable to satisfy actual needs；(2) some detection methods use the data set of oneself shooting Carry out training pattern, mark variation wretched insufficiency, quantity are very few, relative to German traffic sign data set GTSRB and GTSDB training Model, be more difficult to adapt to complicated traffic conditions, adaptive capacity to environment is worse；(3) some detection methods be based on color, The simple features such as shape, in actual traffic scene mark deformation, motion blur situations such as, these detection method robustness It is poor, it is difficult to be applied in practice；(4) remote in progress in addition, existing invention is all closely detection identification mostly Precision when road traffic sign detection identifies is lower, can not practical application.

Remote road traffic sign detection timely responds to have with driving safety with identification for intelligent transportation system Significance.Since tag distance intelligence system distance of taking pictures decides ratio of the traffic sign in actual scene, examine Ranging is from remoter, and the size of traffic sign in the scene is with regard to smaller.In computer vision field, remote road traffic sign detection Belong to small target deteection identification problem with identification problem, which is the difficulties of current computer vision field, existing Method is difficult to obtain higher detection accuracy of identification.

Three, summary of the invention:

The object of the present invention is to provide the remote road traffic sign detection recognition methods based on F-RCNN, it is existing for solving There is short distance detection recognition method low problem of precision when carrying out the identification of remote road traffic sign detection.

The technical solution adopted by the present invention to solve the technical problems is: this remote traffic sign based on F-RCNN Detection recognition method:

Step 1. pre-processes Traffic Sign Images sample set；

Step 2. carries out pre-training to the feature extraction network VGG-16 in F-RCNN using image classification benchmark dataset；

Pre-training is carried out to feature extraction network VGG-16 using ImageNet common image data set, after training Original state of the parameter as network；VGG-16 network includes 5 convolutional layers, 5 pond layers, and using ReLU as activation letter Number；

Traffic sign training dataset is input to the feature extraction network VGG-16 of pre-training by step 3., is rolled up to image Product, pondization operation, complete feature extraction；

By the way of mini-batch, by the Traffic Sign Images Segmentation of Data Set of training at several batches；Step 4. Fusion feature figure is constructed using maximum pond, empty convolution, regularization, converging operation；

(1) output of each convolutional layer of feature extraction network VGG-16, be successively denoted as from front to back conv1, conv2, Conv3, conv4 and conv5；For every traffic scene with 2048 × 2048 resolution ratio, it is converted first into 2048 × VGG-16 is input to after 2048 × 3 numerical matrix, wherein picture size constantly reduces, and port number constantly increases, each convolutional layer Output matrix dimension is followed successively by conv1:2048 × 2048 × 64；Conv2:1024 × 1024 × 128；Conv3:512 × 512 × 256；Conv4:256 × 256 × 512；Conv5:128 × 128 × 512；

(2) being expanded using empty convolution to conv5, expansion rate 3, the size after expansion is 512*512* 512, convolutional calculation is then carried out, wherein the parameter of convolution kernel are as follows: size 3 × 3, step-length are 1, are filled with 1, quantity 256, meter It is 512 × 512 × 256 that output after calculation, which is denoted as dilated-conv5 matrix dimension,；

(3) down-sampling is carried out using maximum pondization to conv1, channel duplication expands 4 times, and the output after down-sampling is denoted as Pooling-conv1, its matrix dimension are 512 × 512 × 256；

(4) it carries out L2 regularization calculating respectively to pooling-conv1, conv3, dilated-conv5, eliminates scale shadow It rings, wherein the calculating process of L2 regularization is specific as follows:

WhereinPixel after indicating regularization, d indicate the port number of pixel；

(5) by pooling-conv1, conv3 and dilated-conv5, this three groups of eigenmatrixes are spatially directly carried out Polymerization calculates, and generates while including the fusion feature figure of resolution ratio and abstract semantics information, its matrix dimension is 512 × 512 ×256；

Area generation network RPN in step 5.F-RCNN carries out Area generation according to fusion feature figure, obtains traffic mark The candidate region of will；

(1) it is slided on fusion feature figure using 3 × 3 convolution kernel, for each pixel on characteristic pattern, with this Centered on point, and using 1:1,1:2,2:1 dimension scale and 4 kinds of areas 16,32,64,128, produced on original input picture Raw 12 anchor frames；

(2) after sliding, the generation quantity of anchor frame is 512 × 512 × 12；

(3) removal is more than the anchor frame on original input picture boundary；

(4) method, threshold value 0.7 are inhibited using maximum value, removal repeats more anchor frame；

(5) according to the friendship of real goal in anchor frame and sample and than IoU, positive sample and negative sample are determined, wherein IoU >0.7 is positive sample, and IoU<0.3 is negative sample, the anchor frame between removal 0.3 to 0.7, wherein the calculation formula of IoU It is as follows:

(6) according to translation invariance, each anchor frame corresponds to a region Suggestion box on fusion feature figure；

(7) all areas Suggestion box obtains object candidate area after the full articulamentum of Area generation network RPN；

All candidate regions of step 6. are input in the layer of the RoI-Pooling in F-RCNN, generate fixed-size feature Vector；

(1) for each object candidate area, it is divided into 8 parts in horizontal, vertical direction respectively by RoI-Pooling layers, And every a down-sampling for all carrying out maximum value pond is handled；

(2) in this manner, even if candidate region size is different, but sampled result is consistent, generation fixed dimension 8 × 8 × 256 feature vector；

Feature vector is sent into the extreme learning machine network for being used to classify and return by step 7., exports the classification of traffic sign The position and；

The extreme learning machine structure of use: (1) ELM: 4096 nodes of hidden layer for traffic sign classification exports 44 Node, each output node represent a kind of traffic sign, and for codomain between (0,1), when classification takes maximum output node to make For traffic sign classification；(2) ELM: 4096 nodes of hidden layer returned for traffic sign position exports 4 nodes, respectively generation The center point coordinate and width of table traffic sign；

The learning algorithm of the extreme learning machine of use is specific as follows:

(1) input/output relation of extreme learning machine ELM can indicate are as follows:

Wherein X=(x₁,x₂,...,x_N) it is the RoI-Pooling layer feature vector exported, for j-th feature vector Desired output is T_j=(t_j1,t_j2,...,t_jk)^T, the reality output O of ELM_j=[o_j1,o_j2,...,o_jk]^T；For the ELM of classification Network, k=44, for the ELM network of recurrence, k=4；w_i=[w_i1,w_i2,...,w_in]^TFor i-th of hidden neuron and input Weight vector between neuron, β_i=[β_i1,β_i2,...,β_ik]^TFor the power between i-th of hidden neuron and output neuron k It is worth vector, θ_iIt is the threshold value of i-th of hidden node, i=1,2 ..., 4096, g () be activation primitive；

(2) learning objective of ELM is so that error function E is minimum, and wherein E is the square-error of target and desired output With expression are as follows:

There is β_i, w_iAnd b_i, so that

The output matrix H of ELM is as follows:

Therefore ELM output indicates are as follows:

H β=T

(3) according to the principle of least square, the calculating that hidden layer exports weight β is as follows:

Wherein H is the output matrix of ELM hidden layer, H^TFor the transposed matrix of H, I is unit matrix, and C is constant, and O is ELM's Reality output matrix.

The output weight being calculated is updated toCalculate the value of each output node of ELM； For the ELM to classify for traffic sign, the corresponding node serial number of maximum output valve in 44 output nodes is taken, is traffic mark The classification of will；For the ELM for road traffic sign detection, 4 export 4 positional parameters for respectively representing the mark, respectively Center point coordinate and width are high；

Step 8., which uses, contributes adaptive loss function (Contribution Adaptive Loss Function, CA), Training F-RCNN model；

(1) region in F-RCNN model suggests that the training objective of network RPN is to minimize classification and positioning loss, it The description of loss function formalization are as follows:

Wherein p_iIndicate that i-th of anchor frame is the prediction probability of target object,Indicate the true tag of target object, t_iFor the coordinate information of prediction block, including centre coordinate (x_i,y_i), width w_iWith high h_i,For the coordinate information of true frame, also include Centre coordinateIt is wideAnd heightL_RPN-CAIndicate the adaptive Classification Loss of contribution of RPN network, N_clsIndicate anchor The sum of frame, N_regIndicate the size of characteristic pattern, λ is adjustment factor, is takenL_regIndicate all recurrence damages for surrounding frame It loses, it has used L₁Loss, is specifically defined are as follows:

Wherein

Therefore, the adaptive loss function of the contribution of RPN may be defined as:

Wherein (1-p_t)³To contribute adaptive loss adjustment factor, since difficulty divides easily wrong point of negative sample, class probability p_t→ 0, contribution at this time adaptively loses adjustment factor and tends to 1, so that such sample is unaffected to the contribution of total losses；But It is easily point positive sample class probability p_t→ 1, contribute adaptive loss adjustment factor to tend to 0, so that easily dividing positive sample to total losses Contribution drop to 0, utilize the tribute to total losses of the adaptive loss adaptive dynamic regulation difficulty or ease sample of adjustment factor of contribution It offers, allows F-RCNN training to focus more on difficulty and divide negative sample, effective training for promotion efficiency；

(2) the adaptive loss function of contribution of layer network is connected entirely is defined as:

Wherein L_FC-CAIndicate that focal loss of classifying, k indicate kth class target, q more_kIt is pre- to indicate that sample belongs to kth classification target Survey probability；

Step 9. starts colour TV camera, takes pictures, to carry out before the scene input model pre- to actual traffic scene Processing, sets 2048 × 2048 for resolution ratio, is then input in FR-CNN, repeats step 3 to step 7, completes actual field The road traffic sign detection of scape identifies.

Step 1 in above scheme specifically:

(1) the Tsinghua-Tencent 100K data set announced using Tsinghua University and Tencent's joint, selects 44 Class often uses traffic sign as remote detection identification object；

(2) Tsinghua-Tencent 100K data set is divided into training set and test set according to the ratio of 1:2；

(3) to guarantee sample balance when model training, the Scene case of every class traffic sign is 100 in training set More than, if the Scene case of certain class mark is lower than 100, it is filled using the method for repeated sampling.

The utility model has the advantages that

1, the present invention does not need hand-designed feature, and the remote detection of 44 kinds of common traffic signs may be implemented, have Higher accuracy of identification prevents traffic accident so that facilitating intelligence system timely responds to and improve drive safety.

2, the training sample set that the present invention uses and the difference and advantage invented in the past.

The analysis of causes: in training pattern, there are mainly two types of sample sets for existing invention, and the 1st kind is acquisition of taking pictures by hand Sample set, the 2nd kind is German road traffic sign detection data set GTSRB and GTSDB.Due to traffic sign is many kinds of, influence because Plain multiplicity, the sample set for acquisition of taking pictures by hand, be difficult include comprehensively traffic sign various situations, such as illumination, motion blur, A variety of variations such as weather, visual angle.On the other hand, GTSDB and GTSRB will test identification mission and be divided into two independent data Collection causes to be connected between two tasks undesirable and (needs individually to train detection model with GTSDB, and with GTSRB training identification mould Type, two models are merged according still further to certain mechanism, just the detection and identification of achievable traffic sign).In addition GTSDB is provided The traffic sign type of detection is less, can not adapt to actual requirement, and the resolution ratio of traffic scene is only 1360 × 800, hands over Logical mark is larger, belongs to and closely detects.Simultaneously in GTSRB, traffic sign occupies the 90% of scene.Therefore using GTSDB and The model of GTSRB training can not adapt to remote road traffic sign detection identification in practice.

Benefit analysis: based on the above analysis, the present invention combines publication using Tsinghua University with Tencent Tsinghua-Tencent 100K mass transportation flag data collection.Advantage is specific as follows:

(1) it is split to form by Tencent's real scene shooting streetscape figure, altogether include 100,000 scene pictures and 30,000 traffic signs, Comprising ban, warning and the traffic sign for indicating 3 major class, cover the change conditions such as most of illumination, weather, comprising more Traffic sign type.And every traffic scene photo resolution reaches 2048 × 2048, and (0,32] pixel and (32,96] The traffic sign of pixel occupies the 41.6% and 49.1% of data set respectively, therefore is very suitable to remote road traffic sign detection and knows The training of other model；(2) this data set training objective detection model is used, model more can adapt to complicated and changeable Remote road traffic sign detection identification, for it is unmanned and auxiliary driving in intelligent navigation equipment more safeties are provided With the equipment response time

3, it has invented fusion feature figure (Fusion Feature Map), analysis and advantage the reason of invention.

The analysis of causes: existing target detection model has preferable detection identification effect for target biggish in picture Fruit, but the target identification poor effect for accounting for image scaled very little.The main body of remote Traffic Sign Recognition test problems Present traffic scene mark very little that is larger, and identifying, existing target detection model accuracy rate when solving the problems, such as this is very low, Small target deteection identification problem is the Research Challenges of current computer vision field at present.The essential reason of the problem is traffic mark Will is after being responsible for the convolutional neural networks VGG-16 fl transmission of feature extraction in target detection model, by multiple convolution It is operated with pondization, the resolution ratio in characteristic pattern (output of the last one convolutional layer) is sharply lower, and size is only original mark The 1/16 of will, while containing many background noise informations unrelated with mark.

Benefit analysis: it is insufficient for this, the characteristics of present invention is according to different convolutional layers (shallow convolutional layer have high-resolution, The characteristics of low semantic information, and deep convolutional layer then has low resolution, high semantic information), invent a kind of fusion feature figure skill The feature of different convolutional layers is integrated into fusion feature figure by empty convolution, Chi Hua, regularization, polymerization by art.This fusion The advantages of characteristic pattern, is as follows:

(1) technology that fusion feature figure is constructed using empty convolution sum maximum pondization, be present invention firstly provides.It utilizes Empty convolution expands deep convolutional layer, does not lose the abstract semantics information of any high level, is carried out according to the coefficient of expansion Expansion, the deep convolutional layer after expansion can have identical dimensional with preceding convolution, to complete to polymerize；

(2) size of the characteristic pattern of common VGG-16 is the 1/16 of input picture, and the dimension enlargement of fusion feature figure is The 1/4 of input picture, meanwhile, fused each pixel not only has advanced abstract semantics information, but also includes higher Resolution characteristics, so that the detection for small traffic sign provides great advantage.4, invention limit of utilization learning machine model structure Build the full articulamentum of F-RCNN, analysis and advantage the reason of invention.

The analysis of causes: the present invention is using extreme learning machine (Extreme Learning Machine, ELM) as full connection The reason of layer network model is: the number of parameters that F-RCNN connects layer network entirely is huge, reaches 4096 × 4096.If using general Logical neural network model, such as BP neural network, not only training speed is extremely slow, training is unstable, but also is easy to over-fitting, sternly Ghost image rings model performance.

Benefit analysis: extreme learning machine is a kind of neural network model with Fast Learning ability, in the training process, After input sample is mapped to the random number space of hidden layer by the model, quickly calculated using Moore-Penrose generalized inverse hidden Layer output weight, pace of learning are exceedingly fast, simplify the training process of full articulamentum, the training time of model, Lifting Modules are effectively reduced The training speed of type.

5, it has invented the adaptive loss function of contribution (Contribution Adaptive Loss Function, CA), has sent out Bright reason analysis and advantage.

The analysis of causes: during target detection model training, most of sample be easily divide positive sample, be easy to get compared with High accuracy rate, and a few sample is that difficulty divides negative sample, be easy to cause detection identification mistake.It is similar with the learning process of student, Raising of the problem to do wrong together to learning level is grasped, the easily calculation question of repetition training tens is far longer than.Mesh Mark detection model belongs to one kind of artificial intelligence model, and training process is also that the learning process of the mankind is copied to construct.But The learning algorithm of existing target detection model, indistinguishable difficulty divide negative sample and easy point positive sample, and due to easily dividing positive sample This quantity divides negative sample much higher than difficulty, thus cause these model trainings many times after, although training with higher is accurate Rate, but accuracy of identification when practical application is still lower.

Benefit analysis: based on the above reasons, it is adaptive to have invented a kind of contribution in terms of the training algorithm of F-RCNN by the present invention Loss function (Contribution Adaptive Loss Function, CA) is answered, algorithm advantage is as follows:

(1) situation is identified according to the detection of each sample, automatically adjusts its contribution to loss function, improve difficult point of negative sample This contribution lost to F-RCNN model training reduces in contrast and easily divides influence of the positive sample to model training.

(2) CA loss function can effectively distinguish two kinds of samples, so that the training of target detection model F-RCNN is more closed Note difficulty divides negative sample, training of the model to such sample is continually strengthened, until correctly detecting identification, to constantly effectively mention The learning ability and accuracy of identification of high model.

(3) in addition, this adjustment process is that dynamically, after difficulty divides negative sample classification correct, CA loss function can lead to Contribution adaptation coefficient is overregulated, easily point positive sample is classified as.In contrast, since training is shaken, easily divide positive sample point After class mistake, CA loss function can be adjusted to again difficulty and divide negative sample.This Dynamic Regulating Process, has been effectively ensured mould The convergence of type training process.

Four, Detailed description of the invention:

Fig. 1 is the internal structure chart of F-RCNN target detection model of the invention.

Fig. 2 is the method flow diagram that remote road traffic sign detection of the invention identifies.

Fig. 3 is 44 kinds of common traffic signs of remote detection identification of the invention, these traffic signs are divided into: instruction, police Three kinds of classifications are accused and forbid, * indicates a kind of mark in figure, wherein il*:il100, il60, il80；Ph*:ph4, ph4.5, ph5； Pm*:pm20, pm30, pm55；Pl*:pl5, pl20, pl30, pl40, pl50, pl60, pl70, pl80, pl100, pl120.

Fig. 4 is the detection accuracy of identification of F-RCNN of the present invention, and exists with common target detection model Faster R-CNN Small size (0,32] traffic sign of pixel carries out Accuracy-Recall comparison diagram.

Fig. 5 is the detection accuracy of identification of F-RCNN of the present invention, and exists with common target detection model Faster R-CNN Small size (32,96] traffic sign of pixel carries out Accuracy-Recall comparison diagram.

Fig. 6 is the detection accuracy of identification of F-RCNN of the present invention, and exists with common target detection model Faster R-CNN Small size (96,200] traffic sign of pixel carries out Accuracy-Recall comparison diagram.

Fig. 7 is using method of the invention for the road traffic sign detection recognition result in actual traffic scene.

Five, specific embodiment:

Following further describes the present invention with reference to the drawings:

The invention proposes a kind of novel target detection models, are named as integration region convolutional neural networks (Fusion Region Convolutional Neural Networks, F-RCNN), learning algorithm aspect proposes the adaptive loss of contribution Function (Contribution Adaptive Loss Function, CA) mainly includes 5 component parts in model structure, It is briefly described as follows that (wherein (2) and (5) and other target detection models have significant difference, are for for remote traffic mark Will detection identification problem is individually invented):

(1) convolutional neural networks VGG-16: being mainly responsible for the traffic scene to input model, by convolution sum pond, by Layer calculates picture feature；

(2) fusion feature figure (Fusion Feature Map): this is proposed by the present invention a kind of for remote traffic The peculiar technology of Mark Detection.Traffic scene passes through maximum pond, empty convolution, regularization, polymerization after VGG-16 is transmitted The fusion feature figure of generation, while resolution characteristics with higher and high-level semantics information abundant are target detection model The feature extraction of traffic sign is provided, is a component part mostly important in F-RCNN network, it is accurate for model inspection Rate has great influence；

(3) network (Region Proposal Network, RPN) is suggested in region: according to fusion feature figure, generating certain The target suggestion areas of quantity；

(4) interest pool area layer (Region of Intertest Pooling Layer, RoI-Pooling), into one Step extracts the feature of traffic sign, the target suggestion areas that RPN is calculated, and pond chemical conversion is the feature vector of regular length；

(5) fully-connected network (Fully Connected Network, FC): it is responsible for specific classification and the position of traffic sign Calculating is set, for the training effectiveness for improving model, the present invention, as FC, can save computing resource using extreme learning machine model, Effectively shorten training time of model.

As shown in Figure 1, Figure 2, this remote road traffic sign detection recognition methods based on F-RCNN is specific as follows:

Step 1. pre-processes Traffic Sign Images sample set；

(3) to guarantee sample balance when model training, the Scene case of every class traffic sign is 100 in training set More than, if the Scene case of certain class mark is lower than 100, it is filled using the method for repeated sampling；

The step plays a significant role the training of F-RCNN model.If not using traffic directly to VGG-16 pre-training Flag data collection training, then F-RCNN is difficult detection accuracy of identification with higher.Therefore the present invention is public using ImageNet Image data set carries out pre-training to VGG-16 network, original state of the parameter as network after training.The network packet 5 convolutional layers, 5 pond layers are included, and using ReLU as activation primitive；

Traffic sign training dataset is input to VGG-16 by step 3., does convolution to image, pondization operates, completion feature It extracts；

Detailed process are as follows: by the way of mini-batch, by the Traffic Sign Images Segmentation of Data Set Cheng Ruo of training Dry batch, the quantity of the picture in every batch of is all seldom, can not only substantially reduce the calculation amount of F-RCNN, and model parameter according to The error of this group of data carries out gradient updating, reduces randomness, to accelerate model training convergence；

Step 4. constructs fusion feature figure using maximum pond, empty convolution, regularization, converging operation；

Fusion feature figure proposed by the present invention, its specific calculating step are as follows:

(1) output of each convolutional layer of VGG-16 model, be successively denoted as from front to back conv1, conv2, conv3, Conv4 and conv5.For every traffic scene with 2048 × 2048 resolution ratio, it is converted first into 2048 × 2048 × 3 VGG-16 is input to after numerical matrix, wherein picture size constantly reduces, and port number constantly increases.Wherein each convolutional layer output Matrix dimension is followed successively by conv1:2048 × 2048 × 64；Conv2:1024 × 1024 × 128；Conv3:512 × 512 × 256； Conv4:256 × 256 × 512；Conv5:128 × 128 × 512；

(2) being expanded using empty convolution to conv5, expansion rate 3, the size after expansion is 512*512* 512, convolutional calculation is then carried out, wherein the parameter of convolution kernel are as follows: size 3 × 3, step-length are 1, are filled with 1, quantity 256.Meter It is 512 × 512 × 256 that output after calculation, which is denoted as dilated-conv5 matrix dimension,；

WhereinPixel after indicating regularization, d indicate the port number of pixel.

(5) by pooling-conv1, conv3 and dilated-conv5, this three groups of eigenmatrixes are spatially directly carried out Polymerization calculates, and generates while including the fusion feature figure of resolution ratio and abstract semantics information, its matrix dimension is 512 × 512 ×256。

(1) it is slided on fusion feature figure using 3 × 3 convolution kernel, for each pixel on characteristic pattern, with this Centered on point, and use 3 kinds of dimension scales (1:1,1:2,2:1) and 4 kinds of areas (16,32,64,128), in original input picture 12 anchor frames of upper generation；

(3) removal is more than the anchor frame on original input picture boundary；

(5) according to the friendship of real goal in anchor frame and sample and than IoU, positive sample and negative sample are determined, wherein IoU >0.7 is positive sample, and IoU<0.3 is negative sample, the anchor frame between removal 0.3 to 0.7.The wherein calculation formula of IoU It is as follows:

(7) finally, all areas Suggestion box obtains target candidate area after the full articulamentum of Area generation network RPN Domain.

The extreme learning machine structure that the present invention uses: (1) ELM: hidden layer 4096 nodes for traffic sign classification, it is defeated 44 nodes out, each output node represent a kind of traffic sign, and for codomain between (0,1), when classification takes maximum output Node is as traffic sign classification；(2) ELM: 4096 nodes of hidden layer returned for traffic sign position exports 4 nodes, Center point coordinate and the width for respectively representing traffic sign are high.

The learning algorithm for the extreme learning machine that the present invention uses is specific as follows:

Wherein X=(x₁,x₂,...,x_N) it is the RoI-Pooling layer feature vector exported, for j-th feature vector Desired output is T_j=(t_j1,t_j2,...,t_jk)^T, the reality output O of ELM_j=[o_j1,o_j2,...,o_jk]^T.For the ELM of classification Network, k=44, for the ELM network of recurrence, k=4.w_i=[w_i1,w_i2,...,w_in]^TFor i-th of hidden neuron and input Weight vector between neuron, β_i=[β_i1,β_i2,...,β_ik]^TFor the power between i-th of hidden neuron and output neuron k It is worth vector, θ_iIt is the threshold value of i-th of hidden node, i=1,2 ..., 4096, g () be activation primitive.

(2) learning objective of ELM is so that error function E is minimum, and wherein E is the square-error of target and desired output With may be expressed as:

There is β_i, w_iAnd b_i, so thatIn addition, the output matrix H of ELM is as follows:

Therefore ELM output may be expressed as:

H β=T

The output weight being calculated is updated toThe each output node of ELM can be calculated Value.For the ELM to classify for traffic sign, the corresponding node serial number of maximum output valve in 44 output nodes is taken, i.e., For the classification of traffic sign.For the ELM for road traffic sign detection, 4 export the 4 positioning ginseng for respectively representing the mark Number, respectively center point coordinate and width are high.

Step 8., which uses, contributes adaptive loss function (Contribution Adaptive Loss Function, CA), Training F-RCNN model

(1) region in F-RCNN model suggests that the training objective of network RPN is to minimize classification and positioning loss.It The description that loss function can formalize are as follows:

Wherein p_iIndicate that i-th of anchor frame is the prediction probability of target object,Indicate the true tag of target object, t_iFor the coordinate information of prediction block, including centre coordinate (x_i,y_i), width w_iWith high h_i,For the coordinate information of true frame, also include Centre coordinateIt is wideAnd heightL_RPN-CAIndicate the adaptive Classification Loss of contribution of RPN network, N_clsIndicate anchor The sum of frame, N_regIndicate the size of characteristic pattern, λ is adjustment factor, Ke YiquL_regIt indicates all and surrounds returning for frame Return loss, it has used L₁Loss, is specifically defined are as follows:

Wherein

Wherein (1-p_t)³To contribute adaptive loss adjustment factor.Since difficulty divides easily wrong point of negative sample, class probability p_t→ 0, contribution at this time adaptively loses adjustment factor and tends to 1, so that such sample is to the contribution of total losses substantially not by shadow It rings.But easily divide positive sample class probability p_t→ 1, contribute adaptive loss adjustment factor to tend to 0, so that easily dividing positive sample to total The contribution of loss drops to 0.Using the contribution to total losses of the adaptive dynamic regulation difficulty or ease sample of the coefficient, to allow F- RCNN training focuses more on difficulty and divides negative sample, effective training for promotion efficiency.

Wherein L_FC-CAIndicate that focal loss of classifying, k indicate kth class target, q more_kIt is pre- to indicate that sample belongs to kth classification target Survey probability.

Step 9. starts colour TV camera, takes pictures to actual traffic scene.It to be carried out before the scene input model pre- Processing, sets 2048 × 2048 for resolution ratio, is then input in model, repeats step 3 to step 7, completes actual scene Road traffic sign detection identification.

Embodiment:

The present invention uses the data set that learns as model training of Tsinghua-Tencent 100K, common to 44 kinds Traffic sign (refering to Fig. 3) has carried out detection identification.In the data set, the resolution ratio of every traffic scene is 2048 × 2048, The size of traffic sign accounts for the 41.6% and 49.1% of data set between 0-32 pixel, 32-96 pixel respectively, i.e., and 90.7% Traffic sign size account for traffic scene ratio less than 1%, belong to remote road traffic sign detection identification situation.

F-RCNN model training and site testing data explanation (test be in order to grasp step 1 of the present invention it is rapid~step 8 is rapid Feasibility)

During training F-RCNN model, for the imbalance for eliminating sample set, for being less than 100 classifications, every Method when secondary trained iteration using resampling makes number of pictures be more than 1000.The ratio of training set and test set is 2:1. The present invention in the specific implementation process, is referred to using the common Measure Indexes F1-measure of accuracy rate is measured as detection identification Mark, this refers to that target value is bigger, then illustrates that detection accuracy of identification is higher.

Table 1 gives F-RCNN of the invention and common target detection model Faster R-CNN and is identifying remote traffic Comparative result on mark.In addition, the present embodiment successive contrast removes the accuracy of identification feelings after the relevant technologies in the present invention Condition.

For convenience of description, make following denotational description:

(1) record F-RCNN detection model is F0；

(2) if in F-RCNN detection model, fusion feature diagram technology is not used, frame is denoted as F1 at this time.On the basis of F1, The adaptive loss function of contribution is not used to train network again, frame at this time is denoted as F2.

The result data provided from table 1 can be seen that us and invent the detection recognition method of F-RCNN, to 44 kinds of common friendships The remote detection accuracy of identification of logical mark is clearly higher than common Faster R-CNN, reaches 30~40%, to have Effect demonstrates the validity and practicability of our inventive methods.Fig. 4, Fig. 5, Fig. 6 each provide various sizes of traffic sign Detect the RC curve comparison of recognition result, hence it is evident that find out the method for our inventions in 0-32 pixel, 32-96 pixel, 96-200 picture The Accuracy-Recall curve of element is substantially better than Faster R-CNN model.

If not identified using the detection of the F1 of fusion feature figure in addition, can be seen that from the comparative situation of F0, F1, F2 Precision averagely has dropped 10 percentage points or so, if simultaneously without using fusion feature figure and the adaptive loss function of contribution, inspection It surveys accuracy of identification and declines 16 percentage points or so again.Therefore can be seen that from actually detected Comparative result it is proposed that melt It closes characteristic pattern and contributes adaptive loss function that there is obvious effect for the accuracy of identification for improving remote traffic sign.

Table 1. is often with the recognition result accuracy comparison (%) of 44 kinds of traffic signs

In addition, the present embodiment compared training time and detection time, the training time of Faster R-CNN is 107 small When, the training time of F0 (F-RCNN) of the present invention is 68 hours, and the training time of F1 is 66 hours, and the training time of F2 is 63 hours.This explanation: (1) F-RCNN model, due to having used extreme learning machine ELM as fully-connected network, by model Training time improves 30% or so；(2) the training time comparison of F0, F1 and F2, effectively illustrates fusion proposed by the present invention Characteristic pattern and the adaptive loss function of contribution, it is increased for the calculation amount of F-RCNN model seldom, improving remote identification While precision, computing resource is effectively saved.

The invention discloses one kind to be based on integration region convolutional neural networks (Fusion Region Convolutional Neural Networks, F-RCNN) remote road traffic sign detection recognition methods, mainly solve existing road traffic sign detection The detecting distance of recognition methods is short, detects the deficiency that type is few, accuracy of identification is low.Deep learning, computer vision are used first Method, the detection of remote traffic sign and identification mission are uniformly integrated into F-RCNN target detection model, and is directed to The few deficiency of existing method (based on GTSDB, GTSRB data set or collected by hand data set) detection type, the present invention is using clear Hua Da combines the Tsinghua-Tencent 100K announced with Tencent as model training data set；Secondly, to improve F-RCNN has invented a kind of fusion feature figure skill to the character representation ability of small size traffic sign in remote detection process Art is somebody's turn to do by the Fusion Features of different convolutional layers into new characteristic pattern by maximum pond, empty convolution, regularization, polymerization Characteristic pattern resolution characteristics with higher and high-level semantics information abundant.Meanwhile to improve the study energy of F-RCNN model Power further increases detection accuracy of identification, has invented a kind of adaptive loss function of contribution, has been distinguished by adjusting sample losses Difficulty or ease sample allows in model learning training process focusing more on difficulty and divide negative sample, to effectively improve training effectiveness.In addition, adopting Extreme learning machine is used to effectively shorten the training time of model as the full connection layer network of F-RCNN model, save calculating Resource.It was proved that present invention detection accuracy of identification with higher, and can be to 44 kinds in actual life common traffic marks Will carries out remote detection identification.

Claims

1. a kind of remote road traffic sign detection recognition methods based on F-RCNN, it is characterised in that:

Step 1. pre-processes Traffic Sign Images sample set；

Pre-training is carried out to feature extraction network VGG-16 using ImageNet common image data set, the parameter after training Original state as network；VGG-16 network includes 5 convolutional layers, 5 pond layers, and using ReLU as activation primitive；

Traffic sign training dataset is input to the feature extraction network VGG-16 of pre-training by step 3., to image do convolution, Pondization operation, completes feature extraction；

By the way of mini-batch, by the Traffic Sign Images Segmentation of Data Set of training at several batches；

(2) being expanded using empty convolution to conv5, expansion rate 3, the size after expansion is 512 × 512 × 512, Then convolutional calculation is carried out, wherein the parameter of convolution kernel are as follows: size 3 × 3, step-length are 1, are filled with 1, quantity 256, after calculating Output to be denoted as dilated-conv5 matrix dimension be 512 × 512 × 256；

(4) L2 regularization calculating is carried out respectively to pooling-conv1, conv3, dilated-conv5, eliminating scale influences, Wherein the calculating process of L2 regularization is specific as follows:

(5) by pooling-conv1, conv3 and dilated-conv5, this three groups of eigenmatrixes spatially directly polymerize It calculating, generates fusion feature figure simultaneously comprising resolution ratio and abstract semantics information, its matrix dimension is 512 × 512 × 256；

Area generation network RPN in step 5.F-RCNN carries out Area generation according to fusion feature figure, obtains traffic sign Candidate region；

(1) it is slided on fusion feature figure using 3 × 3 convolution kernel, for each pixel on characteristic pattern, as in Heart point, and using the dimension scale and 4 kinds of areas 16,32,64,128 of 1:1,1:2,2:1,12 are generated on original input picture A anchor frame；

(3) removal is more than the anchor frame on original input picture boundary；

(5) according to the friendship of real goal in anchor frame and sample and than IoU, positive sample and negative sample are determined, wherein IoU > 0.7 It is positive sample, IoU < 0.3 is negative sample, and the anchor frame between removal 0.3 to 0.7, wherein the calculation formula of IoU is such as Under:

(1) for each object candidate area, it is divided into 8 parts in horizontal, vertical direction respectively by RoI-Pooling layers, and right Every a down-sampling processing for all carrying out maximum value pond；

Feature vector is sent into the extreme learning machine network for being used to classify and return by step 7., exports classification and the position of traffic sign It sets；

The extreme learning machine structure of use: (1) ELM: 4096 nodes of hidden layer for traffic sign classification exports 44 sections Point, each output node represent a kind of traffic sign, and for codomain between (0,1), when classification takes maximum output node conduct Traffic sign classification；(2) ELM: 4096 nodes of hidden layer returned for traffic sign position exports 4 nodes, respectively represents The center point coordinate and width of traffic sign are high；

(1) input/output relation of extreme learning machine ELM indicates are as follows:

Wherein X=(x₁,x₂,...,x_N) it is the RoI-Pooling layers of feature vector exported, the expectation for j-th of feature vector Output is T_j=(t_j1,t_j2,...,t_jk)^T, the reality output O of ELM_j=[o_j1,o_j2,...,o_jk]^T；For the ELM net of classification Network, k=44, for the ELM network of recurrence, k=4；w_i=[w_i1,w_i2,...,w_in]^TFor i-th of hidden neuron and input mind Through the weight vector between member, β_i=[β_i1,β_i2,...,β_ik]^TFor the weight between i-th of hidden neuron and output neuron k Vector, θ_iIt is the threshold value of i-th of hidden node, i=1,2 ..., 4096, g () be activation primitive；

(2) learning objective of ELM is so that error function E is minimum, and wherein E is the error sum of squares of target and desired output, table It is shown as:

There are β_i, w_iAnd b_i, so that

The output matrix H of ELM is as follows:

Therefore ELM output indicates are as follows:

H β=T

Wherein H is the output matrix of ELM hidden layer, H^TFor the transposed matrix of H, I is unit matrix, and C is constant, and O is the reality of ELM Output matrix.

The output weight being calculated is updated toCalculate the value of each output node of ELM；It is right In the ELM to classify for traffic sign, the corresponding node serial number of maximum output valve in 44 output nodes is taken, is traffic sign Classification；For the ELM for road traffic sign detection, 4 export and respectively represent 4 positional parameters of the mark, respectively in Heart point coordinate and width are high；

Step 8. trains F-RCNN model using adaptive loss function is contributed；

(1) region in F-RCNN model suggests that the training objective of network RPN is to minimize classification and positioning loss, its loss The description of functional form are as follows:

Wherein p_iIndicate that i-th of anchor frame is the prediction probability of target object,Indicate the true tag of target object, t_iFor The coordinate information of prediction block, including centre coordinate (x_i,y_i), width w_iWith high h_i,For the coordinate information of true frame, including center CoordinateIt is wideAnd heightL_RPN-CAIndicate the adaptive Classification Loss of contribution of RPN network, N_clsIndicate anchor frame Sum, N_regIndicate the size of characteristic pattern, λ is adjustment factor, is takenL_regIndicate all recurrence losses for surrounding frame, It has used L₁Loss, is specifically defined are as follows:

Wherein

Therefore, the adaptive loss function of the contribution of RPN is defined as:

Wherein (1-p_t)³To contribute adaptive loss adjustment factor, since difficulty divides easily wrong point of negative sample, class probability p_t→ 0, contribution at this time adaptively loses adjustment factor and tends to 1, so that such sample is unaffected to the contribution of total losses；But easily Divide positive sample class probability p_t→ 1, contribute adaptive loss adjustment factor to tend to 0, so that easily dividing tribute of the positive sample to total losses It offers and drops to 0, utilize the adaptive loss adjustment factor (1-p of contribution_t)³, adaptive dynamic regulation difficulty or ease sample to total losses Contribution allows F-RCNN training to focus more on difficulty and divides negative sample, effective training for promotion efficiency；

Wherein L_FC-CAIndicate that focal loss of classifying, k indicate kth class target, q more_kIt is general to indicate that sample belongs to the prediction of kth classification target Rate；

Step 9. starts colour TV camera, takes pictures to actual traffic scene, to be pre-processed before the scene input model, 2048 × 2048 are set by resolution ratio, is then input in FR-CNN, step 3 is repeated to step 7, completes the friendship of actual scene Logical Mark Detection identification.

2. the remote road traffic sign detection recognition methods according to claim 1 based on F-RCNN, it is characterised in that: institute The step of stating one specifically:

(1) the Tsinghua-Tencent 100K data set announced using Tsinghua University and Tencent's joint, selects 44 classes normal Use traffic sign as remote detection identification object；

(3) for guarantee model training when sample balance, training set in every class traffic sign Scene case be 100 with On, if the Scene case of certain class mark is lower than 100, it is filled using the method for repeated sampling.