CN108710826A

CN108710826A - A kind of traffic sign deep learning mode identification method

Info

Publication number: CN108710826A
Application number: CN201810329234.9A
Authority: CN
Inventors: 张秀玲; 张逞逞; 周凯旋
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2018-04-13
Filing date: 2018-04-13
Publication date: 2018-10-26

Abstract

The invention discloses a kind of traffic sign deep learning mode identification methods, include the following steps：The test sample and training sample of Traffic Sign Images are pre-processed；The residual error deep learning network for designing the Analysis On Multi-scale Features ranking operation fusion based on convolutional neural networks, mainly allows network to automatically extract feature to exclude artificial trace by training；Grader is trained using deep layer color characteristic and traffic sign test sample is identified.Characteristics of image ranking operation is applied in conjunction with multiple dimensioned convolution converged network in traffic pattern recognition technology by the present invention, the training effectiveness for significantly improving network efficiently solves the precision encountered in traffic sign recognition method and real-time is not ideal enough, the complicated network structure and training time is long and stability and the problems such as poor robustness.Trained network reaches 97% in 43 class Traffic Sign Images accuracy of identification.

Description

A kind of traffic sign deep learning mode identification method

Technical field

The present invention relates to Traffic Sign Recognition field more particularly to a kind of sides of deep layer network model image steganalysis Method.

Background technology

In vehicle traveling process, the pattern-recognition of Traffic Sign Images is the important composition portion of intelligent traffic control system Point, have to follow-up running car control effect to the accuracy of identification of different traffic instructions states in this link vital Effect.The variety classes traffic data in road site environment is acquired by camera shooting to be recognized, and judges that Current traffic instructs The type of state, and the control mechanism of automobile is fed back to, ensure the safety traffic of automobile by controlling the operation of executing agency, Intelligent automobile is finally set to meet unpiloted production standard, so the pattern that should seek and study high-precision traffic sign is known Other method.

Traffic sign recognition (TSR, Traffic Signs Recognition) is as one in vehicle-mounted auxiliary system Important branch is one of current still unsolved problem.Due in traffic sign containing there are many important traffic information, such as to working as How soon the speed prompt of preceding driving, the variation of road ahead situation, driving behavior restrict, therefore in the auxiliary system, Speed accurately and efficiently identifies the traffic sign in road and it is fed back to driver or control system, and guarantee is driven Safety is sailed, the generation to avoid traffic accident has highly important research significance.Pavement marking identifies common method packet Include the recognition methods based on shape, the method that feature extraction and classifying device combines, the recognition methods of deep learning.Based on shape Recognition methods robustness is poor, ineffective in complex environment.The method recognition effect that feature extraction and classifying device combines is preferable, But computing cost is big, and adaptive capacity to environment is poor.Deep learning can directly be identified original image, extraction reflection number According to the recessive character of essence, there is enough study depth.Convolutional neural networks have the characteristic that local weight is shared, for ring Situations such as border is complicated, multi-angle changes all has certain real-time and robustness.Therefore, it is necessary to design one kind can accurately obtain Take the recognition methods of pavement marking in road scene.

Traditional traffic sign mode identification method is to utilize template matches, neural network, characteristics of image and grader knot The methods of close, make full use of image SIFT, HOG, Haar, ORB feature to obtain good effect.These characteristic of human nature's interference rejection abilities Difference, in theory existing defects, it is difficult to meet the needs of high precision image identification.Traffic Sign Recognition side based on deep layer network Method is simple and effective, but net training time is not ideal enough, and training pattern is excessive, practical application it is seldom.In recent years, this field Interior experts and scholars have studied the technology of the image recognition and application aspect of the deep layer network of light-type, achieve extraordinary effect Fruit.But due to the deficiency of the complexity of neural network and existing optimisation technique, cause to be based on neural network filter system Debate that know model structure complicated, and the problems such as that there are net training times is long, and stability is poor.

Traffic Sign Images include several classes such as indicating, forbidding, alerting, and there is also become in a large amount of class for similar traffic image Change, keeps identification process more complicated.Under urban road environment, since time, angle, weather are different, sign image information can It can accurately be detected by color, shape etc. interference and completely interception mark region difficulty is larger.

On the other hand, intelligent optimization algorithm has obtained quick development in nearest decades, passes through intelligent optimization method It can obtain the optimal solution of many nonlinear optimal problems.Intelligent optimization algorithm technology is applied to quantum nerve network training In, modeling accuracy and efficiency are improved, is the research direction rich in foreground and application value in Nonlinear Modeling field.This is also High-precision and efficient traffic sign on-line mode identification technology problem provide technical support and theoretical foundation.

In conclusion accuracy rate for Traffic Sign Recognition in the prior art is low, parameter redundancy and the problem of over-fitting, There has been no effective solution schemes at present.

Invention content

Present invention aims at a kind of accuracy rate that can improve Traffic Sign Recognition of offer, simplify structural parameters, elimination The traffic sign deep learning mode identification method of over-fitting, system, which provides, in order to control reliably controls foundation, to improve intelligence The unmanned technology of traffic provides strong guarantee.

To achieve the above object, following technical scheme is used：The method of the invention includes the following steps：

Step 1, input Traffic Sign Images as test sample and training sample and are pre-processed；

Step 2, convolutional neural networks model is designed, using sample training network, allows network to automatically extract depth by training Layer color characteristic is excluding artificial trace；

Step 3, the traffic sign that test sample is identified using deep layer color characteristic training grader, carries out traffic sign knowledge Not.

Further, in step 1, the pretreatment of training sample includes：

Mean value, training sample image is gone to subtract training sample mean value training, test sample implementation；Test sample image subtracts Go training sample mean value；

Random selection increases rotational angle to training sample, promotes image diversity of the image in terms of angle, angle of rotation Degree setting is within limits；

Random selection is stretched or is compressed in the horizontal direction to training sample, image of the promotion image in visual aspects Diversity, stretches or suppressed range is set within limits；

Unified training sample, test sample size.

Further, it in step 2, designs a n-layer convolutional neural networks characteristic weighing and merges Remanent Model；

The traffic sign database of use has 43 kinds of marks, so network output is made to be aⁱ, i=1,2,3...m, wherein for Export m number, m=43；Network weights channel is k, k=6；Then the calculation formula of each layer of network is as follows：

Wherein l is l layers of network；w_ljFor l layer networks channel；x_lIt is inputted for l layers.

Further, in step 2, using sample training convolutional neural networks, extraction deep layer color characteristic includes：

Each convolutional layer is built in characteristic dimension using random and partially connected table, and is rolled up according to multiple convolution layer buildings Product neural network carries out convolution to Traffic Sign Images and is operated with pondization；

The characteristics of image figure in different channels is weighted using Squeeze-Excitation network modules, is used Full articulamentum reinforces characteristics of image in module；

Various sizes of feature channel module is fused to multiple dimensioned color characteristic；

Feature on different depth is normalized and is fused to deep layer color characteristic；

It is mapped according to the bottom of the input of each scale feature fused layer first layer and the fitting of mesh size Fusion Features layer Learn the residual error mapping of convolutional neural networks.

Further, described that each convolutional layer is built in characteristic dimension using random and partially connected table, according to multiple Convolution layer building convolutional network carries out convolution to Traffic Sign Images：

Convolutional layer forms successively structure, analysis in characteristic dimension using random and intensive partially connected table pack network The data statistics of last layer is simultaneously gathered into the neural tuple with high correlation, which forms next layer of neuron simultaneously Connect the neuron of last layer；Relevant neuron concentrates on the regional area of input data image, and small ruler is covered at next layer The neural tuple of very little convolutional layer, smallest number expansion is covered by larger convolution, wherein merges the use 3 of Analysis On Multi-scale Features The convolutional layer of × 3 and 5 × 5 sizes and the pond layer filter of 3 × 3 sizes, the filter groups of all outputs are connected to next The input of layer；1 × 1 convolution kernel is added before 3 × 3 and 5 × 5 convolution kernel of high calculation amount.

Further, described that the characteristics of image figure in different channels is carried out using Squeeze-Excitation network modules Ranking operation, carrying out reinforcement to characteristics of image using full articulamentum in module includes：

In a kind of scale feature convolutional channel, image enters Squeeze- by convolution pond shallow operation Excitation network modules；Using in neighborhood in maximum convergence localized region in Squeeze-Excitation modules The mode that characteristic point is maximized carries out pondization operation；

It is three layers of fully-connected network mappings characteristics that 1 × 1 × n (port number) outputs are 1 × 1 × n that feature group, which obtains input, The characteristics of image figure in different channels is weighted in the n-dimensional vector of group, enhances image feature information automatically.

Further, described to carry out various sizes of feature to be fused to multiple dimensioned color characteristic, merge Analysis On Multi-scale Features Using 6 network channels, single channel includes the extraction of shallow-layer convolution pondization and SE network module two parts；The shallow-layer in 6 channels The pond layer filter of the convolutional layer filter and two 3 × 3 sizes of two 3 × 3,5 × 5 sizes is respectively adopted in mechanism；Two Different with the shallow-layer network number of plies of scale network channel, remaining is identical.

Further, the feature by different depth is normalized and is fused to deep layer color characteristic, to close And characteristic pattern vector in each pixel in be normalized, and according to zoom factor it is independent to the channel of each vector into Row scaling；Feature after six single channel shallow-layer networks and the characteristic size after deep layer Squeeze-Excitation networks are returned It is merged after one change so that the further feature of image information is combined with shallow-layer feature.

Further, it is fitted according to the input of each scale feature fused layer first layer and mesh size Fusion Features layer The residual error mapping of bottom mapping study convolutional neural networks, uses number of filter for 64,128,256 multi-scale feature fusion The residual error of bottom mapping study convolutional network after layer with the fitting of network integration layer maps, each scale feature fused layer carries out 3 Secondary superposition forms 9 layers of Fusion Features residual error network.

Further, the step 2 the specific steps are：

2-1 determines the control parameter of the convolutional neural networks training study for traffic sign pattern-recognition：

Specifically control parameter includes：For the initial type number NP of network training, maximum study algebraically N, network training effect The desired value e and training sample number M of fruit；

2-2, the processing of training sample data：

Traffic sign pretreatment image is directly inputted to network, the multilayer convolutional neural networks of network front end is allowed to carry out certainly Dynamic feature extraction, is then input in output layer excitation function；

2-3, the setting of multilayer convolutional neural networks：

Multilayer convolutional neural networks model uses fully connected topology, is by feature extraction layer, characteristic weighing channel, feature Fused layer and output layer are constituted；Feature extraction layer is also referred to as shallow-layer feature extraction layer, by the different convolution pond layer structures of 6 kinds of scales At；

2-4, the setting in characteristic weighing channel：

Given input x, the feature number of plies are c_1, logical by converting to obtain a feature to characteristic layer ranking operation Road number is the feature of c_2；This is to count to realize at channel by using the average pond metaplasia of the overall situation.In form, statistics z ∈ R^CIt is By shrinking what U was generated on Spatial Dimension W × H, c-th of element of wherein z is calculate by the following formula：

Weight is generated for each feature channel by parameter, wherein parameter is learnt for explicitly Modelling feature channel Between correlation；Selection is activated using sigmoid：

S=F_ex(z, W)=σ (g (z, W))=σ (W₂δ(W₁z))

Wherein, δ refers to ReLU functions, W₁∈R^C_3×CAnd W₂∈R^C_3×C, i.e. dimensionality reduction layer parameter is W₁, followed by a parameter For W₂Liter tie up layer, C_3 is set as 16；

In order to which limited model complexity and auxiliary are extensive, operated using Dropout in two full connection (FC) layers, Dropout reduces the simultaneous adaptation between neuron node, enhances generalization ability, and node dropout rates are set as 0.5；

The weight of the output of Excitation is regarded to the importance in each feature channel after feature selecting as, so The recalibration to primitive character on channel dimension is completed by channel weighting to previous feature by multiplication afterwards：

x′_c=F_scale(u_c,s_c)=s_c·u_c

Wherein X '=s [x′₁,x′₂,...,x′_c]And F_scale(u_c,s_c) refer to Feature Mapping u_c∈R^W×HWith scalar s_cBetween Amount s_cBetween corresponding channel product；

2-5, the setting of Fusion Features model block：

The output feature of six branches is combined, final output is characterized as that k is tieed up, and the output of the network integration layer is special Sign carries out BatchNormalization normalization, carries out out_dim 1 × 1 convolution kernels and ReLU activation；Fusion Features module Input layer make together with the output feature for passing through multiple dimensioned Model Fusion by out_dim 1 × 1 convolution kernels and ReLU activation A residual error network model is formed for the input feature vector of next Fusion Features module；

2-6, the setting of residual error learning model block：

H (x) bottoms being fitted as network superimposed layer are mapped using residual error learning model block, wherein x indicates each folded Add the input of the first layer of layer；Assuming that multiple nonlinear network layers can progressively approach complicated function, it is equivalent to non-linear layer Asymptotic residual error function, i.e. H (x)-x；Therefore, these non-linear layers is allowed to be similar to residual error function：F (x)=H (x)-x；That , original function becomes F (x)+x；

2-7 records the network parameter of training study, obtains the multilayer excitation function of a 9 layer network models 43 output Modified convolutional neural networks；

After establishing network according to above-mentioned steps, then carry out Traffic Sign Recognition.

Compared with prior art, the invention has the advantages that：By using input Traffic Sign Images as test specimens This with training sample and pre-processed, using training sample training convolutional neural networks extraction deep layer color characteristic, using depth The technological means of the traffic sign of layer color characteristic training grader identification test sample, improves the accurate of Traffic Sign Recognition Rate simplifies structural parameters, eliminates over-fitting.

Description of the drawings

Fig. 1 is the flow chart of the method for the present invention.

Fig. 2 is the module diagram of multi-scale feature fusion network in the method for the present invention.

Fig. 3 is Squeeze-Excitation module diagrams in the method for the present invention.

Fig. 4 is Squeeze-Excitation module arithmetic schematic diagrames in the method for the present invention.

Fig. 5 is addition residual error after the multi-scale feature fusion layer that the method for the present invention median filter number is 64,128,256 Practise illustraton of model.

Fig. 6 is to carry out the obtained accuracy of identification figure of Traffic Sign Recognition using the present invention.

Specific implementation mode

With reference to specific example and attached drawing, the present invention will be further described.

Characteristic present refers to just activation value of the image at CNN layers, and the size of characteristic present should be slow in CNN Reduce.The feature of higher-dimension is easier to handle, and is trained faster on high dimensional feature, it is easier to restrain.Low-dimensional embedded space is enterprising Row space is converged, and loss is not very big.This explanation is that have very strong correlation, information between adjacent neural unit With redundancy.

The depth and width of the network of balance.Have when can be allowed if width and appropriate depth in network application to distribution There is the computation budget for comparing balance.

Fig. 1 is the flow chart of the present invention, it includes the following steps：

Step 2, convolutional neural networks model is designed, using sample training network, extracts deep layer color characteristic；

Step 3, the traffic sign of test sample is identified using image further feature training grader.

The traffic sign recognition method based on deep learning provided according to embodiments of the present invention includes：

The design phase of network model：During to whole network modelling, mainly solves catenet and gathering around Under conditions of having quantity of parameters, network is susceptible to over-fitting and the influence of computing resource is excessively increased, and is not increasing The problem of learning ability of network is improved under conditions of quantity of parameters.Often there is its deep layer network in general catenet structure Penalty values be not less than its shallow-layer network losses value the shortcomings that, this patent is mapped by residual error, the learning function of reconstructed network layer, Residual error approached to the mode of zero, the effective solution problem.This patent is by merging the defeated of different dimensioned network layers simultaneously Go out feature, realizes the Multiscale Fusion of characteristics of image.To make the further comprehensive study of network input Traffic Sign Images Further feature, realize merging for local feature and global characteristics, depth layer network structure passed through into Squeeze- Excitation network modules merge.Step 2 includes the following operation content executed successively：

(1) design of deep learning network model：It is inefficient in numerical computations to break non-homogeneous sparse data structure Property and improve the learning ability of network model, convolutional layer, using random and partially connected table, while combining close in characteristic dimension The network of collection.A kind of successively structure is formed, needs the related data statistics for analyzing last layer, and they are gathered into height The neural tuple of correlation.These neurons form next layer of neuron, and connect the neuron of last layer.Close to data Lower level in, relevant neuron concentrates on the regional area of input data image.Finally there is a large amount of characteristic information meeting The same regional area is concentrated on, this can cover the convolutional layer of small size at next layer.And there are the nerves of smallest number expansion Tuple can be covered by larger convolution.For snap to pixels size, the convolutional layer for merging Analysis On Multi-scale Features uses 1 × 1,3 The filter of × 3 and 5 × 5 sizes, pond layer use the filter of 3 × 3 sizes.And the filter group of all outputs is connected It connects, as next layer of input；

To ensure invariance of feature under the conditions ofs image frees rotation, translation, flexible etc., using maximum convergence to part Characteristic point in region in neighborhood is maximized.Mean shift phenomenon is estimated caused by reduce convolutional layer parameter error, more Reservation image detail texture information.

Since the model block overlies one another, their related data will necessarily change.When high-rise feature is by higher When layer is captured, their spatial concentration degree can become smaller, and the size of filter should increasing and become with the network number of plies at this time Greatly.But use 5 × 5 convolution kernel that can bring huge calculation amount, if the output of last layer is 100 × 100 × 128, pass through It crosses after 5 × 5 convolution kernels with 256 output, output data size is 100 × 100 × 256.Wherein, the shared ginseng of convolutional layer Number 128 × 5 × 5 × 256.Obviously this can bring high calculation amount.Once pooling is added in Fusion Features layer, Since the quantity of output filter is equal to the number of filters in preceding layer, calculation amount can dramatically increase.Merge the defeated of layer Go out the increase for merging the number of output that will all lead to interlayer after being exported with convolutional layer.Even if Inception structures can cover Best sparsity structure, but calculate inefficiencies can cause in an iterative process there is a phenomenon where calculation amount explode.

Convolution kernel to solve 5 × 5 sizes brings huge calculation amount, and keeps sparsity structure, compresses calculation amount.In height The 3 × 3 of calculation amount and 5 × 5 convolution kernel before using 1 × 1 convolution kernel reduce calculation amount, network model block structure as scheme Shown in 2.

Fusion Features network model is overlie one another by multiple convolutional layers, and maximum convergence is added by the resolution ratio of network Halve.Due to the Memorability of network during the training period, multi-scale feature fusion module has good effect in upper layer network.The net Network model allows to significantly increase neuronal quantity in each stage, and will not amplify calculation amount.The multiple dimensioned spy of size reduction Sign Fusion Model allows every layer of last a large amount of inputs being transmitted in next layer network.In multi-scale feature fusion structure Each larger convolution kernel first reduces the size of convolution kernel before calculating, i.e., handle visual information on multiple scales, then gather Close Analysis On Multi-scale Features information so that next layer network can obtain the abstract characteristics of different scale simultaneously.

(2) another Spatial Dimension is introduced to carry out the fusion of feature interchannel, uses a kind of " feature recalibration " plan Slightly.Specifically, the significance level for getting each feature channel automatically exactly by way of study, then according to this Significance level goes to promote useful feature and inhibits the feature little to current task use.

Fig. 3 is the schematic diagram of improvement SE (Squeeze-Excitation) module provided.Given input x, it is special The sign number of plies is c_1, by obtaining the feature that a feature number of plies is c_2 after a series of General Transformations such as convolution.With traditional CNN Different to be, next we operate the feature being previously obtained come recalibration by three.

It is Squeeze operations first, we carry out Feature Compression along Spatial Dimension, will each two-dimensional feature channel Become a real number, the feature for the dimension and input that this real number has global receptive field, and exports in a way is logical Road number matches.It characterizes the global distribution responded on feature channel, and the layer close to input can also be obtained Global receptive field, this point is all highly useful in many tasks.This is by using the average pond metaplasia of the overall situation at logical Road statistics is realized.In form, statistics z ∈ R^CIt is to be generated by shrinking U on Spatial Dimension W × H, wherein c-th of z yuan Element is calculate by the following formula：

Followed by Excitation is operated, and generates weight by parameter for each feature channel, wherein parameter is learnt For the correlation of explicitly Modelling feature interchannel.In order to meet these standards, we select to activate using sigmoid：

S=F_ex(z, W)=σ (g (z, W))=σ (W₂δ(W₁z))

Wherein δ refers to ReLU functions, W₁∈R^C_3×CAnd W₂∈R^C_3×C.I.e. dimensionality reduction layer parameter is W₁, followed by a parameter For W₂Liter tie up layer, C_3 is set as 16.

Followed by Dropout is operated, and for limited model complexity and assists extensive, we pass through the shape around non-linear Door machine system is parameterized at the bottleneck of two full connection (FC) layers, dropout operations make a neural unit and pick out at random Other neural units come work together, and reduce the simultaneous adaptation between neuron node, enhance generalization ability, node Dropout rates are set as 0.5.

It is finally the operation of a Reweight, we regard the weight of the output of Excitation as selects by feature The importance in each feature channel after selecting, then by multiplication by channel weighting to previous feature, completion is tieed up in channel The recalibration to primitive character on degree：

x′_c=F_scale(u_c,s_c)=s_c·u_c

Wherein X '=s [x′₁,x′₂,...,x′_c]And F_scale(u_c,s_c) refer to Feature Mapping u_c∈R^W×HWith scalar s_cBetween Corresponding channel product.

Fig. 4 is by an example of SE Module-embeddings to Fusion Features module.Dimensional information beside box represents the layer Output.Here we use global average pooling to be operated as Squeeze.And then two Fully One Bottleneck structure of Connected layers of composition goes the correlation of modeling interchannel, and exports and equally counted with input feature vector Purpose weight.Characteristic dimension is reduced to by we firstThen pass through a Fully again after ReLu is activated Connected layers rise and return to original dimension.It does so than being directly advantageous in that with Connected layers of a Fully：1) With more non-linear, it can preferably be fitted the correlation of interchannel complexity；2) parameter amount and calculating are considerably reduced Amount.Then by normalized weight between the door acquisition 0~1 of a Sigmoid, come finally by the operation of a Scale It will be in the feature of the Weight after normalization to each channel.

(3) one of such a network network module has 10 layers, it may be said that and it is the network of relatively large depth, Therefore as how one effectively gradient is propagated back to all layers by mode is an important problem.Utilize residual error learning model block The bottom mapping that H (x) is fitted as network superimposed layer, wherein x indicate the input of the first layer of each superimposed layer.Assuming that multiple Nonlinear network layer can progressively approach complicated function, be equivalent to the asymptotic residual error function of non-linear layer, i.e. H (x)-x.Cause This, allows these non-linear layers to be similar to residual error function：F (x)=H (x)-x.So, original function becomes F (x)+x.

Although what two kinds of forms can be asymptotic approaches expectation function, the easiness of study is different.Adding layers construct body Part mapping, the model to meet deeper have the training error no more than its compared with shallow-layer peer-to-peer model.When identity map is optimal When, simply the weight of multiple non-linear layers is promoted to zero to approach identity map.If optimal function is kept off in zero mapping And be close to identical mapping, then find disturbance according to identical mapping.

The definition of each building block is y=F (x, { w_i)+x, here x and y be respectively the building block preceding layer input and The output vector of final layer.Function F (x, { w_i) it is the residual error to be learnt mapping.

Here by taking the building block that two layers of residual error learns as an example, wherein F=W₂σ(W₁X) σ in indicates ReLU activation, and Offset parameter is omitted.Y=F (x, { w_i) shortcut link in+x will not introduce additional parameter and will not increase answering for calculating Miscellaneous degree.In y=F (x, { w_i) x the and F sizes in+x must be equal, when the size of x and F is unequal, pass through linear projection With size, such as formula：Y=F (x, { w_i})+w_sX, residual error learn the building block for single layer, are similar to linear layer：Y=w₁X+x, and The effect of optimization cannot be played to deep layer network.Therefore use and learn building block with three layers of residual error, as shown in Figure 5.

The study found that when the quantity of the model block median filter of residual error study is more than 1000, residual error study will appear not Stable phenomenon.ResNet-50, ResNet-101, ResNet-152 network have reached peak in res4 this layer network, The quantity of res4 layers of filter is 1024, occurs apparent decline inflection point at res5 layers, res5 layers of filter quantity is 2048.Therefore for ResNet when filter quantity is more than 1000, network shows unstability, and network can be in training early stage There is the phenomenon that " death ".It can not be solved by reducing learning rate or adding additional batch normalization to residual error learning model block The certainly problem.Therefore the number of filter in the network of the present invention is up to 256, is respectively 64,128 in number of filter, Residual error learning model is added after 256 multi-scale feature fusion layer, as shown in Figure 5.

The operation content of step (1) (2) (3) is further specifically described below according to the embodiment of the present invention：

Image data set is sent into the network designed by the present invention and proceeds by deep learning.Image is by again in input layer It is secondary to be adjusted to 120 × 120 × 3, it is then passed in convolutional layer conv1, then carries out ReLU activation, obtain 64 dimensional feature figures, It is normalized again.It is admitted to first layer Fusion Features layer later, each Fusion Features module includes 6 characteristic operation channels, As shown in Fig. 2, convolutional layer and scale size that respectively two scale sizes are 3 × 3 and 5 × 5 are 3 × 3 pond layer filter, The convolution kernel for adding 1x1 respectively afterwards plays the role of reducing characteristic pattern thickness.

In Fusion Features model block, characteristic operation is divided into six branches, more rulers are handled using the convolution kernel of different scale Degree problem.This six branches are as follows：

1, after 16 1 × 1 convolution kernels, 16 3 × 3 convolution of two superpositions are entered after ReLU is activated Core carries out 64 1 × 1 convolution kernels again after ReLU is activated.The latter branch into Squeeze-Excitation networks, One branch carries out 64 1 × 1 convolution kernels, and merges finally.

2, after 16 1 × 1 convolution kernels, the convolution kernel of single layer 16 3 × 3 is entered after ReLU is activated, is passed through 64 1 × 1 convolution kernels are carried out again after ReLU activation.The latter branch into Squeeze-Excitation networks, one point 64 1 × 1 convolution kernels of Zhi Jinhang, and merge finally.

3, after 16 1 × 1 convolution kernels, 16 5 × 5 convolution of two superpositions are entered after ReLU is activated Core carries out 64 1 × 1 convolution kernels again after ReLU is activated.The latter branch into Squeeze-Excitation networks, One branch carries out 64 1 × 1 convolution kernels, and merges finally.

4, after 16 1 × 1 convolution kernels, the convolution kernel of single layer 16 5 × 5 is entered after ReLU is activated, is passed through 64 1 × 1 convolution kernels are carried out again after ReLU activation.The latter branch into Squeeze-Excitation networks, one point 64 1 × 1 convolution kernels of Zhi Jinhang, and merge finally.

5, it by the 64 average pond layers of dimension that two layers of core size is 3 × 3, is carried out after ReLU is activated BatchNormalization is normalized.The latter branch into Squeeze-Excitation networks, a branch carries out 64 1 × 1 convolution kernels, and merge finally.

6, it by the 64 average pond layers of dimension that single layer core size is 3 × 3, is carried out after ReLU is activated BatchNormalization is normalized.The latter branch into Squeeze-Excitation networks, a branch carries out 64 1 × 1 convolution kernels, and merge finally.

The output feature of six branches is combined, final output is characterized as 384 dimensions, is sent into the model of residual error study Block.The output feature of the network integration layer carries out BatchNormalization normalization, carries out out_dim 1 × 1 convolution kernels And ReLU is activated.The input layer of Fusion Features module by out_dim 1 × 1 convolution kernels and ReLU activation with pass through it is multiple dimensioned The output feature of Model Fusion forms a residual error network model as the input feature vector of next Fusion Features module together.On The out_dim sizes for stating description are 64,128,256, the combination class of subsequent Fusion Features model block and residual error learning model block Seemingly, it is just not repeated to describe here.

Residual error module bottom output feature merges after normalization in the average pond layer of feeding, export characterized by 1 × 1 × 512, by reducing the dropout layers of 50% output ratio, finally it is sent into the linear layer that there is softmax to lose as grader, Due to being divided into 43 classes, therefore softmax is finally 43 × 1 vector.

By repeatedly training network in the optimizers file parameters of deep learning network, we are at regularized learning algorithm rate 0.0001, and the renewal learning rate in a manner of step, stepsize are set as 1000, maximum iteration 2000, weight decays It is set as 0.0002.

Traffic sign sorting phase：Retain softmax classification in the network structure of deep learning, but uses every time entire The softmax of network model, which carries out classification, can cause huge calculation amount, be easy to happen over-fitting, and can not ensure The feature of final convolutional layer output is exactly optimal classification result by the sorted results of softmax.To change softmax The network needs of the parameter of classification, entire depth study reclassify.To solve the above problems, using SVM classifier to network Every layer of output feature is trained, and compares training result, and the network layer feature for choosing highest accuracy is handed over as final from now on The feature of logical sign image, solves the flexibility of adjusting parameter, avoids the process of re -training network.

It is surveyed from the above it can be seen that technical solution provided by the invention is used as by using input Traffic Sign Images Sample sheet and training sample and pre-processed, using training sample training convolutional neural networks extraction deep layer color characteristic, make The technological means that the traffic sign of test sample is identified with deep layer color characteristic training grader, improves the standard of Traffic Sign Recognition True rate simplifies structural parameters, eliminates over-fitting.

Embodiment described above is only that the preferred embodiment of the present invention is described, not to the model of the present invention It encloses and is defined, under the premise of not departing from design spirit of the present invention, technical side of the those of ordinary skill in the art to the present invention The various modifications and improvement that case is made should all be fallen into the protection domain of claims of the present invention determination.

Claims

1. a kind of traffic sign deep learning mode identification method, the described method comprises the following steps：

Step 2, convolutional neural networks model is designed, using sample training network, allows network to automatically extract deep layer face by training Color characteristic is excluding artificial trace；

Step 3, the traffic sign that test sample is identified using deep layer color characteristic training grader, carries out Traffic Sign Recognition.

2. a kind of traffic sign deep learning mode identification method according to claim 1, it is characterised in that：In step 1, The pretreatment of training sample includes：

Mean value, training sample image is gone to subtract training sample mean value training, test sample implementation；Test sample image subtracts instruction Practice sample average；

Random selection increases rotational angle to training sample, promotes image diversity of the image in terms of angle；

Random selection is stretched or is compressed in the horizontal direction to training sample, and it is various in the image of visual aspects to promote image Property；

Unified training sample, test sample size.

3. a kind of traffic sign deep learning mode identification method according to claim 1, it is characterised in that：In step 2, One n-layer convolutional neural networks characteristic weighing of design merges Remanent Model；

The traffic sign database of use has 43 kinds of marks, so network output is made to be aⁱ, i=1,2,3...m, wherein to export m Number, m=43；Network weights channel is k, k=6；Then the calculation formula of each layer of network is as follows：

4. a kind of traffic sign deep learning mode identification method according to claim 1, it is characterised in that：In step 2, Using sample training convolutional neural networks, extraction deep layer color characteristic includes：

Each convolutional layer is built in characteristic dimension using random and partially connected table, and according to multiple convolution layer building convolution god Convolution is carried out through network to Traffic Sign Images to operate with pondization；

The characteristics of image figure in different channels is weighted using Squeeze-Excitation network modules, uses module In full articulamentum characteristics of image is reinforced；

It is mapped and is learnt according to the bottom of the input of each scale feature fused layer first layer and the fitting of mesh size Fusion Features layer The residual error of convolutional neural networks maps.

5. a kind of traffic sign deep learning mode identification method according to claim 4, it is characterised in that：The use Random and partially connected table builds each convolutional layer in characteristic dimension, according to multiple convolution layer building convolutional networks to traffic mark Will image carries out convolution：

Convolutional layer forms successively structure in characteristic dimension using random and intensive partially connected table pack network, and analysis is last One layer of data statistics is simultaneously gathered into the neural tuple with high correlation, which forms next layer of neuron and connection The neuron of last layer；Relevant neuron concentrates on the regional area of input data image, and small size is covered at next layer The neural tuple of convolutional layer, smallest number expansion is covered by larger convolution, wherein merges 3 × 3 He of use of Analysis On Multi-scale Features The pond layer filter of the convolutional layer of 5 × 5 sizes and 3 × 3 sizes, the filter groups of all outputs are connected to next layer defeated Enter；1 × 1 convolution kernel is added before 3 × 3 and 5 × 5 convolution kernel of high calculation amount.

6. a kind of traffic sign deep learning mode identification method according to claim 4, it is characterised in that：The use The characteristics of image figure in different channels is weighted in Squeeze-Excitation network modules, is connected entirely using in module Layer to characteristics of image reinforce：

In a kind of scale feature convolutional channel, image enters Squeeze-Excitation nets by convolution pond shallow operation Network module；In Squeeze-Excitation modules maximum is taken using the maximum characteristic point converged in localized region in neighborhood The mode of value carries out pondization operation；

It is three layers of fully-connected network mappings characteristics groups that 1 × 1 × n (port number) outputs are 1 × 1 × n that feature group, which obtains input, The characteristics of image figure in different channels is weighted in n-dimensional vector, enhances image feature information automatically.

7. a kind of traffic sign deep learning mode identification method according to claim 4, it is characterised in that：It is described will not Feature with size carries out being fused to multiple dimensioned color characteristic, and fusion Analysis On Multi-scale Features use 6 network channels, single channel packet Include the extraction of shallow-layer convolution pondization and SE network module two parts；The shallow-layer mechanism in 6 channels is respectively adopted two 3 × 3,5 × 5 big The pond layer filter of small convolutional layer filter and two 3 × 3 sizes；Two shallow-layer network numbers of plies with scale network channel Difference, remaining is identical.

8. a kind of traffic sign deep learning mode identification method according to claim 4, it is characterised in that：It is described will not Deep layer color characteristic is normalized and is fused to the feature in depth, for each pixel in combined characteristic pattern vector It is inside normalized, and is zoomed in and out according to zoom factor is independent to the channel of each vector；To six single channel shallow-layer nets It is merged so that image after characteristic size normalization after feature and deep layer Squeeze-Excitation networks after network The further feature of information is combined with shallow-layer feature.

9. a kind of traffic sign deep learning mode identification method according to claim 4, it is characterised in that：According to each The bottom mapping study convolutional neural networks of the input of scale feature fused layer first layer and the fitting of mesh size Fusion Features layer Residual error mapping, it is 64 to use number of filter, the bottom being fitted with network integration layer after 128,256 multi-scale feature fusion layer The residual error mapping of layer mapping study convolutional network, each scale feature fused layer carry out 3 superpositions, it is residual to form 9 layers of Fusion Features Poor network.

10. a kind of traffic sign deep learning mode identification method according to claim 4, which is characterized in that the step Rapid 2 the specific steps are：

Specifically control parameter includes：For the initial type number NP of network training, maximum learns algebraically N, network training effect Desired value e and training sample number M；

2-2, the processing of training sample data：

Traffic sign pretreatment image is directly inputted to network, allows the multilayer convolutional neural networks of network front end to carry out automatic special Sign extraction, is then input in output layer excitation function；

2-3, the setting of multilayer convolutional neural networks：

Multilayer convolutional neural networks model uses fully connected topology, is by feature extraction layer, characteristic weighing channel, Fusion Features Layer and output layer are constituted；Feature extraction layer is also referred to as shallow-layer feature extraction layer, is made of the different convolution pond layer of 6 kinds of scales；

2-4, the setting in characteristic weighing channel：

Given input x, the feature number of plies is c_1, and a feature port number is obtained by being converted to characteristic layer ranking operation For the feature of c_2；In form, statistics z ∈ R^CIt is to be generated by shrinking U on Spatial Dimension W × H, wherein c-th of z yuan Element is calculate by the following formula：

Weight is generated for each feature channel by parameter, wherein parameter is learnt for explicitly Modelling feature interchannel Correlation；Selection is activated using sigmoid：

S=F_ex(z, W)=σ (g (z, W))=σ (W₂δ(W₁z))

Wherein, δ refers to ReLU functions, W₁∈R^C_3×CAnd W₂∈R^C_3×C, i.e. dimensionality reduction layer parameter is W₁, followed by a parameter is W₂ Liter tie up layer, C_3 is set as 16；

In order to which limited model complexity and auxiliary are extensive, operated using Dropout in two full connection (FC) layers, dropout subtracts Simultaneous adaptation between weak neuron node, enhances generalization ability, node dropout rates are set as 0.5；

The importance that the weight of the output of Excitation is regarded as to each feature channel after feature selecting, then leads to Multiplication is crossed by channel weighting to previous feature, completes the recalibration to primitive character on channel dimension：

x′_c=F_scale(u_c,s_c)=s_c·u_c

2-5, the setting of Fusion Features model block：

The output feature of six branches is combined, final output is characterized as that k is tieed up, the output feature of the network integration layer into Row BatchNormalization normalization carries out out_dim 1 × 1 convolution kernels and ReLU activation；Fusion Features module it is defeated Enter layer to be used as down together with by the output feature of multiple dimensioned Model Fusion by out_dim 1 × 1 convolution kernels and ReLU activation The input feature vector of one Fusion Features module forms a residual error network model；

2-6, the setting of residual error learning model block：

H (x) bottoms being fitted as network superimposed layer are mapped using residual error learning model block, wherein x indicates each superimposed layer First layer input；Assuming that multiple nonlinear network layers can progressively approach complicated function, it is equivalent to non-linear layer institute gradually Close residual error function, i.e. H (x)-x；Therefore, these non-linear layers is allowed to be similar to residual error function：F (x)=H (x)-x；So, former Function becomes F (x)+x；

2-7 records the network parameter of training study, obtains the improvement of the multilayer excitation function of a 9 layer network models 43 output Type convolutional neural networks；