CN108985217A

CN108985217A - A kind of traffic sign recognition method and system based on deep space network

Info

Publication number: CN108985217A
Application number: CN201810751516.8A
Authority: CN
Inventors: 侯振杰; 朱军; 林恩; 莫宇剑; 王涛; 林锦雄
Original assignee: Changzhou University
Current assignee: Changzhou University
Priority date: 2018-07-10
Filing date: 2018-07-10
Publication date: 2018-12-11

Abstract

The invention belongs to traffic information processing technology fields, disclose a kind of traffic sign recognition method and system based on deep space network, and the image information with space-invariance is extracted using spatial alternation network；By improving VGG network, the feature that converged network layer each stage extracts handles fused characteristics of image using spatial pyramid method, obtains the characteristic information of more Traffic Sign Images.Spatial alternation network is added in the present invention, and the deficiency of the image information with space-invariance cannot effectively be extracted by making up CNN；VGG network is improved, the feature that converged network layer each stage extracts handles fused characteristics of image using spatial pyramid method, obtains the characteristic information of more Traffic Sign Images.The experimental results showed that improving the accuracy of identification Traffic Sign Images on the basis of enhancing network characterization.

Description

A kind of traffic sign recognition method and system based on deep space network

Technical field

The invention belongs to traffic information processing technology field more particularly to a kind of traffic signs based on deep space network Recognition methods and system.

Background technique

Currently, the prior art commonly used in the trade is such that

Sun Wei [] et al. proposes to extract multilayer feature using CNN network, and uses pyramid pond method, multiple dimensioned pond These features are combined to obtain multiattribute traffic sign feature vector, and input in ELM classifier and carry out by each layer feature Quickly identification traffic sign；

Traffic Sign Recognition (Traffic Sign Recognition, TSR)^[1]It is intelligent transportation system The important composition module of (intelligent transportation system, ITS).Using image, the information processing technology into Row it is more accurate and timely traffic administration guarantee driving safety.Traffic Sign Recognition System mainly include road sign detection and Identify two parts.Common lane marker detection mainly includes based on color segmentation^[2], template matching is done based on road sign shape^[3]With And combine two methods^[4]Target detection is carried out, and these technologies are all quite mature.The difficult point of TSR is mainly reflected in knowledge On not, since Traffic Sign Recognition is carried out in outdoor complex environment, great challenge, main body are brought to researcher Now^[5]: the color of road sign becomes dim；Outdoor complicated light condition；Barrier blocks road sign；The road sign mould of high-speed motion Paste etc..

In recent years, at convolutional neural networks (convolutional neural network, CNN) for the image of representative Reason technology all achieves greater advance in target identification in semantic segmentation and target detection, also become intelligent transportation field The emphasis of research.Convolutional neural networks directly input traffic image in model, automatically extract validity feature, improve image recognition Efficiency.Based on this, many scholars explore the identification in terms of effective deep learning model is applied to traffic sign.Sermanet Deng^[6]Multi-scale image is put into training in network model, and the Feature Mapping of different convolutional layers is combined as final feature Vector recently enters the identification of full articulamentum, and enhances data, increases sample diversity, meets model to image space not Denaturation demand has fine recognition effect, but the acquisition of multi-scale image feature also increases the model training time.Jin etc.^[7] The inspiration of hinge loss function in supported vector machine (support vector machine, SVM) proposes to lose using hinge Learn the convolution kernel in CNN, completion automatically extracts feature and identification image, but also results in heavy computation burden.Xie etc.^[8]It will Fish criterion is added in deep learning network, and obtains the feature for meeting chronic space-invariance, is identified using svm classifier. Wang^[9]Deng proposition to the independent pre-training of different convolutional layers in CNN network, retain network weight parameter, and comprehensive batch is added It normalizes (Batch Normalization, BN), greatly shortens the training time, but Traffic Sign Images discrimination still has promotion Space.Zhang^[10]Et al. construct the CNN network that a variety of pond methods combine, not use GPU accelerate in the case where, The network has good effect on training time and discrimination.Zeng etc.^[11]Using CNN as feature extractor, with having added core The extreme learning machine (Extreme Learning Machine) of function is used as classifier, when greatly shortening identification traffic sign Between, but the feature lack of diversity that network obtains, especially to danger signal and other prohibitory sign effects of releasing far below it His method.

In conclusion problem of the existing technology is:

(1) the existing traffic sign recognition method based on deep learning is not able to satisfy model for space-invariances such as rotations Generalization ability and insufficient Small object image zooming-out characteristic quantity the problems such as, cause identification Traffic Sign Images poor accuracy.

(2) the existing traffic sign recognition method based on deep learning using data enhancing (translation including certain angle, Rotation and scaling etc. map functions) method, increase sample it is multifarious simultaneously, greatly increase deep learning network training friendship The time of logical mark is difficult to realize quick for traffic sign and has although can guarantee the accuracy of identification traffic sign Effect identification

(3) network layer of neural network selected by the existing traffic sign recognition method based on deep learning is very deep, and And there are many training parameter, cannot be efficiently applied to Traffic Sign Recognition.

Solve the difficulty and meaning of above-mentioned technical problem:

Traffic Sign Images are all Small object images, are difficult to obtain enough characteristic quantities using classical Let-5 network, and And influence final discrimination.And VGG network model very good solution this problem is selected, this is because VGG Web vector graphic is more Small convolution kernel obtains the feature of image, and more details feature can be expressed by network, and we change traditional VGG network Into required parameter in reduction training process can quickly identify the friendship in image in this way while guaranteeing traffic identification Logical mark.In addition, it is not direct picture that many traffic signs are concentrated in public data, certain transformation is had, therefore, it is necessary to We correct original image, and traditional data enhancing technology is difficult to realize and is accurately suitably become to traffic sign It changes, therefore, the present invention adaptively can correct these images, be conducive to image and exist by constructing spatial alternation network Feature representation is carried out in depth network model, completes classification.

[1]Wu T,Ranganathan A.A practical system for road marking detection and recognition[C]//Intelligent Vehicles Symposium.IEEE,2012:25-30.

[2]Luo L,Li X.A Method to Search for Color Segmentation Threshold in Traffic Sign Detection.[C]//International Conference on Image and Graphics.IEEE,2010:774-777.

[3]Landesa-Vázquez I,Parada-Loira F,Alba-Castro J L.Fast real-time multiclass traffic sign detection based on novel shape and texture descriptors[C]//International IEEE Conference on Intelligent Transportation Systems.IEEE,2010:1388-1395.

[4]Bahlmann C,Zhu Y,Ramesh V,et al.A system for traffic sign detection,tracking,and recognition using color,shape,and motion information [C]//Intelligent Vehicles Symposium,2005.Proceedings.IEEE.IEEE,2005:255-260.

[5]Ellahyani A,Ansari M E,Jaafari I E.Traffic sign detection and recognition based on random forests[M].Elsevier Science Publishers B.V.2016.

[6]Sermanet P,Lecun Y.Traffic sign recognition with multi-scale Convolutional Networks[C]//International Joint Conference on Neural Networks.IEEE,2012:2809-2813.

[7]Jin J,Fu K,Zhang C.Traffic Sign Recognition With Hinge Loss Trained Convolutional Neural Networks[J].IEEE Transactions on Intelligent Transportation Systems,2014,15(5):1991-2000.

[8]Xie Jin,Cai Zixing,Deng Haitao,et al.Classification of Traffic Signs Based on Deep Learning of Image Invariant Features[J].Journal of Computer-Aided Design&Computer Graphics,2017,29(4):632-640.(in Chinese)

(Xie Jin, Cai Zixing, Deng Haitao wait to calculate based on traffic sign classification [J] of image invariant features deep learning Machine Computer Aided Design and graphics journal, 2017,29 (4): 632-640.)

[9]Wang Xiaobin,Huang Jinjie,Liu Wenju.Traffic Sign Recognition Based on Optimized Convolutional Neural Network Structure[J].Journal of Computer Applications,2017,37(2):530-534.(in Chinese)

(Wang Xiaobin, gold is outstanding, and Liu Wen is lifted and calculated based on Traffic Sign Recognition [J] of optimization convolutional neural networks structure Machine application, 2017,37 (2): 530-534.)

[10]Zhang J,Huang Q,Wu H,et al.A Shallow Network with Combined Pooling for Fast Traffic Sign Recognition[J].Information,2017,8(2):45.

[11]Huang Z,Yu Y,Gu J.A novel method for traffic sign recognition based on extreme learning machine[C]//Intelligent Control and Automation.IEEE,2015:1451-1456.

[12]Jaderberg M,Simonyan K,Zisserman A,et al.Spatial Transformer Networks[J].2015:2017-2025.

[13]Goodfellow I J,Warde-Farley D,Mirza M,et al.Maxout Networks[J] .Computer Science,2013:1319-1327.

[14]Simonyan K,Zisserman A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].Computer Science,2014.

[15]He K,Zhang X,Ren S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2015,37(9):1904-1916.

[16]Srivastava N,Hinton G,Krizhevsky A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.

[17]Dan C,Meier U,Masci J,et al.A committee of neural networks for traffic sign classification[C]//International Joint Conference on Neural Networks.IEEE,2011:1918-1921.

[18]Stallkamp J,Schlipsing M,Salmen J,et al.Man vs.computer: benchmarking machine learning algorithms for traffic sign recognition.[J] .Neural Networks,2012,32(2):323-332.

[19]Berkaya S K,Gunduz H,Ozsen O,et al.On circular traffic sign detection and recognition[J].Expert Systems with Applications,2016,48:67-75.

Zaklouta F,Stanciulescu B,Hamdoun O.Traffic sign classification using K-d trees and Random Forests[C]//International Joint Conference on Neural Networks.IEEE,2011:2151-2155.

Summary of the invention

In view of the problems of the existing technology, the present invention provides a kind of Traffic Sign Recognitions based on deep space network Method and system.

The invention is realized in this way a kind of traffic sign recognition method based on deep space network, comprising:

Firstly, Traffic Sign Images conveying space converting network is obtained the affine transformation square of image using localized network Battle array, and using the backpropagation of neural network, optimal affine transformation matrix is obtained, more contains space-invariance to obtain Image；

Then changing image is extracted in the feature of network layer different phase, to different phase using improved VGG network Feature is merged, and the diversity of Traffic Sign Images feature is increased, and improved VGG network selects smaller size of convolution Core obtains the minutia of more images, is conducive to feature and preferably expresses in deep learning network layer；Fusion feature passes through Spatial pyramid pond method can obtain specified size characteristic vector, and being conducive to network acquisition feature can be in different point Traffic sign is identified in class device；Meanwhile characteristics of image is further enhanced, and export as the feature of traffic sign；

Finally classified with sotfmax classifier to traffic sign.

Further, before the training of image input network model, spatial alternation network is added, enhances the more of image space structure Sample, and sufficient spatial signature information is provided for CNN Network Recognition traffic sign；

Spatial alternation network is that affine transformation is carried out to image, passes through the change of 2 × 3 dimensions during the affine transformation of image It changes matrix to translate Traffic Sign Images, scale, rotation；

Transformation for mula is expressed as

Wherein, θ indicates the parameter in transformation matrix, and (x, y) indicates the pixel of image, and α indicates image after rotation State, image successively by translation, scaling and rotation spatial alternation；

Traffic Sign Images are after removing the preprocess method of illumination, by the office operated containing multilayer convolution sum pondization In portion's network, the transformation matrix A1 of initial 2 × 3 dimension is obtained；Grid Sampling is established using transformation matrix A1, is obtained desired defeated Image A2 out；Using input and output image corresponding relationship, bilinear interpolation is carried out to input picture, completes transformed image A3 is the input of identification module；

Spatial network includes localized network, 3 parts of mesh generator and sampler；

In localized network, input is Traffic Sign ImagesWherein W, H are the width and height of input picture, C indicates port number；Spatial alternation layer exports θ=f_loc(U), local network structure are as follows: input picture passes through f_locFunction obtains office The output of portion's network includes 6 neurons, transformation matrix θ needed for forming image.

Further, mesh generator is the sampling network of the parameter creation obtained according to localized network, is one group of point set, defeated Enter mapping and generates desired conversion output through over-sampling；Using the method for inverse transformation, the respective value of output pixel point is obtained, is converted Formula is as follows:

Wherein,Image input and output coordinate points are respectively indicated, θ indicates the output of localized network；To seat Scale value is normalized, so that x_i, y_i∈ [- 1,1]；

Output is established by sampler to contact with upper all pixels point gray value is inputted, and is made using sampling and input picture Bilinearity sampling is carried out, sampling formula indicates are as follows:

Wherein,Indicate input picture,Image after indicating output transform, H ', w ' expression The height and width of sampling grid, c indicate that port number is identical as original input Traffic Sign Images；It samples in formulaWithIn Partial derivative can be asked, and constantly update transformation matrix using improved VGG network backpropagation principle, and it is empty to obtain more images Between characteristic, for convolutional neural networks CNN identify image Biodiversity Characteristics are provided.

Further, include: based on the feature extracting method for improving VGG

The convolution kernel of multiple 3 × 3 sizes is selected to replace 5 × 5 convolution kernel, to CNN network image Edge Gradient Feature；Net The input of network is the gray level image after spatial alternation network processes, and 3 layers of convolutional layer and 1 layer of pond layer constitute VGG module, always 3 VGG modules are shared, the number of network layer convolution kernel is 32,32,32,32,32,32,64,64,64, VGG module different phase Obtaining feature sizes is 16 × 16 × 64,8 × 8 × 64 and 4 × 4 × 64；

The Fusion Features that different VGG modules are obtained, since the dimension of feature is different, selection obtains feature to VGG1 and carries out The operation of 2 × 2 pondizations, feature down-sampling are 8 × 8 × 32 dimension sizes, obtain feature 1, and obtain feature with VGG2 and merged, Fusion feature 8 × 8 × (32+32)=8 × 8 × 64 dimension size is obtained, feature 2 is obtained；2 down-sampling of feature is operated, with VGG3 feature, which blends, obtains feature 3, and dimension is after 4 × 4 × (64+64)=4 × 4 × 128.VGG modular character merges, more Minutia by e-learning；It carries out enhancing fusion feature using pyramid pond method again.

Further, using pyramid pond method, (so-called pyramid pond method is exactly the pond using different scale The characteristic pattern that core obtains convolution is converted into the feature vector of fixed size, is obtained not using pyramid pond method here With the characteristic information of scale, Enhanced feature improves Traffic Sign Recognition effect) feature after fusion is further enhanced, it obtains More sufficient target signature, the identification to Traffic Sign Images.

Another object of the present invention is to provide the Traffic Sign Recognition sides based on deep space network described in a kind of realize The computer program of method.

Another object of the present invention is to provide the Traffic Sign Recognition sides based on deep space network described in a kind of realize The information data processing terminal of method.

Another object of the present invention is to provide a kind of computer readable storage mediums, including instruction, when it is in computer When upper operation, so that computer executes the traffic sign recognition method based on deep space network.

Another object of the present invention is to provide the Traffic Sign Recognition sides based on deep space network described in a kind of realize The Traffic Sign Recognition System based on deep space network of method, the Traffic Sign Recognition System based on deep space network Include:

The image collection module of space-invariance, by Traffic Sign Images conveying space converting network, acquisition more contains The image of space-invariance；

Characteristics of image processing module extracts changing image in the spy of network layer different phase using improved VGG network Sign, merges different phase feature；Fusion feature passes through spatial pyramid pond method, further enhances characteristics of image, And it is exported as the feature of traffic sign；

Traffic sign categorization module classifies to traffic sign with sotfmax classifier.

Another object of the present invention is to provide a kind of equipped with the Traffic Sign Recognition based on deep space network System, Traffic Sign Recognition equipment.

In conclusion advantages of the present invention and good effect are as follows:

The invention proposes a kind of deep space network models to be applied to Traffic Sign Recognition.Spatial alternation network is added, The deficiency of the image information with space-invariance cannot effectively be extracted by making up CNN；Improve VGG network, each rank of converged network layer The feature that section is extracted handles fused characteristics of image using spatial pyramid method, obtains the spy of more Traffic Sign Images Reference breath.The experimental results showed that improving the accuracy of identification Traffic Sign Images on the basis of enhancing network characterization.

Detailed description of the invention

Fig. 1 is the traffic sign recognition method flow chart provided in an embodiment of the present invention based on deep space network.

Fig. 2 is the structural schematic diagram of STN network provided in an embodiment of the present invention.

Fig. 3 is improved VGG network model figure provided in an embodiment of the present invention.

Fig. 4 is SPP illustraton of model provided in an embodiment of the present invention.

Fig. 5 is input picture provided in an embodiment of the present invention and the image graph after STN is exported.

Fig. 6 is the cost function comparison diagram before and after addition STN provided in an embodiment of the present invention.

Fig. 7 is training set discrimination comparison diagram before and after addition STN network provided in an embodiment of the present invention.

Fig. 8 is the Traffic Sign Recognition System schematic diagram provided in an embodiment of the present invention based on deep space network.

In figure: 1, the image collection module of space-invariance；2, characteristics of image processing module；3, traffic sign classification mould Block.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.

It is not able to satisfy model for space invariances such as rotations for the existing traffic sign recognition method based on deep learning Property generalization ability and insufficient Small object image zooming-out characteristic quantity the problems such as, propose and a kind of net converted based on deep space The traffic sign recognition method of network.Firstly, obtaining and more containing space invariance Traffic Sign Images conveying space converting network The image of property；Then changing image is extracted in the feature of network layer different phase, to different phase using improved VGG network Feature is merged；Fusion feature passes through spatial pyramid pond method, further enhances characteristics of image, and as traffic sign Feature output；Finally classified with sotfmax classifier to traffic sign.It is tested, is tied on disclosed data set Fruit shows that this method can effectively improve Traffic Sign Recognition effect.

Below with reference to concrete analysis, the invention will be further described.

1, the Traffic Sign Recognition based on spatial alternation network

A kind of traffic sign model based on deep space network proposed, by spatial alternation network (Spatial Transformer Networks,STN)^[12]It is introduced into CNN model, spatial alternation adaptively is carried out to Traffic Sign Images, is obtained Take more space-invariance characteristic informations.Input of the image as convolutional neural networks after transformation, by multilayer convolution sum pond The feature extraction of Traffic Sign Images is completed in operation, and network layer acquisition feature is merged, and obtains Biodiversity Characteristics, and make With pyramid pond model, strengthens characteristics of image, convey full articulamentum, complete the identification of image, overall framework is as shown in Figure 1.

The 1.1 image diversity enhancings based on spatial alternation network

The influence that Traffic Sign Images are shot by illumination condition, motion blur and different angle, the space knot of image Structure changes, and the feature of image also becomes diversification, and traditional CNN model can identify these image sections, but cannot Image space invariance is solved, as part speed limit 60km/h traffic sign rotates by a certain angle and cannot effectively identify.The present invention mentions Out before the training of image input network model, spatial alternation network is added, enhances the diversity of image space structure, and is CNN Network Recognition traffic sign provides sufficient spatial signature information.The essence of spatial alternation network is to carry out affine change to image It changes, the affine transformation process of image is to be translated by the transformation matrix of 2 × 3 dimensions to Traffic Sign Images, is scaled, rotation Deng operation.

Transformation for mula can be expressed as

Wherein, θ indicates the parameter in transformation matrix, and (x, y) indicates the pixel of image, and α indicates image after rotation State, image successively by translation, scaling and rotation spatial alternation.

Traffic Sign Images are after removing the preprocess method of illumination, by the office operated containing multilayer convolution sum pondization In portion's network, the transformation matrix A1 of initial 2 × 3 dimension is obtained；Grid Sampling is established using transformation matrix A1, is obtained desired defeated Image A2 out；Using input and output image corresponding relationship, bilinear interpolation is carried out to input picture, completes transformed image A3 is the input of identification module.Spatial network mainly includes localized network, 3 parts of mesh generator and sampler, Fig. 2 table Show the structural schematic diagram of STN network.

In localized network, input is Traffic Sign ImagesWherein W, H are the width and height of input picture, C indicates port number.So spatial alternation layer exports α=f_loc(U), local network structure is similar to convolutional neural networks, contains volume Lamination, pond layer, full articulamentum etc..Different places is input picture by f_locFunction obtains the output of localized network Comprising 6 neurons, transformation matrix θ needed for forming image.

Mesh generator is to create sampling network according to the parameter of localized network acquisition, this is one group of point set, that is, is inputted Mapping generates desired conversion through over-sampling and exports.Using the method for inverse transformation, the respective value of output pixel point is obtained, transformation is public Formula is as follows:

Wherein,Image input and output coordinate points are respectively indicated, θ indicates the output of localized network.In order to The interpolation calculation for facilitating sampler needs that coordinate value is normalized, so that x_i, y_i∈ [- 1,1].

The effect of sampler is exactly to establish output to contact with upper all pixels point gray value is inputted, and use sampling and input Image is made to carry out bilinearity sampling, and sampling formula indicates are as follows:

Wherein,Indicate input picture,Image after indicating output transform, H ', W ' expression The height and width of sampling grid, C indicate that port number is identical as original input Traffic Sign Images.Due in sampling formulaWithMiddle partial derivative can be asked, and transformation matrix is constantly updated using the backpropagation principle of identification module part, obtain More image space characteristics identify that image provides Biodiversity Characteristics for convolutional neural networks, improve Traffic Sign Images discrimination.

1.2 based on the feature extraction for improving VGG

Convolutional layer (Convolutional Layer) and pond layer (Pooling Layer) are alternately schemed in CNN model The feature extraction of picture, in classification of the full articulamentum to Traffic Sign Images.Convolutional layer is to extract characteristics of image using convolution operation, Relationship between the character representation pixel and adjacent pixel of image, characteristics of image carry out convolution operation by weighting adjacent pixel and obtain ?.Traffic Sign Images are filtered by convolution kernel, and be activated Function Mapping, generate output characteristic pattern.Convolution operation It can be expressed as

Wherein,Indicate the i-th input of l network layer,It is the i-th output of l network layer, g indicates activation primitive, M_iIndicate the i-th selected input, * essence is that convolution kernel w is allowed to make convolution algorithm on l-1 layers of associated characteristic pattern, so After sum, add an offset parameter b.

Pond layer is generally connected to behind convolutional layer, is replaced with convolutional layer, and the feature extractor of image is constituted.Due to convolution It obtains characteristic pattern to be trained directly as feature vector, will cause huge computation complexity.Therefore, it is necessary to pass through Chi Huacao Make to carry out dimensionality reduction to characteristic pattern.Goodfellow etc.^[13]It is proposed the activation primitive for using maximum pond function as pond layer, The result that the maximum value in receptive field region is operated as pondization is directly chosen during pond.This pond has CNN model There is preferably anti-noise ability, while also reducing the dimension of feature.

Since traffic indication map seems Small object image, the convolutional layer of CNN model selects big convolution kernel to lose many weights The minutia wanted, influences target identification.The present invention proposes improved VGG^[14]Network model selects multiple 3 in VGG network The convolution kernel of × 3 sizes replaces 5 × 5 convolution kernel, makes up CNN network and extracts incomplete defect to picture edge characteristic.Choosing It is with the advantage of small convolution kernel: on the one hand can reduces the study of parameter in neural network, is on the other hand carried out more Nonlinear Mapping can increase the fitting and ability to express of network.Improved VGG model as shown in figure 3, the input of network be through Gray level image after crossing spatial alternation network processes, 3 layers of convolutional layer and 1 layer of pond layer constitute VGG module, a total of 3 VGG moulds Block, the number of network layer convolution kernel is 32,32,32,32,32,32,64,64,64, VGG module different phase obtain feature sizes It is 16 × 16 × 64,8 × 8 × 64 and 4 × 4 × 64.

Document^[6]By convolutional neural networks heterogeneous networks layer Fusion Features, characteristics of image is easier to express, and is effectively promoted and is handed over Logical sign image discrimination.The Fusion Features that different VGG modules are obtained, since the dimension of feature is different, selection obtains VGG1 It obtains feature and carries out the operation of 2 × 2 pondizations, feature down-sampling is 8 × 8 × 32 dimension sizes, obtains feature 1, and obtain spy with VGG2 Sign is merged, and is obtained fusion feature 8 × 8 × (32+32)=8 × 8 × 64 dimension size, is obtained feature 2；To being adopted under feature 2 Sample operation blends with VGG3 feature and obtains feature 3, and dimension is that 4 × 4 × (64+64)=4 × 4 × 128.VGG modular character melts After conjunction, more minutias are by e-learning.In next step, fusion feature is enhanced using pyramid pond method, improves figure As the discrimination in depth network model.

1.3 traffic sign characteristic strengthenings

Spatial pyramid pond method (Spatial Pyramid Pooling, SPP)^[15]Be proposed to solve it is defeated The problem of entering graphical rule variability, network with full articulamentum during connecting, using pyramid pond method, so that appointing The characteristic pattern of meaning size can be converted into the feature vector of specified dimension.Assuming that CNN network model, which obtains characteristic pattern, to be drawn It is divided into 2²×2²Then a characteristic area carries out characteristic area using three kinds of different size of scales (4 × 4,2 × 2,1 × 1) It divides, available 16+4+1=21 block, the maximum value in each region is acquired using maximum pond method, thus may be used in total To obtain the feature vector of 21 fixed dimensions.However select that fixed scale size divides characteristic pattern is nonoverlapping piece of area The marginal information of image is lost in domain, is unfavorable for the identification of overall profile information, leads to the decline of recognition accuracy.

The present invention further enhances feature after fusion using pyramid pond method, and it is special to obtain more sufficient target Sign completes the identification to Traffic Sign Images.Fig. 4 indicates the mistake of spatial pyramid pond fusion feature used in experimentation Journey, merging the feature in VGG network layer is 4 × 4 × 128, wherein 128 indicate port number, fusion feature is by 3 kinds of different scales Pondization operation, the pond method choice maximum pond pond Hua Zhi, obtaining character pair dimension vector is Isosorbide-5-Nitrae respectively, 16, always it is obtained The feature vector of 21 dimensions, then feature vector length is 21 × 128=2688, and original fusion feature length is 2048,

More characteristic informations are obtained to express in network model.Target object ruler can be effectively adapted to using SPP method Variability, the more flexible marginal information for obtaining image are spent, and uses the phase on the pondization reinforcement boundary and adjacent area that have overlapping Guan Xing, the boundary between blurred picture block, edge pixel point can also provide characteristic information, be conducive to the whole of Traffic Sign Images The extraction and identification of body profile information.On the basis of fusion feature, using pyramid pond method, more traffic can be obtained The characteristic information of sign image, and it is transported to full articulamentum, final identification classification is completed by softmax classifier, is being connected entirely Dropout module is added in socket part point^[16], prevent the generation of over-fitting.

Below with reference to experimental result and analysis, the invention will be further described.

2 experimental results and analysis

Experiment is run on Hewlett-Packard pavilion notebook, 64 bit manipulation system of win10, CPU 2.6GHz, interior 8G is saved as, video card NVIDIA GTX 960M uses Tensorflow deep learning frame training data.

2.1 data set

Disclosed German traffic data collection (Germany Traffic Sign Recognition Benchmark, GTSRB it is tested on), wherein training set has 39209, and test set has 12630, shares 43 class traffic signs.Original coloured silk All include a traffic sign in chromatic graph picture, 10% boundary is had around mark, these image pixels are from 15 × 15 to 250 × 250 differ in size, and are not necessarily rectangular.Network training is gray level image in experiment, because gray level image identifies Rate can faster complete image training, improve recognition efficiency in the case where guaranteeing and color image is very nearly the same.Deep learning Network Recognition target image takes the image of identical size, and Traffic Sign Images are scaled 32 × 32 sizes by selection.

The design of 2.2 network models

The given Traffic Sign Recognition algorithm based on deep space network first has to determine the net for extracting characteristics of image Network model.When extracting feature to Traffic Sign Images, the convolutional neural networks model that VGG module is added, referred to as VCNN are selected Model, so-called VGG module be exactly utilize the convolution kernels of 33 × 3 sizes instead of single 5 × 5 convolution kernel method, make up CNN network extracts incomplete defect to picture edge characteristic, and local acknowledgement normalization layer (Local Response is added Normalization, LRN), prevent the generation of over-fitting.Used network model such as table 2, wherein deep learning network packet Containing 9 convolutional layers, 3 pond layers, 3 LRN layers and 2 full articulamentums.

The network model that 1 present invention of table uses

Image inputs before network model training, needs to remove influence of the different light conditions to Traffic Sign Recognition.Choosing Use document^[6]The enhancing contrast method of middle proposition mainly includes greyscale transformation, histogram equalization and limits contrast certainly 3 kinds of preprocess methods such as adaptive equalization (CLAHE), reducing different illumination conditions influences Traffic Sign Recognition.Initial data Collection Data1 obtains data set Data2, Data3, Data4 after greyscale transformation, histogram equalization, CLAHE image procossing, 4 kinds of different data sets are respectively delivered to be trained on CNN model and VCNN model.Table 3 provides different data collection two Recognition result compares under kind network model, wherein it is 0.001 that initial learning rate, which is arranged, and dropout is added in full articulamentum

2 CNN model of table and VCNN model compare

Module reduces model over-fitting.The experimental results showed that 4 kinds of different data collection are equal in the recognition effect of VCNN model Better than CNN, wherein on Data2 data set, the accuracy of Traffic Sign Recognition can at most be promoted to 2.2%, illustrate to select The VCNN model of small convolution kernel obtains more image detail features, improves the validity of model training and the accuracy of identification.? Highest discrimination 96.87% is obtained on Data4 data set, illustrates that traffic sign after CLAHE processing, effectively removes in image Redundancy.

2.3STN increases image diversity

It proposes that STN network is added before VCNN network, adaptively carry out spatial alternation to image, obtaining has space not The traffic sign of Vertic features.Local network model is designed in STN, as shown in table 3, localized network is by convolutional layer, pond layer And full articulamentum composition, the full articulamentum of the last layer export 6 neurons, the parameter of corresponding affine transformation matrix.Utilize mind Transformation matrix is constantly updated in backpropagation through network, is increased the diversity of traffic sign feature representation, is promoted the accurate of identification Degree.Fig. 5 indicates that test image enters the Contrast on effect before and after STN network in 4 data set of Data, and first row is test image,

3 localized network model of table

Second row is the transformed image of STN, it can be seen that gradually eliminates image after spatial alternation e-learning Geometry noise and background information, only retain the area-of-interest of input picture, and are input in VCNN network, complete traffic sign Identification.

Fig. 6 and Fig. 7 indicates that VCNN network is added before and after STN network, the cost function and discrimination situation of traffic sign Comparison, explanation in Fig. 6 (a), after STN network is added, noise and complicated background information due to removal image, network model Convergence rate faster, optimize network model, be obtained with higher Traffic Sign Recognition rate using lesser cost value；Figure It is provided in 6 (b), the final cost value of VCNN+STN network is 0.256, and the cost value of VCNN network is 0.275, furtherly Bright, VCNN+STN network needs less cost value to complete Traffic Sign Recognition.Explanation in Fig. 7 (a), after STN network is added, net Network obtains feature more with space-invariance, is conducive to traffic sign expression characteristic in a model, improves network model pair The accuracy of Traffic Sign Recognition；It is provided in Fig. 7 (b), VCNN+STN network discrimination on training set is 99.5%, and VCNN Network Recognition rate is 99.1%, and discrimination improves 0.4%；On test set, the model after STN network is added is in Data4 number According to the discrimination for obtaining 97.45% on collection, and the discrimination of VCNN network is then 96.87%, and discrimination improves 0.58%.Institute With spatial alternation network enhances the diversity of image space structure, effectively promotes the discrimination of traffic sign.Such as following table.

Traffic Sign Recognition effect before and after STN network is added in table

Table 4 Traffic sign recognition effect before and after STN network

STN network is added as can be seen from the above table in comparison before and after deep learning network layer carries out that STN network is added Afterwards, there is good promotion on the training set core test set of traffic sign, wherein training set improves 0.4%, is promoted on test set 0.58%, illustrate that STN network, which is added, has higher discrimination to traffic sign.In addition, STN network is added in penalty values Afterwards, it needs smaller penalty values to be obtained with traffic discrimination, illustrates that improved network can quickly identify traffic sign, Promote recognition effect.

2.4 SPP strengthen feature

The present invention is directed to the characteristics of traffic indication map seems Small object image, and it is strong to carry out feature to traffic sign using SPP Change.Document^[6]Network layer different phase characteristic binding is chosen, Traffic Sign Recognition rate is effectively promoted.Therefore, by heterogeneous networks layer Characteristic binding, obtain 4 × 4 × 128 dimensions feature.Select 3 kinds of different characteristic regions such as 4 × 4,2 × 2,1 × 1 to joint Feature carries out pyramid pond, obtains the feature vector of 16+4+1=21 dimension, then the length of feature vector is 21 × 128= Feature after reinforcing is inputted full articulamentum by 2688., completes Traffic Sign Recognition using softmax classifier.To prevent network Over-fitting occurs, dropout module is added.As dropout=0.1, inhibit more multi-neuron, network cannot effectively identify friendship Logical mark；As dropout=1, neuron is completely in state of activation, is easy to happen over-fitting, influences recognition effect.In order to Estimate different dropout values, iteration 20 times on VCNN+STN+SPP network, more different dropout values are to traffic sign The influence of identification, as shown in table 4, as dropout=0.4, discrimination highest can achieve 98.42%, than VCNN+STN's Discrimination improves 0.97%；As dropout=0.1, discrimination minimum 97.54%, also the discrimination than VCNN+STN is promoted 0.09%, so pyramid pondization effectively enhancing characteristics of image, further promotes the accuracy of Traffic Sign Recognition.

Influence of the 4 difference dropout value of table to feature is strengthened

Table 5 and table 6 list method proposed by the present invention compared with the arithmetic result of current mainstream respectively.Table 5 provides Germany Open traffic data collection contest first place Committee of CNNS method^[17]、Multi-scale CNN^[6]Method, Human average^[18]Method, GABOR+LBP+HOG^[19]4 kinds of algorithm comparisons such as method, the present invention classify correctly on GTSRB data set Rate is higher than the methods of multiple dimensioned CNN method and GABOR+LBP+HOG multiple features fusion, the slightly below side CNNS Committee of Method and human eye Direct Recognition, wherein Committee of CNNS method utilizes multiple Network Recognition traffic signs, training time 37h is needed, especially time-consuming, real-time recognition efficiency substantially reduces in assorting process；The present invention proposes to improve CNN network model, Completing the training required time is 102.1s, and whole test images are completed the time required to identification to be 6.2s, can be effectively right in real time Traffic Sign Recognition.

Table 5 and classic algorithm compare

The performance of this algorithm is studied for deep layer, table 6 provides inventive algorithm and other algorithms to traffic sign Sub Data Set Recognition result comparison, by traffic sign data set be divided into speed(-)limit sign dataset1, other prohibitory signs dataset2, release Prohibitory sign dataset3, Warning Mark dataset4, danger signal dataset5 and other marks dataset6 etc. 6 Sub Data Set.Algorithm proposed by the present invention is to speed(-)limit sign, other prohibitory signs, Warning Mark, other marks 4 Sub Data Set Discrimination is higher, but to lift a ban mark and danger signal discrimination lower than other classic algorithms, this is because lifting a ban Indicate that original image is gray image, after gray processing is handled, loses many marginal informations；Danger signal original image is by contracting Dimension normalization is discharged into, deformation occurs for traffic indication map case, and this kind of traffic sign shape is much like, Network Recognition traffic mark Will obscures these images, classification error.And method proposed by the present invention is to ignore part edge loss of learning and image generation Deformation bring influences, and using spatial alternation network, obtains multifarious space characteristics, SPP network is added, it is special to strengthen network layer Sign, further increases Traffic Sign Recognition rate.

The recognition result of the different traffic sign Sub Data Sets of table 6

Below with reference to effect, the invention will be further described.

The traffic sign recognition method based on deep space network is analyzed, spatial alternation network is introduced, it is more to increase image The space characteristics of sample remove geometry noise and extra background information, only retain the area-of-interest of image.STN net is being added On the basis of network, by the Fusion Features of heterogeneous networks layer, SPP method is introduced, is chosen under various sizes of pondization verification fusion feature Characteristics of image is strengthened in sampling, obtains Network Recognition traffic sign abundance feature, promotes Traffic Sign Recognition rate.

For the enhancing and acquisition of characteristics of image after dimension normalization, optimize network model, it can be with using lesser cost It obtains to higher Traffic Sign Recognition rate.But image passes through gray processing, dimension normalization method, lost part marginal information. Further work, it is the methods of unrelated using scale by image transfer zone others color space, more marginal informations are obtained, are mentioned Rise the accuracy of Traffic Sign Recognition.

As shown in figure 8, the embodiment of the present invention provides a kind of Traffic Sign Recognition System based on deep space network includes:

The image collection module 1 of space-invariance, by Traffic Sign Images conveying space converting network, acquisition more contains The image of space-invariance；

Characteristics of image processing module 2 extracts changing image in the spy of network layer different phase using improved VGG network Sign, merges different phase feature；Fusion feature passes through spatial pyramid pond method, further enhances characteristics of image, And it is exported as the feature of traffic sign；

Traffic sign categorization module 3 classifies to traffic sign with sotfmax classifier.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or Multiple computer instructions.When loading on computers or executing the computer program instructions, entirely or partly generate according to Process described in the embodiment of the present invention or function.The computer can be general purpose computer, special purpose computer, computer network Network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or from one Computer readable storage medium is transmitted to another computer readable storage medium, for example, the computer instruction can be from one A web-site, computer, server or data center pass through wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL) Or wireless (such as infrared, wireless, microwave etc.) mode is carried out to another web-site, computer, server or data center Transmission).The computer-readable storage medium can be any usable medium or include one that computer can access The data storage devices such as a or multiple usable mediums integrated server, data center.The usable medium can be magnetic Jie Matter, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of traffic sign recognition method based on deep space network, which is characterized in that described to be based on deep space network Traffic sign recognition method include:

Firstly, Traffic Sign Images conveying space converting network is obtained into the affine transformation matrix of image using localized network, and Using the backpropagation of neural network, optimal affine transformation matrix is obtained, to obtain more containing the figure of space-invariance Picture；

Then changing image is extracted in the feature of network layer different phase, to different phase feature using improved VGG network It is merged, increases the diversity of Traffic Sign Images feature, improved VGG network is selected smaller size of convolution kernel, obtained The minutia for obtaining more images, for being expressed in deep learning network layer；Fusion feature is by spatial pyramid pondization side Method obtains specified size characteristic vector, identifies traffic sign in different classifiers；Meanwhile characteristics of image is further enhanced, And it is exported as the feature of traffic sign；

Finally classified with sotfmax classifier to traffic sign.

2. the traffic sign recognition method as described in claim 1 based on deep space network, which is characterized in that image input Before network model training, spatial alternation network is added, enhances the diversity of image space structure, and hand over for CNN Network Recognition Logical mark provides sufficient spatial signature information；

Spatial alternation network is that affine transformation is carried out to image, passes through the transformation square of 2 × 3 dimensions during the affine transformation of image Battle array translates Traffic Sign Images, scales, rotation；

Transformation for mula is expressed as

Wherein, θ indicates the parameter in transformation matrix, and (x, y) indicates the pixel of image, and α indicates shape of the image after rotation State, image is successively by translation, the spatial alternation of scaling and rotation；

Traffic Sign Images are after removing the preprocess method of illumination, by the local area network operated containing multilayer convolution sum pondization In network, the transformation matrix A1 of initial 2 × 3 dimension is obtained；Grid Sampling is established using transformation matrix A1, obtains desired output figure As A2；Using input and output image corresponding relationship, bilinear interpolation is carried out to input picture, completing transformed image A3 is The input of identification module；

In localized network, input is Traffic Sign ImagesWherein W, H are the width and height of input picture, C table Show port number；Spatial alternation layer exports θ=f_loc(U), local network structure are as follows: input picture passes through f_locFunction obtains local area network The output of network includes 6 neurons, transformation matrix θ needed for forming image.

3. the traffic sign recognition method as claimed in claim 2 based on deep space network, which is characterized in that

Mesh generator is the sampling network of the parameter creation obtained according to localized network, is one group of point set, and input mapping is passed through Sampling generates desired conversion output；Using the method for inverse transformation, the respective value of output pixel point is obtained, transformation for mula is as follows:

Wherein,Image input and output coordinate points are respectively indicated, θ indicates the output of localized network；To coordinate value It is normalized, so that x_i, y_i∈ [- 1,1]；

Output is established by sampler to contact with upper all pixels point gray value is inputted, and is carried out using sampling and input picture Bilinearity sampling, sampling formula indicate are as follows:

Wherein,Indicate input picture,Image after indicating output transform, H ', W ' expression sampling The height and width of grid, C indicate that port number is identical as original input Traffic Sign Images.

4. the traffic sign recognition method as described in claim 1 based on deep space network, which is characterized in that based on improvement The feature extracting method of VGG includes:

The convolution kernel of multiple 3 × 3 sizes is selected to replace 5 × 5 convolution kernel, to CNN network image Edge Gradient Feature；Network Input is the gray level image after spatial alternation network processes, and 3 layers of convolutional layer and 1 layer of pond layer constitute VGG module, a total of 3 VGG modules, the number of network layer convolution kernel is 32,32,32,32,32,32,64,64,64, VGG module different phase obtains Feature sizes are 16 × 16 × 64,8 × 8 × 64 and 4 × 4 × 64；

The Fusion Features that different VGG modules are obtained, since the dimension of feature is different, selection obtains feature to VGG1 and carries out 2 × 2 Pondization operation, feature down-sampling are 8 × 8 × 32 dimension sizes, obtain feature 1, and obtain feature with VGG2 and merged, obtain Fusion feature 8 × 8 × (32+32)=8 × 8 × 64 dimension size, obtains feature 2；2 down-sampling of feature is operated, with VGG3 spy Sign, which blends, obtains feature 3, and dimension is minutia quilt after the fusion of 4 × 4 × (64+64)=4 × 4 × 128.VGG modular character E-learning；It carries out enhancing fusion feature using pyramid pond method again.

5. the traffic sign recognition method as described in claim 1 based on deep space network, which is characterized in that use golden word Tower basin method further enhances feature after fusion, obtains more sufficient target signature, the identification to Traffic Sign Images.

6. a kind of realize described in Claims 1 to 5 any one based on the traffic sign recognition method of deep space network Calculation machine program.

7. a kind of letter for realizing the traffic sign recognition method based on deep space network described in Claims 1 to 5 any one Cease data processing terminal.

8. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed Benefit requires the traffic sign recognition method based on deep space network described in 1-5 any one.

9. a kind of traffic sign recognition method realized based on deep space network described in claim 8 based on deep space net The Traffic Sign Recognition System of network, which is characterized in that the Traffic Sign Recognition System based on deep space network includes:

Traffic Sign Images conveying space converting network is obtained and more contains space by the image collection module of space-invariance The image of invariance；

Characteristics of image processing module, using improved VGG network extract changing image network layer different phase feature, it is right Different phase feature is merged；Fusion feature passes through spatial pyramid pond method, further enhances characteristics of image, and conduct The feature of traffic sign exports；

10. a kind of Traffic Sign Recognition System equipped with based on deep space network described in claim 9, traffic sign is known Other equipment.