CN110188705A

CN110188705A - A kind of remote road traffic sign detection recognition methods suitable for onboard system

Info

Publication number: CN110188705A
Application number: CN201910474059.7A
Authority: CN
Inventors: 刘志刚; 杜娟; 田枫; 韩玉祥; 高雅田; 张可佳
Original assignee: Northeast Petroleum University
Current assignee: Northeast Petroleum University
Priority date: 2019-06-02
Filing date: 2019-06-02
Publication date: 2019-08-30
Anticipated expiration: 2039-06-02
Also published as: CN110188705B

Abstract

The present invention relates to a kind of remote road traffic sign detection recognition methods suitable for onboard system, it includes: that 1. pairs of Traffic Sign Images sample sets pre-process；2. constructing light-type convolutional neural networks, the convolution feature extraction of traffic sign is completed；3. paying close attention to power module, building concern power characteristic pattern by the channel-space being embedded into light-type convolutional neural networks；4. generating the candidate region of target using Area generation network RPN；5. the object candidate area that couple RPN is generated introduces context area domain information, enhance label category feature；6. feature vector is sent into full articulamentum, classification and the position of traffic sign are exported；7. establishing concern power loss function, training FL-CNN model；8. repeating 2 to 7, the sample training of FL-CNN model is completed；9. repeating 2 to 6, the road traffic sign detection identification of actual scene is completed.The present invention realizes the identification of remote road traffic sign detection, and precision is up to 92%.

Description

A kind of remote road traffic sign detection recognition methods suitable for onboard system

One, technical field:

The present invention relates to towards it is unmanned and auxiliary drive intelligent transportation field, solving road traffic sign Remote detection and recognition methods, and in particular to be a kind of remote road traffic sign detection identification side suitable for onboard system Method.

Two, background technique:

In intelligent transportation field, road traffic sign detection and identification are that the important of systems such as unmanned, auxiliary driving is ground Study carefully problem.Many research work have been carried out to this both at home and abroad, but there are still very big deficiency, can not have been applied in practice.It is former Because as follows: (1) traditional detection recognition method based on characteristic Designs such as color, shapes, the mark in actual traffic scene Situations such as will deformation, motion blur, weather, these method robustness are poor, are difficult to be applied in practice；(2) existing Detection recognition method based on deep learning, the Parameter File after model export is huge, and when operation needs biggish memory, hard disk Storage, can not run directly on power supply power consumption and the lower onboard system of hardware performance, practicability is poor；(3) in data set Aspect, the data set that some methods directly use oneself to shoot, the variation of data bulk and mark is all less, is difficult that model is allowed to exist Practice in practice；In addition, some methods are based on disclosed German traffic sign data set GTSRB and GTSDB, but these are counted It is less according to concentrating landmark size larger and detecting mark type.Since the performance of data the set pair analysis model has significant effect, These methods belong to the detection identification of short distance at present.

Onboard system in unmanned and auxiliary driving belongs to a kind of embedded system, therefore power supply power consumption and hardware Can be lower, it can not directly run huge deep learning model.Meanwhile remote detection distinguishing mark can provide for running car More response times, the safety for improving automobile intelligent traveling play an important role.Technically, remote traffic mark Will detection identification belongs to the identification problem of the Small object in complex background, which is that the difficult point of current computer vision field is asked Topic, existing method are difficult to obtain higher detection accuracy of identification.

Three, summary of the invention:

The object of the present invention is to provide a kind of remote road traffic sign detection recognition methods suitable for onboard system, it is used Precision is low when solving the problems, such as existing short distance detection recognition method in progress remote road traffic sign detection identification.

The technical solution adopted by the present invention to solve the technical problems is: this remote traffic suitable for onboard system Mark Detection recognition methods:

Step 1. pre-processes Traffic Sign Images sample set；

Step 2. constructs light-type convolutional neural networks, completes the convolution feature extraction of traffic sign；

(1) Joint Mapping in the channel of original VGG-16 Plays convolution and space is separated into using depth separation convolution The independent mapping mode of the two reduces the number of parameters and hard drive space storage of model, and light-type convolutional neural networks include 5 altogether A convolutional layer, wherein each convolutional layer includes depth convolution sum point convolution two parts, and using ReLU as activation primitive；

(2) in light-type convolutional neural networks, depth separates convolution and the calculation amount comparison of original Standard convolution is as follows:

Note convolution kernel is (D_K,D_K, C), wherein D_KWide and high for convolution kernel, C is the channel of convolution kernel；In convolutional calculation mistake Cheng Zhong, depth separation convolution are N number of (D by original across path computation_K,D_K, M) and Standard convolution changes into and is converted into M (D_K,D_K, 1) the point convolution of N number of (1,1, M) of the depth convolution sum across channel, wherein depth convolution be single channel calculate, point convolution be across Path computation；Remember that input feature vector figure is { D_F,D_F, M }, output characteristic pattern is { D_F,D_F, N }, wherein D_FIndicate characteristic pattern width and Height, then the calculation amount of each convolution is as follows:

1. the calculation amount of Standard convolution are as follows: Count_s=D_K×D_K×M×N×D_F×D_F

2. the calculation amount of depth convolution are as follows: Count_d=D_K×D_K×M×D_F×D_F

3. putting the calculation amount of convolution are as follows: Count_p=M × N × D_F×D_F

Therefore, the relativity of the calculation amount of depth separation convolution sum Standard convolution is as follows, every time using depth separation volume Product, relative to original Standard convolution, calculation amount declineTimes；

Step 3. pays close attention to power module, building concern power by the channel-space being embedded into light-type convolutional neural networks Characteristic pattern；

Power characteristic pattern is paid close attention to, the concern power mechanism of the mankind is copied, realizes the convolution feature to the small traffic sign in scene Enhanced, feature suppression is carried out to unrelated background information, saves computing resource, improves detection accuracy, channel-space is closed Note power module is embedded into depth separation convolutional layer, carries out channel and space to the output characteristic pattern of each depth separation convolutional layer The feature concern of two dimensions；Wherein concern power in channel is the correlation and significance level using interchannel, pays close attention to a figure " what " is most significant as in；Space concern power is concerned with the position feature of target in the picture, " where " image is examined It is more efficient to survey identification；

The output characteristic pattern that some depth separates convolutional layer is U=[u₁, u₂..., u_C],C is port number,For real number matrix, H and W are respectively the height and width of characteristic pattern.After channel-concern power module, the concern power feature of building Figure is Y=[y₁, y₂..., y_C],The specific calculating process of the process is as follows:

Pay close attention to power in step 3.1 channel

Power is paid close attention to calculate channel, first along channel direction, the Spatial Dimension in each channel in compressive features figure, respectively Spatial information is polymerize using maximum, average and random three kinds of global pool modes, wherein maximum and average pondization is distinguished Retain the texture and background feature of image, and random poolization is then between；

First by global maximum pond, respectively by each u_iIt is compressed to channel concern power mask componentIts Is defined as:

Again by global average pondization and global random pool, each u is compressed respectively_iPower mask component S is paid close attention to channel_mean And S_stoIn, is defined as:

Wherein

Secondly, power mask component is paid close attention in the channel of three kinds of pondization compression buildings, respectively as the defeated of multiple perceptron model Enter, by being multiplied point by point of weighting parameter and mask component, cumulative and activation primitive is completed to polymerize, to further increase non-thread Property.Pay close attention to power mask S=[s in the channel of characteristic pattern U₁,s₂,…,s_C] it is defined as follows:

S=σ (W₁δ(W₀S_max)+W₁δ(W₀S_mean)+W₁δ(W₀S_sto))

Wherein σ is sigmoid function, and δ is ReLu function.W₀And W₁For the weight of multiple perceptron model, these parameters pair It is shared for paying close attention to power mask components in three channels；

It is originally inputted characteristic pattern U finally, channel concern power mask is expanded to, according to mask to each of characteristic pattern U Corridorbarrier function weight, the new feature figure after note channel concern power mapping areIt is specifically defined are as follows:

Pay close attention to power in step 3.2 space

Power mapping, pixel or interregional characteristic relation, first compressive features in construction feature figure are paid close attention to calculate space The port number of figure is originally inputted characteristic pattern to depth separation convolutional layer using one group of pointwise convolutionAcross Channel polymerization, the characteristic pattern after polymerization are denoted as

Secondly as the characteristic of the characteristic pattern of different layers has a very big difference, the resolution ratio of shallow-layer characteristic pattern is higher and deep layer Characteristic pattern then in contrast, and includes more abstract semantics feature；Therefore the present invention is when constructing space concern power, to reduce Parameter and reduction calculation amount, space concern power mask are carried out respectively by region and pixel in shallow-layer and further feature figure；It is empty Between concern power mask beIt is defined as follows:

N=Softmax (Conv (M, o, k, s, p))

Wherein Conv () represents convolution operation, output channel o=1, the convolution kernel size k=1 of shallow convolutional layer, deep convolution Convolution kernel the size k=3, s=1 and p=0 of layer are respectively the step-length and filling of convolution, in addition, to eliminate different characteristic figure scale Different influences normalizes space concern power mask using Softmax function；

Step 3.3 pays close attention to power characteristic pattern

Characteristic pattern after channel concern power mappingOn the basis of, space is carried out again pays close attention to power maskExtension.Power mask is paid close attention to according to space, and power is paid close attention to each corridorbarrier function space of characteristic pattern X, it is final to generate The output characteristic pattern of depth separation convolutional layerThe input of convolutional layer is separated as next depth, is defined as:

WhereinRepresent point-by-point be multiplied；

Step 4 generates the candidate region of target using Area generation network RPN on the basis of paying close attention to power characteristic pattern；

The region that traffic sign is likely to occur is navigated in traffic scene, then FL-CNN is carried out further according to these regions The classification of mark；

The object candidate area that step 5. generates RPN introduces context area domain information, enhances the characteristic of division of mark；

For the object candidate area that step 4 provides, only includes the Partial Feature of traffic sign, introduce target candidate area The adjacent feature in the space in domain, to enhance the characteristic of division of mark, the specific steps are as follows:

(1) for convenience of description, some object candidate area is denoted as p=(p_x,p_y,p_w,p_h), wherein (p_x,p_y) it is region Center, (p_w,p_h) width and height that indicate the region use on the concern power characteristic pattern of the last one depth separation convolution output Scale factorWithCreate context areaCentre coordinate is identical as corresponding object candidate area；On Context area and the relationship of candidate region can be described as follows, and wherein i is the serial number of context area；

(2) for the context area of each object candidate area and it, using RoI-Pooling respectively horizontal, perpendicular Histogram is handled every a down-sampling for all carrying out maximum value pond to being divided into 7 parts, even if the region that size is different, output Dimension still maintains and unanimously generates the feature vector that 3 fixed dimensions are 7 × 7 × 512；

(3) it on Spatial Dimension, is serially connected, forms 3 × 7 × 7 × 512 feature vector；

(4) 7 × 7 × 512 are compressed to the feature vector formed in (3) using 1 × 1 convolution, make to introduce context area The dimension of feature vector behind domain meets the node requirement of full articulamentum, is learnt between background and target using 1 × 1 convolution Non-linear relation, when the context area of introducing contains complex background, deconvolution parameter inhibits these backgrounds, on the contrary, if introducing The local feature of target, deconvolution parameter then enhance these features；

Feature vector is sent into the full articulamentum for being used to classify and return by step 6., exports classification and the position of traffic sign It sets；

Traffic sign is used for using Network, FC, the first FC networks of two fully-connected network Fully Connected Classification, wherein 4096 nodes of hidden layer, export 44 nodes, and each output node represents a kind of traffic sign, and codomain is Between (0,1), when classification, takes maximum output node as traffic sign classification；Second FC network is used for the position of traffic sign It puts back into and returns, wherein 4096 nodes of hidden layer, export 4 nodes, respectively represent the center point coordinate and width of traffic sign；

Step 7. establishes concern power loss function, training FL-CNN model；

To guarantee that model is trained up and improved the generalization ability of model, concern power loss function is established, it is difficult to realize Effective differentiation of easy sample, inhibits the loss for easily dividing sample, divides the loss of sample to enhance hardly possible；FL-CNN model Training include two parts, be RPN network and full connection layer network respectively, wherein the loss of RPN network includes two classification damages Recurrence of becoming estranged is lost, and the loss of fully-connected network is more Classification Loss and recurrence loss；

(1) the concern power loss function of RPN network is as follows:

Wherein p_iIndicate that i-th of anchor is the prediction probability of target object,Indicate the true tag of target.t_iIt is one A vector, the high information of centre coordinate, width comprising prediction block.Indicate the information vector of true frame.N_clsIndicate that anchor's is total Number, N_regIndicate the size of characteristic pattern, λ is adjustment factor, Ke YiquL_regIndicate all recurrence damages for surrounding frame It loses,It is two Classification Loss of concern power proposed by the present invention, is defined as follows:

Wherein σ is sigmoid function, and the prediction probability of prospect sample is-log σ (x), the prediction probability of background sample is- Log σ (- x), K are constants.The loss function has the property that if sample, which belongs to, easily divides sample ,-log σ (x) → 1 or Person-log σ (- x) → 0, i.e. σ (x) → 1/e or σ (- x) → 1, then take K be the larger value when, prospect sample lossesIn loss adjustment factor σ (- Kx) → 0, background sample loss In loss adjustment factor σ (Kx) → 0；If sample belongs to difficulty and divides sample, the loss of prospect sample and background sample is adjusted Coefficient is respectively as follows: σ (- Kx) → 1, σ (Kx) → 1, and therefore, concern power loss function effectively distinguishes difficulty or ease sample, by inhibiting easy The loss for dividing sample allows the study of RPN and the more concern difficulties of training to divide sample, to guarantee that RPN network is sufficiently instructed Practice；

(2) the concern power loss function of connection layer network is as follows entirely:

Wherein δ is softmax function, and the loss function is similar with RPN, including more Classification Loss and recurrence loss two Point, whereinIt is the concern more Classification Loss of power, Functional Quality and concern two Classification Loss one of power It causes, as the prediction probability-log δ (x of sample_kWhen) → 1, weight δ (- Kx_k) → 0, the prediction probability-log δ (x of opposite sample_k)→ When 0, weight δ (- Kx_k)→1；

Step 8. repeats step 2 to step 7, completes the sample training of FL-CNN model；

Step 9. starts colour TV camera, takes pictures, to carry out before the scene input model pre- to actual traffic scene Processing, sets 2048 × 2048 for resolution ratio, is then input in FL-CNN model, repeats step 2 to step 6, completes real The road traffic sign detection of border scene identifies.

Step 1 in above scheme specifically:

(1) the Tsinghua-Tencent 100K data set announced using Tsinghua University and Tencent's joint, selects 44 Class often uses traffic sign as remote detection identification object；

(2) Tsinghua-Tencent 100K data set is divided into training set and test set according to the ratio of 1:2；

(3) to guarantee sample balance when FL-CNN model training, the Scene case of every class traffic sign in training set It is 100 or more, if the Scene case of certain class mark is lower than 100, is filled using the method for repeated sampling.

Step 4 in above scheme specifically:

(1) on the concern power characteristic pattern of the last one depth separation convolution output, using each pixel as anchor point, using 1: 1, the 3 kinds of ratios and 3 kinds of sizes of 1:2,2:1, generate 9 anchor frames in original traffic scene, 3 kinds of sizes are respectively 4, 8,16；

(2) removal is more than the anchor frame on original input picture boundary；

(3) method, threshold value 0.7 are inhibited using maximum value, removal repeats more anchor frame；

(4) according to the friendship of real goal in anchor frame and sample and than IoU, positive sample and negative sample are determined, wherein IoU >0.7 is positive sample, and IoU<0.3 is negative sample, the anchor frame between 0.3 to 0.7 is removed again, wherein IoU Calculation formula is as follows:

(5) according to translation invariance, each anchor frame corresponds to a region Suggestion box on fusion feature figure；

(6) all areas Suggestion box obtains object candidate area after the full articulamentum of Area generation network RPN.

The utility model has the advantages that

1, the present invention is mainly using the remote detection and identification of depth learning technology solving road traffic sign, this method It can run directly on power supply power consumption and the lower onboard system of hardware performance, test result shows that this method is effectively saved Computing resource, model size are only 76Mb, and accuracy of identification up to 92%, know by the remote road traffic sign detection suitable for onboard system It does not require.

2, the reason of present invention establishes concern power light-type convolutional neural networks, establishes analysis and advantage:

The analysis of causes: existing road traffic sign detection model is all that image spy is directly extracted using convolutional neural networks Sign, although this mode can concentrate the study and extraction for being automatically performed feature from mass data, it should be noted that convolution After the model parameter export of neural network, occupies very big memory space (example: VGG-16 network reaches 527Mb), need when operation Want biggish memory and power consumption.And onboard system belongs to embedded, low-power dissipation system, hardware performance is lower, therefore can not be direct The model that these Pangs are worked as is run to carry out road traffic sign detection.Current technology is that will test identification model to operate in distal end Server on, vehicle-mounted camera shoot traffic scene after, by transmission of network to server-side, clothes are then received by network again The detection recognition result at business end, this application mode are not only severely limited by Network status, if transmission network paralysis or congestion, It easily causes driving system response not in time, is then fatal for intelligent travelling crane system.

Benefit analysis: being directed to these problems, and the present invention devises a kind of light-type convolutional neural networks based on concern power (Focus Lightweight Convolutional Neural Network, FL-CNN).The model has the advantages that

(1) FL-CNN, for each convolutional layer, is replaced during convolution extracts feature using depth separation convolution Original Standard convolution greatly reduces model parameter quantity, and the memory space of model is compressed to 76Mb, to be applicable in In onboard system；

(2) FL-CNN has invented a kind of channel-space concern power module, to depth separation convolutional layer in detection-phase Be originally inputted characteristic pattern compressed, mask and extension, construct the concern power characteristic pattern with feature suppression or reinforcement, make model Target can be quickly and accurately detected in complex background.It pays close attention to power characteristic pattern and passes through channel and two, space dimension, it is right respectively The notable feature and zone position information of image are enhanced, and model can not only be made to pay close attention to this partial information when calculating, and are saved Computing resource is saved, and model can be made when detecting, effectively realizes detection by the channel characteristics of concern and region enhancing The raising of speed and precision；

(3) FL-CNN has invented a kind of concern context mechanism in cognitive phase.The target extracted due to detection-phase Region often will appear the incomplete phenomenon of mark region, and the Partial key information of mark is caused to lose.Because of this concern Context mechanism realizes that the local message of traffic sign introduces by the context of target area, enhances the characteristic of division of mark, It effectively prevent the traffic sign region of detection-phase imperfect.Meanwhile using a convolution can be with dynamic learning target and context It is dry to complete these by a reduction point deconvolution parameter value if what is introduced is background interference information for non-linear relation between region Disturb the inhibition of information；If what is introduced is the local message of mark, reinforce these information by improving point deconvolution parameter value.Cause This, concern context mechanism plays an important role for the classification of traffic sign.

3, the reason of present invention establishes concern power loss function, establishes analysis and advantage.

The analysis of causes: model is in training learning process, and with the growth of the number of iterations, more and more pattern detections are known All do not tend to be correct, although original cross entropy loss function easily can divide the loss of sample to carry out to a certain degree for such Inhibition, but since this part sample size is numerous, still strong influence the overall loss of model training, lead to last difficulty The training " being flooded " of point sample, seriously affects the generalization ability of model, although that is, model training precision it is very high, Detection recognition capability in practical application is weaker.

Benefit analysis: in response to this problem, the present invention is to guarantee that model is trained up and improved the generalization ability of model, It is proposed a kind of concern power loss function.Advantage is as follows:

(1) function enhances or inhibits the training of sample to lose using loss adjustment factor, realizes difficulty or ease sample It effectively distinguishes, by inhibiting to the loss for easily dividing sample, greatly enhances difficulty point sample losses in total training loss Specific gravity.On this basis, the training process of model is changed into more concern difficulties from all samples and divides sample；

(2) model concern hardly possible divides the training of sample, is to effectively prevent difficulty on the basis of guaranteeing easily to divide sample training correct Sample is divided easily to divide sample institute " flooding " by large number of in the training process.Meanwhile when difficulty divides sample to instruct in certain iteration Practice it is correct after, be transformed into easily dividing sample according to loss adjustment factor is changed, and if easily divide sample training mistake occur, then The secondary difficulty that is transformed into divides sample, therefore it is dynamic for losing adjustment process；

(3) model pays close attention to difficult sample the training stage more, and model can be made to imply in the training process in sample set Data characteristics rule is adequately excavated.Model after training in specific application, there is stronger generalization ability and higher Detect accuracy of identification.

4, it is big to combine the Tsinghua-Tencent 100K issued using Tsinghua University with Tencent by the present invention Scale traffic sign data set.Advantage is as follows:

(1) it is split to form by the streetscape figure of Tencent's real scene shooting, altogether includes 100,000 scene pictures and 30,000 traffic marks Will includes ban, warning and the traffic sign for indicating 3 major class, A wide selection of colours and designs.Every traffic scene photo resolution reaches 2048 × 2048, and (0,32] pixel and (32,96] traffic sign of pixel occupies 41.6% He of data set respectively 49.1%, therefore it is very suitable to the training of remote road traffic sign detection identification model；(2) data set covers most of light According to change conditions such as, weather, therefore its training objective detection model is used, model can adapt to remote traffic complicated and changeable Mark Detection identification, when providing more safeties and equipment response for the intelligent navigation equipment in unmanned and auxiliary driving Between.

5, the present invention is based on VGG-16 convolutional neural networks, using in depth separation convolution replacement convolutional neural networks Standard convolution, the direct convolutional calculation mode for expanding channel is changed to the combined calculation (single channel and point convolution of depth convolution Across channel), implementation model compression of parameters constructs light-type convolutional neural networks, saves computing resource, reduces hardware and deposit Storage.

Four, Detailed description of the invention:

Fig. 1 is the internal structure chart of FL-CNN target detection identification model of the invention.

Fig. 2 is that power module is paid close attention in channel of the invention-space.

Fig. 3 is flow chart of the method for the present invention.

Fig. 4 is 44 kinds of common traffic signs of remote detection identification of the invention, these traffic signs are divided into: instruction, police Three kinds of classifications are accused and forbid, * indicates a kind of mark in figure, wherein il*:il100, il60, il80；Ph*:ph4, ph4.5, ph5；Pm*:pm20, pm30, pm55；Pl*:pl5, pl20, pl30, pl40, pl50, pl60, pl70, pl80, pl100, pl120。

Fig. 5 is that the present invention and the detection accuracy of identification of 44 kinds of common traffic signs of other methods compare statistical chart.

Five, specific embodiment:

Following further describes the present invention with reference to the drawings:

This remote road traffic sign detection recognition methods suitable for onboard system, effectively solves that accuracy of identification is low, mould The deficiency that type is huge, detecting distance is short proposes a kind of new target detection frame, and is named as the light-type volume based on concern power Product neural network (Focus Lightweight Convolutional Neural Network, FL-CNN).Firstly, FL-CNN In detection-phase, the number of parameters of model, effective implementation model compression are reduced using depth separation convolution.Meanwhile inventing one Power module is paid close attention in kind of channel-space, to depth separation convolutional layer be originally inputted characteristic pattern compressed, mask and extension, structure The concern power characteristic pattern with feature suppression or reinforcement is built, the small target deteection ability of model is improved；Secondly, FL-CNN is being identified Stage has invented a kind of concern power context area area mechanism, has introduced the local message of traffic sign, and the classification for enhancing mark is special Sign, prevents the traffic sign region of detection-phase imperfect；Finally, FL-CNN has invented a kind of concern power damage in the training stage It loses function and distinguishes difficulty or ease sample, improve training and the generalization ability of model.Finally, combining public affairs with Tencent using Tsinghua University The remote road traffic sign detection data set Tsinghua-Tencent 100K of cloth is trained model.

It is specific as follows:

Step 1. pre-processes Traffic Sign Images sample set；

(3) to guarantee sample balance when model training, the Scene case of every class traffic sign is 100 in training set More than.If the Scene case of certain class mark is lower than 100, it is filled using the method for repeated sampling.

(1) step plays an important role the compression of Mark Detection identification model, will using depth separation convolution The channel of original VGG-16 Plays convolution and the Joint Mapping in space are separated into the independent mapping mode of the two, and mould is effectively reduced The number of parameters and hard drive space of type store.The network includes 5 convolutional layers altogether, wherein each convolutional layer includes depth convolution sum Point convolution two parts, and using ReLU as activation primitive；

(2) in this light-type convolutional neural networks, depth separates convolution and the calculation amount of original Standard convolution compares such as Under:

Note convolution kernel is (D_K,D_K, C), wherein D_KWide and high for convolution kernel, C is the channel of convolution kernel.In convolutional calculation mistake Cheng Zhong, depth separation convolution are N number of (D by original across path computation_K,D_K, M) and Standard convolution changes into and is converted into M (D_K,D_K, 1) the point convolution of N number of (1,1, M) of the depth convolution sum across channel, wherein depth convolution be single channel calculate, point convolution be across Path computation.Remember that input feature vector figure is { D_F,D_F, M }, output characteristic pattern is { D_F,D_F, N }, wherein D_FIndicate characteristic pattern width and Height, then the calculation amount of each convolution is as follows:

This step main purpose is to design a kind of concern power characteristic pattern, copies the concern power mechanism of the mankind, is realized to scene In the convolution feature of small traffic sign enhanced, feature suppression is carried out to unrelated background information, computing resource is saved, mentions High measurement accuracy.For this purpose, the present invention proposes a kind of channel that can be embedded into depth separation convolutional layer-space concern power mould Block, to the output characteristic pattern of each depth separation convolutional layer carry out channel and two, space dimension feature concern (inhibit or Enhancing).Wherein concern power in channel is the correlation and significance level using interchannel, and paying close attention in an image " what " most has Meaning.It being different from, space concern power is concerned with the position feature of target in the picture, i.e., " where " image detection is known It is not more efficient.

The output characteristic pattern that some depth separates convolutional layer is U=[u₁, u₂..., u_C],C is port number,For real number matrix, H and W are respectively the height and width of characteristic pattern.After channel-concern power module, the concern power feature of building Figure is Y=[y₁, y₂..., y_C],Specific calculating process such as step 3.1, step 3.2 and the step 3.3 of the process Description.

Pay close attention to power in step 3.1 channel

Power is paid close attention to calculate channel, the present invention is first along channel direction, the space dimension in each channel in compressive features figure Degree, respectively polymerize spatial information using maximum, average and random three kinds of global pool modes, wherein maximum and average pond Change the texture and background feature for retaining image respectively, and random poolization is then between.

Wherein

S=σ (W₁δ(W₀S_max)+W₁δ(W₀S_mean)+W₁δ(W₀S_sto))

Wherein σ is sigmoid function, and δ is ReLu function.W₀And W₁For the weight of multiple perceptron model, these parameters pair It is shared for paying close attention to power mask components in three channels.

Pay close attention to power in step 3.2 space

Secondly as the characteristic of the characteristic pattern of different layers has a very big difference, the resolution ratio of shallow-layer characteristic pattern is higher and deep layer Characteristic pattern then in contrast, and includes more abstract semantics feature.Therefore the present invention is when constructing space concern power, to reduce Parameter and reduction calculation amount, space concern power mask are carried out respectively by region and pixel in shallow-layer and further feature figure.For Facilitate description, concern power mask in note space isIt is defined as follows:

N=Softmax (Conv (M, o, k, s, p))

Wherein Conv () represents convolution operation, output channel o=1, the convolution kernel size k=1 of shallow convolutional layer, deep convolution The convolution kernel size k=3 of layer.S=1 and p=0 is respectively the step-length and filling of convolution.In addition, to eliminate different characteristic figure scale Different influences normalizes space concern power mask using Softmax function.

Step 3.3 pays close attention to power characteristic pattern

WhereinRepresent point-by-point be multiplied.

Step 4 generates the candidate region of target using Area generation network RPN on the basis of paying close attention to power characteristic pattern

The target of the step is the region for navigating to traffic sign in traffic scene and being likely to occur, and then FL-CNN is again The classification indicated according to these regions.The correlative detail of this step is as follows:

(1) on the concern power characteristic pattern of the last one depth separation convolution output, using each pixel as anchor point, using 3 Kind size (4,8,16) and 3 kinds of ratios (1:1,1:2,2:1) generate 9 anchor frames in original traffic scene；

(2) removal is more than the anchor frame on original input picture boundary；

(4) according to the friendship of real goal in anchor frame and sample and than IoU, positive sample and negative sample are determined, wherein IoU >0.7 is positive sample, and IoU<0.3 is negative sample, and the anchor frame between 0.3 to 0.7 is removed again.Wherein IoU Calculation formula is as follows:

It may only include the Partial Feature of traffic sign for the object candidate area that step 4 provides, therefore the present invention mentions A kind of context area domain information is gone out, the adjacent feature in space of object candidate area has been introduced, to enhance the characteristic of division of mark.Tool Steps are as follows for body:

(1) for convenience of description, some object candidate area is denoted as p=(p_x,p_y,p_w,p_h), wherein (p_x,p_y) it is region Center, (p_w,p_h) width and height that indicate the region use on the concern power characteristic pattern of the last one depth separation convolution output Scale factorWithCreate context areaCentre coordinate is identical as corresponding object candidate area.On Context area and the relationship of candidate region can be described as follows, and wherein i is the serial number of context area；

(2) for the context area of each object candidate area and it, using RoI-Pooling respectively horizontal, perpendicular Histogram is handled every a down-sampling for all carrying out maximum value pond to being divided into 7 parts, even if the region that size is different, output Dimension still maintains and unanimously generates the feature vector that 3 fixed dimensions are 7 × 7 × 512.

(4) finally, the convolution using 1 × 1 is compressed to 7 × 7 × 512 to the feature vector formed in (3).The step is not only The dimension of the feature vector introduced after context area is set to meet the node requirement of full articulamentum.It is worth noting that, using 1 × 1 convolution can learn the non-linear relation between background and target, when the context area of introducing contains complex background, volume Product parameter can inhibit these backgrounds.On the contrary, these spies can be enhanced in deconvolution parameter if introducing the local feature of target Sign.

Using two fully-connected networks (Fully Connected Network, FC), first FC network is used for traffic mark The classification of will, wherein 4096 nodes of hidden layer, export 44 nodes, and each output node represents a kind of traffic sign, codomain Between (0,1), when classification, takes maximum output node as traffic sign classification；；Second FC network is used for traffic sign Position return, wherein 4096 nodes of hidden layer, export 4 nodes, respectively represent the center point coordinate and width of traffic sign.

Step 7. establishes concern power loss function, training FL-CNN model

To guarantee that model is trained up and improved the generalization ability of model, the present invention proposes a kind of concern power loss letter Number is realized effective differentiation of difficulty or ease sample, is inhibited to the loss for easily dividing sample, divides the loss of sample to enhance hardly possible. The training of FL-CNN model includes two parts, is respectively RPN network and connects layer network entirely, wherein the loss packet of RPN network It includes two Classification Loss and returns loss, the loss of fully-connected network is more Classification Loss and recurrence loss.

(1) the concern power loss function of RPN network is as follows:

Wherein p_iIndicate that i-th of anchor is the prediction probability of target object,Indicate the true tag of target.t_iIt is One vector, the high information of centre coordinate, width comprising prediction block.Indicate the information vector of true frame.N_clsIndicate anchor Sum, N_regIndicate the size of characteristic pattern, λ is adjustment factor, Ke Yiqu L_regIt indicates all and surrounds returning for frame Return loss,It is two Classification Loss of concern power proposed by the present invention, is defined as follows:

Wherein σ is sigmoid function, and the prediction probability of prospect sample is-log σ (x), the prediction probability of background sample is- Log σ (- x), K are constants.The loss function has the property that if sample, which belongs to, easily divides sample ,-log σ (x) → 1 perhaps-log σ (- x) → 0 i.e. σ (x) → 1/e or σ (- x) → 1, then when to take K be the larger value, prospect sample LossIn loss adjustment factor σ (- Kx) → 0, background sample lossIn loss adjustment factor σ (Kx) → 0.If sample belongs to difficulty and divides sample, before Scape sample and the loss adjustment factor of background sample are respectively as follows: σ (- Kx) → 1, σ (Kx) → 1.Therefore, concern proposed by the present invention Power loss function can effectively distinguish difficulty or ease sample, by the loss for inhibiting easily to divide sample, make the study of RPN and training more Concern difficulty divides sample, is trained up so that RPN network be effectively ensured.

Wherein δ is softmax function.The loss function is similar with RPN, including more Classification Loss and recurrence loss two Point, whereinIt is the more Classification Loss of concern power proposed by the present invention.Its Functional Quality and concern power Two Classification Loss are consistent, as the prediction probability-log δ (x of sample_kWhen) → 1, weight δ (- Kx_kThe prediction of) → 0, opposite sample is general Rate-log δ (x_kWhen) → 0, weight δ (- Kx_k)→1。

Step 9. starts colour TV camera, takes pictures, to carry out before the scene input model pre- to actual traffic scene Processing, sets 2048 × 2048 for resolution ratio, is then input in FL-CNN model, repeats step 2 to step 6, completes The road traffic sign detection of actual scene identifies.

In terms of the learning algorithm of model of the present invention, to make model obtain abundant study from sample set, invent a kind of new Concern power loss function.The configuration aspects of model, mainly include following 5 component parts, wherein (1), (2) and (4) and its His model has significant difference, be for the low-power consumption for onboard system, hardware performance is weak and to improve detection accuracy of identification independent Invention)

(1) light-type convolutional neural networks: convolutional neural networks are constructed using depth separation convolution technique, reduce model ginseng Number quantity, implementation model compression, and successively extract the convolution feature of traffic sign automatically to traffic scene；

(2) power module is paid close attention in channel-space: this is a kind of spy for remote road traffic sign detection proposed by the present invention There is technology, be the important component in model, is constructed in embeddable each convolutional layer to light-type convolutional neural networks Power characteristic pattern is paid close attention to, realization is inhibited or enhanced to the information in original characteristic pattern, and the computing resource, right of model is effectively saved Improvement method detectability；

(3) network (Region Proposal Network, RPN) is suggested in region: separating convolution according to the last one depth The concern power characteristic pattern of layer output, generates a certain number of target suggestion areas；

(4) context area pond layer: this is that one kind proposed by the present invention can effectively enhance target area information capability Peculiar technology, be the important component in model, pass through pool area, regularization, concatenation, compression building concern context The characteristic of division of area information；

(5) fully-connected network (Fully Connected Network, FC): this part is mainly responsible for compressed pass Infuse specific classification and position calculating that power context area characteristic of field carries out traffic sign.

Embodiment:

The resolution ratio of every traffic scene in model training data set Tsinghua-Tencent 100K be 2048 × 2048 pixels, the size of traffic sign account for 41.6% He of data set between 0~32 pixel, 32~96 pixels respectively 49.1%, i.e., the size of 90.7% traffic sign accounts for the ratio of traffic scene less than 1%, belongs to remote road traffic sign detection Identify situation.

(1) data set is handled: FL-CNN model of the present invention in the training process, to keep the flat of sample set Weighing apparatus, the method that resampling is taken for classification of the traffic scene less than 100 where every class traffic sign.Training set and test set Ratio be 1:2；

(2) compare index: the present invention is in specific test process, using the measurement common Measure Indexes F1- of accuracy rate For measure as detection distinguishing indexes, this refers to that target value is bigger, then illustrates that detection accuracy of identification is higher；

(3) contrast model: for the validity for verifying FL-CNN of the present invention, with most common target detection frame Fast R-CNN and Faster R-CNN carries out detection accuracy comparison；

(4) in addition, being the verifying present invention in detail to the detection accuracy of different distance traffic sign, we compared this respectively Invention different sizes (different distance) traffic sign detection accuracy, altogether include three kinds of sizes, (0,32] pixel, (32,96] Pixel, (96,200] pixel, wherein (0,32] pixel and (32,96] traffic sign of pixel belongs at a distance, (96,200] as The traffic sign of element belongs to moderate distance.

FL-CNN model training and site testing data explanation (test be in order to grasp this method step 1 it is rapid~step 8 is rapid Feasibility)

Firstly, we compared hard drive space storage shared after each model exports.Wherein Fast R-CNN and Faster R-CNN is VGG-16 network, respectively accounts for 558Mb and 582Mb after parameter export, and FL-CNN model of the invention is only Account for 76Mb.Relative to Fast R-CNN and Faster R-CNN, the space storage for the FL-CNN model that we invent is had dropped 80% or so, power consumption and the lower onboard system of hardware performance can be run directly in.It can be seen that the present invention is separated using depth Convolution not only effectively reduces model parameter quantity and hard drive space storage.Channel of the invention-space concern is also indicated that simultaneously Power module and concern power context area on model store influence and it is little, can effectively save computing resource.

Secondly, we compared FL-CNN and common target detection frame Fast R-CNN and Faster R- of the invention The road traffic sign detection accuracy of identification of CNN.It can be clearly seen from table 1, know in the detection of three kinds of different distance traffic signs In other precision, the present invention is significantly improved compared with Fast R-CNN and Faster R-CNN, wherein relative to Faster R-CNN method, (0,32] pixel improves 55 percentage points, (0,32] pixel and (96,200] pixel has also been respectively increased 20 Percentage point and 9 percentage points, population mean improves 28 percentage points.This has absolutely proved three kinds of concern power machines of the invention The validity of system (channel-space concern power, context concern power, loss function pay close attention to power).

Then, the detection accuracy of identification of FL-CNN model is influenced for further three kinds of concern power mechanism of verifying, we are also It compared removing the precision after different concern power mechanism in table 1, the mechanism is removed in wherein "-" expression in a model.From knot Have an impact as can be seen that removing each concern power mechanism to detection accuracy of identification in fruit, wherein power shadow is paid close attention in channel-space It rings maximum.In addition, FL-CNN model degradation is Faster R-CNN model after removing three kinds of mechanism.

The detection accuracy of identification of each model of table 1 compares (%)

Finally, Fig. 5, which gives different target detection identification framework, compares statistical chart to the accuracy of identification of 44 kinds of traffic signs, As can be seen that the detection recognition accuracy of every kind of traffic sign of FL-CNN of the invention is all higher than Fast R-CNN and Faster R-CNN。

Claims

1. a kind of remote road traffic sign detection recognition methods suitable for onboard system, it is characterised in that:

Step 1: being pre-processed to Traffic Sign Images sample set；

Step 2: building light-type convolutional neural networks, complete the convolution feature extraction of traffic sign；

Both (1) be separated into the Joint Mapping in the channel of original VGG-16 Plays convolution and space using depth separation convolution Independent mapping mode, reduce model number of parameters and hard drive space storage, light-type convolutional neural networks include altogether 5 volume Lamination, wherein each convolutional layer includes depth convolution sum point convolution two parts, and using ReLU as activation primitive；

Note convolution kernel is (D_K,D_K, C), wherein D_KWide and high for convolution kernel, C is the channel of convolution kernel；During convolutional calculation, Depth separation convolution is N number of (D by original across path computation_K,D_K, M) and Standard convolution changes into and is converted into M (D_K,D_K, 1) depth The point convolution of N number of (1,1, M) of the convolution sum across channel is spent, wherein depth convolution is that single channel calculates, and point convolution is across channel meter It calculates；Remember that input feature vector figure is { D_F,D_F, M }, output characteristic pattern is { D_F,D_F, N }, wherein D_FIndicate the width and height of characteristic pattern, then The calculation amount of each convolution is as follows:

Therefore, the relativity of the calculation amount of depth separation convolution sum Standard convolution is as follows, separates convolution, phase using depth every time For original Standard convolution, calculation amount declineTimes；

Step 3 pays close attention to power module, building concern power feature by the channel-space being embedded into light-type convolutional neural networks Figure；

Power characteristic pattern is paid close attention to, the concern power mechanism of the mankind is copied, realizes and the convolution feature of the small traffic sign in scene is carried out Enhancing, carries out feature suppression to unrelated background information, saves computing resource, improve detection accuracy, and power is paid close attention in channel-space Module-embedding separates convolutional layer to depth, carries out channel and two, space to the output characteristic pattern of each depth separation convolutional layer The feature of dimension is paid close attention to；Wherein concern power in channel is the correlation and significance level using interchannel, is paid close attention in an image " what " is most significant；Space concern power is concerned with the position feature of target in the picture, " where " image detection is known It is not more efficient；

The output characteristic pattern that some depth separates convolutional layer is U=[u₁,u₂,…,u_C],C is port number,For reality Matrix number, H and W are respectively the height and width of characteristic pattern, and after channel-concern power module, the concern power characteristic pattern of building is Y= [y₁,y₂,…,y_C],The specific calculating process of the process is as follows:

Pay close attention to power in step 3.1 channel

First along channel direction, the Spatial Dimension in each channel in compressive features figure, respectively using maximum, average and random three Kind global pool mode polymerize spatial information, wherein the texture and background that maximum and average pondization retains image respectively is special Sign, random poolization is then between；

First by global maximum pond, respectively by each u_iIt is compressed to channel concern power mask componentIt is defined Are as follows:

Again by global average pondization and global random pool, each u is compressed respectively_iPower mask component S is paid close attention to channel_meanAnd S_sto In, is defined as:

Wherein

Secondly, power mask component is paid close attention in the channel of three kinds of pondization compression buildings, respectively as the input of multiple perceptron model, lead to It crosses that being multiplied point by point of weighting parameter and mask component, cumulative and activation primitive is completed to polymerize, further increases non-linear, characteristic pattern Pay close attention to power mask S=[s in the channel of U₁,s₂,…,s_C] it is defined as follows:

S=σ (W₁δ(W₀S_max)+W₁δ(W₀S_mean)+W₁δ(W₀S_sto))

Wherein σ is sigmoid function, and δ is ReLu function, W₀And W₁For the weight of multiple perceptron model, these parameters are for three A channel concern power mask component is shared；

It is originally inputted characteristic pattern U finally, channel concern power mask is expanded to, according to mask to each channel in characteristic pattern U Weight is demarcated, the new feature figure after note channel concern power mapping isIt is specifically defined are as follows:

Pay close attention to power in step 3.2 space

To calculate space pay close attention to power mapping, pixel or interregional characteristic relation in construction feature figure, first compressive features figure Port number is originally inputted characteristic pattern to depth separation convolutional layer using one group of pointwise convolutionAcross channel Polymerization, the characteristic pattern after polymerization are denoted as

Secondly as the characteristic of the characteristic pattern of different layers has a very big difference, the resolution ratio of shallow-layer characteristic pattern is higher, further feature figure It in contrast, and include more abstract semantics feature；Therefore it when constructing space concern power, is calculated to reduce parameter and reducing Amount, space are paid close attention to power mask and are carried out respectively by region and pixel in shallow-layer and further feature figure；Pay close attention to power mask in spaceIt is defined as follows:

N=Softmax (Conv (M, o, k, s, p))

Wherein Conv () represents convolution operation, output channel o=1, the convolution kernel size k=1 of shallow convolutional layer, deep convolutional layer Convolution kernel size k=3, s=1 and p=0 are respectively the step-length and filling of convolution, in addition, different to eliminate different characteristic figure scale Influence, using Softmax function to space concern power mask normalize；

Step 3.3 pays close attention to power characteristic pattern

Characteristic pattern after channel concern power mappingOn the basis of, space is carried out again pays close attention to power maskExtension, power mask is paid close attention to according to space, power is paid close attention to each corridorbarrier function space of characteristic pattern X, it is final to generate The output characteristic pattern of depth separation convolutional layerAs the input of next depth separation convolutional layer, definition Are as follows:

WhereinRepresent point-by-point be multiplied；

The region that traffic sign is likely to occur is navigated in traffic scene, then FL-CNN is indicated further according to these regions Classification；

The object candidate area that step 5 generates RPN introduces context area domain information, enhances the characteristic of division of mark；

For the object candidate area that step 4 provides, only includes the Partial Feature of traffic sign, introduce object candidate area The adjacent feature in space, to enhance the characteristic of division of mark, the specific steps are as follows:

(1) for convenience of description, some object candidate area is denoted as p=(p_x,p_y,p_w,p_h), wherein (p_x,p_y) be region center, (p_w,p_h) width and height that indicate the region, on the concern power characteristic pattern of the last one depth separation convolution output, use ratio The factorWithCreate context areaCentre coordinate is identical as corresponding object candidate area, context area The relationship of domain and candidate region can be described as follows, and wherein i is the serial number of context area；

(2) for the context area of each object candidate area and it, using RoI-Pooling respectively in horizontal, vertical side To being divided into 7 parts, and every a down-sampling for all carrying out maximum value pond is handled, even if the region that size is different, exports dimension It still maintains consistent, generates the feature vector that 3 fixed dimensions are 7 × 7 × 512；

(4) 7 × 7 × 512 are compressed to the feature vector formed in (3) using 1 × 1 convolution, made after introducing context area The dimension of feature vector meet the node requirement of full articulamentum, learnt using 1 × 1 convolution non-thread between background and target Sexual intercourse, when the context area of introducing contains complex background, deconvolution parameter inhibits these backgrounds, on the contrary, if introducing mesh Target local feature, deconvolution parameter then enhance these features；

Feature vector is sent into the full articulamentum for being used to classify and return by step 6, exports classification and the position of traffic sign；

Point using Network, FC, the first FC networks of two fully-connected network Fully Connected for traffic sign Class, wherein 4096 nodes of hidden layer, export 44 nodes, and each output node represents a kind of traffic sign, and codomain is (0,1) Between, when classification, takes maximum output node as traffic sign classification；Position of second FC network for traffic sign is returned Return, wherein 4096 nodes of hidden layer, export 4 nodes, respectively represents the center point coordinate and width of traffic sign；

Step 7 establishes concern power loss function, training FL-CNN model

To guarantee that model is trained up and improved the generalization ability of model, concern power loss function is established, realizes difficulty or ease sample This effective differentiation, inhibits the loss for easily dividing sample, divides the loss of sample to enhance hardly possible；The instruction of FL-CNN model Practicing includes two parts, is RPN network and full connection layer network respectively, wherein the loss of RPN network include two Classification Loss and Loss is returned, the loss of fully-connected network is more Classification Loss and recurrence loss；

(1) the concern power loss function of RPN network is as follows:

Wherein p_iIndicate that i-th of anchor is the prediction probability of target object,Indicate the true tag of target, t_iBe one to Amount, the high information of centre coordinate, width comprising prediction block,Indicate the information vector of true frame, N_clsIndicate the sum of anchor, N_regIndicate the size of characteristic pattern, λ is adjustment factor, is takenL_regIndicate all recurrence losses for surrounding frame,It is concern two Classification Loss of power, is defined as follows:

Wherein σ is sigmoid function, and the prediction probability of prospect sample is-log σ (x), and the prediction probability of background sample is-log σ (- x), K is constant, which has the property that if sample, which belongs to, easily divides sample ,-log σ (x) → 1 or-log σ (- x) → 0, i.e. σ (x) → 1/e or σ (- x) → 1, then take K be the larger value when, prospect sample losses In loss adjustment factor σ (- Kx) → 0, background sample lossIn loss adjustment factor σ(Kx)→0；If sample belongs to difficulty and divides sample, the loss adjustment factor of prospect sample and background sample is respectively as follows: σ (- Kx) → 1, σ (Kx) → 1, therefore, concern power loss function effectively distinguish difficulty or ease sample, by inhibiting the easy loss for dividing sample, allow RPN Study and the more concern difficulties of training divide sample, to guarantee that RPN network is trained up；

Wherein δ is softmax function, and the loss function is similar with RPN, including more Classification Loss and recurrence loss two parts, InIt is the concern more Classification Loss of power, Functional Quality is consistent with concern two Classification Loss of power, As the prediction probability-log δ (x of sample_kWhen) → 1, weight δ (- Kx_k) → 0, the prediction probability-log δ (x of opposite sample_k)→0 When, weight δ (- Kx_k)→1；

Step 8 repeats step 2 to step 7, completes the sample training of FL-CNN model；

Step 9 starts colour TV camera, takes pictures to actual traffic scene, to be located in advance before the scene input model Reason, sets 2048 × 2048 for resolution ratio, is then input in FL-CNN model, repeats step 2 to step 6, completes real The road traffic sign detection of border scene identifies.

2. the remote road traffic sign detection recognition methods according to claim 1 based on F-RCNN, it is characterised in that: institute The step of stating one specifically:

(1) the Tsinghua-Tencent 100K data set announced using Tsinghua University and Tencent's joint, selects 44 classes normal Use traffic sign as remote detection identification object；

(3) to guarantee sample balance when FL-CNN model training, the Scene case of every class traffic sign is 100 in training set Or more, if the Scene case of certain class mark is lower than 100, it is filled using the method for repeated sampling.

3. the remote road traffic sign detection recognition methods according to claim 2 based on F-RCNN, it is characterised in that: institute The step of stating four specifically:

(1) on the concern power characteristic pattern of the last one depth separation convolution output, using each pixel as anchor point, using 1:1,1: 2, the 3 kinds of ratios and 3 kinds of sizes of 2:1 generate 9 anchor frames in original traffic scene, and 3 kinds of sizes are respectively 4,8,16；

(2) removal is more than the anchor frame on original input picture boundary；

(4) according to the friendship of real goal in anchor frame and sample and than IoU, positive sample and negative sample are determined, wherein IoU > 0.7 It is positive sample, IoU < 0.3 is negative sample, and the anchor frame between 0.3 to 0.7 is removed, wherein the calculating of IoU again Formula is as follows: