CN108960198A

CN108960198A - A kind of road traffic sign detection and recognition methods based on residual error SSD model

Info

Publication number: CN108960198A
Application number: CN201810850416.0A
Authority: CN
Inventors: 张淑芳; 朱彤
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-07-28
Filing date: 2018-07-28
Publication date: 2018-12-07

Abstract

The invention discloses a kind of road traffic sign detection based on residual error SSD model and recognition methods, step 1: carrying out multiple dimensioned piecemeal to image；Step 2: using residual error network ResNet101 as the basic network of SSD, residual error SSD model is constructed；Step 3: carrying out network training；Step 4: completing the detection and identification with generalization ability.The present invention is directed to improve existing SSD network to the accuracy of small target deteection, the effective detection and identification to multiclass difference sized signage in the true traffic scene of China are realized.

Description

A kind of road traffic sign detection and recognition methods based on residual error SSD model

Technical field

The present invention relates to image detection and identification field, deep learning applied technical field, in particular to a kind of traffic marks Will image detection and recognition methods.

Background technique

Intelligent transportation Mark Detection and identification are an important technologies of advanced driving assistance system (ADAS), to realization Road conditions early warning, anticollision avoidance are made that great contribution.Traditional machine learning method generally passes through segmentation area-of-interest and mentions Characteristics of image is taken, target is identified using single or several classification operators.The feature of main foundation has the shallow-layer of image Information such as color, shape etc., the characteristic pattern etc. of multiple features such as vision significance information such as blend color, brightness, direction, part Three kinds of invariant features information such as histogram of gradients etc..Although corresponding detection of classifier speed is quickly, due to feature representation piece Face and it is single, do not have universality, complicated background or chaff interferent be numerous, mark itself has the case where distortion damage Under, false detection rate omission factor is high.

The limitation of image, semantic expression has been broken in the appearance of deep learning especially depth convolutional neural networks (DCNN), leads to Feature representation abundant and matching, assessment strategy are crossed, accuracy of identification greatly improved.R-CNN(Region Proposal Neutral Network, region convolutional neural networks) the bottom-up positioning of network and segmentation object suggested by region, in conjunction with CNN classification and recurrence device realization accurately detection and identification, open what region suggestion was combined with convolutional neural networks (CNN) The beginning.ROI pooling layer is added on the basis of R-CNN in Fast R-CNN after the last layer convolution, and by frame Recurrence is added to training in CNN network, accelerates the detection speed of network.Faster R-CNN uses the RPN of a full convolution Network replaces Selective Search to generate suggestion window, and RPN network and front and back end network weight are shared, constitute One end-to-end high performance network frame.RFCN(Region-based Fully Convolution Neutral Network, the full convolutional neural networks in region) the ROI pooling layer that then has devised a position sensing, it realizes point Double height of class and positioning accuracy.Although higher by the network precision that region is suggested, the performance in speed is not to the utmost such as people Meaning.For this purpose, some network structures for not depending on region suggestion also occur successively.YOLO network by the extraction of object frame and identification two Step operation is combined, using whole figure as the input of network, directly in the position of output layer output regression frame and affiliated Classification.The method that SSD network then optimizes the direct grid division of YOLO is known at the feature map of different levels Not, multiple scale detecting is realized.In addition, the segmentation of some pixel scales and identification network such as SegNet, Deeplab etc. are also successively It is suggested, but there is presently no be widely used since segmentation precision is not high.

On the other hand, the national communications such as Germany, Belgium flag data collection publishes successively, pushes road traffic sign detection It flourishes with identification field.Especially in the research of German Vehicle Detection data set GTSDB and identification data set GTSRB, Research has been realized in 100% detection accuracy and 99.67% classification accuracy rate both at home and abroad.But since traffic sign is at this Accounting is excessive in a little images, background information serious loss, and the traffic in China is often more complicated than external, thus these are studied It is not able to satisfy the actually detected demand of China's traffic scene.

Summary of the invention

Deep learning is applied to Vehicle Detection field, the present invention proposes a kind of traffic sign based on residual error SSD model Detection and recognition methods are realized different to multiclass in the true traffic scene of China using SSD network to the detection technique of Small object The effective detection and identification of sized signage.

A kind of road traffic sign detection and recognition methods based on residual error SSD model of the invention, this method specifically include with Lower step:

Step 1: carrying out multiple dimensioned piecemeal to image, i.e., multiple series of images block is marked off to image to be detected using sliding window method； Correspondingly, the label information including wherein mark surrounds the classification of four coordinate values of frame, mark of image is also carried out Decompose image block level；Each image block is scanned for, when there are the surface areas that some indicates 50% or more frame of encirclement When falling on this image block, it is believed that this image block includes the mark, retains this image block；Otherwise give up；Include mark for all Image block all investment networks, and random screening a part do not include mark image block be trained to realize network to back The identification of scene area；

Step 2: building residual error SSD model, that is, use residual error network ResNet101 as the basic network of SSD, it is constructed Residual error SSD model by reel lamination conv1, residual error structure sheaf Res2x~Res5x and newly added auxiliary convolutional layer Conv2~Conv5 composition；Specific features are described as follows:

(1) 4 groups of convolutional layer Conv2~Conv5 are successively added after the top Res5c of residual error network ResNet101, Constitute characteristic pattern more abundant, wherein each group of convolutional layer all includes 256 1 × 1 convolution filters and 512 volume 3 × 3 Product filter realizes the detection of Analysis On Multi-scale Features figure；

(2) default frame of the frame Duplication higher than 0.5 is surrounded as the prediction target area of network using with object in original image, and Generate prediction target this region belong to every one kind Softmax fractional value and with default frame relative offset amount. Softmax calculation formula such as formula (1) is shown, wherein c_iIndicating the predicted value of the i-th class mark, (all kinds of marks and its title are shown in figure 1):

The predicted value of all kinds of marks will be distributed between (0,1) after Softmax, the fractional value as every one kind；

(3) will default frame and network top (including Res3b3, Res5c, Pool6 and newly added Conv2, Conv3, Conv4, Con5) association of each characteristic pattern unit, default frame makees convolution algorithm to characteristic pattern, so that each frame example is relative to each The position of cell is fixed；Using the default frame with different the ratio of width to height, the effectively shape of discrete output frame is improved With a possibility that and speed；

Step 3: network training, i.e., carry out gaussian random initialization to the weight of residual error network first, then using as follows Learning strategy network is trained:

(1) the comprehensive characteristic pattern using 7 different scale layers described in second step is predicted, realize multiple scale detecting with It identifies, shown in the default frame dimension calculation formula such as formula (1) after Res5c on each characteristic pattern:

Wherein s_min、s_maxThe scale for defaulting frame in Res5c and top Pool6 is respectively indicated, 0.04 and 0.49 are set as, The scale of intermediate each layer equal difference interval within this range.Particularly, the scale for selecting Res3b3 layers of default frame is 0.01；In addition, 7 The ratio of width to height a of a scale layer is disposed asThe then wide w of respective default frame, high h calculation formula such as formula (2) shown in:

Wherein, w, h respectively indicate width, the height of default frame；

In addition to the last layer, also it is in every layer of increase by one group of a=1, scaleDefault frame, setting default The center of frame isIt is wherein the size of rectangular characteristic pattern on kth layer, i, j ∈ [0, L_k).It is possible thereby to It is 5 that the last layer characteristic pattern, which is obtained, comprising default frame quantity, and the quantity that frame is defaulted on other each layer characteristic patterns is 6L_k ², network production Raw default frame sum is 32765.For 512 × 512 image block, then it is capable of detecting when that size is big using default frame above The small mark in (5,273) range, this is consistent with the magnitude range of our data set identifications.

(2) target frame and default frame are matched using Jaccardoverlap strategy；First to each target frame, find With the maximum default frame of its JaccardOverlap, as best match；Later in order to simplify study, calculate each default frame and The JaccardOverlap of target frame, as long as be greater than some threshold value, then it is assumed that this default frame matches with target frame. Shown in JaccardOverlap calculation formula such as formula (3):

Wherein, A, B respectively indicate the collection area of all pixels composition in default frame and target frame.In the present invention, The threshold value of Jaccard Overlap is set as 0.5.

(3) objective function of model training is indicated with prediction overall error, prediction overall error is by classification and the error detected Weighted sum obtains, shown in calculation formula such as formula (4).

Wherein, L indicates overall error, and weight α is set as 1, L_confAnd L_locRespectively indicate error in classification and detection error；Such as Shown in formula (5) and shown in formula (6),

Wherein, x indicates match flag (x=1 expression match, x=0 indicate mismatch) of each prediction block with target frame, Such asIndicate that classification is that i-th of default frame of p matches with j-th of target frame；C indicates that the score of class prediction (is set Reliability), l indicates prediction block, and g indicates that target frame, N indicate the quantity of default frame.If N=0, it is believed that overall error L is 0.

(4) data extending

In order to increase sample size, the healthy and strong model for training and adapting to all size and shape input is improved, to each original Beginning image block generates new image block data using one of following methods:

A. original picture block is used；

B. original picture block is sampled, so that the Jaccard Overlap of image block and original image block after sampling (outstanding karr Duplication is handed over and compared) is respectively 0.1,0.3,0.5,0.7,0.9；

C. to original picture block stochastical sampling.

Image block after wherein sampling is 0.1~1 times of original image block size, and the ratio of width to height is between 0.5~2.Finally will All image blocks are unified to 512 × 512 sizes, and pitch-based sphere overturning and distortion；

(5) negative sample excavates

When in the image block after the midpoint of target frame falls in sampling, sanction goes target frame to fall in the part outside image block, And the target collimation mark after cutting is denoted as a positive sample in the image block；Otherwise it is labeled as negative sample；Each iteration it L all to all default frames afterwards_confValue is ranked up, and will be worth maximum several default collimation marks and is denoted as negative sample, so that positive and negative sample This quantitative proportion is maintained at 1:3；

Step 4: completing the detection and identification with generalization ability: considering that sliding window method slides into the side in wide, the high direction of image The area of afterimage block is likely less than sliding window area when boundary, and test image is generally not present mark in edge；At this time not It slides again, incomplete piece is abandoned；In addition, taking a kind of by slightly to the inspection policies of essence, which includes: firstly, centering All images block under equal scales carries out preliminary residual error SSD detection；Then the result by confidence level higher than 0.3 maps back original image； To the image block under other four scales, if its (Overlap > 0 Jaccard) Chong Die with initial survey prediction block, is chosen throwing Enter residual error SSD detection, otherwise gives up；The result of above two step is integrated, on original image using non-maxima suppression into Row screening, completes detection and identification.

The present invention is directed to improve existing SSD network to the accuracy of small target deteection, realize to the true traffic scene of China The effective detection and identification of middle multiclass difference sized signage.

Appended drawing reference

Fig. 1 is 45 class sign images of the invention and corresponding item name；

Fig. 2 is a kind of overall flow figure of road traffic sign detection and recognition methods based on residual error SSD model of the invention；

Fig. 3 is the multiple dimensioned piecemeal embodiment schematic diagram of the present invention；

Fig. 4 is residual error SSD illustraton of model；

Fig. 5 is by slightly to the overhaul flow chart of essence

Fig. 6 present invention is compared with other methods are to the detection performance of different ruler marks；

Fig. 7 is testing result of the model of the present invention to low-resolution image, (7a) mini-tags testing result；(7b) is medium-sized Mark Detection result；(7c) large size Mark Detection result.

Specific embodiment

A specific embodiment of the invention is described in further detail below in conjunction with attached drawing.

The embodiment of the present invention is with the newest disclosed traffic sign data set of Tsinghua University and joint laboratory of Tencent Tsinghua-Tencent 100K is research object.In Tsinghua-Tencent 100K data set, the resolution ratio of image It is 2048 × 2048, each mark only accounts for the very small part (such as 1%) of image, and there may be multiple in every image Various sizes of sign board.In view of GPU limited memory, a kind of traffic sign based on residual error SSD model proposed by the present invention Large-size images are decomposed into multiscale image block detection first, then the testing result of block level are reflected by detection and recognition methods It is mapped to original image level, construct residual error SSD structure and trains a network model using part image data.In another part It is tested on image to verify the generalization ability of model, in conjunction with by slightly to the strategy of essence and non-maxima suppression, optimal inspection As a result.The specific implementation steps are as follows:

Step 1: carrying out multiple dimensioned piecemeal to image；

In embodiment schematic diagram as shown in Figure 3, marking off five groups of sizes using sliding window method is respectively 256 × 256,320 × 320,384 × 384,448 × 448,512 × 512 image block, sliding step are followed successively by (64,96,128,160,256).By It is larger in mini-tags area ratio shared in small-sized image window, thus lesser step-length is used to small-sized image window It is slided, to obtain more mini-tags examples.Correspondingly, (including encirclement frame is wherein indicated to the label information of image Four coordinate values, mark classification) also carry out decomposing image block level.Label decomposition method is to unify all image blocks extremely 4 coordinate values that mark in image block surrounds frame are mapped between [0,512] by 512 × 512 sizes, flag category and packet Peripheral frame corresponds.On the other hand the size of unified image block is also conducive to subsequent batch of training, can accelerate network convergence rate.By In the image block substantial amounts that sliding window method generates, and mostly it is background information, is not suitable for fully entering network training.And residual error SSD Data extending processing has been carried out to input picture in network, and has proposed effective negative sample method for digging, so this hair It is bright that all image blocks comprising mark are all put into network, and random screening a part is instructed not comprising the image block of mark Practice to realize identification of the network to background area (the two quantitative proportion is about 1:2).Determine image block whether include mark side Method is: scanning for each image block, when there are some to indicate that the surface area for surrounding 50% or more frame falls in this image block When upper, it is believed that this image block includes the mark, retains this image block；Otherwise give up.

Step 2: building residual error SSD model

The basic network that the present invention uses residual error network ResNet101 to replace original VGGNet as SSD, constitutes residual error SSD mould Type, residual error SSD model are to extract the feedforward convolutional neural networks that several auxiliary convolutional layers of network building-out are constituted by foundation characteristic, it The object in different characteristic figure is matched using the frame of one group of discrete default size, calculating matching target belongs to of all categories Score, and frame is constantly adjusted with preferably Approximate object surround frame, composite basis feature extraction network it is last Output result on several layers of and auxiliary layer is predicted.As shown in figure 4, residual error SSD model is by reel lamination conv1, residual error Structure sheaf Res2x~Res5x and newly added auxiliary convolutional layer Conv2~Conv5 composition, as shown in figure 4, specific features are retouched It states as follows:

1) multiple dimensioned piecemeal is carried out to image, i.e., multiple series of images block is marked off to image to be detected using sliding window method；Accordingly Ground also decomposes the label information including wherein mark surrounds the classification of four coordinate values of frame, mark of image To image block level；Each image block is scanned for, when there are some to indicate that the surface area for surrounding 50% or more frame is fallen in When on this image block, it is believed that this image block includes the mark, retains this image block；Otherwise give up；By all figures comprising mark As block whole investment network, and random screening a part does not include the image block indicated and is trained to realize network to background area The identification in domain；

2) default frame of the frame Duplication higher than 0.5 is surrounded as the prediction target area of network using with object in original image, and Generate prediction target this region belong to every one kind Softmax fractional value and with default frame relative offset amount. Softmax calculation formula such as formula (1) is shown, wherein c_iIndicating the predicted value of the i-th class mark, (all kinds of marks and its title are shown in figure 1):

As it can be seen that the predicted value of all kinds of marks will be distributed between (0,1) after Softmax, can be regarded as every A kind of fractional value.

3) will default frame and network top (including Res3b3, Res5c, Pool6 and newly added Conv2, Conv3, Conv4, Con5) association of each characteristic pattern unit, default frame makees convolution algorithm to characteristic pattern, so that each frame example is relative to each The position of cell is fixed；Calculate offset and all class of each characteristic pattern relative to each frame example positions Other score.Specifically, it for each characteristic pattern unit, calculates c classification score and defaults 4 offsets of frame relative to k It measures (Δ (cx, cy, w, h)), so that each characteristic pattern needs (c+4) × k filter, the characteristic pattern of m × n will be generated (c+4) × k × m × n output.The use of different the ratio of width to height default frames, can effectively discrete output frame shape, improve With a possibility that and speed.

1) the comprehensive characteristic pattern using 7 different scale layers described in second step is predicted, realize multiple scale detecting with Identification.Shown in default frame dimension calculation formula such as formula (2) after Res5c on each characteristic pattern:

Wherein s_min、s_maxThe scale for defaulting frame in Res5c and top Pool6 is respectively indicated, 0.04 and 0.49 are set as, The scale of intermediate each layer equal difference interval within this range.Particularly, the scale for selecting Res3b3 layers of default frame is 0.01.In addition, In order to make to default the shape of frame preferably fit object frame, it is also necessary to different the ratio of width to height a is set, 7 scale layers of the present invention The ratio of width to height a is disposed asThen shown in the wide w of respective default frame, high h calculation formula such as formula (2):

Wherein, w, h respectively indicate width, the height of default frame；

In addition to the last layer, also it is in every layer of increase by one group of a=1, scaleDefault frame, setting default The center of frame isIt is wherein the size of rectangular characteristic pattern on kth layer, i, j ∈ [0, L_k).It is hereby achieved that The last layer characteristic pattern includes that default frame quantity is 5, and the quantity that frame is defaulted on other each layer characteristic patterns is 6L_k ², what network generated Defaulting frame sum is 32765.For 512 × 512 image block, then it is capable of detecting when that size exists using default frame above Mark in (5,273) range, this is consistent with the magnitude range of our data set identifications.

2) target frame and default frame are matched using Jaccard overlap strategy.First to each target frame, find With the maximum default frame of its Jaccard Overlap, as best match；Later in order to simplify study, each default frame is calculated With the Jaccard Overlap of target frame, as long as be greater than some threshold value, then it is assumed that this default frame match with target frame. Shown in Jaccard Overlap calculation formula such as formula (3):

3) objective function

The objective function of model training is indicated with prediction overall error, predicts overall error by reducing, and reaches optimization training Purpose.Prediction overall error is obtained by the error weighted sum classified with detection, shown in calculation formula such as formula (4).

4) data extending

A, using original picture block；

B, original picture block is sampled, so that the Jaccard Overlap of image block and original image block after sampling (outstanding karr Duplication is handed over and compared) is respectively 0.1,0.3,0.5,0.7,0.9；

C, to original picture block stochastical sampling.

Image block after wherein sampling is 0.1~1 times of original image block size, and the ratio of width to height is between 0.5~2.Finally will All image blocks are unified to 512 × 512 sizes, and pitch-based sphere overturning and distortion.

5) negative sample excavates

The balance of positive and negative sample size is most important to maintenance model stability.For this purpose, after each iteration all to all Default the L of frame_confValue is ranked up, and will be worth maximum several default collimation marks and is denoted as negative sample, so that the quantity ratio of positive negative sample Example is maintained at 1:3.

Step 4: completing the detection and identification with generalization ability: considering that sliding window method slides into the side in wide, the high direction of image The area of afterimage block is likely less than sliding window area when boundary, and test image is generally not present mark in edge；At this time not It slides again, incomplete piece is abandoned；In addition, taking a kind of by slightly to the inspection policies of essence, which includes: firstly, centering All images block under equal scales carries out preliminary residual error SSD detection；Then the result by confidence level higher than 0.3 maps back original image； To the image block under other four scales, if its (Overlap > 0 Jaccard) Chong Die with initial survey prediction block, is chosen throwing Enter residual error SSD detection, otherwise gives up；The result of above two step is integrated, on original image using non-maxima suppression into Row screening, completes detection and identification.The process of non-maxima suppression is as follows: to all prediction blocks according to fractional value carry out from height to Low sequence judges the Jaccard Overlap of the highest prediction block of score Yu other prediction blocks, if more than certain threshold value (this hair 0.5) bright middle selection, then abandons other prediction blocks；Then the Jaccard of judgement fractional high prediction block and other prediction blocks Overlap is abandoned and is greater than those of 0.5 prediction block；Then judge that high prediction block ... the of score third is so carried out down It goes, until having traversed all prediction blocks.

Embodiment of the present invention is described as follows:

1, data prediction

The quantitative proportion substantially 2:1 of Tsinghua-Tencent 100K data concentration training and test picture.Choose 45 Mark target as detection and identification of the class frequency of occurrence greater than 100, will surround frame coordinate, classification as detection and knows Other label.

2, experimental situation

The training and test of model all carry out at the end Linux PC, the Intel i7-7700K including a 32GB memory The NVIDIA GeForce GTX 1080Ti GPU of CPU and two 11G video memory.

3, learning parameter configures

Trained initial learning rate is set as 0.001,0.0001 is dropped to after 40000 iteration, later with 0.0001 Learning rate continue iteration 40000 times stopping.Using 0.9 momentum and 0.0005 value attenuation rate.It is enterprising in CAFFE framework Row experiment, is once tested for every training 20000 times.

4, test result and method comparison

By the size of mark be divided into small-sized (area≤322), medium-sized (322 < area≤962), it is large-scale (962 < area≤ 4002) it three groups, is carried out on test set by slightly to the detection of essence.Using the evaluation index of Microsoft COCO, statistics is all Result of the confidence level more than or equal to 0.01 assess and compare with the result of zhu, meng et al., as shown in Figure 6.Wherein (6a), (6b), (6c) respectively indicate the method for the present invention and other two methods to recall rate-essence of small-sized, medium-sized, large-scale mark Exactness curve.As can be seen that the method for the present invention achieves more preferably testing result to various sizes of mark.It counts all to set Reliability more than or equal to 0.5 as a result, obtain the method for the present invention and other two methods overall accuracy and recall rate compare, such as Overall accuracy shown in table 1 and recall rate comparing result.

Table 1

Method	Overall accuracy	Overall recall rate
			zhu	88%	91%
meng	90%	93%
			The present invention	94%	95%

Table 1, which reflects, is also significantly better than other two methods on the method for the present invention overall performance.The region used compared to zhu It is recommended that network, Preliminary detection of the invention, which is used, examines identical network with essence below, avoids additional expense and to interested The extraction effect in region is more preferable.Compared to the image down sampling partitioned mode that meng is used, the method for the present invention is adopted on original image It is divided with different scale, and reduces step-size in search, so that the accounting indicated in image block is higher, mentioned in conjunction with Resnet network Get more abundant feature.Compared to the model that zhu, meng are detected and positioned on single network layer, side of the present invention The residual error SSD model that method uses carries out characteristic matching in multiple and different network layers, adapts to the detection target of various sizes, Robustness is stronger.

In addition, the image block for 512 × 512 Pixel-levels that the method for the present invention resize is obtained can preferably reflect it is most of The feature of true traffic scene, therefore network model of the invention can also be used for the detection and knowledge of low resolution traffic scene image Not.In order to show this performance of the method for the present invention, the image blocks comprising traffic sign all in test image are supplemented Experiment.Test image root tuber is divided into 256 × 256 according to the size before resize, 320 × 320,384 × 384,448 × 448, 512 × 512 5 kinds, respectively to other three groups small-sized (area≤502), medium-sized by (50²< area≤100²), large size (100²< area ≤512²) mark detected, as a result as shown in Figure 7.It should be noted that area here is mark in 512 × 512 images Size in block rather than in original image.

As seen from Figure 7, the residual error SSD model that the method for the present invention uses equally has the detection of low resolution traffic image Effect, and preferable accuracy and recall rate are achieved for various sizes of traffic mark board.Compared to meng using 200 × The multiple dimensioned partitioned mode application of 200 image block, the method for the present invention is more extensive.

Claims

1. a kind of road traffic sign detection and recognition methods based on residual error SSD model, which is characterized in that this method specifically include with Lower step:

Step 1: carrying out multiple dimensioned piecemeal to image, i.e., multiple series of images block is marked off to image to be detected using sliding window method；Accordingly Ground also decomposes the label information including wherein mark surrounds the classification of four coordinate values of frame, mark of image To image block level；Each image block is scanned for, when there are some to indicate that the surface area for surrounding 50% or more frame is fallen in When on this image block, it is believed that this image block includes the mark, retains this image block；Otherwise give up；By all figures comprising mark As block whole investment network, and random screening a part does not include the image block indicated and is trained to realize network to background area The identification in domain；

Step 2: building residual error SSD model, that is, use residual error network ResNet101 as the basic network of SSD, constructed is residual Poor SSD model by reel lamination conv1, residual error structure sheaf Res2x~Res5x and newly added auxiliary convolutional layer Conv2~ Conv5 composition；Specific features are described as follows:

(1) 4 groups of convolutional layer Conv2~Conv5 are successively added after the top Res5c of residual error network ResNet101, are constituted Characteristic pattern more abundant, wherein each group of convolutional layer all includes 256 1 × 1 convolution filters and 512 3 × 3 convolution filters Wave device realizes the detection of Analysis On Multi-scale Features figure；

(2) default frame of the frame Duplication higher than 0.5 is surrounded as the prediction target area of network using with object in original image, and generate Prediction target this region belong to every one kind Softmax fractional value and with default frame relative offset amount.Softmax Calculation formula such as formula (1) is shown, wherein c_iIndicate the predicted value (all kinds of marks and its title are shown in Fig. 1) of the i-th class mark:

Step 3: network training, i.e., carry out gaussian random initialization to the weight of residual error network first, following is then used Strategy is practised to be trained network:

(1) the comprehensive characteristic pattern using 7 different scale layers described in second step is predicted, is realized multiple scale detecting and is known Not, shown in the default frame dimension calculation formula such as formula (1) after Res5c on each characteristic pattern:

Wherein s_min、s_maxThe scale for defaulting frame in Res5c and top Pool6 is respectively indicated, 0.04 and 0.49 are set as, it is intermediate The scale of each layer equal difference interval within this range.Particularly, the scale for selecting Res3b3 layers of default frame is 0.01；In addition, 7 rulers The ratio of width to height a of degree layer is disposed asThe then wide w of respective default frame, high h calculation formula such as formula (2) institute Show:

Wherein, w, h respectively indicate width, the height of default frame；

In addition to the last layer, also it is in every layer of increase by one group of a=1, scaleDefault frame, setting default frame Center isIt is wherein the size of rectangular characteristic pattern on kth layer, i, j ∈ [0, L_k).It is hereby achieved that The last layer characteristic pattern includes that default frame quantity is 5, and the quantity that frame is defaulted on other each layer characteristic patterns is 6L_k ², what network generated Defaulting frame sum is 32765.For 512 × 512 image block, then it is capable of detecting when that size exists using default frame above Mark in (5,273) range, this is consistent with the magnitude range of our data set identifications.

(2) target frame and default frame are matched using Jaccardoverlap strategy；First to each target frame, find and its The maximum default frame of JaccardOverlap, as best match；Later in order to simplify study, each default frame and target are calculated The JaccardOverlap of frame, as long as be greater than some threshold value, then it is assumed that this default frame matches with target frame. Shown in JaccardOverlap calculation formula such as formula (3):

Wherein, A, B respectively indicate the collection area of all pixels composition in default frame and target frame.In the present invention, Jaccard The threshold value of Overlap is set as 0.5.

(3) objective function of model training is indicated with prediction overall error, prediction overall error is weighted by classifying with the error of detection Summation obtains, shown in calculation formula such as formula (4).

Wherein, L indicates overall error, and weight α is set as 1, L_confAnd L_locRespectively indicate error in classification and detection error；Such as formula (5) Shown in shown and formula (6),

Wherein, x indicates match flag (x=1 expression match, x=0 indicate mismatch) of each prediction block with target frame, such asIndicate that classification is that i-th of default frame of p matches with j-th of target frame；C indicates the score (confidence level) of class prediction, L indicates prediction block, and g indicates that target frame, N indicate the quantity of default frame.If N=0, it is believed that overall error L is 0.

(4) data extending

In order to increase sample size, the healthy and strong model for training and adapting to all size and shape input is improved, to each original graph As block generates new image block data using one of following methods:

A. original picture block is used；

B. original picture block is sampled, so that (outstanding person blocks the Jaccard Overlap of image block and original image block after sampling That Duplication, that is, hand over and compare) it is respectively 0.1,0.3,0.5,0.7,0.9；

C. to original picture block stochastical sampling.

Image block after wherein sampling is 0.1~1 times of original image block size, and the ratio of width to height is between 0.5~2.To finally own Image block is unified to 512 × 512 sizes, and pitch-based sphere overturning and distortion；

(5) negative sample excavates

When in the image block after the midpoint of target frame falls in sampling, sanction goes target frame to fall in the part outside image block, and The target collimation mark after cutting is denoted as a positive sample in the image block；Otherwise it is labeled as negative sample；After each iteration all To the L of all default frames_confValue is ranked up, and will be worth maximum several default collimation marks and is denoted as negative sample, so that positive negative sample Quantitative proportion is maintained at 1:3；

Step 4: completing the detection and identification with generalization ability: when considering that sliding window method slides into the boundary in wide, the high direction of image The area of afterimage block is likely less than sliding window area, and test image is generally not present mark in edge；It is no longer sliding at this time It is dynamic, incomplete piece is abandoned；In addition, taking a kind of by slightly to the inspection policies of essence, which includes: firstly, to medium ruler All images block under degree carries out preliminary residual error SSD detection；Then the result by confidence level higher than 0.3 maps back original image；To it Image block under his four scales is chosen investment residual if its (Overlap > 0 Jaccard) Chong Die with initial survey prediction block Poor SSD detection, otherwise gives up；The result of above two step is integrated, is sieved on original image using non-maxima suppression Detection and identification are completed in choosing.

2. a kind of road traffic sign detection and recognition methods, feature based on residual error SSD model as described in claim 1 exists In wherein the process of non-maxima suppression is as follows: being sorted from high to low to all prediction blocks according to fractional value, judge score The Jaccard Overlap of highest prediction block and other prediction blocks then abandons other prediction blocks if more than threshold value；Then The Jaccard Overlap of judgement fractional high prediction block and other prediction blocks, abandons and is greater than those of threshold value prediction block；It connects Judge that high prediction block ... the of score third so goes on, until having traversed all prediction blocks.