CN108960198A - A kind of road traffic sign detection and recognition methods based on residual error SSD model - Google Patents

A kind of road traffic sign detection and recognition methods based on residual error SSD model Download PDF

Info

Publication number
CN108960198A
CN108960198A CN201810850416.0A CN201810850416A CN108960198A CN 108960198 A CN108960198 A CN 108960198A CN 201810850416 A CN201810850416 A CN 201810850416A CN 108960198 A CN108960198 A CN 108960198A
Authority
CN
China
Prior art keywords
frame
block
network
image
image block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810850416.0A
Other languages
Chinese (zh)
Inventor
张淑芳
朱彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201810850416.0A priority Critical patent/CN108960198A/en
Publication of CN108960198A publication Critical patent/CN108960198A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a kind of road traffic sign detection based on residual error SSD model and recognition methods, step 1: carrying out multiple dimensioned piecemeal to image;Step 2: using residual error network ResNet101 as the basic network of SSD, residual error SSD model is constructed;Step 3: carrying out network training;Step 4: completing the detection and identification with generalization ability.The present invention is directed to improve existing SSD network to the accuracy of small target deteection, the effective detection and identification to multiclass difference sized signage in the true traffic scene of China are realized.

Description

A kind of road traffic sign detection and recognition methods based on residual error SSD model
Technical field
The present invention relates to image detection and identification field, deep learning applied technical field, in particular to a kind of traffic marks Will image detection and recognition methods.
Background technique
Intelligent transportation Mark Detection and identification are an important technologies of advanced driving assistance system (ADAS), to realization Road conditions early warning, anticollision avoidance are made that great contribution.Traditional machine learning method generally passes through segmentation area-of-interest and mentions Characteristics of image is taken, target is identified using single or several classification operators.The feature of main foundation has the shallow-layer of image Information such as color, shape etc., the characteristic pattern etc. of multiple features such as vision significance information such as blend color, brightness, direction, part Three kinds of invariant features information such as histogram of gradients etc..Although corresponding detection of classifier speed is quickly, due to feature representation piece Face and it is single, do not have universality, complicated background or chaff interferent be numerous, mark itself has the case where distortion damage Under, false detection rate omission factor is high.
The limitation of image, semantic expression has been broken in the appearance of deep learning especially depth convolutional neural networks (DCNN), leads to Feature representation abundant and matching, assessment strategy are crossed, accuracy of identification greatly improved.R-CNN(Region Proposal Neutral Network, region convolutional neural networks) the bottom-up positioning of network and segmentation object suggested by region, in conjunction with CNN classification and recurrence device realization accurately detection and identification, open what region suggestion was combined with convolutional neural networks (CNN) The beginning.ROI pooling layer is added on the basis of R-CNN in Fast R-CNN after the last layer convolution, and by frame Recurrence is added to training in CNN network, accelerates the detection speed of network.Faster R-CNN uses the RPN of a full convolution Network replaces Selective Search to generate suggestion window, and RPN network and front and back end network weight are shared, constitute One end-to-end high performance network frame.RFCN(Region-based Fully Convolution Neutral Network, the full convolutional neural networks in region) the ROI pooling layer that then has devised a position sensing, it realizes point Double height of class and positioning accuracy.Although higher by the network precision that region is suggested, the performance in speed is not to the utmost such as people Meaning.For this purpose, some network structures for not depending on region suggestion also occur successively.YOLO network by the extraction of object frame and identification two Step operation is combined, using whole figure as the input of network, directly in the position of output layer output regression frame and affiliated Classification.The method that SSD network then optimizes the direct grid division of YOLO is known at the feature map of different levels Not, multiple scale detecting is realized.In addition, the segmentation of some pixel scales and identification network such as SegNet, Deeplab etc. are also successively It is suggested, but there is presently no be widely used since segmentation precision is not high.
On the other hand, the national communications such as Germany, Belgium flag data collection publishes successively, pushes road traffic sign detection It flourishes with identification field.Especially in the research of German Vehicle Detection data set GTSDB and identification data set GTSRB, Research has been realized in 100% detection accuracy and 99.67% classification accuracy rate both at home and abroad.But since traffic sign is at this Accounting is excessive in a little images, background information serious loss, and the traffic in China is often more complicated than external, thus these are studied It is not able to satisfy the actually detected demand of China's traffic scene.
Summary of the invention
Deep learning is applied to Vehicle Detection field, the present invention proposes a kind of traffic sign based on residual error SSD model Detection and recognition methods are realized different to multiclass in the true traffic scene of China using SSD network to the detection technique of Small object The effective detection and identification of sized signage.
A kind of road traffic sign detection and recognition methods based on residual error SSD model of the invention, this method specifically include with Lower step:
Step 1: carrying out multiple dimensioned piecemeal to image, i.e., multiple series of images block is marked off to image to be detected using sliding window method; Correspondingly, the label information including wherein mark surrounds the classification of four coordinate values of frame, mark of image is also carried out Decompose image block level;Each image block is scanned for, when there are the surface areas that some indicates 50% or more frame of encirclement When falling on this image block, it is believed that this image block includes the mark, retains this image block;Otherwise give up;Include mark for all Image block all investment networks, and random screening a part do not include mark image block be trained to realize network to back The identification of scene area;
Step 2: building residual error SSD model, that is, use residual error network ResNet101 as the basic network of SSD, it is constructed Residual error SSD model by reel lamination conv1, residual error structure sheaf Res2x~Res5x and newly added auxiliary convolutional layer Conv2~Conv5 composition;Specific features are described as follows:
(1) 4 groups of convolutional layer Conv2~Conv5 are successively added after the top Res5c of residual error network ResNet101, Constitute characteristic pattern more abundant, wherein each group of convolutional layer all includes 256 1 × 1 convolution filters and 512 volume 3 × 3 Product filter realizes the detection of Analysis On Multi-scale Features figure;
(2) default frame of the frame Duplication higher than 0.5 is surrounded as the prediction target area of network using with object in original image, and Generate prediction target this region belong to every one kind Softmax fractional value and with default frame relative offset amount. Softmax calculation formula such as formula (1) is shown, wherein ciIndicating the predicted value of the i-th class mark, (all kinds of marks and its title are shown in figure 1):
The predicted value of all kinds of marks will be distributed between (0,1) after Softmax, the fractional value as every one kind;
(3) will default frame and network top (including Res3b3, Res5c, Pool6 and newly added Conv2, Conv3, Conv4, Con5) association of each characteristic pattern unit, default frame makees convolution algorithm to characteristic pattern, so that each frame example is relative to each The position of cell is fixed;Using the default frame with different the ratio of width to height, the effectively shape of discrete output frame is improved With a possibility that and speed;
Step 3: network training, i.e., carry out gaussian random initialization to the weight of residual error network first, then using as follows Learning strategy network is trained:
(1) the comprehensive characteristic pattern using 7 different scale layers described in second step is predicted, realize multiple scale detecting with It identifies, shown in the default frame dimension calculation formula such as formula (1) after Res5c on each characteristic pattern:
Wherein smin、smaxThe scale for defaulting frame in Res5c and top Pool6 is respectively indicated, 0.04 and 0.49 are set as, The scale of intermediate each layer equal difference interval within this range.Particularly, the scale for selecting Res3b3 layers of default frame is 0.01;In addition, 7 The ratio of width to height a of a scale layer is disposed asThe then wide w of respective default frame, high h calculation formula such as formula (2) shown in:
Wherein, w, h respectively indicate width, the height of default frame;
In addition to the last layer, also it is in every layer of increase by one group of a=1, scaleDefault frame, setting default The center of frame isIt is wherein the size of rectangular characteristic pattern on kth layer, i, j ∈ [0, Lk).It is possible thereby to It is 5 that the last layer characteristic pattern, which is obtained, comprising default frame quantity, and the quantity that frame is defaulted on other each layer characteristic patterns is 6Lk 2, network production Raw default frame sum is 32765.For 512 × 512 image block, then it is capable of detecting when that size is big using default frame above The small mark in (5,273) range, this is consistent with the magnitude range of our data set identifications.
(2) target frame and default frame are matched using Jaccardoverlap strategy;First to each target frame, find With the maximum default frame of its JaccardOverlap, as best match;Later in order to simplify study, calculate each default frame and The JaccardOverlap of target frame, as long as be greater than some threshold value, then it is assumed that this default frame matches with target frame. Shown in JaccardOverlap calculation formula such as formula (3):
Wherein, A, B respectively indicate the collection area of all pixels composition in default frame and target frame.In the present invention, The threshold value of Jaccard Overlap is set as 0.5.
(3) objective function of model training is indicated with prediction overall error, prediction overall error is by classification and the error detected Weighted sum obtains, shown in calculation formula such as formula (4).
Wherein, L indicates overall error, and weight α is set as 1, LconfAnd LlocRespectively indicate error in classification and detection error;Such as Shown in formula (5) and shown in formula (6),
Wherein, x indicates match flag (x=1 expression match, x=0 indicate mismatch) of each prediction block with target frame, Such asIndicate that classification is that i-th of default frame of p matches with j-th of target frame;C indicates that the score of class prediction (is set Reliability), l indicates prediction block, and g indicates that target frame, N indicate the quantity of default frame.If N=0, it is believed that overall error L is 0.
(4) data extending
In order to increase sample size, the healthy and strong model for training and adapting to all size and shape input is improved, to each original Beginning image block generates new image block data using one of following methods:
A. original picture block is used;
B. original picture block is sampled, so that the Jaccard Overlap of image block and original image block after sampling (outstanding karr Duplication is handed over and compared) is respectively 0.1,0.3,0.5,0.7,0.9;
C. to original picture block stochastical sampling.
Image block after wherein sampling is 0.1~1 times of original image block size, and the ratio of width to height is between 0.5~2.Finally will All image blocks are unified to 512 × 512 sizes, and pitch-based sphere overturning and distortion;
(5) negative sample excavates
When in the image block after the midpoint of target frame falls in sampling, sanction goes target frame to fall in the part outside image block, And the target collimation mark after cutting is denoted as a positive sample in the image block;Otherwise it is labeled as negative sample;Each iteration it L all to all default frames afterwardsconfValue is ranked up, and will be worth maximum several default collimation marks and is denoted as negative sample, so that positive and negative sample This quantitative proportion is maintained at 1:3;
Step 4: completing the detection and identification with generalization ability: considering that sliding window method slides into the side in wide, the high direction of image The area of afterimage block is likely less than sliding window area when boundary, and test image is generally not present mark in edge;At this time not It slides again, incomplete piece is abandoned;In addition, taking a kind of by slightly to the inspection policies of essence, which includes: firstly, centering All images block under equal scales carries out preliminary residual error SSD detection;Then the result by confidence level higher than 0.3 maps back original image; To the image block under other four scales, if its (Overlap > 0 Jaccard) Chong Die with initial survey prediction block, is chosen throwing Enter residual error SSD detection, otherwise gives up;The result of above two step is integrated, on original image using non-maxima suppression into Row screening, completes detection and identification.
The present invention is directed to improve existing SSD network to the accuracy of small target deteection, realize to the true traffic scene of China The effective detection and identification of middle multiclass difference sized signage.
Appended drawing reference
Fig. 1 is 45 class sign images of the invention and corresponding item name;
Fig. 2 is a kind of overall flow figure of road traffic sign detection and recognition methods based on residual error SSD model of the invention;
Fig. 3 is the multiple dimensioned piecemeal embodiment schematic diagram of the present invention;
Fig. 4 is residual error SSD illustraton of model;
Fig. 5 is by slightly to the overhaul flow chart of essence
Fig. 6 present invention is compared with other methods are to the detection performance of different ruler marks;
Fig. 7 is testing result of the model of the present invention to low-resolution image, (7a) mini-tags testing result;(7b) is medium-sized Mark Detection result;(7c) large size Mark Detection result.
Specific embodiment
A specific embodiment of the invention is described in further detail below in conjunction with attached drawing.
The embodiment of the present invention is with the newest disclosed traffic sign data set of Tsinghua University and joint laboratory of Tencent Tsinghua-Tencent 100K is research object.In Tsinghua-Tencent 100K data set, the resolution ratio of image It is 2048 × 2048, each mark only accounts for the very small part (such as 1%) of image, and there may be multiple in every image Various sizes of sign board.In view of GPU limited memory, a kind of traffic sign based on residual error SSD model proposed by the present invention Large-size images are decomposed into multiscale image block detection first, then the testing result of block level are reflected by detection and recognition methods It is mapped to original image level, construct residual error SSD structure and trains a network model using part image data.In another part It is tested on image to verify the generalization ability of model, in conjunction with by slightly to the strategy of essence and non-maxima suppression, optimal inspection As a result.The specific implementation steps are as follows:
Step 1: carrying out multiple dimensioned piecemeal to image;
In embodiment schematic diagram as shown in Figure 3, marking off five groups of sizes using sliding window method is respectively 256 × 256,320 × 320,384 × 384,448 × 448,512 × 512 image block, sliding step are followed successively by (64,96,128,160,256).By It is larger in mini-tags area ratio shared in small-sized image window, thus lesser step-length is used to small-sized image window It is slided, to obtain more mini-tags examples.Correspondingly, (including encirclement frame is wherein indicated to the label information of image Four coordinate values, mark classification) also carry out decomposing image block level.Label decomposition method is to unify all image blocks extremely 4 coordinate values that mark in image block surrounds frame are mapped between [0,512] by 512 × 512 sizes, flag category and packet Peripheral frame corresponds.On the other hand the size of unified image block is also conducive to subsequent batch of training, can accelerate network convergence rate.By In the image block substantial amounts that sliding window method generates, and mostly it is background information, is not suitable for fully entering network training.And residual error SSD Data extending processing has been carried out to input picture in network, and has proposed effective negative sample method for digging, so this hair It is bright that all image blocks comprising mark are all put into network, and random screening a part is instructed not comprising the image block of mark Practice to realize identification of the network to background area (the two quantitative proportion is about 1:2).Determine image block whether include mark side Method is: scanning for each image block, when there are some to indicate that the surface area for surrounding 50% or more frame falls in this image block When upper, it is believed that this image block includes the mark, retains this image block;Otherwise give up.
Step 2: building residual error SSD model
The basic network that the present invention uses residual error network ResNet101 to replace original VGGNet as SSD, constitutes residual error SSD mould Type, residual error SSD model are to extract the feedforward convolutional neural networks that several auxiliary convolutional layers of network building-out are constituted by foundation characteristic, it The object in different characteristic figure is matched using the frame of one group of discrete default size, calculating matching target belongs to of all categories Score, and frame is constantly adjusted with preferably Approximate object surround frame, composite basis feature extraction network it is last Output result on several layers of and auxiliary layer is predicted.As shown in figure 4, residual error SSD model is by reel lamination conv1, residual error Structure sheaf Res2x~Res5x and newly added auxiliary convolutional layer Conv2~Conv5 composition, as shown in figure 4, specific features are retouched It states as follows:
1) multiple dimensioned piecemeal is carried out to image, i.e., multiple series of images block is marked off to image to be detected using sliding window method;Accordingly Ground also decomposes the label information including wherein mark surrounds the classification of four coordinate values of frame, mark of image To image block level;Each image block is scanned for, when there are some to indicate that the surface area for surrounding 50% or more frame is fallen in When on this image block, it is believed that this image block includes the mark, retains this image block;Otherwise give up;By all figures comprising mark As block whole investment network, and random screening a part does not include the image block indicated and is trained to realize network to background area The identification in domain;
2) default frame of the frame Duplication higher than 0.5 is surrounded as the prediction target area of network using with object in original image, and Generate prediction target this region belong to every one kind Softmax fractional value and with default frame relative offset amount. Softmax calculation formula such as formula (1) is shown, wherein ciIndicating the predicted value of the i-th class mark, (all kinds of marks and its title are shown in figure 1):
As it can be seen that the predicted value of all kinds of marks will be distributed between (0,1) after Softmax, can be regarded as every A kind of fractional value.
3) will default frame and network top (including Res3b3, Res5c, Pool6 and newly added Conv2, Conv3, Conv4, Con5) association of each characteristic pattern unit, default frame makees convolution algorithm to characteristic pattern, so that each frame example is relative to each The position of cell is fixed;Calculate offset and all class of each characteristic pattern relative to each frame example positions Other score.Specifically, it for each characteristic pattern unit, calculates c classification score and defaults 4 offsets of frame relative to k It measures (Δ (cx, cy, w, h)), so that each characteristic pattern needs (c+4) × k filter, the characteristic pattern of m × n will be generated (c+4) × k × m × n output.The use of different the ratio of width to height default frames, can effectively discrete output frame shape, improve With a possibility that and speed.
Step 3: network training, i.e., carry out gaussian random initialization to the weight of residual error network first, then using as follows Learning strategy network is trained:
1) the comprehensive characteristic pattern using 7 different scale layers described in second step is predicted, realize multiple scale detecting with Identification.Shown in default frame dimension calculation formula such as formula (2) after Res5c on each characteristic pattern:
Wherein smin、smaxThe scale for defaulting frame in Res5c and top Pool6 is respectively indicated, 0.04 and 0.49 are set as, The scale of intermediate each layer equal difference interval within this range.Particularly, the scale for selecting Res3b3 layers of default frame is 0.01.In addition, In order to make to default the shape of frame preferably fit object frame, it is also necessary to different the ratio of width to height a is set, 7 scale layers of the present invention The ratio of width to height a is disposed asThen shown in the wide w of respective default frame, high h calculation formula such as formula (2):
Wherein, w, h respectively indicate width, the height of default frame;
In addition to the last layer, also it is in every layer of increase by one group of a=1, scaleDefault frame, setting default The center of frame isIt is wherein the size of rectangular characteristic pattern on kth layer, i, j ∈ [0, Lk).It is hereby achieved that The last layer characteristic pattern includes that default frame quantity is 5, and the quantity that frame is defaulted on other each layer characteristic patterns is 6Lk 2, what network generated Defaulting frame sum is 32765.For 512 × 512 image block, then it is capable of detecting when that size exists using default frame above Mark in (5,273) range, this is consistent with the magnitude range of our data set identifications.
2) target frame and default frame are matched using Jaccard overlap strategy.First to each target frame, find With the maximum default frame of its Jaccard Overlap, as best match;Later in order to simplify study, each default frame is calculated With the Jaccard Overlap of target frame, as long as be greater than some threshold value, then it is assumed that this default frame match with target frame. Shown in Jaccard Overlap calculation formula such as formula (3):
Wherein, A, B respectively indicate the collection area of all pixels composition in default frame and target frame.In the present invention, The threshold value of Jaccard Overlap is set as 0.5.
3) objective function
The objective function of model training is indicated with prediction overall error, predicts overall error by reducing, and reaches optimization training Purpose.Prediction overall error is obtained by the error weighted sum classified with detection, shown in calculation formula such as formula (4).
Wherein, L indicates overall error, and weight α is set as 1, LconfAnd LlocRespectively indicate error in classification and detection error;Such as Shown in formula (5) and shown in formula (6),
Wherein, x indicates match flag (x=1 expression match, x=0 indicate mismatch) of each prediction block with target frame, Such asIndicate that classification is that i-th of default frame of p matches with j-th of target frame;C indicates that the score of class prediction (is set Reliability), l indicates prediction block, and g indicates that target frame, N indicate the quantity of default frame.If N=0, it is believed that overall error L is 0.
4) data extending
In order to increase sample size, the healthy and strong model for training and adapting to all size and shape input is improved, to each original Beginning image block generates new image block data using one of following methods:
A, using original picture block;
B, original picture block is sampled, so that the Jaccard Overlap of image block and original image block after sampling (outstanding karr Duplication is handed over and compared) is respectively 0.1,0.3,0.5,0.7,0.9;
C, to original picture block stochastical sampling.
Image block after wherein sampling is 0.1~1 times of original image block size, and the ratio of width to height is between 0.5~2.Finally will All image blocks are unified to 512 × 512 sizes, and pitch-based sphere overturning and distortion.
Image block after wherein sampling is 0.1~1 times of original image block size, and the ratio of width to height is between 0.5~2.Finally will All image blocks are unified to 512 × 512 sizes, and pitch-based sphere overturning and distortion.
5) negative sample excavates
The balance of positive and negative sample size is most important to maintenance model stability.For this purpose, after each iteration all to all Default the L of frameconfValue is ranked up, and will be worth maximum several default collimation marks and is denoted as negative sample, so that the quantity ratio of positive negative sample Example is maintained at 1:3.
Step 4: completing the detection and identification with generalization ability: considering that sliding window method slides into the side in wide, the high direction of image The area of afterimage block is likely less than sliding window area when boundary, and test image is generally not present mark in edge;At this time not It slides again, incomplete piece is abandoned;In addition, taking a kind of by slightly to the inspection policies of essence, which includes: firstly, centering All images block under equal scales carries out preliminary residual error SSD detection;Then the result by confidence level higher than 0.3 maps back original image; To the image block under other four scales, if its (Overlap > 0 Jaccard) Chong Die with initial survey prediction block, is chosen throwing Enter residual error SSD detection, otherwise gives up;The result of above two step is integrated, on original image using non-maxima suppression into Row screening, completes detection and identification.The process of non-maxima suppression is as follows: to all prediction blocks according to fractional value carry out from height to Low sequence judges the Jaccard Overlap of the highest prediction block of score Yu other prediction blocks, if more than certain threshold value (this hair 0.5) bright middle selection, then abandons other prediction blocks;Then the Jaccard of judgement fractional high prediction block and other prediction blocks Overlap is abandoned and is greater than those of 0.5 prediction block;Then judge that high prediction block ... the of score third is so carried out down It goes, until having traversed all prediction blocks.
Embodiment of the present invention is described as follows:
1, data prediction
The quantitative proportion substantially 2:1 of Tsinghua-Tencent 100K data concentration training and test picture.Choose 45 Mark target as detection and identification of the class frequency of occurrence greater than 100, will surround frame coordinate, classification as detection and knows Other label.
2, experimental situation
The training and test of model all carry out at the end Linux PC, the Intel i7-7700K including a 32GB memory The NVIDIA GeForce GTX 1080Ti GPU of CPU and two 11G video memory.
3, learning parameter configures
Trained initial learning rate is set as 0.001,0.0001 is dropped to after 40000 iteration, later with 0.0001 Learning rate continue iteration 40000 times stopping.Using 0.9 momentum and 0.0005 value attenuation rate.It is enterprising in CAFFE framework Row experiment, is once tested for every training 20000 times.
4, test result and method comparison
By the size of mark be divided into small-sized (area≤322), medium-sized (322 < area≤962), it is large-scale (962 < area≤ 4002) it three groups, is carried out on test set by slightly to the detection of essence.Using the evaluation index of Microsoft COCO, statistics is all Result of the confidence level more than or equal to 0.01 assess and compare with the result of zhu, meng et al., as shown in Figure 6.Wherein (6a), (6b), (6c) respectively indicate the method for the present invention and other two methods to recall rate-essence of small-sized, medium-sized, large-scale mark Exactness curve.As can be seen that the method for the present invention achieves more preferably testing result to various sizes of mark.It counts all to set Reliability more than or equal to 0.5 as a result, obtain the method for the present invention and other two methods overall accuracy and recall rate compare, such as Overall accuracy shown in table 1 and recall rate comparing result.
Table 1
Method Overall accuracy Overall recall rate
zhu 88% 91%
meng 90% 93%
The present invention 94% 95%
Table 1, which reflects, is also significantly better than other two methods on the method for the present invention overall performance.The region used compared to zhu It is recommended that network, Preliminary detection of the invention, which is used, examines identical network with essence below, avoids additional expense and to interested The extraction effect in region is more preferable.Compared to the image down sampling partitioned mode that meng is used, the method for the present invention is adopted on original image It is divided with different scale, and reduces step-size in search, so that the accounting indicated in image block is higher, mentioned in conjunction with Resnet network Get more abundant feature.Compared to the model that zhu, meng are detected and positioned on single network layer, side of the present invention The residual error SSD model that method uses carries out characteristic matching in multiple and different network layers, adapts to the detection target of various sizes, Robustness is stronger.
In addition, the image block for 512 × 512 Pixel-levels that the method for the present invention resize is obtained can preferably reflect it is most of The feature of true traffic scene, therefore network model of the invention can also be used for the detection and knowledge of low resolution traffic scene image Not.In order to show this performance of the method for the present invention, the image blocks comprising traffic sign all in test image are supplemented Experiment.Test image root tuber is divided into 256 × 256 according to the size before resize, 320 × 320,384 × 384,448 × 448, 512 × 512 5 kinds, respectively to other three groups small-sized (area≤502), medium-sized by (502< area≤1002), large size (1002< area ≤5122) mark detected, as a result as shown in Figure 7.It should be noted that area here is mark in 512 × 512 images Size in block rather than in original image.
As seen from Figure 7, the residual error SSD model that the method for the present invention uses equally has the detection of low resolution traffic image Effect, and preferable accuracy and recall rate are achieved for various sizes of traffic mark board.Compared to meng using 200 × The multiple dimensioned partitioned mode application of 200 image block, the method for the present invention is more extensive.

Claims (2)

1. a kind of road traffic sign detection and recognition methods based on residual error SSD model, which is characterized in that this method specifically include with Lower step:
Step 1: carrying out multiple dimensioned piecemeal to image, i.e., multiple series of images block is marked off to image to be detected using sliding window method;Accordingly Ground also decomposes the label information including wherein mark surrounds the classification of four coordinate values of frame, mark of image To image block level;Each image block is scanned for, when there are some to indicate that the surface area for surrounding 50% or more frame is fallen in When on this image block, it is believed that this image block includes the mark, retains this image block;Otherwise give up;By all figures comprising mark As block whole investment network, and random screening a part does not include the image block indicated and is trained to realize network to background area The identification in domain;
Step 2: building residual error SSD model, that is, use residual error network ResNet101 as the basic network of SSD, constructed is residual Poor SSD model by reel lamination conv1, residual error structure sheaf Res2x~Res5x and newly added auxiliary convolutional layer Conv2~ Conv5 composition;Specific features are described as follows:
(1) 4 groups of convolutional layer Conv2~Conv5 are successively added after the top Res5c of residual error network ResNet101, are constituted Characteristic pattern more abundant, wherein each group of convolutional layer all includes 256 1 × 1 convolution filters and 512 3 × 3 convolution filters Wave device realizes the detection of Analysis On Multi-scale Features figure;
(2) default frame of the frame Duplication higher than 0.5 is surrounded as the prediction target area of network using with object in original image, and generate Prediction target this region belong to every one kind Softmax fractional value and with default frame relative offset amount.Softmax Calculation formula such as formula (1) is shown, wherein ciIndicate the predicted value (all kinds of marks and its title are shown in Fig. 1) of the i-th class mark:
The predicted value of all kinds of marks will be distributed between (0,1) after Softmax, the fractional value as every one kind;
(3) will default frame and network top (including Res3b3, Res5c, Pool6 and newly added Conv2, Conv3, Conv4, Con5) association of each characteristic pattern unit, default frame makees convolution algorithm to characteristic pattern, so that each frame example is relative to each The position of cell is fixed;Using the default frame with different the ratio of width to height, the effectively shape of discrete output frame is improved With a possibility that and speed;
Step 3: network training, i.e., carry out gaussian random initialization to the weight of residual error network first, following is then used Strategy is practised to be trained network:
(1) the comprehensive characteristic pattern using 7 different scale layers described in second step is predicted, is realized multiple scale detecting and is known Not, shown in the default frame dimension calculation formula such as formula (1) after Res5c on each characteristic pattern:
Wherein smin、smaxThe scale for defaulting frame in Res5c and top Pool6 is respectively indicated, 0.04 and 0.49 are set as, it is intermediate The scale of each layer equal difference interval within this range.Particularly, the scale for selecting Res3b3 layers of default frame is 0.01;In addition, 7 rulers The ratio of width to height a of degree layer is disposed asThe then wide w of respective default frame, high h calculation formula such as formula (2) institute Show:
Wherein, w, h respectively indicate width, the height of default frame;
In addition to the last layer, also it is in every layer of increase by one group of a=1, scaleDefault frame, setting default frame Center isIt is wherein the size of rectangular characteristic pattern on kth layer, i, j ∈ [0, Lk).It is hereby achieved that The last layer characteristic pattern includes that default frame quantity is 5, and the quantity that frame is defaulted on other each layer characteristic patterns is 6Lk 2, what network generated Defaulting frame sum is 32765.For 512 × 512 image block, then it is capable of detecting when that size exists using default frame above Mark in (5,273) range, this is consistent with the magnitude range of our data set identifications.
(2) target frame and default frame are matched using Jaccardoverlap strategy;First to each target frame, find and its The maximum default frame of JaccardOverlap, as best match;Later in order to simplify study, each default frame and target are calculated The JaccardOverlap of frame, as long as be greater than some threshold value, then it is assumed that this default frame matches with target frame. Shown in JaccardOverlap calculation formula such as formula (3):
Wherein, A, B respectively indicate the collection area of all pixels composition in default frame and target frame.In the present invention, Jaccard The threshold value of Overlap is set as 0.5.
(3) objective function of model training is indicated with prediction overall error, prediction overall error is weighted by classifying with the error of detection Summation obtains, shown in calculation formula such as formula (4).
Wherein, L indicates overall error, and weight α is set as 1, LconfAnd LlocRespectively indicate error in classification and detection error;Such as formula (5) Shown in shown and formula (6),
Wherein, x indicates match flag (x=1 expression match, x=0 indicate mismatch) of each prediction block with target frame, such asIndicate that classification is that i-th of default frame of p matches with j-th of target frame;C indicates the score (confidence level) of class prediction, L indicates prediction block, and g indicates that target frame, N indicate the quantity of default frame.If N=0, it is believed that overall error L is 0.
(4) data extending
In order to increase sample size, the healthy and strong model for training and adapting to all size and shape input is improved, to each original graph As block generates new image block data using one of following methods:
A. original picture block is used;
B. original picture block is sampled, so that (outstanding person blocks the Jaccard Overlap of image block and original image block after sampling That Duplication, that is, hand over and compare) it is respectively 0.1,0.3,0.5,0.7,0.9;
C. to original picture block stochastical sampling.
Image block after wherein sampling is 0.1~1 times of original image block size, and the ratio of width to height is between 0.5~2.To finally own Image block is unified to 512 × 512 sizes, and pitch-based sphere overturning and distortion;
(5) negative sample excavates
When in the image block after the midpoint of target frame falls in sampling, sanction goes target frame to fall in the part outside image block, and The target collimation mark after cutting is denoted as a positive sample in the image block;Otherwise it is labeled as negative sample;After each iteration all To the L of all default framesconfValue is ranked up, and will be worth maximum several default collimation marks and is denoted as negative sample, so that positive negative sample Quantitative proportion is maintained at 1:3;
Step 4: completing the detection and identification with generalization ability: when considering that sliding window method slides into the boundary in wide, the high direction of image The area of afterimage block is likely less than sliding window area, and test image is generally not present mark in edge;It is no longer sliding at this time It is dynamic, incomplete piece is abandoned;In addition, taking a kind of by slightly to the inspection policies of essence, which includes: firstly, to medium ruler All images block under degree carries out preliminary residual error SSD detection;Then the result by confidence level higher than 0.3 maps back original image;To it Image block under his four scales is chosen investment residual if its (Overlap > 0 Jaccard) Chong Die with initial survey prediction block Poor SSD detection, otherwise gives up;The result of above two step is integrated, is sieved on original image using non-maxima suppression Detection and identification are completed in choosing.
2. a kind of road traffic sign detection and recognition methods, feature based on residual error SSD model as described in claim 1 exists In wherein the process of non-maxima suppression is as follows: being sorted from high to low to all prediction blocks according to fractional value, judge score The Jaccard Overlap of highest prediction block and other prediction blocks then abandons other prediction blocks if more than threshold value;Then The Jaccard Overlap of judgement fractional high prediction block and other prediction blocks, abandons and is greater than those of threshold value prediction block;It connects Judge that high prediction block ... the of score third so goes on, until having traversed all prediction blocks.
CN201810850416.0A 2018-07-28 2018-07-28 A kind of road traffic sign detection and recognition methods based on residual error SSD model Pending CN108960198A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810850416.0A CN108960198A (en) 2018-07-28 2018-07-28 A kind of road traffic sign detection and recognition methods based on residual error SSD model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810850416.0A CN108960198A (en) 2018-07-28 2018-07-28 A kind of road traffic sign detection and recognition methods based on residual error SSD model

Publications (1)

Publication Number Publication Date
CN108960198A true CN108960198A (en) 2018-12-07

Family

ID=64466056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810850416.0A Pending CN108960198A (en) 2018-07-28 2018-07-28 A kind of road traffic sign detection and recognition methods based on residual error SSD model

Country Status (1)

Country Link
CN (1) CN108960198A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109613006A (en) * 2018-12-22 2019-04-12 中原工学院 A kind of fabric defect detection method based on end-to-end neural network
CN109815906A (en) * 2019-01-25 2019-05-28 华中科技大学 Method for traffic sign detection and system based on substep deep learning
CN109840498A (en) * 2019-01-31 2019-06-04 华南理工大学 A kind of real-time pedestrian detection method and neural network, target detection layer
CN109858349A (en) * 2018-12-25 2019-06-07 五邑大学 A kind of traffic sign recognition method and its device based on improvement YOLO model
CN110110684A (en) * 2019-05-14 2019-08-09 深圳供电局有限公司 Exotic recognition methods, device and the computer equipment of power transmission line equipment
CN110135307A (en) * 2019-04-30 2019-08-16 北京邮电大学 Method for traffic sign detection and device based on attention mechanism
CN110188780A (en) * 2019-06-03 2019-08-30 电子科技大学中山学院 Method and device for constructing deep learning model for positioning multi-target feature points
CN110287806A (en) * 2019-05-30 2019-09-27 华南师范大学 A kind of traffic sign recognition method based on improvement SSD network
CN110414417A (en) * 2019-07-25 2019-11-05 电子科技大学 A kind of traffic mark board recognition methods based on multi-level Fusion multi-scale prediction
CN110543879A (en) * 2019-08-20 2019-12-06 高新兴科技集团股份有限公司 SSD target detection method based on SE module and computer storage medium
CN110580505A (en) * 2019-08-29 2019-12-17 杭州火小二科技有限公司 Intelligent cash registering method based on service plate identification
CN110826514A (en) * 2019-11-13 2020-02-21 国网青海省电力公司海东供电公司 Construction site violation intelligent identification method based on deep learning
CN110853019A (en) * 2019-11-13 2020-02-28 西安工程大学 Method for detecting and identifying controlled cutter through security check
CN110889425A (en) * 2018-12-29 2020-03-17 研祥智能科技股份有限公司 Target detection method based on deep learning
CN111259808A (en) * 2020-01-17 2020-06-09 北京工业大学 Detection and identification method of traffic identification based on improved SSD algorithm
CN111652836A (en) * 2020-03-19 2020-09-11 天津大学 Multi-scale target detection method based on clustering algorithm and neural network
CN111723614A (en) * 2019-03-20 2020-09-29 北京四维图新科技股份有限公司 Traffic signal lamp identification method and device
CN112022065A (en) * 2020-09-24 2020-12-04 电子科技大学 Method and system for quickly positioning time point of capsule entering duodenum
CN112233071A (en) * 2020-09-28 2021-01-15 国网浙江省电力有限公司杭州供电公司 Multi-granularity hidden danger detection method and system based on power transmission network picture in complex environment
CN112288044A (en) * 2020-12-24 2021-01-29 成都索贝数码科技股份有限公司 News picture attribute identification method of multi-scale residual error network based on tree structure
CN112836571A (en) * 2020-12-18 2021-05-25 华中科技大学 Ship target detection and identification method, system and terminal in remote sensing SAR image
CN112966558A (en) * 2021-02-03 2021-06-15 华设设计集团股份有限公司 Port automatic identification method and system based on optimized SSD target detection model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909021A (en) * 2017-11-07 2018-04-13 浙江师范大学 A kind of guideboard detection method based on single deep layer convolutional neural networks
CN108009526A (en) * 2017-12-25 2018-05-08 西北工业大学 A kind of vehicle identification and detection method based on convolutional neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909021A (en) * 2017-11-07 2018-04-13 浙江师范大学 A kind of guideboard detection method based on single deep layer convolutional neural networks
CN108009526A (en) * 2017-12-25 2018-05-08 西北工业大学 A kind of vehicle identification and detection method based on convolutional neural networks

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
HEE SEOK LEE等: "Simultaneous Traffic Sign Detection and Boundary Estimation Using Convolutional Neural Network", 《IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS》 *
WEI LIU等: "SSD: Single Shot MultiBox Detector", 《EUROPEAN CONFERENCE ON COMPUTER VISION》 *
刘浏: "基于深度学习的摄像机网络行人识别系统研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
宋少雷: "基于单目相机的运动轨迹与目标检测技术的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李珊珊: "基于深度学习的交通场景多目标检测", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
许晏铭: "视觉主导的无人机航拍目标快速检测技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109613006A (en) * 2018-12-22 2019-04-12 中原工学院 A kind of fabric defect detection method based on end-to-end neural network
CN109858349A (en) * 2018-12-25 2019-06-07 五邑大学 A kind of traffic sign recognition method and its device based on improvement YOLO model
CN109858349B (en) * 2018-12-25 2022-11-15 五邑大学 Traffic sign identification method and device based on improved YOLO model
CN110889425A (en) * 2018-12-29 2020-03-17 研祥智能科技股份有限公司 Target detection method based on deep learning
CN109815906A (en) * 2019-01-25 2019-05-28 华中科技大学 Method for traffic sign detection and system based on substep deep learning
CN109815906B (en) * 2019-01-25 2021-04-06 华中科技大学 Traffic sign detection method and system based on step-by-step deep learning
CN109840498A (en) * 2019-01-31 2019-06-04 华南理工大学 A kind of real-time pedestrian detection method and neural network, target detection layer
CN109840498B (en) * 2019-01-31 2020-12-15 华南理工大学 Real-time pedestrian detection method, neural network and target detection layer
CN111723614A (en) * 2019-03-20 2020-09-29 北京四维图新科技股份有限公司 Traffic signal lamp identification method and device
CN110135307A (en) * 2019-04-30 2019-08-16 北京邮电大学 Method for traffic sign detection and device based on attention mechanism
CN110110684A (en) * 2019-05-14 2019-08-09 深圳供电局有限公司 Exotic recognition methods, device and the computer equipment of power transmission line equipment
CN110287806A (en) * 2019-05-30 2019-09-27 华南师范大学 A kind of traffic sign recognition method based on improvement SSD network
CN110188780A (en) * 2019-06-03 2019-08-30 电子科技大学中山学院 Method and device for constructing deep learning model for positioning multi-target feature points
CN110188780B (en) * 2019-06-03 2021-10-08 电子科技大学中山学院 Method and device for constructing deep learning model for positioning multi-target feature points
CN110414417A (en) * 2019-07-25 2019-11-05 电子科技大学 A kind of traffic mark board recognition methods based on multi-level Fusion multi-scale prediction
CN110414417B (en) * 2019-07-25 2022-08-12 电子科技大学 Traffic sign board identification method based on multi-level fusion multi-scale prediction
CN110543879A (en) * 2019-08-20 2019-12-06 高新兴科技集团股份有限公司 SSD target detection method based on SE module and computer storage medium
CN110580505A (en) * 2019-08-29 2019-12-17 杭州火小二科技有限公司 Intelligent cash registering method based on service plate identification
CN110826514A (en) * 2019-11-13 2020-02-21 国网青海省电力公司海东供电公司 Construction site violation intelligent identification method based on deep learning
CN110853019A (en) * 2019-11-13 2020-02-28 西安工程大学 Method for detecting and identifying controlled cutter through security check
CN111259808A (en) * 2020-01-17 2020-06-09 北京工业大学 Detection and identification method of traffic identification based on improved SSD algorithm
CN111652836A (en) * 2020-03-19 2020-09-11 天津大学 Multi-scale target detection method based on clustering algorithm and neural network
CN112022065A (en) * 2020-09-24 2020-12-04 电子科技大学 Method and system for quickly positioning time point of capsule entering duodenum
CN112233071A (en) * 2020-09-28 2021-01-15 国网浙江省电力有限公司杭州供电公司 Multi-granularity hidden danger detection method and system based on power transmission network picture in complex environment
CN112836571A (en) * 2020-12-18 2021-05-25 华中科技大学 Ship target detection and identification method, system and terminal in remote sensing SAR image
CN112288044A (en) * 2020-12-24 2021-01-29 成都索贝数码科技股份有限公司 News picture attribute identification method of multi-scale residual error network based on tree structure
CN112966558A (en) * 2021-02-03 2021-06-15 华设设计集团股份有限公司 Port automatic identification method and system based on optimized SSD target detection model

Similar Documents

Publication Publication Date Title
CN108960198A (en) A kind of road traffic sign detection and recognition methods based on residual error SSD model
CN105160309B (en) Three lanes detection method based on morphological image segmentation and region growing
CN109325418A (en) Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
CN104536009B (en) Above ground structure identification that a kind of laser infrared is compound and air navigation aid
CN109636772A (en) The defect inspection method on the irregular shape intermetallic composite coating surface based on deep learning
CN110321815A (en) A kind of crack on road recognition methods based on deep learning
CN110223302A (en) A kind of naval vessel multi-target detection method extracted based on rotary area
CN102096821B (en) Number plate identification method under strong interference environment on basis of complex network theory
CN109508710A (en) Based on the unmanned vehicle night-environment cognitive method for improving YOLOv3 network
CN108364280A (en) Structural cracks automation describes and width accurately measures method and apparatus
CN108491797A (en) A kind of vehicle image precise search method based on big data
CN108009526A (en) A kind of vehicle identification and detection method based on convolutional neural networks
CN106650812B (en) A kind of urban water-body extracting method of satellite remote-sensing image
CN106845408A (en) A kind of street refuse recognition methods under complex environment
CN111091095B (en) Method for detecting ship target in remote sensing image
CN105825169B (en) A kind of pavement crack recognition methods based on road image
CN114998852A (en) Intelligent detection method for road pavement diseases based on deep learning
CN107507170A (en) A kind of airfield runway crack detection method based on multi-scale image information fusion
CN109726717A (en) A kind of vehicle comprehensive information detection system
CN104463168B (en) A kind of useless house site information automation extraction method of sky based on remote sensing image
CN110378239A (en) A kind of real-time traffic marker detection method based on deep learning
CN107038416A (en) A kind of pedestrian detection method based on bianry image modified HOG features
CN107392929A (en) A kind of intelligent target detection and dimension measurement method based on human vision model
CN110210418A (en) A kind of SAR image Aircraft Targets detection method based on information exchange and transfer learning
CN107480585A (en) Object detection method based on DPM algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181207