CN108256498A

CN108256498A - A kind of non power driven vehicle object detection method based on EdgeBoxes and FastR-CNN

Info

Publication number: CN108256498A
Application number: CN201810103573.5A
Authority: CN
Inventors: 路雪; 刘坤
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2018-02-01
Filing date: 2018-02-01
Publication date: 2018-07-06

Abstract

The present invention proposes a kind of method that can solve non power driven vehicle target detection problems, by the way that the mininet structure C affnet in EdgeBoxes algorithm fusion Fast R CNN is obtained new network structure, with reference to the non power driven vehicle data sample of VOC forms, training is iterated to new network structure and obtains non power driven vehicle target detection model, the classification problem that the target detection problems realification in road traffic scene is bicycle bicycle and electric vehicle evbike.This method is from target suggestion, effectively overcome the changeable difficulty brought of non power driven vehicle appearance using deep learning thought, and can training data driving under adaptively construction feature describe, it is capable of the non power driven vehicle target detected in road traffic scene of efficiently and accurately so that non power driven vehicle target detection more fast facilitates.

Description

A kind of non power driven vehicle object detection method based on EdgeBoxes and FastR-CNN

Technical field

The present invention relates to a kind of object detection method, specifically a kind of non-machine based on EdgeBoxes and Fast R-CNN Motor-car object detection method.

Background technology

The object detection and recognition of road traffic image hands over rule for improving friendship method, ensures that safety etc. has important meaning Justice.Especially in recent years, bicycle, the fast development for taking out industry are shared, the non power driven vehicles such as bicycle increase sharply on road, road Traffic safety problem is also increasingly severeer, since this kind of non power driven vehicle target is small, mobile flexible, in addition currently to such non-machine The research of motor-car target detection is relatively fewer, and traffic monitoring faces huge challenge.Therefore, based on non power driven vehicle target signature Information detects in actual traffic image and positions non power driven vehicle target, perfect to traffic accident fix duty, traffic video etc. to have Important booster action.

Non power driven vehicle target in road traffic image can generate due to illumination, visual angle and driver are blocked when variation How variation, ensure good detection result, traditional object detection method has done basis under the premise of existing for many variations Property attempt, but still there are detection efficiency is low and the problems such as poor reliability.Such as HOG, SIFT traditional machine learning method leads to Extraction target signature is crossed, and the feature extracted is inputted into the graders such as support vector machines, iterator AdaBoost and is divided Class identifies：But entirely characteristic extraction procedure is sufficiently complex, and substantially these features for being characterized in hand-designed, needs to be directed to Different image scenes selects suitable feature, is unfavorable for practical engineering application, generalization ability is poor.It is emerging with deep learning It rises, the method for some deep learnings is applied to object detection field quickly, wherein the most prominent with depth convolutional neural networks CNN Go out.Pioneers of the R-CNN in 2013 as deep learning target detection application field carries out conventional machines study and deep learning Innovative combination, then occurs as the optimization of the target detections network such as SPP-NET, Fast R-CNN are incorporated within 2015 The advantages of R-CNN and SPP-Net so that under changeable road traffic image, quickly and accurately detect non power driven vehicle Target is possibly realized.But the detection result of Fast R-CNN, it is largely dependent upon sample image target and suggests Object The extraction quantity of Proposals, the extraction of a large amount of OP take time and effort, and aggravate the burden of model training, to network and hardware It is more demanding.

Invention content

To solve the above-mentioned problems, the present invention proposes a kind of target merged based on EdgeBoxes and Fast R-CNN Detection method, and applied in the target detection of non power driven vehicle.For the target proposed extraction stage, EdgeBoxes algorithms It can ensure when extracting lesser amt, there is relatively high average recall rate, by controlling the extraction quantity of OP, reduce detection The computation complexity of process.Core CNN in deep learning algorithm Fast R-CNN has geometric transformation, deformation, illumination etc. Invariance to a certain degree effectively overcomes the changeable difficulty brought of non power driven vehicle appearance, and can be under training data driving certainly Adaptively construction feature describes.

By the mininet structure C affnet in EdgeBoxes algorithm fusion Fast R-CNN, with reference to the non-of VOC forms Motor vehicles data sample asks classification of the target detection problems realification in road traffic scene for bicycle and electric vehicle Topic.Suitable area-of-interest is built using the target suggestion of EdgeBoxes algorithms extraction sample and sample inputs network together Training is iterated, obtains non power driven vehicle target detection model, new samples test is carried out to model and analyzes test effect, with Standby subsequent model generalization process uses.The fusion of EdgeBoxes and Fast R-CNN is so that the target detection of non power driven vehicle With higher flexibility and generalization ability.Specifically include following steps：

Step 1 defines network structure, selectes the mininet structure C affenet networks in Fast R-CNN algorithms：

Step 1-1, the setting Caffenet network cores number of plies is 11 layers, specifically includes 1 11 × 11 convolutional layer, 15 × 5 convolutional layers, 33 × 3 convolutional layers, 1 area-of-interest pond layer, 2 full articulamentums, 3 pond layers；

Step 1-2, Caffenet networks output layer is multitask loss function, and specific there are two output layers, and first defeated Go out layer and probability Ps of each RoI in 2 classes, i.e. target classification are calculated as a result, second defeated by the recurrence of softmax algorithms Go out the coordinate value that layer is responsible for calculating 2 class target detection blocks, i.e. detection block coordinate result；

Step 2 defines goal task, acquires target sample figure and data set makes：

Step 2-1, it is non power driven vehicle, i.e. bicycle and electric vehicle to define goal task；

Step 2-2, data set makes, and prepares the sample image of appropriate goal task, to ensure feature adequacy, sample graph As the order of magnitude is set as 10³-10⁴, any numerical value M in the range of this can be as sample size, bicycle b and electric vehicle e distribution Ratio is b:E, and b ≈ e, M=b+e；

Step 2-3, the sample image for obtaining step 2-2 carries out VOC formattings, obtains VOC data sets；

Step 2-4, the sample graph in the VOC data sets for obtaining step 2-3 carries out target mark, then uses EdgeBoxes algorithms carry out the sample image after mark target and suggest OP extractions, using the extraction strategy of sliding window, by with Lower parameter controls：α command deployment step-lengths, β control IoU threshold values, δ command deployment accuracy, minScore control minimum score The maximum number magnitude of extraction OP is controlled with maxBoxes, sample OP information is obtained with this, carries out obtaining sample after centainly calculating Area-of-interest RoIs；

Step 2-5, the VOC data sets for obtaining step 2-3, according to training>Test>The ratio of verification, and each ratio Division principle of the sum of the item less than or equal to 10 is divided, and obtains training set, test set and verification collection；

Step 3, model training set pre-training parameter：

Step 3-1, initiation parameter setting is carried out to the Caffnet network structures defined in step 1, utilizes Caffenet The layer parameter before the layer of training parameter initialization pre-training network RoI ponds under ImageNet, setting learning rate are 0.001, trained batchsize are 2, iterations I, order of magnitude 10^4, carry out network training；

Step 3-2, goal task is bicycle and electric vehicle, i.e. K=2, then adds a background classes, then input data layer class Not Shuo num_class=2+1=3, similary RoI ponds layer data classification number num_class=3, branch of classifying to obtain in output layer Divided data classification number num_class=3, bounding box return partial data classification number num_class=3 × 4=12；Step 4, mould Type training, pre-training network：

Step 4-1, it will be instructed on training dataset that the Caffnet network structures defined in step 1 prepare in step 2-5 Practice, sample includes dynamic bicycle and electric vehicle, and training parameter is set by step 3-1；

Step 4-2, the output process of training a kind of type of multitask loss function to the RoI of each label and detection Frame coordinate carries out recurrence calculating, and computational methods are：

L(p,u,t^u, v) and=L_cls(p,u)+λ[u≥1]L_loc(t^u,v) (1)

L_cls(p, u)=logP_u (2)

Step 4-3, to hyper parameter λ=1 in step 4-2 setting formula (1), and bicycle class u=1, electric vehicle class u are marked =2, L_clsIt is type probability P_uLogarithm loss, wherein P_uIt is calculated with softmax, the output vector for enabling this layer is a_i= (a₁,a₂,......,a_n), according to softmax computational methods

Then

P_u=maxP_i (4)

Step 4-4, another task loss L of output_loc, it is detection block loss, passes through true u classes detection block coordinate Value v=(v_x,v_y,v_w,v_h) and u classes prediction coordinate valueIt obtains, since background classes are without former label Frame can ignore the L of background classes_loc, detection block is returned and is lost, uses loss function

Wherein, when regressive object non-boundary, traditional loss function susceptibility in training is very high, needs very carefully Ground regularized learning algorithm rate with prevent explosion gradient, andEliminate this sensibility：

Step 5, repetitive exercise obtain target detection model：

Step 5-1, repetitive exercise process carries out backpropagation, x by RoI ponds layer_i(x_i∈ R) it is i-th of activation, it is defeated Enter to RoI ponds layer, enable y_rjIt is output, i.e., the jth layer of r-th RoI exports, and RoI ponds layer calculates y_rj=x_i*(r,j), wherein i* (r, j)=argmax_i'∈R(i,j)x_i', i.e., so that x_i'Maximum i, R (i, j) is output unit y_rjIn child window where maximum pondization The indexed set of input, single x_iSeveral different output y can be assigned to_rj, RoI ponds layer is by calculating relative to each defeated Enter variable x_iLoss function partial derivative carry out backpropagation：

Step 5-2, repetitive exercise process introduces fine tuning strategy, and the training parameter of continuous trim step 3-1 settings is so handed over For network training is carried out, until network convergence, obtains target detection model at this time；

Step 6, model measurement, the target detection model that the test set image input step 5-2 that step 2-5 is obtained is obtained It is tested, is broadly divided into individual test and integrated testability two parts：

Step 6-1, the sample image income single image detection test in selected test set, and shown on image pattern Detection block, to ensure the practical application and accuracy rate of detection, setting detection type probability P range is 0.5≤P≤1 during test Numerical value；

Step 6-2, that step 2-5 is obtained the target detection model that whole test data set input step 5-2 obtain is whole Target detection model is verified in test；

Preferably, sample values M is 4096 in step 2-2, and bicycle b is 2262, and electric vehicle e is 1834.

Preferably, according to training in step 2-5：Test：Verification=5：2：1 ratio cut partition sample M, obtains training set 2560, verification collection 1024,512 Zhang San part of test set.Preferably, iterations I is set as 40000 times in step 3-1 Carry out network repetitive exercise.Preferably, detection probability P is set as 0.7 in step 6-1, i.e., a display target probability is more than or equal to 0.7 target.

The present invention merges EdgeBoxes algorithms and deep learning method Fast R-CNN, obtains new network mould Type structure, and it is applied to the target detection process of non power driven vehicle, it has obtained melting based on EdgeBoxes and Fast R-CNN The non power driven vehicle object detection method of conjunction.Compared with conventional target detection method, the present invention had both maintained deep learning calculation The advantage of method possesses higher Detection accuracy；Meanwhile using the extraction quantity of EdgeBoxes algorithms control OP, reduce net The complexity of network training improves the detection efficiency of target detection；And the method after fusion is applied to non power driven vehicle Blank of the traffic monitoring in terms of non power driven vehicle has been filled up in field, is improved for road traffic supervision video and is provided new side Method.

Description of the drawings

Fig. 1 is the schematic network structure that Caffenet of the present invention merges EdgeBoxes algorithms, wherein the network core number of plies It is 11 layers, specifically includes 1 11 × 11 convolutional layer, 15 × 5 convolutional layer, 33 × 3 convolutional layers, 1 area-of-interest pond Layer, 2 full articulamentums, 3 pond layers carry out target proposed extraction using EdgeBoxes algorithms to the sample image after mark, It is fused to the area-of-interest pond layer of Caffenet networks later, network output layer is multitask loss function, specifically there is two A output layer is each responsible for output target classification result and target detection frame coordinate result；Fig. 2 be the present invention is based on The non power driven vehicle object detection method flow chart of EdgeBoxes and Fast R-CNN fusions, detailed process step is by having Body embodiment provides.

Specific embodiment

In order to deepen the understanding of the present invention, the present invention is described in further detail with reference to example.But this Invention can be realized with different goal tasks, however it is not limited to example described herein.

The present invention is a kind of object detection method, as shown in Figure 2 based on the non-of EdgeBoxes and Fast R-CNN fusions Motor vehicles object detection method flow chart, mainly includes the following steps that：Step 1 defines network structure, selectes Fast R-CNN Mininet structure C affenet networks in algorithm：

Step 2 defines goal task, acquires target sample figure and data set makes：

Step 3, model training set pre-training parameter：

L(p,u,t^u, v) and=L_cls(p,u)+λ[u≥1]L_loc(t^u,v) (1)

L_cls(p, u)=logP_u (2)

Then

P_u=maxP_i (4)

Step 5, repetitive exercise obtain target detection model：

Target suggestion is extracted with EdgeBoxes algorithms in step 2-4, and from the aspect of calculation amount, suitable control target It is recommended that quantity, data set manufacturing process, the sample graph in data set carries out EdgeBoxes pretreatments, obtains OP information, first Sample image after being marked using EdgeBoxes algorithms to target carries out OP extractions, using the extraction strategy of sliding window, by with Lower parameter carrys out command deployment as a result, α command deployment step-lengths, β control IoU threshold values, δ command deployment accuracy, minScore controls The maximum number magnitude of minimum score and maxBoxes control extractions OP.

The training process EdgeBoxes algorithms of step 4 and Fast R-CNN algorithm fusions, and Caffnet is selected to obtain newly Network structure, the appropriate OP that will be extracted obtains RoIs after centainly calculating, then by sample image input network extraction Convolution characteristic pattern and RoIs are then inputted the RoI ponds layer of network structure by convolution characteristic pattern together.

Algorithm after fusion is applied to non power driven vehicle object detection field by entire model training and test process, from Row acquisition target sample, makes data set and is trained, tests, and introduces fine tuning strategy, constantly fine tuning instruction in training process Practice parameter, so alternately network training, until network convergence, obtains being easy to extensive non power driven vehicle target detection mould Type.

The present invention can be with the non power driven vehicle target detection problems in solving road traffic scene, by acquiring non-motor vehicle Sample image from control sample image target suggested quantity, changes to target image using the algorithm of deep learning Generation training, obtains non power driven vehicle target detection model, can solve the non power driven vehicle target under Ordinary Rd traffic scene Test problems.This method, which is based on target suggestion, can preferably control validity feature data volume, effective using deep learning algorithm Overcome the changeable difficulty brought of non power driven vehicle appearance, and can adaptively construction feature describes under training data driving, Accurate and quickly classification can be carried out to target signature under new samples test.And this method can be appointed by defining target The detection model of business training plurality of target, generalization ability is strong, is more suitable for actual requirement of engineering.Proposed by the present invention is a kind of base In the non power driven vehicle object detection method of EdgeBoxes and Fast R-CNN, this method, which can be applied to, many to be needed to carry out The field of target detection.

Claims

1. a kind of non power driven vehicle object detection method based on EdgeBoxes and Fast R-CNN, it is characterised in that：Including with Lower step：

Step 1-1, the setting Caffenet network cores number of plies is 11 layers, specifically includes 1 11 × 11 convolutional layer, 1 volume 5 × 5 Lamination, 33 × 3 convolutional layers, 1 area-of-interest pond layer, 2 full articulamentums, 3 pond layers；

Step 1-2, Caffenet networks output layer is multitask loss function, and specific there are two output layer, first output layers Probability Ps of each RoI in 2 classes, i.e. target classification are calculated as a result, second output layer by the recurrence of softmax algorithms It is responsible for calculating the coordinate value of 2 class target detection blocks, i.e. detection block coordinate result；

Step 2 defines goal task, acquires target sample figure and data set makes：

Step 2-2, data set makes, and prepares the sample image of appropriate goal task, to ensure feature adequacy, sample image number Magnitude is set as 10³-10⁴, should in the range of any numerical value M can all be used as sample size, bicycle b and electric vehicle e allocation proportions For b:E, and b ≈ e, M=b+e；

Step 2-4, the sample graph in the VOC data sets for obtaining step 2-3 carries out target mark, then using EdgeBoxes Algorithm carries out the sample image after mark target and suggests OP extractions, using the extraction strategy of sliding window, by following parameter Lai Control：α command deployment step-lengths, β control IoU threshold values, δ command deployment accuracy, minScore control minimum score and The maximum number magnitude of maxBoxes control extractions OP, sample OP information is obtained with this, carries out obtaining the sense of sample after centainly calculating Interest region RoIs；

Step 2-5, the VOC data sets for obtaining step 2-3, according to training>Test>The ratio of verification, and each proportional it It is divided with the division principle less than or equal to 10, obtains training set, test set and verification collection；

Step 3, model training set pre-training parameter：

Step 3-1, initiation parameter setting is carried out to the Caffnet network structures defined in step 1, is existed using Caffenet The layer parameter before the layer of training parameter initialization pre-training network RoI ponds under ImageNet, setting learning rate is 0.001, Trained batchsize is 2, iterations I, order of magnitude 10^4, carries out network training；

Step 3-2, goal task is bicycle and electric vehicle, i.e. K=2, then adds a background classes, then input data layer classification number Num_class=2+1=3, similary RoI ponds layer data classification number num_class=3, score portion number of classifying in output layer According to classification number num_class=3, bounding box returns partial data classification number num_class=3 × 4=12；

Step 4, model training, pre-training network：

Step 4-1, training, sample on the training dataset for preparing the Caffnet network structures defined in step 1 in step 2-5 This includes dynamic bicycle and electric vehicle, and training parameter is set by step 3-1；

Step 4-2, the output process of training sits the type and detection block of the RoI of each label with a kind of multitask loss function Mark carries out recurrence calculating, and computational methods are：

L(p,u,t^u, v) and=L_cls(p,u)+λ[u≥1]L_loc(t^u,v) (1)

L_cls(p, u)=logP_u (2)

Step 4-3, to hyper parameter λ=1 in step 4-2 setting formula (1), and bicycle class u=1, electric vehicle class u=2 are marked, L_clsIt is type probability P_uLogarithm loss, wherein P_uIt is calculated with softmax, the output vector for enabling this layer is a_i=(a₁, a₂,......,a_n), according to softmax computational methods

Then

P_u=maxP_i (4)

Step 4-4, another task loss L of output_loc, it is detection block loss, passes through true u classes detection block coordinate value v= (v_x,v_y,v_w,v_h) and u classes prediction coordinate valueIt obtains, it, can since background classes are without former indicia framing To ignore the L of background classes_loc, detection block is returned and is lost, uses loss function

Wherein, when regressive object non-boundary, traditional loss function susceptibility in training is very high, needs carefully to adjust very much Whole learning rate with prevent explosion gradient, andEliminate this sensibility：

Step 5, repetitive exercise obtain target detection model：

Step 5-1, repetitive exercise process carries out backpropagation, x by RoI ponds layer_i(x_i∈ R) it is i-th of activation, it is input to RoI ponds layer, enables y_rjIt is output, i.e., the jth layer of r-th RoI exports, and RoI ponds layer calculatesWherein i* (r, J)=argmax_i'∈R(i,j)x_i', i.e., so that x_i'Maximum i, R (i, j) is output unit y_rjIt is defeated in child window where maximum pondization The indexed set entered, single x_iSeveral different output y can be assigned to_rj, RoI ponds layer is by calculating relative to each input Variable x_iLoss function partial derivative carry out backpropagation：

Step 5-2, repetitive exercise process introduces fine tuning strategy, the training parameter of continuous trim step 3-1 setting, so alternately into Row network training, until network convergence, obtains target detection model at this time；

Step 6, model measurement, the target detection model that the test set image input step 5-2 that step 2-5 is obtained is obtained carry out Test is broadly divided into individual test and integrated testability two parts：

Step 6-1, the sample image income single image detection test in selected test set, and detection is shown on image pattern Frame, to ensure the practical application and accuracy rate of detection, setting detection type probability P range is the numerical value of 0.5≤P≤1 during test；

Step 6-2, step 2-5 is obtained into the target detection model integrated testability that whole test data set input step 5-2 obtain, Verify target detection model.

2. a kind of non power driven vehicle object detection method based on EdgeBoxes and Fast R-CNN as described in claim 1, It is characterized in that：Sample values M is 4096 in step 2-2, and bicycle b is 2262.

3. a kind of non power driven vehicle object detection method based on EdgeBoxes and Fast R-CNN as described in claim 1, It is characterized in that：According to training in step 2-5：Test：Verification=5：2：1 ratio cut partition sample number M, obtains training set 2560 It opens, verification collection 1024,512 Zhang San part of test set.

4. a kind of non power driven vehicle object detection method based on EdgeBoxes and Fast R-CNN as described in claim 1, It is characterized in that：Iterations I is set as 40000 in step 3-1.

5. a kind of non power driven vehicle object detection method based on EdgeBoxes and Fast R-CNN as described in claim 1, It is characterized in that：Detection probability P is set as 0.7 in step 6-1.