CN106250812B

CN106250812B - A kind of model recognizing method based on quick R-CNN deep neural network

Info

Publication number: CN106250812B
Application number: CN201610563184.1A
Authority: CN
Inventors: 汤一平
Original assignee: Individual
Current assignee: Hangzhou Yixun Technology Service Co ltd
Priority date: 2016-07-15
Filing date: 2016-07-15
Publication date: 2019-08-20
Anticipated expiration: 2036-07-15
Also published as: CN106250812A

Abstract

The present invention discloses a kind of model recognizing method based on quick R-CNN deep neural network, mainly includes unsupervised deep learning, the CNN convolutional neural networks of multilayer, region suggestion network, network share, softmax classifier；Realize the frame of one quick R-CNN network implementations of use a vehicle detection and identification end to end truly, and there is the Morphological Diversity for being suitable for vehicle target, illumination variation diversity, under the environment such as background diversity quickly, the vehicle subclass identification of high-precision and robustness.

Description

A kind of model recognizing method based on quick R-CNN deep neural network

Technical field

The present invention relates to computer technology, pattern-recognition, artificial intelligence, applied mathematics and biological vision technologies in intelligence The application of field of traffic more particularly to a kind of model recognizing method based on quick R-CNN deep neural network.

Background technique

Core function in intelligent transportation system is to the accurate detection of vehicular traffic and correct vehicle cab recognition.It is current right Mainly there are two important technologies for the research of vehicle detection sorting technique: i.e. automatic vehicle identification and automobile automatic recognition.

Automatic vehicle identification is that progress is mutually known using mobile unit and ground base station equipment, which is mainly used for charge system Wider in some technology developed countries use scope in system, AE-PASS system, the ETC system of Japan such as the U.S., the whole world is defended Star GPS positioning etc..

Automobile automatic recognition is the parameter by detection vehicle inherently, with appropriate under certain vehicle classification standard Classification and identification algorithm, parting initiatively is carried out to vehicle, than wide, oneself is this kind of technical applications through have many maturations System is applied in real life, such technology can be known automatically by modes such as frequency microwave, feux rouges, laser, surface acoustic waves The mode of video image processing can be used also to identify the information of vehicles such as license plate, vehicle in other information of vehicles.

Automobile automatic recognition comparative maturity technology has Data mining, swashs for infrared detection, ultrasonic wave/microwave inspection Survey, geomagnetism detecting etc., but these types of method respectively has superiority and inferiority, and advantage is that identification is accurate relatively high, but disadvantage is it is also obvious that main lack Point has construction and installation process sufficiently complex, influences normal traffic order, and difficult in maintenance, capital equipment is easy to damage, spends larger Deng.

In recent years, video detection technology oneself become the most important information collection means of intelligent transportation field, synthetical comparison and assessment, will Video detection technology, which is applied to highway and urban road, has great practical application value, is based on video vehicle cab recognition system System, the level of the information collection of General Promotion urban road and safety management can play increasingly in intelligent transportation system Important role.

The visual identity of vehicle, lot of domestic and international scholar have carried out correlative study.Paper " Robert T.Collins, Alan J.Lipton,Hironobu Fujiyoshi,and Takeo Kanade.Algorith rns For Cooperative multisensor surveillanee.In Proceedings of the IEEE " discloses a road The detection, tracking, identifying system for moving up moving-target identify that moving target is people, crowd, vehicle with the neural network trained Or it interferes, the input characteristics amount of network has dispersibility measurement, target sizes target surface size and the video camera of target to monitor The relative value of area size.Vehicle further divides into different type and color.Paper " TieniuN.Tan and Keith D.Baker Efficient image gradient based vehicle localization.IEEE Transaction On Iimage Proeessing, 2000,9 (8): 1343-1356. " describes a kind of vehicle location and knows method for distinguishing, one In a wicket, this method is carried out according to image gradient.Using surface constraints and most of contour of the vehicle by two straight lines about The posture of vehicle can be obtained in the fact that beam.Paper " George Fung, NelsonYung, and Granthajm Pang.Vehicle shape approximation from motion for visual traffic Surveillance.In Proc.IEEE Conf.Intelligent Transport System, 2001,608-613. " is used High-precision video camera observes the movement of vehicle to estimate vehicle shape, by estimating that characteristic point obtains vehicle's contour.Basic thought It is that the movement speed of high characteristic point is greater than the movement speed of low characteristic point, because high characteristic point is close from video camera, vehicle's contour can With with vehicle identification.Paper " Ferryman, A.Worral, G.Sullivan, K.Baker, A generic deformable model for Vehicle recognition,Proeeeding of British Machine Vision Conference, 1995,127-136. " proposes the deformable three-dimensional template of a parametrization, which, which passes through, develops, it is said that It can be applied to various vehicle identifications.Paper " Larry Davis, V.Philomin and R.Duralswami.Tracking humans from a moving platform.In Proc.Intematl on Conference on Pattem Recognition, 2000. " study vehicle identification with deforming template, firstly, establishing the side view of target vehicle vehicle head part And the deforming template of front view.By histogram intersection, the RGB histogram of vehicle must also compare, suitable vehicle template The point set on side is compared also by the Hausdorff distance between point set with other car modals.Above-mentioned technology substantially also needs Feature extraction is manually completed, bigger problem is: 1) be influenced too big, all kinds of detection algorithm requirements by specific application environment Condition it is too harsh；2) vehicle class is various but difference is little, without apparent distinguishing characteristics；3) it is influenced by visible change Greatly, the automobile characteristic difference taken the photograph from different perspectives is big；4) too big by natural environment influence, especially illumination effect is serious Illumination reflection is so that vehicle wheel profile fuzzy, color deviation, variation are too big, it is difficult to recognize；5) shape of automobile updates too fast, Changing features are too fast, so that algorithm adaptability is poor.The country remains in research state in most of vehicle cab recognition technical aspect, Such as Chinese Academy of Sciences, Xi'an Highway house, Shanghai Communications University, Xi'an Communications University, some research achievements of Sichuan University.It is closed Key problem is that mankind itself limit the knowledge of vehicle cab recognition process.

It needs to have scale, rotation and certain angular transformation, illumination variation for carrying out the feature of vehicle classification Good robustness, the computer vision technique in preceding deep learning epoch are typically all that labor costs' plenty of time and energy go to set Count suitable feature.In order to allow computer to automatically select suitable characteristics, artificial neural network just comes into being, early in 1999 Year just has external researcher to classify using neural network to object, includes the side such as fuzzy neural network and BP network Method；But since there are problems for its performance, the contradiction as being difficult to solve complexity present in pattern-recognition and generalization, mind Although there is powerful modeling ability through network, but classify to such as vehicle large-scale image, huge parameter space makes Excellent optimized initial value more difficulty etc. is found, therefore has been treated coldly for a long time by people, until the proposition ability of deep learning Become research hotspot again.

In image classification field, to the image classifications of substantial amounts, there are mainly two types of methods: one kind is extracts every photo Local feature, by the feature of extraction carry out cluster and coding obtain a high dimension vector, then it is classified with classifier.Its The method of middle coding has vision bag of words coding, sparse coding and Fei Sheer vector coding etc., from the point of view of current result of study The performance of Fei Sheer vector coding will be got well compared with other several coding modes.The very wide image classification method of another kind application is depth Neural network, deep learning are a new hot spots in neural network research, and its object is to pass through non-supervisory pre-training Excellent initial parameter value is provided for neural network, by way of greedy, training in layer classifies to large-scale image Obtain extraordinary effect.

The concept of deep learning start to attract much attention be before and after about 2006, paper " Hinton, G.E.and R.R.Salakhutdinov,Reducing the dimensionality of data with neural Networks.Science, 2006.313 (5786): the feedforward neural network that 504-507 " proposes a kind of multilayer can be layer-by-layer Efficient training early period is done, each layer is trained using unsupervised restricted Boltzmann machine, finally has prison in utilization The back-propagating superintended and directed is finely tuned, and provides a kind of new side to solve complexity present in pattern-recognition and the contradiction of generalization Method and thinking have thus pulled open the computer vision technique prelude in deep learning epoch.

Convolutional neural networks, i.e. CNN are one kind of deep learning algorithm, are that the mode in special disposal image domains is known It not, while being also the algorithm that achievement is most surprising in current image steganalysis.Convolutional neural networks algorithm is advantageous in that training It is not needed when model using any manual features, algorithm can explore the feature that image implies automatically.

Application No. is 201610019285.2 Chinese patent applications to disclose a kind of model recognizing method and system, including Machine training generates classifier process and treats the differentiation process of mapping piece, generates classifier during, based on license plate It determines required image range in training set picture, the image range having determined is divided into region, is selected in each region Characteristic information investment machine training all in selected respective region are generated point for corresponding each region by characteristic information respectively Class device treats mapping piece by the classifier generated and carries out single area judging, according to single area judging result using more Region confidence fusion judgement obtains vehicle cab recognition result.The invention carries out vehicle cab recognition using random deep woods classifier, maximum The problem of be the absence of unsupervised learning process.

Application No. is 201510639752.7 Chinese patent applications to disclose a kind of model recognizing method, the method packet It includes: obtaining picture to be detected；The picture to be detected is detected using the first default classifier；If the picture to be detected In contain target vehicle, extract the target vehicle in the picture to be detected；Registration process is carried out to the target vehicle, so that Angle between the headstock direction of the target vehicle and the vertical direction of the target area is less than preset threshold；To described right Target vehicle after neat processing carries out feature extraction, and to obtain M feature, the M is the integer greater than 1；It is pre- using second If classifier classifies to the M feature；The vehicle of the target vehicle is determined according to the result of the classification.From certain It is said in meaning, which still falls within shallow neural network, is architecturally just difficult to realize unsupervised learning process.

Application No. is 201510071919.4 Chinese patent applications to propose a kind of vehicle based on convolutional neural networks Recognition methods is based on characteristic extracting module and vehicle cab recognition module, comprising the following steps: by design convolution and pond layer, entirely Articulamentum, classifier construct the neural network of vehicle cab recognition, and wherein convolution and pond layer and full articulamentum are for extracting vehicle Feature, classifier is for vehicle classification identification；Utilize the neural network of the database training comprising different automobile types feature, training side Formula is the study for having supervision that the data of tape label carry out, and carries out weight parameter matrix and offset with stochastic gradient descent method Adjustment；The weight parameter matrix and offset in trained each layer are obtained, they are accordingly assigned to the neural network In each layer, then the network has the function of vehicle feature extraction and identification.The invention lacks more detailed realization details, Only conceptually propose the data that the model recognizing method based on convolutional neural networks, especially training method are tape label The study for having supervision carried out.The shape image volume of vehicle belongs to mass data, and to be labeled to these image datas is one A extremely difficult thing；Furthermore the shape update of vehicle causes changing features fast very much fastly, so causing some in the invention Algorithm is difficult to meet practical application.In addition, the practical collected vehicle image in road surface is complicated；Include background complexity, vehicle it Between the interference such as block；If without processing is split to the image actually obtained, it will seriously affect final recognition result.

Application No. is 201510738852.5 and 201510738843.6 Chinese patent applications to propose a kind of be based on deeply The model recognizing method of Du Feisheer network constructs the 0th layer of Fei Sheer network first, to the database for having K kind vehicle image, Extract the SIFT feature of every kind of vehicle vehicle image；Then the 1st layer of Fei Sheer network is constructed, to the SIFT feature extracted Fei Sheer vector coding is carried out, the vector after coding is stacked in space, then carry out L2 normalization and PCA dimensionality reduction；1st layer is obtained The character representation arrived carries out Fei Sheer vector coding, by symbol square root and L2 normalized, forms Fei Sheer network 2nd layer；The global characteristics expression that different automobile types image obtains finally is used for linear SVM training, is obtained with K kind The identifying system of vehicle classification；It to vehicle to be identified, makes it through Fei Sheer network and obtains testing feature vector, import identification system System may recognize that vehicle vehicle to be identified.Also there are two deficiency, one is the absence of deep learning process for the invention, the second is lacking It is few that vehicles segmentation processing step is carried out to image.

Application No. is 201510738540.4 Chinese patent applications to propose one kind based on local feature Aggregation Descriptor Model recognizing method, first extraction model data library in vehicle image SIFT feature；Then to all vehicle images SIFT feature carries out Kmeans cluster, forms K cluster centre, obtains the dictionary with K vision word；Then for every Each SIFT feature is assigned to nearest vision word by vehicle image；Count SIFT feature around each vision word to The residual error cumulant of amount and Current vision word, obtains the local feature Aggregation Descriptor of current vehicle picture；Finally, will train The local feature Aggregation Descriptor of the n of module vehicle images obtains the n class vehicle class that can be indexed by quantization encoding Other coded image library；And to test vehicle image, its local feature Aggregation Descriptor is equally extracted, as query vector, is led Enter image library to be indexed, be matched by approximate KNN searching method, identifies test vehicle vehicle.The same invention Also there are two defect, one is the absence of deep learning, carries out vehicles segmentation processing step to image secondly being the absence of.

Convolutional neural networks are upper or relatively more successful in the identification of truck, buggy, bus, minibus, SUV and car , but classify in subclass, the remote precision less than major class classification if the precision in the different automobile types of identification vehicle.Usually The difficulty of subclass image recognition is generally placed at two o'clock:

(1) acquisition of subclass image labeling data is very difficult, it usually needs the expert of related fields is labeled.

(2) there are difference in big class for subclass image recognition, such as in vehicle cab recognition, the viewing angle of Che Butong, and Difference between small class.

In conclusion using convolutional neural networks even depth nerual network technique to vehicle cab recognition, still have at present as Several lower stubborn problems: 1) general image of tested vehicle how from complicated background is accurately partitioned into；2) how to the greatest extent The characteristic of vehicle vehicle may be accurately obtained using few label image data；3) how to know in vehicle vehicle major class Also can recognize that it is which kind of, which kind of color, which vehicle of age out on the basis of not；4) how to be obtained automatically by deep learning The feature for a vehicle of picking up the car；5) accuracy of identification and detection efficiency how have been taken into account, while having reduced trained and learning time as far as possible； 6) how in classifier design, which is the classificating requirement for being able to satisfy vehicle vehicle subclass, and can be in the shape of automobile It no longer needs to be trained study to whole network again after update；7) one CNN of use truly how is designed The frame of network implementations vehicle detection and identification end to end；8) influence for how reducing weather condition, increases the adaptive of system Ying Xing.

Summary of the invention

In order to overcome, automation and intelligent level in existing vehicle vehicle Visual identification technology are low, lack depth It practises, be difficult to adapting to ambient weather variation, be difficult to accurately extract vehicle general image for identification, be difficult to use visual manner Identification classification is carried out to vehicle vehicle subclass, is difficult to the deficiencies of taking into account accuracy of identification and time and detection efficiency, the present invention provides A kind of model recognizing method based on quick R-CNN deep neural network, can effectively improve vehicle visual identity automation and Intelligent level can preferably adapt to ambient weather variation and have extensive adaptivity, can guarantee in preferable detection identification essence There is real-time detection recognition capability on the basis of degree, the dependence to label vehicle data can be greatly reduced with automatic study and extract vehicle The contradiction of the ability of type feature, the complexity that can preferably solve vehicle cab recognition and generalization has preferable universality.

It realizes foregoing invention content, it is necessary to solve several key problems: (1) designing a kind of quick view of Vehicle Object Feel partitioning algorithm；(2) a kind of deep learning method is researched and developed, realizes unsupervised vehicle feature extraction；(3) one kind is designed to be suitable for The classifier of thousands of kinds of type subclass, and there is scalability；(4) one quick R- of use truly is designed The frame of CNN network implementations vehicle detection and identification end to end.

The technical solution adopted by the present invention to solve the technical problems is:

A kind of model recognizing method based on quick R-CNN deep neural network, including one be used for deep learning and instruction Practice the VGG network of identification, network is suggested in a region for extracting area-of-interest and one for vehicle classification Softmax classifier；

The VGG network, including 8 Ge Juan bases, 3 full articulamentums amount to 11 layers；There are 5 groups in 8 Ge Juan bases Convolutional layer, 2 classification layers extract characteristics of image, 1 classification layer characteristic of division；3 full articulamentums are separately connected classification layer 6, divide Class layer 7 and classification layer 8；

Network is suggested in the region, and the module of Classification Loss is calculated including 1 layer of classifying, 1 window recurrence layer, 1 The module of loss, p interested Suggestion box of output are returned with 1 calculation window；

The Softmax classifier, by the input data feature extracted and learning training obtain feature database data into Row compares, and calculates the probability of each classification results, the highest result of probability is then taken to be exported；

Quick R-CNN deep neural network has accessed the region at the 5th layer of end of the VGG network and has suggested Network, so that first 5 layers of low-level image feature extraction process and result of VGG network described in network share is suggested in the region；

The the 6th and the 7th layer of the VGG network suggests p interested suggestions of network output according to the region Characteristics of image in frame carries out convolution sum ReLU processing, obtains the p characteristic patterns containing 4096 vectors, then gives classification respectively Layer and window return layer and are handled, and realize the segmentation of vehicle image；On the other hand, the Softmax classifier contains p There is the characteristic pattern of 4096 vectors to carry out Classification and Identification, obtains the classification results of vehicle vehicle.

The Softmax classifier, using the learning outcome in quick R-CNN as softmax during learning training The input data of classifier；It is that the Logistic towards multicategory classification problem is returned that Softmax, which is returned, is that Logistic is returned General type, be suitable for classification between mutual exclusion the case where；Assuming that for training set { (x⁽¹⁾,y⁽¹⁾,…,x^(m),y^(m)), there is y⁽¹⁾ ∈ { 1,2 ..., k } inputs x for given sample, exports the vector of k dimension to indicate what each classification results occurred Probability is p (y=i | x), it is assumed that function h (x) is as follows:

θ₁,θ₂,…θ_kIt is the parameter of model, and all probability and be 1；Cost function after regularization term is added are as follows:

Partial derivative of the cost function to first of parameter of j-th of classification are as follows:

Finally, realizing that the classification of softmax returns by minimizing J (θ), classification regression result is saved in feature database In；

When identifying classification, the input data feature extracted is obtained into feature database data with learning training and is compared, The probability of each classification results is calculated, the highest result of probability is then taken to be exported.

Network is suggested in the region, for formation zone Suggestion box, is accessed at the 5th layer of end of the VGG network Network is suggested in the region, i.e., slides small network on the convolution Feature Mapping figure that the 5th layer of convolutional layer exports, this A network is connected to entirely in the spatial window of n × n of input convolution Feature Mapping；Each sliding window be mapped to a low-dimensional to In amount, low-dimensional vector is 256-d, the corresponding numerical value of a sliding window of each Feature Mapping；This vector is exported to two The layer of a peer connected entirely；- a window recurrence layer and a classification layer；Window returns layer and exports on each position, and 9 kinds Recommend region to correspond to window to need that there is translation scaling invariance, window returns layer and exports 4 translation scalings from 256 dimensional features Parameter has 4k output, i.e., the codes co-ordinates of k Suggestion box；Classification layer is exported from 256 dimensional features belongs to foreground and background Probability exports 2k Suggestion box score, i.e., is vehicle target/non-vehicle target estimated probability to each Suggestion box.

Whether the training of network is suggested in region, distribute a binary label to each candidate region, i.e., be vehicle pair As；Here distribute positive label to two class candidate regions: the enclosing region of (i) and some GT have the ratio between highest intersection union, IoU, the candidate region of overlapping；(ii) IoU with any GT enclosing region greater than 0.7 overlapping candidate region；It distributes simultaneously negative The IoU ratio that label gives all GT enclosing regions is below 0.3 candidate region；Leave out non-just non-negative candidate region；Tool Body algorithm is as follows:

STEP31: sequence reads every figure in training set；

STEP32: it to the true value candidate region of each calibration, overlaps the maximum candidate region of ratio and is denoted as prospect sample This；

STEP33:: to STEP32) remaining candidate region, if it is Chong Die with some calibration, IoU ratio is greater than 0.7, It is denoted as prospect sample；If the overlap proportion of itself and any one calibration is both less than 0.3, it is denoted as background sample；

STEP34: the remaining candidate region STEP32 and STEP33 is discarded；

STEP35: the candidate region across image boundary is discarded.

In order to carry out the screening and regional location refine of candidate region automatically, here using minimum objective function；To one Formula (14) expression of the cost function of a image,

In formula, i is the index of candidate region in a batch processing, N_clsFor the normalization coefficient for layer of classifying, N_regFor window The normalization coefficient of layer is returned, λ is balance weight, p_iFor the prediction probability of vehicle target,For GT label, if candidate region It is positiveIf candidate region is negativet_iFor a vector, indicate that 4 parametrizations of the encirclement frame of prediction are sat Mark,The coordinate vector of frame, L are surrounded for GT corresponding with positive candidate region_clsFor the logarithm cost of classification, L_regTo return logarithm Cost, L ({ p_i},{t_i) it is total logarithm cost；

The logarithm cost L of classification_clsIt is calculated by formula (15),

Window returns logarithm cost L_regIt is calculated by formula (16),

In formula, R is the cost function of the robust of definition, belongs to Smooth L1 error, insensitive to outlier, uses formula (17) it calculates,

In formula (14)This means only positive candidate region, i.e.,Shi Caiyou returns generation Valence, other situations due toDo not return cost；Classification layer and window return the output of layer respectively by { p_iAnd { t_iGroup At this two respectively by N_clsAnd N_regAnd a balance weight λ normalization, λ=10, N are selected here_cls=256, N_reg= 2400, it is almost equal weight that selection sort layer and window in this way, which returns layer item,；

About position refine, here using 4 values, centre coordinate, width and height, calculation method is as follows,

In formula, x, y, w, h, which are respectively indicated, surrounds frame centre coordinate, width and height, x_a、y_a、w_a、h_aRespectively indicate candidate Regional center coordinate, width and height, x^*、y^*、w^*、h^*Respectively indicate encirclement frame centre coordinate, width and the height of prediction；With The calculated result of formula (18) carries out position refine；In fact, using area completely there is no any candidate window is explicitly extracted Suggest that network itself completes judgement and position refine in domain.

The VGG network, the method that multilayer neural network is established in label vehicle image data are divided into two steps, and one It is one layer network of each training, second is that tuning, the advanced expression r for generating original representation X upwards and the advanced expression r give birth to downwards At X' it is as consistent as possible；

The propagated forward process of convolutional neural networks, upper one layer of output are the input of current layer, and pass through activation letter Number successively transmitting, therefore the practical calculating output of whole network is indicated with formula (4),

O_p=F_n(…(F₂(F₁(XW₁)W₂)…)W_n) (4)

In formula, X expression is originally inputted, F_lIndicate l layers of activation primitive, W_lIndicate l layers of mapping weight matrix, O_p Indicate the practical calculating output of whole network；

The output of current layer (5) expression,

X^l=f^l(W^lX^l-1+b^l) (5)

In formula, l represents the network number of plies, X^lIndicate the output of current layer, X^l-1Indicate one layer of output, i.e. current layer Input, W^lRepresent trained, current network layer mapping weight matrix, b^lBigoted, the f for the additivity of current network^lIt is to work as The activation primitive of preceding network layer；The activation primitive f of use^lTo correct linear unit, i.e. ReLU is indicated with formula (6),

In formula, l represents the network number of plies, W^lRepresent trained, current network layer mapping weight matrix, f^lIt is to work as The activation primitive of preceding network layer；It is to allow it to be 0 if convolutional calculation result is less than 0 that it, which is acted on,；Otherwise keep its value constant.

The VGG network, first 5 layers are a typical depth convolutional neural networks, which is one Back-propagation process optimizes deconvolution parameter and biasing using stochastic gradient descent method by error function backpropagation Adjustment until network convergence or reaches maximum number of iterations stopping；

Backpropagation is needed by being compared to the training sample with label, right using square error cost function In c classification, the multi-class of N number of training sample is identified, network final output error function calculates mistake with formula (7) Difference,

In formula, E^NFor square error cost function,It is tieed up for the kth of n-th of sample corresponding label,For n-th of sample K-th of output of corresponding network prediction；

When carrying out backpropagation to error function, using the similar calculation method of traditional BP algorithm, such as formula (8) institute Show,

In formula, δ^lRepresent the error function of current layer, δ^l+1Represent one layer of error function, W^l+1For upper one layer of mapping square Battle array, f' indicate the inverse function of activation primitive, that is, up-sample, u^lIndicate upper one layer of the output for not passing through activation primitive, x^l-1It indicates Next layer of input, W^lWeight matrix is mapped for this layer；

After error back propagation, the error function δ of each network layer is obtained^l, then use stochastic gradient descent method To network weight W^lIt modifies, then carries out next iteration, until network reaches the condition of convergence；It is needed when carrying out error propagation The up-sampling in formula (8) is first passed through so that two layers of front and back size is identical, then carries out error propagation；

Algorithm idea is: 1) successively building monolayer neuronal is first first, is one single layer network of training every time in this way；2) when After all layers have been trained, tuning is carried out using wake-sleep algorithm.

Deep learning training process is specific as follows:

STEP21: using unsupervised learning from bottom to top, i.e., since bottom, past top layer in layer is trained, and is learned It practises vehicle image feature: first with no label vehicle image data training first layer, first learning the parameter of first layer when training, due to The limitation of model capacity and sparsity constraints, the model enabled learns the structure to data itself, to obtain The feature of expression ability is had more than inputting；After study obtains l-1 layers, by l-1 layers of output as l layers of input, L layers of training, thus respectively obtains the parameter of each layer；It is specific to calculate as shown in formula (5), (6)；

STEP22: top-down supervised learning goes to train by the vehicle image data of tape label, error from push up to Lower transmission, is finely adjusted network: specific to calculate as shown in formula (7), (8)；

The parameter of entire multilayered model is further finely tuned based on the obtained each layer parameter of STEP21, this step, which is one, prison Superintend and direct training process；STEP21 similar to neural network random initializtion initial value process, due to the STEP21 of deep learning be not with Machine initialization, but obtained by the structure of study input data, thus this initial value is closer to global optimum, so as to Obtain better effect.

Preceding 5 layers of the model initialization of the VGG network: it is broadly divided into data preparation, calculates image mean value, network 5 steps such as definition, training and recovery data；

1) data preparation；The image data that all kinds of vehicles are had collected by crawler software, what is obtained substantially has mark The vehicle image data of label, as training image data；Another kind of data are the vehicle figures obtained by bayonet camera As data；

2) image mean value is calculated；Model needs to subtract mean value from every picture；

3) definition of network；Main definitions xml tag path, picture path, storage train.txt, val.txt, The path of test.txt and trainval.txt file；

4) training；Run training module；

5) restore data；The layer of ReLu5 before deleting, and the bottom of roi_pool5 is changed to data and rois；

The model initialization work of vehicle vehicle pre-training is completed by above-mentioned processing.

Suggest that preceding 5 layers of low-level image feature extraction of the VGG network is utilized as a result, i.e. two nets in network in the region Network has shared preceding 5 layers of low-level image feature of the VGG network, needs to learn by alternative optimization the shared feature of optimization；Tool Body algorithm is as follows:

STEP41: suggest that network is suggested in the optimization training region of network with region, with above-mentioned data preparation, calculate image 5 steps such as mean value, the definition of network, training and recovery data complete model initializations, and end-to-end fine tuning is built for region View task；

STEP42: suggesting the Suggestion box that network generates with the region of STEP1, by one individually inspection of quick R-CNN training Survey grid network, this detection network are equally by the model initialization of vehicle vehicle pre-training, and at this time two networks are not yet Shared convolutional layer；

STEP43: suggest network training, fixed shared convolutional layer with detection netinit region, and only finely tune area The exclusive layer of network is suggested in domain, at this moment two network share convolutional layers；

STEP44: keeping shared convolutional layer to fix, and finely tunes the classification layer of quick R-CNN；In this way, two network share phases Same convolutional layer, finally constitutes a unified network.

In the present invention, vehicle vehicle visual identity main flow is as follows；

STEP51: images to be recognized is read；

STEP52: being normalized images to be recognized, and it is normalized to obtain tri- different colours 224 × 224 of RGB Image data；

STEP53: the normalized image data of tri- different colours of RGB 224 × 224 is input to three channels CNN, through 5 Layer process of convolution obtains vehicle vehicle character image data；

STEP54: suggesting the Suggestion box of network generation to vehicle vehicle character image data by region, chooses one most The Suggestion box of high score is to get to an area-of-interest, RoI；This RoI is passed through maximum 5 layers of pondization to be handled to obtain The trellis diagram of one 6 × 6 × 256RoI；

STEP55: the trellis diagram of RoI is exported and obtains the feature of 4096 dimensions to after two layer processing connected entirely at the same level Vector, the input data as softmax classifier；

STEP56: with the classification regression analysis of softmax obtaining vehicle vehicle cab recognition to the feature vector of 4096 dimensions as a result, It identifies by which kind of vehicle the vehicle in altimetric image belongs to.

Beneficial effects of the present invention are mainly manifested in:

1) a kind of model recognizing method based on quick R-CNN deep neural network is provided；

2) a kind of deep learning method is researched and developed, realizes unsupervised vehicle feature extraction；

3) a kind of classifier of type subclass suitable for thousands of kinds is designed, and there is scalability；

4) one quick R-CNN network implementations vehicle detection and knowledge end to end of use truly is realized Other frame, and there is the Morphological Diversity suitable for vehicle target, illumination variation diversity is fast under the environment such as background diversity The vehicle subclass identification of speed, high-precision and robustness.

Detailed description of the invention

Fig. 1 is the detection algorithm process of marginal information candidate frame；

Fig. 2 is the process content in region suggestion network；

Fig. 3 is the sliding window schematic diagram of 3 kinds of scales and 3 kinds of length-width ratios；

Fig. 4 is the synoptic diagram that network is suggested in region；

Fig. 5 is that region Suggestion box generates explanatory diagram；

Fig. 6 is the explanatory diagram of the shared network in quick R-CNN network；

Fig. 7 is the candidate region for suggesting obtaining after network processes by region to vehicle on real road；

Fig. 8 is to realize that region suggests that network itself completes judgement and position refine schematic diagram with cost function；

Fig. 9 is the vehicle cab recognition flow diagram based on CNN；

Figure 10 is wake-sleep algorithmic descriptions figure；

Figure 11 is the vehicle cab recognition flow diagram of the CNN of VGG model；

Figure 12 is the training process figure of quick R-CNN network implementations vehicle detection and identification end to end；

Figure 13 is quick CNN network implementations vehicle detection and identification process synoptic diagram end to end；

Figure 14 is the flow chart of quick R-CNN network implementations vehicle detection and identification end to end.

Specific embodiment

Technical solution of the present invention is described in further detail with reference to the accompanying drawing.

Embodiment 1

Referring to Fig.1~14, the technical solution adopted by the present invention to solve the technical problems is:

(1) about the fast vision partitioning algorithm for designing a kind of Vehicle Object；

Firstly, designing a kind of fast vision partitioning algorithm of Vehicle Object, i.e., carry out regional choice to Vehicle Object and determining Position；

In order to which the position to vehicle target positions；Since vehicle target possibly is present at any position of image, and And the size of target, Aspect Ratio are not known yet, original technology be original adoption sliding window strategy to entire image into Row traversal, and need to be arranged different scales, different length-width ratios；Although the strategy of this exhaustion contains that target is all can The position that can occur, but disadvantage is also obvious: time complexity is too high, and generation redundancy window is too many, this is also serious Influence speed and performance that subsequent characteristics are extracted and classify；

For sliding window there are the problem of, the invention proposes a kind of solutions of candidate region；Find out in advance The position that vehicle target is likely to occur in figure；The information such as texture, edge, color in image, energy are utilized due to candidate region Guarantee to keep higher recall rate in the case where choosing less window；The time that subsequent operation can be effectively reduced in this way is complicated Degree, and the candidate window obtained is higher than the quality of sliding window；Available algorithm is selective search, i.e., Selective Search and marginal information candidate frame, i.e. edge Boxes；The core of these algorithms is that human vision is utilized " take a panoramic view of the situation " at a glance, direct " general position " of the discovery vehicle target in entire image；Since selective search is calculated Method time-consuming is bigger, is not suitable for and online vehicle cab recognition and detection；The present invention uses the detection algorithm of marginal information candidate frame.

The detection algorithm thought of marginal information candidate frame is: utilize marginal information, determine profile number in candidate frame and It with the profile number of candidate frame imbricate, and is scored candidate frame based on this, further according to the sequence of score Determine the candidate region information being made of size, length-width ratio, position；Detection algorithm process such as Fig. 1 institute of marginal information candidate frame Show；Algorithm steps are as follows:

STEP11: original image is handled with structure depth woods edge detection algorithm, obtained edge image, then with non- Maximum restrainable algorithms are further processed edge image to obtain a relatively sparse edge image；

STEP12: the marginal point in relatively sparse edge image almost point-blank puts together to form one A edge group, specific way are the marginal point of 8 connections ceaselessly to be found, until the orientation angle between marginal point two-by-two is poor Value and be greater than pi/2, just obtained the more a edge group s of N in this way_i∈S；

STEP13: calculating the similarity between two two edges groups with formula (1),

a(s_i,s_j)=| cos (θ_i-θ_ij)cos(θ_j-θ_ij)|^γ(1)

In formula, θ_iAnd θ_jThe average orientation of respectively two edge groups, s_iAnd s_jRespectively indicate two edge groups, θ_ijIt is two The mean place x of a edge group_iAnd x_jBetween angle, γ be similar sensitivity coefficient, here select γ=2, a (s_i,s_j) two Similarity between edge group；In order to improve computational efficiency, here by similarity a (s_i,s_j) calculated value be more than threshold value, T_s≥ 0.05 edge group is stored, remaining is disposed as zero；

STEP14: assigning a weight to each edge group, and weight calculation method is provided by formula (2),

In formula, T is that s is reached since the edge of candidate frame_iEdge group arrangement set path, W_b(s_i) it is edge s_i Weight, t_jFor ..., (parameter interpretation)；By W if not finding path_b(s_i) it is set as 1；

STEP15: calculating the scoring of candidate frame with formula (3),

In formula, m_iFor in edge group s_iIn all edge p size m_pSummation, W_b(s_i) it is edge s_iWeight, b_w And b_hThe respectively width and height of candidate frame, k are size coefficient, define k=1.5 here；Calculation window inward flange number into Row marking, last Ordering and marking filter out low point of candidate frame；For present invention is mainly applied to the extraction of bayonet vehicle, this In just select the candidate frame of best result as tested vehicle object foreground image；

(2) about a kind of deep learning method is researched and developed, unsupervised vehicle feature extraction is realized；

Due to the Morphological Diversity of vehicle target, illumination variation diversity, the factors such as background diversity make design one The feature of robust is not so easy；However the quality for extracting feature directly influences the accuracy of classification；

Vehicle feature extraction that meet above three diversity requirement, robust must be by unsupervised deep learning come real It now completes, successively initialization is a highly useful solution；The essence of deep learning is by building with many hidden Layer machine learning model and magnanimity training data, to learn more useful feature, thus finally promoted classification or prediction Accuracy；Therefore, by following two points realize deep learning in the present invention: 1) depth of model structure is of five storeys~10 multilayers Hidden node；2) by layer-by-layer eigentransformation, the character representation by sample in former space transforms to a new feature space, from And makes to classify or predict to be more easier；

The present invention proposes one kind in non-supervisory data, i.e., without establishing multilayer nerve net in label vehicle image data The method of network is briefly divided into two steps, first is that one layer network of training every time, second is that tuning, generates original representation X upwards Advanced expression r and the advanced expression r X' generated downwards it is as consistent as possible；

O_p=F_n(…(F₂(F₁(XW₁)W₂)…)W_n) (4)

In formula, X expression is originally inputted, F_lIndicate l layers of activation primitive, W_lIndicate l layers of mapping weight matrix, l= 1,2,3 ... represents the network number of plies, O_pIndicate the practical calculating output of whole network；

The output of current layer formula (5) expression,

X^l=f^l(W^lX^l-1+b^l) (5)

In formula, l represents the network number of plies, X^lIndicate the output of current layer, X^l-1Indicate one layer of output, i.e. current layer Input, W^lRepresent trained, current network layer mapping weight matrix, b^lBigoted, the f for the additivity of current network^lIt is to work as The activation primitive of preceding network layer；The activation primitive f that the present invention uses^lTo correct linear unit, i.e. Rectified Linear Units, ReLU, with formula (6) indicate,

In formula, l represents the network number of plies, W^lRepresent trained, current network layer mapping weight matrix, f^lIt is to work as The activation primitive of preceding network layer；It is to allow it to be 0 if convolutional calculation result is less than 0 that it, which is acted on,；Otherwise keep its value constant；

Network parameter is further decreased using the shared method of local receptor field and weight in CNN.So-called part Receptive field refers to that every kind of convolution kernel is only connected with some specific region in image, i.e. every kind of convolution kernel convolved image A part, next again in other layers by these part convolution features link together, both met image pixel in space On relevance, and reduce deconvolution parameter quantity.And weight is shared to be passed through so that the weight of every kind of convolution kernel is all the same Increase the type of convolution kernel to extract the multi-party region feature of image；It is special in order to provide more details to the classification of vehicle vehicle subclass It levies, appropriately increases the type of convolution kernel in the present invention；

Convolutional neural networks training is a back-propagation process, similar with traditional BP algorithm, anti-by error function To propagation, deconvolution parameter and biasing are optimized and revised using stochastic gradient descent method, until network convergence or reached most Big the number of iterations stops.

Backpropagation needs to calculate error by being compared the training sample with label.For example, by using a square mistake Poor cost function, for c classification, the multi-class identification problem of N number of training sample, network final output error function formula (7) it indicates,

After error back propagation, the error function δ of each network layer is obtained^l, then use stochastic gradient descent method To network weight W^lIt modifies, then carries out next iteration, until network reaches the condition of convergence；It should be noted that due to layer Size dimension between layer is different, needs to first pass through the up-sampling in formula (8) when carrying out error propagation so that front and back two Layer size is identical, then carries out error propagation；

Deep learning training process is specific as follows:

The parameter of entire multilayered model is further finely tuned based on the obtained each layer parameter of STEP21, this step, which is one, prison Superintend and direct training process；STEP21 similar to neural network random initializtion initial value process, due to the STEP21 of deep learning be not with Machine initialization, but obtained by the structure of study input data, thus this initial value is closer to global optimum, so as to Obtain better effect；So deep learning effect quality is largely attributed to the fact that the feature learning process of STEP21；

For the vehicle image data set of tape label, the present invention collects the vehicle figure of a variety of models using web crawlers technology Picture, vehicle image data of the vehicle image being collected into this way using manual confirmation as tape label；

About wake-sleep algorithm, see that attached drawing 8, main thought are to pass through in the wake stage to given generation weight Study obtains cognition weight；In the sleep stage to given cognition weight, obtain generating weight by study；

Wake stage, l layers of generation weight g_l, it is updated with formula (9),

Δg_l=ε s^l+1(s^l-p^l) (9)

In formula, Δ g_lFor l layers of generation weight g_lUpdate changing value, ε is learning rate, s^l+1For l+1 layers of neuron Liveness, s^lFor l layers of neuron liveness, p^lActivation probability when being driven for the current state of l layers of neuron；

Sleep stage, l layers of cognition weight w_l, it is updated with formula (10),

Δw_l=ε s^l-1(s^l-q^l) (10)

In formula, Δ w_lFor l layers of cognition weight w_lUpdate changing value, ε is learning rate, s^l-1For l-1 layers of neuron Liveness, s^lFor l layers of neuron liveness, q^lThe preceding layer neuron of present cognitive weight is used for l layers of neuron Activation probability when current state drives；

(3) a kind of classifier of type subclass suitable for thousands of kinds is designed, and there is scalability；

The present invention classifies to vehicle using Softmax classifier；Softmax classifier is the feature robust the case where Under, there is preferable classifying quality, while this classifier has scalability, without to original training after new vehicle occurs Good network characterization carries out relearning training, increases the practicability of system；Softmax principle is the input number that will be extracted Be compared according to feature with feature database, calculate the probability of each classification results, then take the highest result of probability into Row output；

Using the learning outcome in CNN as the input data of softmax classifier；Softmax recurrence is towards multiclass point The case where Logistic of class problem is returned, and is the general type that Logistic is returned, is suitable for mutual exclusion between classification.Assuming that pair In training set { (x⁽¹⁾,y⁽¹⁾,…,x^(m),y^(m)), there is y⁽¹⁾∈ { 1,2 ..., k } inputs x for given sample, exports one The vector of k dimension indicates that the probability that each classification results occurs is p (y=i | x), it is assumed that function h (x) is as follows:

θ₁,θ₂,…θ_kIt is the parameter of model, and all probability and be 1.Cost function after regularization term is added are as follows:

Finally, realizing that the classification of softmax returns by minimizing J (θ).

(4) one quick R-CNN network implementations vehicle detection and knowledge end to end of use truly is designed Other frame；

The present invention will focus on solving following three problems:

1) how design section suggests network；

2) region how to be trained to suggest network；

3) region how to be allowed to suggest that network and quick R-CNN network sharing features extract network；

Suggest that Web content specifically includes that network structure, feature extraction, the design of region suggestion network are suggested in region in region With training thinking, candidate region, window classification and position refine；

Suggest network structure, feature extraction, the design of region suggestion network in region:

Region suggests that network structure, feature extraction, region suggest that the design of network is as shown in Fig. 11, in order to multiple It is calculated on GPU, is divided into several groups in calculating in layer, the calculating in every group corresponds to GPU by it and completes, and can be promoted in this way Calculating speed；It is a kind of VGG network, i.e. visual geometry group shown in Figure 12,8 Ge Juan bases, 3 full articulamentums, It is 11 layers total；There are the convolutional layer of 5 groups, 2 classification layers to extract characteristics of image, 1 classification layer characteristic of division in 8 Ge Juan bases；3 A full articulamentum is separately connected classification layer 6, classification layer 7 and classification layer 8；

In Figure 11, normalized 224 × 224 image is sent directly into network, first five stage is the pond convolution+ReLU+ on basis The form of change, inputs p candidate region again at the ending of the 5th stage, and candidate region has 1 picture numbers and 4 geometry positions Confidence breath；Each candidate region is uniformly divided into M × N block by the pond the RoI- layer at the ending of the 5th stage, is carried out most to every piece Great Chiization operation；Candidate region not of uniform size on characteristic pattern is changed into the unified data of size, be sent into next layer of training with Identification；First five above-mentioned stage technique is all to be proved to mature technology in convolutional neural networks technology, it is important to how to make this A little candidate regions can be multiplexed the network characterization in these first five stages of picture；

Suggest that network and quick R-CNN network sharing features extract network in region:

Attached drawing 6 describes region and suggests the convolutional layer output how network and quick R-CNN share, first five stage in Figure 11 Belong to a primitive character and extract network, suggest network formation zone Suggestion box to region respectively later and is examined to quick R-CNN The characteristics of image in the Suggestion box of region is surveyed, two layers connected entirely at the same level, i.e. classification layer 6+ in Figure 11 are then output to ReLU6 and classification layer 7+ReLU7, obtains the p characteristic patterns containing 4096 vectors, then gives classification layer respectively and window returns Layer is handled；

Formation zone Suggestion box:

For formation zone Suggestion box, the present invention is sliding in the convolution Feature Mapping of the last one shared convolutional layer output Small network is moved, this network is connected to entirely in the spatial window of n × n of input convolution Feature Mapping, as shown in Fig. 5；Each Sliding window is mapped on a low-dimensional vector, and the low-dimensional vector in attached drawing 5 is 256-d, a sliding of each Feature Mapping Window corresponds to a numerical value；This vector is exported to two layers connected entirely at the same level；- a window recurrence layer and a classification Layer；Window returns layer and exports on each position, and 9 kinds of recommendation regions correspond to window and need to have translation scaling invariance, returns Layer exports 4 translation zooming parameters from 256 dimensional features；Classification layer is exported from 256 dimensional features belongs to the general of foreground and background Rate；

For example, in the position of each sliding window, while predicting that k region is suggested, therefore window returns layer and has 4k a Output, i.e., the codes co-ordinates of k Suggestion box；Classify layer export 2k Suggestion box score, i.e., to each Suggestion box be vehicle target/ The estimated probability of non-vehicle target；

Candidate region: k Suggestion box is parameterized by the corresponding k frame for being known as candidate regions, i.e., each candidate region with Centered on current sliding window mouth center, and a kind of corresponding scale and length-width ratio, the present invention use 3 kinds of scales and 3 kinds of length-width ratios, such as Shown in attached drawing 3；Just there is k=9 kind candidate region in each sliding position in this way；The convolution feature that size is W × H is reflected It penetrates, a total of W × H × k candidate region；

In entire quickly R-CNN algorithm, there are three types of graphical rules altogether: 1) original image scale: the size of original input picture, It is unrestricted, do not influence performance；2) Normalized Scale: input feature vector extracts the size of network, is arranged in test, candidate Region is set on this scale；This parameter and the relative size of candidate region determine the target zone for wanting detection；3) Network inputs scale: input feature vector detects the size of network, is arranged in training, is 224 × 224.

In conclusion region suggests that network is to export the set of rectangular target Suggestion box, often using an image as input A frame has the score of a Vehicle Object, as shown in Fig. 7；

Suggest that Web content specifically includes that training sample, cost function and hyper parameter in training region；

The training of region suggestion network:

The training of network is suggested in region, and whether the present invention distributes a binary label to each candidate region, i.e., be Vehicle Object；Here distribute positive label to two class candidate regions: (i) and some GT, the enclosing region of ground truth have most The ratio between high intersection union, IoU, Intersection-over-Union, the candidate region of overlapping；(ii) it is surrounded with any GT IoU of the region greater than 0.7 overlapping candidate region；Simultaneously distribute negative label give all GT enclosing regions IoU ratio it is all low In 0.3 candidate region；Leave out non-just non-negative candidate region；

Training sample algorithm:

STEP31: sequence reads every figure in training set；

STEP33:: to the remaining candidate region STEP32, if it is Chong Die with some calibration, IoU ratio is greater than 0.7, note For prospect sample；If the overlap proportion of itself and any one calibration is both less than 0.3, it is denoted as background sample；

STEP34: the remaining candidate region STEP32 and STEP33 is discarded；

STEP35: the candidate region across image boundary is discarded.

Cost function:

According to these definition, the multitask cost followed here, using minimum objective function；To the cost of an image Function formula (14) expression,

In formula, i is the index of candidate region in a batch processing, N_clsFor the normalization coefficient for layer of classifying, N_regTo return The normalization coefficient of layer, λ is balance weight, p_iFor the prediction probability of vehicle target,For GT label, if candidate region is positiveIf candidate region is negativet_iFor a vector, 4 parametrization coordinates of the encirclement frame of prediction are indicated,For GT corresponding with positive candidate region surrounds the coordinate vector of frame, L_clsFor the logarithm cost of classification, L_regTo return logarithm cost, L ({p_i},{t_i) it is total logarithm cost；

The logarithm cost L of classification_clsIt is calculated by formula (15),

Window returns logarithm cost L_regIt is calculated by formula (16),

In formula, x, y, w, h, which are respectively indicated, surrounds frame centre coordinate, width and height, x_a、y_a、w_a、h_aRespectively indicate candidate Regional center coordinate, width and height, x^*、y^*、w^*、h^*Respectively indicate encirclement frame centre coordinate, width and the height of prediction；With The calculated result of formula (18) carries out position refine；In fact, using area completely there is no any candidate window is explicitly extracted Suggest that network itself completes judgement and position refine in domain；

The optimization of region suggestion network:

Region suggests that network can be embodied as full convolutional network naturally, end-to-end by backpropagation and stochastic gradient descent Training；Here using this network of the sampling policy training of picture centre, each batch processing is by containing many positive negative samples Single image composition；256 candidate regions are randomly sampled in one image, calculate the cost in batch processing with formula (14) Function, wherein the ratio of the positive and negative candidate region sampled is 1:1；As soon as if the positive sample number in image less than 128, we Other remaining candidate regions in this batch processing are filled up with negative sample；Here batch processing is dimensioned to 256；

By from the Gaussian Profile that zero-mean standard deviation is 0.01 weight that obtains come all mew layers of random initializtion, institute Meaning mew layer refers to suggesting the subsequent layer of network in region, such as the classification layer 6+ReLU6 and classification layer 7+ReLU7 in attached drawing 11； Every other layer, i.e. shared convolutional layer pass through the classification samples pre-training to vehicle vehicle such as first five layer in attached drawing 11 Model initializes；The present invention is 0.001 for the learning rate of 60k batch processing on vehicle model data collection, for The learning rate of next 20k batch processing is 0.0001；Momentum is 0.9, and weight decays to 0.0005；

The model initialization of vehicle vehicle pre-training: be broadly divided into data preparation, calculate image mean value, network definition, 5 steps such as training and recovery data；

1) data preparation；The new folder myself in data, we have collected all kinds of vehicles by crawler software Image data, due to band keyword scan for substantially with the vehicle image data of label, we by its As training data；Another kind of data are the vehicle image data that we are obtained by bayonet camera；

The input of training and test is described with train.txt and val.txt, lists the mark of All Files and they Label；The name of classification is the sequence of ASCII character, i.e. 0-999, and corresponding systematic name is mapped in synset_ with number In words.txt；Val.txt is unable to label, is all arranged to 0；Then the size of picture is unified into 256 × 256；Then exist Myself file is created in caffe-master/examples, then by caffe-maester/examples/imagenet Create_imagenet.sh copy under this document folder, its name is changed to create_animal.sh, modification training and is surveyed The setting for trying path, runs the sh；Finally obtain myself_train_lmdb and myself_val_lmdb；

2) image mean value is calculated；Model needs us to subtract mean value from every picture, thus we must obtain it is trained Mean value is realized with tools/compute_image_mean.cpp, equally replicates caffe-maester/examples/ In ./make_imagenet_mean to the examples/myself of imagenet, it is renamed as make_car_mean.sh, Modified path；

3) definition of network；All Files in caffe-master/models/bvlc_reference_caffenet It copies in caffe-master/examples/myself file, modifies train_val.prototxt, pay attention to modifying number According to the path of layer；

In training, we are with a softmax-loss layers of calculating loss function and initialize backpropagation, and are testing Card, our service precision layers detect our precision；There are one the agreement solver.prototxt run, copied, generals The first row path is changed to our path net: " examples/myself/train_val.prototxt ",

Test_iter:1000 refers to the batch of test；Test_interval:1000 refers to every 1000 iteration tests one It is secondary；Base_lr:0.01 is basic learning rate；Lr_policy: " step " learning rate variation；The variation of gamma:0.1 learning rate Ratio；Every 100000 iteration of stepsize:100000 reduce learning rate；The every 20 layers of display of display:20 is primary；max_ Iter:450000 maximum number of iterations；The parameter of momentum:0.9 study；The parameter of weight_decay:0.0005 study； 10000 display states of the every iteration of snapshot:10000；The end solver_mode:GPU adds a line, represents and uses GPU operation；

4) training；Train_caffenet.sh in caffe-master/examples/imagenet is copied next And entitled train_myself.sh operation is modified, modify the path of the inside；

5) restore data；The resume_training.sh duplication in caffe-master/examples/imagenet Come over and runs；

The model initialization work of vehicle vehicle pre-training is completed by above-mentioned processing；Further, the present invention proposes one 4 step training algorithms of kind, learn shared feature by alternative optimization；

STEP41: suggesting that network is suggested in the optimization training region of network with region, equal with above-mentioned data preparation, calculating image 5 steps such as value, the definition of network, training and recovery data complete model initializations, and end-to-end fine tuning is suggested for region Task；

STEP44: keeping shared convolutional layer to fix, and finely tunes the classification layer of quick R-CNN；In this way, two network share phases Same convolutional layer, finally constitutes a unified network；

The visual identity of vehicle vehicle:

We provide a vehicle vehicle visual identity main flow below, and entire process flow is as shown in Fig. 13；

STEP51: images to be recognized is read；

STEP56: with the classification regression analysis of softmax obtaining vehicle vehicle cab recognition to the feature vector of 4096 dimensions as a result, By vehicle vehicle classification at 1000 types in the present invention, to identify by which kind of vehicle the vehicle in altimetric image belongs to.

Embodiment 2

Visual identification technology of the invention has universality, and the subclass suitable for other objects identifies, as long as participating in training Data run learnt in the system developed of the present invention, can be achieved with the class object after obtaining the feature of the class object Subclass identification mission.

Embodiment 3

Visual identification technology of the invention has scalability, without to original trained network after new subclass occurs Feature carries out relearning training, as long as being trained study to new subclass, and softmax classifier in systems expands Open up the data of classification.

The foregoing is merely preferable implementation examples of the invention, are not intended to restrict the invention, it is all in spirit of that invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of model recognizing method based on quick R-CNN deep neural network, it is characterised in that: be used for depth including one The VGG network of study and training identification, a region suggestion network for being used to extract area-of-interest and one are for vehicle The Softmax classifier of classification；

The VGG network, including 8 Ge Juan bases, 3 full articulamentums amount to 11 layers；8 Ge Juan bases include the convolution of 5 groups Layer, 2 classification layers extract characteristics of image and 1 classification layer characteristic of division；3 full articulamentums are separately connected classification layer 6, classification layer 7 and classification layer 8；

Network is suggested in the region, including 1 layer of classifying, 1 window recurrence layer, 1 module for calculating Classification Loss and 1 Calculation window returns the module of loss, p interested Suggestion box of output；

The input data feature and the learning training that extract are obtained feature database data and compared by the Softmax classifier It is right, the probability of each classification results is calculated, the highest result of probability is then taken to be exported；

The 6th layer of the VGG network and the 7th layer p interested Suggestion box for suggesting that network is exported according to the region Interior characteristics of image carries out convolution sum ReLU processing, obtains the p characteristic patterns containing 4096 vectors, then gives classification layer respectively Layer is returned with window to be handled, and realizes the segmentation of vehicle image；The Softmax classifier contains 4096 vectors to p Characteristic pattern carry out Classification and Identification, obtain the classification results of vehicle vehicle.

2. model recognizing method as described in claim 1, it is characterised in that: network is suggested in the region, for generation area Domain Suggestion box has accessed the region at the 5th layer of end of the VGG network and has suggested network, i.e., in the 5th layer of convolutional layer A small network is slided on the convolution Feature Mapping figure of output, this network is connected to n × n's of input convolution Feature Mapping entirely In spatial window；Each sliding window is mapped on a low-dimensional vector, low-dimensional vector be 256-d, the one of each Feature Mapping The corresponding numerical value of a sliding window；This vector is exported to two layers connected entirely at the same level, and-a window returns layer and one A classification layer；Window returns layer and exports on each position, and 9 kinds of recommendation regions correspond to window translation and scale indeformable, window time Return layer to export 4 translation zooming parameters from 256 dimensional features, there is 4k output, i.e., the codes co-ordinates of k Suggestion box；Classification layer Output belongs to the probability of foreground and background from 256 dimensional features, exports 2k Suggestion box score, i.e., is vehicle to each Suggestion box The estimated probability of target or non-vehicle target.

3. model recognizing method as claimed in claim 1 or 2, it is characterised in that: the training of network is suggested in region, gives each time Whether favored area distributes a binary label, i.e., be Vehicle Object；Here positive label is distributed to two class candidate regions:

(i) there is the overlapping candidate region of the ratio between highest intersection union IoU with the enclosing region of some ground truth GT；

(ii) it is greater than 0.7 overlapping candidate region with the ratio between the intersection union of any ground truth GT enclosing region IoU；Simultaneously Distribute negative label give all GT enclosing regions IoU ratio be below 0.3 candidate region；

Leave out non-just non-negative candidate region；Specific algorithm is as follows:

STEP31: sequence reads every figure in training set；

STEP32: it to the true value candidate region of each calibration, overlaps the maximum candidate region of ratio and is denoted as prospect sample；

STEP33: to the remaining candidate region STEP32, if it is Chong Die with some calibration, and IoU ratio is greater than 0.7, is denoted as Prospect sample；If the overlap proportion of itself and any one calibration is both less than 0.3, it is denoted as background sample；

STEP34: the remaining candidate region STEP32 and STEP33 is discarded；

STEP35: the candidate region across image boundary is discarded.

4. model recognizing method as claimed in claim 3, it is characterised in that: in order to carry out screening and the area of candidate region automatically Domain position refine, here using minimum objective function；The cost function of one image is indicated with formula (14),

In formula, i is the index of candidate region in a batch processing, N_clsFor the normalization coefficient for layer of classifying, N_regFor window recurrence The normalization coefficient of layer, λ is balance weight, p_iFor the prediction probability of vehicle target,For GT label, if candidate region is positiveIf candidate region is negativet_iFor a vector, 4 parametrization coordinates of the encirclement frame of prediction are indicated,For GT corresponding with positive candidate region surrounds the coordinate vector of frame, L_clsFor the logarithm cost of classification, L_regTo return logarithm cost, L ({p_i},{t_i) it is total logarithm cost；

The logarithm cost L of classification_clsIt is calculated by formula (15),

Window returns logarithm cost L_regIt is calculated by formula (16),

In formula, R is the cost function of the robust of definition, belongs to Smooth L1 error, insensitive to outlier, with formula (17) It calculates,

In formula (14)This means only positive candidate region, i.e.,Shi Caiyou returns cost, His situation due toDo not return cost；Classification layer and window return the output of layer respectively by { p_iAnd { t_iComposition, this two Item is respectively by N_clsAnd N_regAnd a balance weight λ normalization, λ=10, N are selected here_cls=256, N_reg=2400, pass through It is equal weight that such selection sort layer and window, which return layer item,；

In formula, x, y, w, h, which are respectively indicated, surrounds frame centre coordinate, width and height, x_a、y_a、w_a、h_aIt respectively indicates in candidate region Heart coordinate, width and height, x^*、y^*、w^*、h^*Respectively indicate encirclement frame centre coordinate, width and the height of prediction, t_x,t_y、t_w、t_h Respectively indicate encirclement frame centre coordinate, width and height after the refine of position；t^* _x,t^* _y、t^* _w、t^* _hIt respectively indicates through position essence Encirclement frame centre coordinate, width and the height of prediction after repairing；Position refine region is carried out with the calculated result of formula (18) to build Discuss network.

5. model recognizing method as described in claim 1, it is characterised in that: the VGG network, in label vehicle image number The method that multilayer neural network is established on is divided into two steps, first is that one layer network of training every time, second is that tuning, makes original representation X The advanced expression r generated upwards is consistent with the X' that the advanced expression r is generated downwards；

The propagated forward process of convolutional neural networks, upper one layer output be current layer input, and by activation primitive by Layer transmitting, therefore the practical calculating output of whole network is indicated with formula (4),

O_p=F_n(…(F₂(F₁(XW₁)W₂)…)W_n) (4)

In formula, X expression is originally inputted, F_nIndicate l layers of activation primitive, W_nIndicate l layers of mapping weight matrix, O_pIt indicates The practical calculating of whole network exports；

The output of current layer formula (5) expression,

X^l=f^l(W^lX^l-1+b^l) (5)

In formula, l represents the network number of plies, X^lIndicate the output of current layer, X^l-1Indicate one layer of output, the i.e. input of current layer, W^lRepresent trained, current network layer mapping weight matrix, b^lIt is biased for the additivity of current network, f^lIt is current net The activation primitive of network layers；The activation primitive f of use^lTo correct linear unit, i.e. ReLU is indicated with formula (6),

In formula, l represents the network number of plies, W^lRepresent trained, current network layer mapping weight matrix, f^lIt is current net The activation primitive of network layers；It is to allow it to be 0 if convolutional calculation result is less than 0 that it, which is acted on,；Otherwise keep its value constant.

6. model recognizing method as claimed in claim 5, it is characterised in that: the VGG network, first 5 layers are a depth Convolutional neural networks, the neural metwork training are that a back-propagation process by error function backpropagation utilizes boarding steps Degree descent method optimizes and revises deconvolution parameter and biasing, until network convergence or reaches maximum number of iterations stopping；

Backpropagation is needed by being compared to the training sample with label, using square error cost function, for c Classification, the multi-class of N number of training sample are identified that network final output error function calculates error with formula (7),

In formula, E^NFor square error cost function,It is tieed up for the kth of n-th of sample corresponding label,Net is corresponded to for n-th of sample K-th of output of network prediction；

When carrying out backpropagation to error function, as shown in formula (8),

In formula, δ^lRepresent the error function of current layer, δ^l+1Represent one layer of error function, W^l+1For upper one layer of mapping matrix, f' The inverse function for indicating activation primitive, that is, up-sample, u^lIndicate upper one layer of the output for not passing through activation primitive, x^l-1Indicate next The input of layer, W^lWeight matrix is mapped for this layer；

After error back propagation, the error function δ of each network layer is obtained^l, then using stochastic gradient descent method to this layer Map weight matrix W^lIt modifies, then carries out next iteration, until network reaches the condition of convergence；When carrying out error propagation Need to first pass through the up-sampling in formula (8) so that two layers of front and back size is identical, then carries out error propagation；

Algorithm idea is: 1) successively building monolayer neuronal is first first, is one single layer network of training every time in this way；2) when all After layer has been trained, tuning is carried out using wake-sleep algorithm；

Deep learning training process is specific as follows:

STEP21: using unsupervised learning from bottom to top, i.e., since bottom, past top layer in layer is trained, and learns vehicle Characteristics of image: first with no label vehicle image data training first layer, when training, first learns the parameter of first layer, due to model The limitation of capacity and sparsity constraints, the model enabled learns the structure to data itself, to obtain than defeated Enter to have more the feature of expression ability；After study obtains l-1 layers, by l-1 layers of output as l layers of input, train L layers, thus respectively obtain the parameter of each layer；

STEP22: top-down supervised learning goes to train by the vehicle image data of tape label, the top-down biography of error It is defeated, network is finely adjusted.

7. model recognizing method as described in claim 1, it is characterised in that: preceding 5 layers of the model of the VGG network is initial Change: being divided into data preparation, calculates image mean value, the definition of network, training and restore 5 steps of data；

1) data preparation；The image data for collecting all kinds of vehicles is taken the photograph including the vehicle image data with label and by bayonet The vehicle image data that camera obtains, and using the vehicle image data with label as training image data；

2) image mean value is calculated；Mean value is subtracted from every picture；

3) definition of network；Define xml tag path, picture path, storage train.txt, val.txt, test.txt and The path of trainval.txt file；

4) training；Run training module；

8. model recognizing method as claimed in claim 7, it is characterised in that: the region suggestion network is utilized described Preceding 5 layers of low-level image feature of VGG network extracts as a result, i.e. two network shares preceding 5 layers of low-level image feature of the VGG network, Need to learn by alternative optimization the shared feature of optimization；Specific algorithm is as follows:

STEP41: optimize training region and suggest network, with the data preparation, calculate image mean value, the definition of network, training Model initialization is completed with 5 steps of data are restored, and finely tunes region with mode end to end and suggests task；

STEP42: suggesting the Suggestion box that network generates with the region of STEP41, and by quick R-CNN training, one is individually detected net Network, this detection network are equally by the model initialization of vehicle vehicle pre-training, and at this time two networks are shared not yet Convolutional layer；

STEP43: suggest network training, fixed shared convolutional layer with detection netinit region, and only fine tuning region is built The exclusive layer of network is discussed, at this moment two network share convolutional layers；

STEP44: keeping shared convolutional layer to fix, and finely tunes the classification layer of quick R-CNN；In this way, two network shares are identical Convolutional layer finally constitutes a unified network.

9. model recognizing method as described in claim 1, it is characterised in that: vehicle vehicle visual identity process is as follows；

STEP51: images to be recognized is read；

STEP52: being normalized images to be recognized, obtains tri- normalized image datas of different colours of RGB；

STEP53: tri- normalized image datas of different colours of RGB are input to three channels CNN, are obtained through 5 layers of process of convolution To vehicle vehicle character image data；

STEP54: suggest the Suggestion box of network generation by region to vehicle vehicle character image data, choose a highest and obtain Point Suggestion box to get to an area-of-interest, RoI；This RoI is passed through maximum 5 layers of pondization to be handled to obtain one The trellis diagram of RoI；

STEP55: the trellis diagram of RoI being exported and obtains the feature vector of 4096 dimensions to after two layer processing connected entirely at the same level, Input data as softmax classifier；

STEP56: vehicle vehicle cab recognition is obtained as a result, identification with the classification regression analysis of softmax to the feature vector of 4096 dimensions Which kind of vehicle is belonged to by the vehicle in altimetric image out.