CN109886147A

CN109886147A - A kind of more attribute detection methods of vehicle based on the study of single network multiple-task

Info

Publication number: CN109886147A
Application number: CN201910086525.4A
Authority: CN
Inventors: 候少麒; 殷光强; 石方炎; 向凯; 杨晓宇
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-01-29
Filing date: 2019-01-29
Publication date: 2019-06-14

Abstract

The invention discloses a kind of more attribute detection methods of vehicle based on the study of single network multiple-task, this method comprises: picture is collected and screening；Data set production；Network design is based on Darknet deep learning frame, according to the multiattribute feature of vehicle using end to end, a stage non-cascaded mode planned network structure, build network model；Model training sets and adjusts model parameter, according to the network model of design training vehicle multiattribute data collection, and in training, carries out data enhancing and multiple dimensioned training；Six steps of model measurement and model evaluation.The present invention is based on the deep learning framework platforms of Darknet to be designed, builds network model, it is an a kind of stage non-cascaded structure end to end, network is by using technologies such as data enhancing, convolution kernel separation, multi-scale feature fusions, improve the multiattribute detection effect of vehicle, while realizing compared with high detection accurate rate and recall ratio, there is preferable real-time.

Description

A kind of more attribute detection methods of vehicle based on the study of single network multiple-task

Technical field

The present invention relates to the target detection technique fields in computer vision direction, in particular to one kind based on single more of network The more attribute detection methods of vehicle of business study.

Background technique

With economic continuous development, automobile has become the most important vehicles of people, convenient providing to people Meanwhile the problems such as caused road traffic congestion, vehicle supervision, is also on the rise.Intelligent transportation system, car monitoring system Generally approve that, as a part of smart city, which are mainly applied to controls of traffic and road, police criminal detection tune by masses It looks into, parking lot monitoring, cell intelligent management etc..With the arrival of information age, how efficiently to accomplish that vehicle is real-time It is intelligent vehicle management urgent problem to be solved that detection (i.e. the positioning with identification of vehicle), people's vehicle, which accurately match,.

Traditional vehicle identification method is mainly based on car plate detection, but license plate wears, blocks, easily changing and being illuminated by the light The influences such as environment become its stumbling-block effectively detected and have relied solely on the detection of license plate single attribute in addition in criminal investigation field It is not enough to accurately identify the true identity of vehicle, in the case, the application of the more Attribute Recognition technologies of vehicle then seems abnormal heavy It wants, it can make up the deficiency of single Attribute Recognition such as license plate, so that further increase intelligent transportation system and vehicle guard system can By property.Existing vehicle attribute detection technique is mainly based upon traditional image processing algorithm, accuracy rate is low, missing inspection is high and Real-time is poor；In recent years, with the high speed development of deep learning, the technology for carrying out vehicle attribute identification based on neural network is more next It is more, but multiattribute Study of recognition is still seldom；And existing " the more Attribute Recognitions of vehicle based on multi-task learning " Accuracy rate, recall ratio and the real-time of technology are still difficult satisfactory, can not accurately be detected to vehicle attribute.

Application number CN201610067290.0, a kind of more attribute conjoint analysis methods of vehicle based on deep learning and application Number CN201711107713.8, a kind of more attribute recognition approaches of fine granularity vehicle based on convolutional neural networks are all by multitask The inside monitoring mechanism and weight sharing policy of learning method introduce depth convolutional neural networks, to realize the more attribute joints of vehicle Analysis, however first: the basic network of the two is all simple directly-connected network, and does not all account for network to such as vehicle body vehicle The adaptability of the different attributes different scale such as type, license plate；Second, two networks all employ the convolution kernel of larger size, cause Network parameter is excessive, is easy over-fitting；Third does not carry out data enhancing to data picture according to actual scene in training process It handles (data augmentation), leads to the poor robustness of network, generalization ability is weak.

In conclusion the existing more detection of attribute technologies of vehicle have following defects that

(1) the different scale problem of vehicle different attribute is not accounted for.

(2) convolution kernel used in network is oversized, causes the parameter of network training excessive, while calculation amount increases Easily there is over-fitting.

(3) vehicle photo is not accounted under actual scene vulnerable to resolution ratio, rotation angle, saturation degree, exposure, tone etc. The influence of factor.

(4) defect in terms of three above leads to that the more detection of attribute complexity of vehicle are high, accuracy rate is low, omission factor is high, real When property is poor.

Summary of the invention

The purpose of the present invention is to provide a kind of more attribute detection methods of vehicle based on the study of single network multiple-task, the party Method is designed based on the deep learning framework platform of Darknet, builds network model, non-using an a kind of stage end to end Cascade structure, network promote vehicle by using data enhancing technology, convolution kernel isolation technics and multi-scale feature fusion technology Multiattribute detection effect has preferable real-time while realizing compared with high detection accurate rate and recall ratio.

The present invention is achieved through the following technical solutions:

A kind of more attribute detection methods of vehicle based on the study of single network multiple-task, this method comprises:

Step 1: picture is collected and screening；

Step 2: data set production makes vehicle multiattribute data collection according to VOC standard data set format；

Step 3: network design is based on Darknet deep learning frame, is arrived according to the multiattribute feature of vehicle using end End, a stage non-cascaded mode planned network structure, build network model；

Step 4: model training sets and adjusts model parameter, according to the network model of the design training more attribute numbers of vehicle According to collection, and in training, data enhancing and multiple dimensioned training are carried out；

Step 5: model measurement carries out the more attribute tests of vehicle using trained network model；

Step 6: model evaluation.

Further in order to preferably realize that the present invention, the step 1 utilize monitoring camera, vehicle photo is obtained.As Preferred embodiment is imaged using cell monitoring, to obtain the vehicle photo under actual scene.Vehicle photo after obtaining and screening The vehicle photo of 15 kinds of common types of brand including a variety of models such as car, SUV, MPV.

Further in order to preferably realize the present invention, artificial primary dcreening operation is carried out to the vehicle photo of acquisition, screens out vehicle back Scene area is larger, the serious ambiguous vehicle photo of vehicle attribute.

Further in order to preferably realize the present invention, the step 2 the specific implementation process is as follows:

Using LabelImg tool, vehicle multiattribute data collection is made according to deep learning standard VOC data set format, and Vehicle multiattribute data collection is divided into training set and test set in the ratio of 10:1.

Further in order to preferably realize the present invention, the vehicle data collection it is specific the production method is as follows:

Tri- files of Annotation, ImageSets and JPEGImages newly-built first, ImageSets file In include Main file, setting logo picture directory and .xml label file directory sets vehicle attribute tag name, by step Vehicle photo after obtaining and screen in one is stored in JPEGImages file.LabelImg tool is opened to vehicle photo More attribute labelings are carried out, and the samples pictures title in the .xml file of generation is stored in respectively with the ratio of 10:1 In trainval.txt and test.txt file, trainval.txt and test.txt file are then stored in Main file. .xml in file deposit Annotation file.

Further in order to preferably realize the present invention, the step 3 the specific implementation process is as follows:

Using Darknet deep learning frame as platform, according to the multiattribute feature of vehicle, designed using mode end to end Core network, preferably, the core network of design include that 16 different convolutional layers (add Batch after every layer of convolutional layer Normalization layers with corresponding active coating) the maximum value pond layer different with 3, core network respectively by comprising 1,3, 5, four Block (block) of 7 different convolutional layers form, and respectively have a maximum value pond layer to carry out up and down between adjacent Block Connection；

For the complexity of vehicle photo under simulation actual scene, the generalization ability of model is improved, (the training before core network After sample input), being equipped with sample data enhances module, faces sample from the tripartites such as color and illumination, rotation angle, noise jamming Data carry out enhancing processing；

To reduce parameter, reducing calculation amount, convolution kernel isolation technics is utilized in each convolutional layer, big convolution kernel is split into The cascade of two or more small convolution kernels, preferably, convolutional layer of the invention all uses 1*1's and 3*3 Convolution kernel carries out alternately connection, and replacing size with this is more than the biggish convolution kernel of 3*3；

Inputting the fixed resize of dimension of picture is 416*416*3, for different attribute (i.e. logo, license plate, the vehicle of vehicle Type) the characteristics of, using the method for multi-scale feature fusion, by characteristic layer 13*13*1024 (serial number the 19th in such as Fig. 4), 13*13* Three 256 (serial numbers the 21st in such as Fig. 4), 13*13*256 (serial number the 23rd in such as Fig. 4) branch fusion composition 13*13*1536 are (such as The characteristic layer of serial number the 25) in Fig. 4, fused characteristic layer 13*13*1536 are converted by the convolutional layer of the last one, output (N and sample class number etc. are related, this hair by 13*13*N for corresponding detection dimensions (result data containing softmax classification, positioning) Bright N is 135, as serial number the is 25) in Fig. 4；

For the complexity of model is effectively reduced and improves accuracy, the present invention uses a stage non-cascaded structure design simultaneously Mode predicts classification and coordinate simultaneously using prediction block (anchorbox), final characteristic pattern is divided into the grid of S*S (grid cell), preferably, present invention 13*13, each grid predict B bounding box (bounding box) and C Class discrimination properties, (S*S* [B* (5+C)] here exports 13*13*N phase with network to final output S*S* [B* (5+C)] dimensional vector Mutually corresponding, 5 indicate 4 coordinates and 1 confidence level of each frame, and confidence level is IOU of the grid under comprising target conditions, if very Real frame (ground truth) is A, and prediction block (anchorbox) is B, then IOU=A ∩ B/A ∪ B), each bounding box by pair It answers the class probability of grid to be multiplied to obtain the confidence score of the category with the box confidence level, it is low first to filter confidence score Boxes, then NMS (non-maxima suppression) processing is carried out to the boxes of reservation, obtain final testing result.

Further in order to preferably realize the present invention, the step 4 the specific implementation process is as follows:

(1) parameter setting is carried out first:

The value of batch, subdivisions, momentum, decay and initial learning rate is set separately, batch is indicated Batch, subdivisions indicate sub- batch, and momentum indicates that weight updates coefficient, and decay indicates weight attenuation parameter, real The sample size being sent into every time in the training of border is batch/subdivisions, i.e., each batch Sample Refreshment primary parameter will Batch is divided into subdivisions sub- batch, can effectively mitigate GPU and calculate pressure, prevent memory from overflowing；As excellent Scheme is selected, batch=32, subdivisions=8 are set, i.e., the sample size being sent into every time in hands-on is batch/ Subdivisions=4, setting weight update Coefficient m omentum=0.9, weight attenuation parameter decay=0.0005, adjust Influence of the model complexity to loss function, prevents model over-fitting, 0.001 is set by initial learning rate, when network iteration It, will when iterating to the 100th and 130 epoch (being an epoch by the primary sample size of all training sample iteration) respectively Learning rate corresponding change is 0.1 times and 0.01 times originally, to accelerate network convergence to global optimum, trains 140 altogether Deconditioning after epoch.

(2) after setup parameter, start to carry out network training, the training sample of input enters the data being added in network front end Enhance processing module, the training sample for inputting network carries out color and makes an uproar with light change, angle rotation transformation and addition The operation such as acoustic jamming, specifically:

(a) color and illumination adjust the saturation degree, exposure and tone of samples pictures, and are generated newly according to setting value Training sample, so that model can be significantly improved to the vehicle of different saturation, exposure and tone while increasing training set The detection effect of photo, enhances the robustness of model；

(b) angle rotates, and sets the rotation angle of the horizontal or vertical direction of samples pictures, and generate newly according to setting value Training sample vehicle can shine under more preferable simulation actual scene so that model is adapted to the detection of multi-angle sample object The time of day of piece；

(c) noise jamming is added randomized jitter noise to samples pictures, and generates new training sample according to setting value, Allow model preferably to cope with the interference of external environment, prevents from enhancing the generalization ability of model while over-fitting again.

(3) during repetitive exercise, multiple dimensioned training is carried out to model:

Because present networks have only used convolutional layer and pond layer (changing based on size), therefore it can dynamically adjust samples pictures Size, and then make network model that there are stronger generalization ability and robustness, concrete operations are as follows: it is every to pass through 10 batches of training (i.e. 10batches) will randomly choose new dimension of picture；The sampling parameter of Web vector graphic is 32, and then dimension of picture uses 32 multiple, it is the smallest having a size of 320*320, it is maximum having a size of 608*608.Adjustment network to respective dimensions then proceed into Row training；This mechanism allows network that various sizes of picture is better anticipated, and the same network can carry out different points The Detection task of resolution.

(4) training of loss function judgment models is utilized, loss function includes error in classification and the big mould of position error two Different weight coefficients is arranged according to the harmony of sample set and influence size in block, and loss function uses:

Wherein W, H respectively represent the width of characteristic pattern and height, A represent priori frame number, and preferably A=5, λ represent weight coefficient, First item loss is the confidence level error for calculating background, needs first to calculate each prediction block (anchorbox) and owns The IOU value of true frame (groundtruth), and it is maximized Max_IOU, if the value is less than certain threshold value, as excellent Select scheme, given threshold 0.5, it may be assumed that if the value of Max_IOU less than 0.5, then this prediction block is just labeled as Background needs to calculate the confidence level error of noobj；Section 2 is the error of coordinate for calculating priori frame and prediction block, but It is only to be calculated between preceding 12800 iterations, it is therefore an objective to make prediction block Fast Learning to the shape of priori frame early period in training Shape；Third sport is calculated to be missed with some true matched prediction block each section loss value of frame (ground truth), including coordinate Difference, confidence level error and error in classification.If true frame (ground truth) is A, prediction block (anchor box) is B, then IOU=A ∩ B/A ∪ B.Each attribute is calculated according to above-mentioned loss respectively, and final summation is total loss loss, with logical Cross the performance of loss function judgment models.

(5) training stops: using SGD gradient updating strategy, based on the backpropagation principle for allowing loss function to minimize, allows Model is trained on the server, the percentile after loss value of loss function drops to decimal point, and is no longer changed substantially, It is optimal to indicate that model has reached for deconditioning at this time.

Further in order to preferably realize the present invention, the step 5 the specific implementation process is as follows:

Multiple dimensioned test is carried out to the vehicle photo in test set, preferably, in 416*416~1024*1024 It is step-length with 32 in range, the random resize for successively carrying out size to all vehicle photos in test set is initialized, and with The vehicle photo after resize is tested each time for one group, as soon as every pass through group test, randomly chooses new dimension of picture, It is repeatedly tested with this, to reach best detection effect, prevents missing inspection and erroneous detection；It is best finally to choose test effect One packet size value, that is, recall ratio (Recall) and maximum one group of average accuracy mean value (meanAverage Precision), And record test size, index and result.

Further in order to preferably realize the present invention, the step 6 the specific implementation process is as follows:

According to test result, recall ratio (Recall), average accuracy (Average Precision), average essence are examined True rate mean value (meanAverage Precision), the prediction effect of assessment models.

Compared with prior art, the present invention having the beneficial effect that

(1) design pattern of the present invention be it is a kind of end to end, a stage non-cascaded structure, using design philosophy end to end All pretreatment links before network training can be given up, reduce the complexity of model；One stage non-cascaded design philosophy It is embodied in and directly utilizes anchorbox while predicting classification and coordinate, the process of candidate frame is generated without sliding window, is had Reduce to effect the calculation amount of model；The combination of the two directly improves the real-time of detection, improves recall ratio and essence indirectly True rate.

(2) method that the present invention uses multi-scale feature fusion, devises four different convolution blocks (convolution Block), According to the difference and its adaptability detected to different size objectives of different characteristic layer receptive field size, by Analysis On Multi-scale Features into Row fusion, makes network have more robustness to the detection performance of different size objectives.

(3) present invention uses convolution kernel isolation technics, and big convolution kernel is split into two or more small convolution kernels Cascade, do not change output dimension while, this technology on the one hand can suitably deepen network depth so that the study of model Ability and learning effect are more preferable, on the other hand can avoid over-fitting while reducing parameter calculation amount.

(4) present invention can be with automated randomized adjusting training sample during network training using data enhancing technology Training set can both be increased by horizontally or vertically rotating angle, saturation degree, exposure, tone and noise jamming, the new samples of generation The abundant simulation of real scenes of energy again, and then enhance the robustness and stability of model.

(5) batch is divided into subdivisions sub- batch by the present invention, can effectively be mitigated GPU and be calculated pressure, Prevent memory from overflowing.

(6) present invention updates (momentum) and weight decaying (decay) by setting weight, adjusts model complexity pair The influence of loss function can drive model to accelerate convergence, reach global optimum while preventing model over-fitting.

(7) present invention is by setting stepping learning rate strategy, when the epochs of iteration difference number, adjusts corresponding study Rate accelerates the global convergence of network.

(8) present invention has used multiple dimensioned training and multiple dimensioned test-taking techniques, because network of the invention has only used convolution Layer and pond layer (being changed based on size), therefore any adjustment can be carried out to detection picture, it is every pass through n*batches training will Randomly choose new dimension of picture, adjustment network to respective dimensions then proceeds by training, and this mechanism makes network can be with Various sizes of logo picture is further better anticipated, reduces omission factor and false detection rate；With similar think of when model measurement Think, when can find test effect preferably, test the best input size of photo resize, to reach preferably detection effect Fruit prevents missing inspection and erroneous detection.

(9) language that model of the present invention uses is that C language and CUDA are held in same hardware platform and Detection task under Scanning frequency degree faster, it is more stable.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

The concept map that Fig. 2 designs for inventive network.

Fig. 3 is convolution kernel seperated schematic diagram of the present invention.

The structure chart that Fig. 4 designs for inventive network.

Fig. 5 is model test results figure of the present invention.

Fig. 6 is the effect picture of the single vehicle detection of the present invention.

Fig. 7 is the effect picture of the more vehicle detections of the present invention.

Specific embodiment

The present invention is described in further detail below with reference to embodiment, embodiments of the present invention are not limited thereto.

Embodiment:

As shown in figs. 1-7, in order to overcome the drawbacks of the prior art, the deep learning frame based on Darknet is flat by the present invention Platform is designed and builds network model, and using one kind, a stage non-cascaded structure, network increase by using data end to end By force, the technologies such as convolution kernel separation, multi-scale feature fusion promote the multiattribute detection effect of vehicle, are realizing compared with high detection essence While true rate and recall ratio, there is preferable real-time.

Step 1: picture is collected and screening；

Step 3: network design is based on Darknet deep learning frame, is arrived according to the multiattribute feature of vehicle using end Mode planned network structure hold, that a stage is non-cascaded, builds network model；

Step 6: model evaluation, according to test result assessment models effect.

Further in order to preferably realize that the present invention, the step 1 utilize monitoring camera, vehicle photo is obtained.As Preferred embodiment is imaged using cell monitoring, to obtain the vehicle photo under actual scene.Vehicle photo after obtaining and screening The vehicle photo of 15 kinds of common types of brand including a variety of models such as car, SUV, MPV is 3300 total, every kind of vehicle photo About 220.

Using LabelImg tool, vehicle multiattribute data collection is made according to deep learning standard VOC data set format, and Vehicle multiattribute data collection is divided into training set and test set in the ratio of 10:1, i.e., is belonged in training set comprising 3000 vehicles more Property data, include 300 vehicle multiattribute datas in test set.

Tri- files of Annotation, ImageSets and JPEGImages newly-built first, ImageSets file In include Main file, setting logo picture directory and .xml label file directory (catalogue is English name) sets vehicle Attribute tags name (tag name shares 22 and is English name, wherein logo label be 15 kinds, 2 kinds of license plate label, vehicle label 5 kinds, total number of labels is 22 kinds) and be stored in file LabelImg-master data in predefined_classes.txt, will Vehicle photo after obtaining and screen in step 1 is stored in JPEGImages file.LabelImg tool is opened to vehicle Photo carries out more attribute labelings, and will be in samples pictures title a part deposit trainval.txt in the .xml file of generation For training, for testing in another part deposit test.txt file, trainval.txt and test.txt file are stored in Main file.Wherein, the figure being stored in the picture name quantity in trainval.txt file and deposit test.txt file The ratio of piece title quantity is 10:1, i.e., 3000 samples pictures titles, test.txt text are shared in trainval.txt file 300 samples pictures titles are shared in part..xml in file deposit Annotation file.

Using Darknet deep learning frame as platform, designed according to the multiattribute feature of vehicle using structure end to end Core network (is used for feature extraction), and preferably, the core network of design includes 16 different convolutional layer (every layer of convolution BatchNormalization layer of addition and corresponding active coating after layer) the maximum value pond layer different with 3, core network divides It is not made of four Block (block) comprising 1,3,5,7 different convolutional layer, respectively there is a maximum value pond between adjacent Block Change layer to be vertically connected with；

To reduce parameter, reducing calculation amount, convolution kernel isolation technics is utilized in each convolutional layer, big convolution kernel is split into Kernel_size=N*N equivalence transformation (that is: is Kernel_size=by the cascade of two or more small convolution kernels n₁*n₁, Kernel_size=n₂*n₂, wherein N*N > n₁*n₁+n₂*n₂), preferably, convolutional layer of the invention all makes Alternately connection is carried out with the convolution kernel of 1*1 and 3*3, and replacing size with this is more than the biggish convolution kernel of 3*3；

Inputting the fixed resize of dimension of picture is 416*416*3, for different attribute (i.e. logo, license plate, the vehicle of vehicle Type) the characteristics of, using the method for multi-scale feature fusion, by characteristic layer 13*13*1024 (serial number the 19th in such as Fig. 4), 13*13* Three 256 (serial numbers the 21st in such as Fig. 4), 13*13*256 (serial number the 23rd in such as Fig. 4) branch fusion composition 13*13*1536 are (such as The characteristic layer of serial number the 25) in Fig. 4, fused characteristic layer 13*13*1536 are converted by the convolutional layer of the last one, output (N and sample class number etc. are related, this hair by 13*13*N for corresponding detection dimensions (result data containing softmax classification, positioning) Bright is 13*13*135, as serial number is 25) in Fig. 4.

For the complexity of model is effectively reduced and improves accuracy, the present invention uses a stage non-cascaded structure design simultaneously Mode predicts classification and coordinate simultaneously using prediction block (anchorbox), final characteristic pattern is divided into the grid of S*S (grid cell), preferably, present invention 13*13, each grid predict B (present invention 5) a bounding box (bounding box) and C (present invention 22) class discrimination properties, final output S*S* [B* (5+C)] dimensional vector (S* here S* [B* (5+C)] is corresponded to each other with network output 13*13*135, and 5 indicate 4 coordinates and 1 confidence level of each frame, confidence level For IOU of the grid under comprising target conditions, if true frame (ground truth) is A, prediction block (anchorbox) is B, then IOU=A ∩ B/A ∪ B), each bounding box is multiplied to obtain the category by the class probability of corresponding grid with the box confidence level Confidence score first filters the low boxes of confidence score, then carries out NMS (non-maxima suppression) processing to the boxes of reservation, obtains To final testing result.

(1) parameter setting is carried out first:

(2) after setup parameter, start to carry out network training, the training sample of input enters the enhancing of the data in network front end Processing module, the module carry out color and light change, angle rotation transformation and addition for inputting the training sample of network The operation such as noise jamming, while increasing training sample radix, can greatly improve the generalization ability of model, enhance model Stability, so as to the anti-interference energy of environment of the time of day of vehicle photo, enhancing model under the various actual scenes of more preferable simulation Power, data enhancing method particularly includes:

Because present networks have only used convolutional layer and pond layer (changing based on size), therefore it can dynamically adjust samples pictures Size, and then make network model have stronger generalization ability and robustness, concrete operations are as follows: it is every pass through 10 batches of training (i.e. 10batches) will randomly choose new dimension of picture；The sampling parameter of Web vector graphic is 32, and then dimension of picture uses 32 multiple, it is the smallest having a size of 320*320, it is maximum having a size of 608*608.Adjustment network to respective dimensions then proceed into Row training；This mechanism allows network that various sizes of picture is better anticipated, and the same network can carry out different points The Detection task of resolution.

Wherein W, H respectively represent the width of characteristic pattern and height, A represent priori frame number, and preferably A=5, λ represent weight coefficient, First item loss is the confidence level error for calculating background, needs first to calculate each prediction block (anchorboxe) and owns The IOU value of true frame (groundtruth), and it is maximized Max_IOU, if the value is less than the threshold value of setting, as excellent Select scheme, given threshold 0.5, it may be assumed that if the value of Max_IOU less than 0.5, then this prediction block is just labeled as Background needs to calculate the confidence level error of noobj；Section 2 is the error of coordinate for calculating priori frame and prediction block, but It is only to be calculated between preceding 12800 iterations, it is therefore an objective to make prediction block Fast Learning to the shape of priori frame early period in training Shape；Third sport is calculated to be missed with some true matched prediction block each section loss value of frame (ground truth), including coordinate Difference, confidence level error and error in classification.If true frame (ground truth) is A, prediction block (anchorboxe) is B, then IOU=A ∩ B/A ∪ B.Each attribute is calculated according to above-mentioned loss respectively, and final summation is total loss loss.Pass through Loss function value, the detection performance of judgment models.

(5) training stops: utilizing SGD gradient updating strategy, based on the backpropagation principle for allowing loss function to minimize, allows Model is trained on the server, and when 140 epoch of iteration (iteration 11250 times), the loss value of loss function drops to Percentile after decimal point, and no longer change substantially, deconditioning, i.e., model at this time have been optimal models at this time.

Multiple dimensioned test is carried out to the vehicle photo in test set, preferably, in 416*416~1024*1024 It is step-length with 32 in range, the random resize for successively carrying out size to all vehicle photos in test set is initialized, and with The vehicle photo after resize is tested each time for one group, as soon as every pass through group test, randomly chooses new dimension of picture, It is repeatedly tested with this, to reach best detection effect, prevents missing inspection and erroneous detection；It is best finally to choose test effect One packet size value, that is, recall ratio (Recall) and maximum one group of average accuracy mean value (mAP), and record test size, index And result.

According to test result, the best photo size of test result is 640*640, corresponding recall ratio (Recall) peace Equal accurate rate mean value (mAP) is maximum.Test set totally 300 vehicle photos (each generic attribute of vehicle photo is uniform, photo number from 0 starts), test results are shown in figure 5, recall ratio Recall=96.10%, average accuracy mean value mAP=90.4%.

The above is only presently preferred embodiments of the present invention, not does limitation in any form to the present invention, it is all according to According to technical spirit any simple modification to the above embodiments of the invention, equivalent variations, protection of the invention is each fallen within Within the scope of.

Claims

1. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task, it is characterised in that: this method comprises:

Step 1: picture is collected and screening；

Step 3: network design is based on Darknet deep learning frame, according to the multiattribute feature of vehicle using end to end, One stage non-cascaded mode planned network structure, builds network model；

Step 4: model training sets and adjusts model parameter, according to the network model of design training vehicle multiattribute data Collection, and in training, carry out data enhancing and multiple dimensioned training；

Step 6: model evaluation.

2. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature Be: the step 1 utilizes monitoring camera, obtains the vehicle photo under actual scene.

3. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1 or 2, special Sign is: carrying out artificial primary dcreening operation to the vehicle photo of acquisition, screens out the vehicle that vehicle context region is big, vehicle attribute seriously obscures Photo.

4. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature Be: the step 2 the specific implementation process is as follows:

Using LabelImg tool, vehicle multiattribute data collection is made according to deep learning standard VOC data set format, and by vehicle Multiattribute data collection is divided into training set and test set in the ratio of 10:1.

5. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 4, feature Be: the vehicle data collection it is specific the production method is as follows:

Annotation, ImageSets and JPEGImages file are created, includes Main text in ImageSets file Part folder, setting logo picture directory and .xml label file directory, set vehicle attribute tag name, step 1 are obtained and screened In vehicle photo deposit JPEGImages file afterwards, opens LabelImg tool and more attribute labelings is carried out to vehicle photo, And the samples pictures title in the .xml file of generation is stored in trainval.txt and test.txt file, it will Trainval.txt and test.txt file are stored in Main file.

6. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature Be: the step 3 the specific implementation process is as follows:

Using Darknet deep learning frame as platform, according to the multiattribute feature of vehicle, trunk is designed using structure end to end Network, and BatchNormalization layers and corresponding active coating are added after every layer of convolutional layer of core network, it is then sharp Big convolution kernel is split into the cascade of two or more small convolution kernels with convolution kernel isolation technics, and non-using a stage Cascade structure design pattern predicts classification and coordinate simultaneously using anchorbox, builds final network model, wherein Anchorbox indicates prediction block.

7. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature Be: the step 4 the specific implementation process is as follows:

(1) parameter setting:

The value of batch, subdivisions, momentum, decay and initial learning rate is set separately, batch indicates to criticize Secondary, subdivisions indicates sub- batch, and momentum indicates that weight updates coefficient, and decay indicates weight attenuation parameter, practical The sample size being sent into every time in training is batch/subdivisions；

(2) after setup parameter, data, which are added, in network front layer enhances processing module, and the training sample for inputting network carries out face Color and light change, angle rotation transformation and addition noise jamming, specifically:

(a) color and illumination, adjust the saturation degree, exposure and tone of samples pictures, and new training is generated according to setting value Sample；

(b) angle rotates, and sets the rotation angle of the horizontal or vertical direction of samples pictures, and new instruction is generated according to setting value Practice sample；

(c) noise jamming is added randomized jitter noise to samples pictures, and generates new training sample according to setting value；

It sets every criticize by n and trains i.e. n*batches, new dimension of picture is just randomly choosed, after adjusting network to respective dimensions Continue to train；

(4) training of loss function judgment models is utilized, loss function includes error in classification and the big module of position error two, damage Function is lost to use:

Wherein W, H respectively represent the width of characteristic pattern and height, A represent priori frame number, and λ represents weight coefficient；

(5) training stops: utilizing SGD gradient updating strategy, based on the backpropagation principle for allowing loss function to minimize, allows model It is trained, the percentile after loss value of loss function drops to decimal point, and no longer changes substantially on the server, at this time Deconditioning.

8. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature Be: the step 5 the specific implementation process is as follows:

Multiple dimensioned test is carried out to the vehicle photo in test set, it may be assumed that the size of all vehicle photos in test set is random Resize initialization, and repeatedly tested with all vehicle photos in test set after initialization for one group, choose test knot The best packet size value of fruit, and record test index and result.

9. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature Be: the step 6 the specific implementation process is as follows:

According to test result, recall ratio, average accuracy, average accuracy mean value, the prediction effect of assessment models are examined.