CN109886147A - A kind of more attribute detection methods of vehicle based on the study of single network multiple-task - Google Patents
A kind of more attribute detection methods of vehicle based on the study of single network multiple-task Download PDFInfo
- Publication number
- CN109886147A CN109886147A CN201910086525.4A CN201910086525A CN109886147A CN 109886147 A CN109886147 A CN 109886147A CN 201910086525 A CN201910086525 A CN 201910086525A CN 109886147 A CN109886147 A CN 109886147A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- network
- training
- model
- study
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a kind of more attribute detection methods of vehicle based on the study of single network multiple-task, this method comprises: picture is collected and screening;Data set production;Network design is based on Darknet deep learning frame, according to the multiattribute feature of vehicle using end to end, a stage non-cascaded mode planned network structure, build network model;Model training sets and adjusts model parameter, according to the network model of design training vehicle multiattribute data collection, and in training, carries out data enhancing and multiple dimensioned training;Six steps of model measurement and model evaluation.The present invention is based on the deep learning framework platforms of Darknet to be designed, builds network model, it is an a kind of stage non-cascaded structure end to end, network is by using technologies such as data enhancing, convolution kernel separation, multi-scale feature fusions, improve the multiattribute detection effect of vehicle, while realizing compared with high detection accurate rate and recall ratio, there is preferable real-time.
Description
Technical field
The present invention relates to the target detection technique fields in computer vision direction, in particular to one kind based on single more of network
The more attribute detection methods of vehicle of business study.
Background technique
With economic continuous development, automobile has become the most important vehicles of people, convenient providing to people
Meanwhile the problems such as caused road traffic congestion, vehicle supervision, is also on the rise.Intelligent transportation system, car monitoring system
Generally approve that, as a part of smart city, which are mainly applied to controls of traffic and road, police criminal detection tune by masses
It looks into, parking lot monitoring, cell intelligent management etc..With the arrival of information age, how efficiently to accomplish that vehicle is real-time
It is intelligent vehicle management urgent problem to be solved that detection (i.e. the positioning with identification of vehicle), people's vehicle, which accurately match,.
Traditional vehicle identification method is mainly based on car plate detection, but license plate wears, blocks, easily changing and being illuminated by the light
The influences such as environment become its stumbling-block effectively detected and have relied solely on the detection of license plate single attribute in addition in criminal investigation field
It is not enough to accurately identify the true identity of vehicle, in the case, the application of the more Attribute Recognition technologies of vehicle then seems abnormal heavy
It wants, it can make up the deficiency of single Attribute Recognition such as license plate, so that further increase intelligent transportation system and vehicle guard system can
By property.Existing vehicle attribute detection technique is mainly based upon traditional image processing algorithm, accuracy rate is low, missing inspection is high and
Real-time is poor;In recent years, with the high speed development of deep learning, the technology for carrying out vehicle attribute identification based on neural network is more next
It is more, but multiattribute Study of recognition is still seldom;And existing " the more Attribute Recognitions of vehicle based on multi-task learning "
Accuracy rate, recall ratio and the real-time of technology are still difficult satisfactory, can not accurately be detected to vehicle attribute.
Application number CN201610067290.0, a kind of more attribute conjoint analysis methods of vehicle based on deep learning and application
Number CN201711107713.8, a kind of more attribute recognition approaches of fine granularity vehicle based on convolutional neural networks are all by multitask
The inside monitoring mechanism and weight sharing policy of learning method introduce depth convolutional neural networks, to realize the more attribute joints of vehicle
Analysis, however first: the basic network of the two is all simple directly-connected network, and does not all account for network to such as vehicle body vehicle
The adaptability of the different attributes different scale such as type, license plate;Second, two networks all employ the convolution kernel of larger size, cause
Network parameter is excessive, is easy over-fitting;Third does not carry out data enhancing to data picture according to actual scene in training process
It handles (data augmentation), leads to the poor robustness of network, generalization ability is weak.
In conclusion the existing more detection of attribute technologies of vehicle have following defects that
(1) the different scale problem of vehicle different attribute is not accounted for.
(2) convolution kernel used in network is oversized, causes the parameter of network training excessive, while calculation amount increases
Easily there is over-fitting.
(3) vehicle photo is not accounted under actual scene vulnerable to resolution ratio, rotation angle, saturation degree, exposure, tone etc.
The influence of factor.
(4) defect in terms of three above leads to that the more detection of attribute complexity of vehicle are high, accuracy rate is low, omission factor is high, real
When property is poor.
Summary of the invention
The purpose of the present invention is to provide a kind of more attribute detection methods of vehicle based on the study of single network multiple-task, the party
Method is designed based on the deep learning framework platform of Darknet, builds network model, non-using an a kind of stage end to end
Cascade structure, network promote vehicle by using data enhancing technology, convolution kernel isolation technics and multi-scale feature fusion technology
Multiattribute detection effect has preferable real-time while realizing compared with high detection accurate rate and recall ratio.
The present invention is achieved through the following technical solutions:
A kind of more attribute detection methods of vehicle based on the study of single network multiple-task, this method comprises:
Step 1: picture is collected and screening;
Step 2: data set production makes vehicle multiattribute data collection according to VOC standard data set format;
Step 3: network design is based on Darknet deep learning frame, is arrived according to the multiattribute feature of vehicle using end
End, a stage non-cascaded mode planned network structure, build network model;
Step 4: model training sets and adjusts model parameter, according to the network model of the design training more attribute numbers of vehicle
According to collection, and in training, data enhancing and multiple dimensioned training are carried out;
Step 5: model measurement carries out the more attribute tests of vehicle using trained network model;
Step 6: model evaluation.
Further in order to preferably realize that the present invention, the step 1 utilize monitoring camera, vehicle photo is obtained.As
Preferred embodiment is imaged using cell monitoring, to obtain the vehicle photo under actual scene.Vehicle photo after obtaining and screening
The vehicle photo of 15 kinds of common types of brand including a variety of models such as car, SUV, MPV.
Further in order to preferably realize the present invention, artificial primary dcreening operation is carried out to the vehicle photo of acquisition, screens out vehicle back
Scene area is larger, the serious ambiguous vehicle photo of vehicle attribute.
Further in order to preferably realize the present invention, the step 2 the specific implementation process is as follows:
Using LabelImg tool, vehicle multiattribute data collection is made according to deep learning standard VOC data set format, and
Vehicle multiattribute data collection is divided into training set and test set in the ratio of 10:1.
Further in order to preferably realize the present invention, the vehicle data collection it is specific the production method is as follows:
Tri- files of Annotation, ImageSets and JPEGImages newly-built first, ImageSets file
In include Main file, setting logo picture directory and .xml label file directory sets vehicle attribute tag name, by step
Vehicle photo after obtaining and screen in one is stored in JPEGImages file.LabelImg tool is opened to vehicle photo
More attribute labelings are carried out, and the samples pictures title in the .xml file of generation is stored in respectively with the ratio of 10:1
In trainval.txt and test.txt file, trainval.txt and test.txt file are then stored in Main file.
.xml in file deposit Annotation file.
Further in order to preferably realize the present invention, the step 3 the specific implementation process is as follows:
Using Darknet deep learning frame as platform, according to the multiattribute feature of vehicle, designed using mode end to end
Core network, preferably, the core network of design include that 16 different convolutional layers (add Batch after every layer of convolutional layer
Normalization layers with corresponding active coating) the maximum value pond layer different with 3, core network respectively by comprising 1,3,
5, four Block (block) of 7 different convolutional layers form, and respectively have a maximum value pond layer to carry out up and down between adjacent Block
Connection;
For the complexity of vehicle photo under simulation actual scene, the generalization ability of model is improved, (the training before core network
After sample input), being equipped with sample data enhances module, faces sample from the tripartites such as color and illumination, rotation angle, noise jamming
Data carry out enhancing processing;
To reduce parameter, reducing calculation amount, convolution kernel isolation technics is utilized in each convolutional layer, big convolution kernel is split into
The cascade of two or more small convolution kernels, preferably, convolutional layer of the invention all uses 1*1's and 3*3
Convolution kernel carries out alternately connection, and replacing size with this is more than the biggish convolution kernel of 3*3;
Inputting the fixed resize of dimension of picture is 416*416*3, for different attribute (i.e. logo, license plate, the vehicle of vehicle
Type) the characteristics of, using the method for multi-scale feature fusion, by characteristic layer 13*13*1024 (serial number the 19th in such as Fig. 4), 13*13*
Three 256 (serial numbers the 21st in such as Fig. 4), 13*13*256 (serial number the 23rd in such as Fig. 4) branch fusion composition 13*13*1536 are (such as
The characteristic layer of serial number the 25) in Fig. 4, fused characteristic layer 13*13*1536 are converted by the convolutional layer of the last one, output
(N and sample class number etc. are related, this hair by 13*13*N for corresponding detection dimensions (result data containing softmax classification, positioning)
Bright N is 135, as serial number the is 25) in Fig. 4;
For the complexity of model is effectively reduced and improves accuracy, the present invention uses a stage non-cascaded structure design simultaneously
Mode predicts classification and coordinate simultaneously using prediction block (anchorbox), final characteristic pattern is divided into the grid of S*S
(grid cell), preferably, present invention 13*13, each grid predict B bounding box (bounding box) and C
Class discrimination properties, (S*S* [B* (5+C)] here exports 13*13*N phase with network to final output S*S* [B* (5+C)] dimensional vector
Mutually corresponding, 5 indicate 4 coordinates and 1 confidence level of each frame, and confidence level is IOU of the grid under comprising target conditions, if very
Real frame (ground truth) is A, and prediction block (anchorbox) is B, then IOU=A ∩ B/A ∪ B), each bounding box by pair
It answers the class probability of grid to be multiplied to obtain the confidence score of the category with the box confidence level, it is low first to filter confidence score
Boxes, then NMS (non-maxima suppression) processing is carried out to the boxes of reservation, obtain final testing result.
Further in order to preferably realize the present invention, the step 4 the specific implementation process is as follows:
(1) parameter setting is carried out first:
The value of batch, subdivisions, momentum, decay and initial learning rate is set separately, batch is indicated
Batch, subdivisions indicate sub- batch, and momentum indicates that weight updates coefficient, and decay indicates weight attenuation parameter, real
The sample size being sent into every time in the training of border is batch/subdivisions, i.e., each batch Sample Refreshment primary parameter will
Batch is divided into subdivisions sub- batch, can effectively mitigate GPU and calculate pressure, prevent memory from overflowing;As excellent
Scheme is selected, batch=32, subdivisions=8 are set, i.e., the sample size being sent into every time in hands-on is batch/
Subdivisions=4, setting weight update Coefficient m omentum=0.9, weight attenuation parameter decay=0.0005, adjust
Influence of the model complexity to loss function, prevents model over-fitting, 0.001 is set by initial learning rate, when network iteration
It, will when iterating to the 100th and 130 epoch (being an epoch by the primary sample size of all training sample iteration) respectively
Learning rate corresponding change is 0.1 times and 0.01 times originally, to accelerate network convergence to global optimum, trains 140 altogether
Deconditioning after epoch.
(2) after setup parameter, start to carry out network training, the training sample of input enters the data being added in network front end
Enhance processing module, the training sample for inputting network carries out color and makes an uproar with light change, angle rotation transformation and addition
The operation such as acoustic jamming, specifically:
(a) color and illumination adjust the saturation degree, exposure and tone of samples pictures, and are generated newly according to setting value
Training sample, so that model can be significantly improved to the vehicle of different saturation, exposure and tone while increasing training set
The detection effect of photo, enhances the robustness of model;
(b) angle rotates, and sets the rotation angle of the horizontal or vertical direction of samples pictures, and generate newly according to setting value
Training sample vehicle can shine under more preferable simulation actual scene so that model is adapted to the detection of multi-angle sample object
The time of day of piece;
(c) noise jamming is added randomized jitter noise to samples pictures, and generates new training sample according to setting value,
Allow model preferably to cope with the interference of external environment, prevents from enhancing the generalization ability of model while over-fitting again.
(3) during repetitive exercise, multiple dimensioned training is carried out to model:
Because present networks have only used convolutional layer and pond layer (changing based on size), therefore it can dynamically adjust samples pictures
Size, and then make network model that there are stronger generalization ability and robustness, concrete operations are as follows: it is every to pass through 10 batches of training
(i.e. 10batches) will randomly choose new dimension of picture;The sampling parameter of Web vector graphic is 32, and then dimension of picture uses
32 multiple, it is the smallest having a size of 320*320, it is maximum having a size of 608*608.Adjustment network to respective dimensions then proceed into
Row training;This mechanism allows network that various sizes of picture is better anticipated, and the same network can carry out different points
The Detection task of resolution.
(4) training of loss function judgment models is utilized, loss function includes error in classification and the big mould of position error two
Different weight coefficients is arranged according to the harmony of sample set and influence size in block, and loss function uses:
Wherein W, H respectively represent the width of characteristic pattern and height, A represent priori frame number, and preferably A=5, λ represent weight coefficient,
First item loss is the confidence level error for calculating background, needs first to calculate each prediction block (anchorbox) and owns
The IOU value of true frame (groundtruth), and it is maximized Max_IOU, if the value is less than certain threshold value, as excellent
Select scheme, given threshold 0.5, it may be assumed that if the value of Max_IOU less than 0.5, then this prediction block is just labeled as
Background needs to calculate the confidence level error of noobj;Section 2 is the error of coordinate for calculating priori frame and prediction block, but
It is only to be calculated between preceding 12800 iterations, it is therefore an objective to make prediction block Fast Learning to the shape of priori frame early period in training
Shape;Third sport is calculated to be missed with some true matched prediction block each section loss value of frame (ground truth), including coordinate
Difference, confidence level error and error in classification.If true frame (ground truth) is A, prediction block (anchor box) is B, then
IOU=A ∩ B/A ∪ B.Each attribute is calculated according to above-mentioned loss respectively, and final summation is total loss loss, with logical
Cross the performance of loss function judgment models.
(5) training stops: using SGD gradient updating strategy, based on the backpropagation principle for allowing loss function to minimize, allows
Model is trained on the server, the percentile after loss value of loss function drops to decimal point, and is no longer changed substantially,
It is optimal to indicate that model has reached for deconditioning at this time.
Further in order to preferably realize the present invention, the step 5 the specific implementation process is as follows:
Multiple dimensioned test is carried out to the vehicle photo in test set, preferably, in 416*416~1024*1024
It is step-length with 32 in range, the random resize for successively carrying out size to all vehicle photos in test set is initialized, and with
The vehicle photo after resize is tested each time for one group, as soon as every pass through group test, randomly chooses new dimension of picture,
It is repeatedly tested with this, to reach best detection effect, prevents missing inspection and erroneous detection;It is best finally to choose test effect
One packet size value, that is, recall ratio (Recall) and maximum one group of average accuracy mean value (meanAverage Precision),
And record test size, index and result.
Further in order to preferably realize the present invention, the step 6 the specific implementation process is as follows:
According to test result, recall ratio (Recall), average accuracy (Average Precision), average essence are examined
True rate mean value (meanAverage Precision), the prediction effect of assessment models.
Compared with prior art, the present invention having the beneficial effect that
(1) design pattern of the present invention be it is a kind of end to end, a stage non-cascaded structure, using design philosophy end to end
All pretreatment links before network training can be given up, reduce the complexity of model;One stage non-cascaded design philosophy
It is embodied in and directly utilizes anchorbox while predicting classification and coordinate, the process of candidate frame is generated without sliding window, is had
Reduce to effect the calculation amount of model;The combination of the two directly improves the real-time of detection, improves recall ratio and essence indirectly
True rate.
(2) method that the present invention uses multi-scale feature fusion, devises four different convolution blocks (convolution Block),
According to the difference and its adaptability detected to different size objectives of different characteristic layer receptive field size, by Analysis On Multi-scale Features into
Row fusion, makes network have more robustness to the detection performance of different size objectives.
(3) present invention uses convolution kernel isolation technics, and big convolution kernel is split into two or more small convolution kernels
Cascade, do not change output dimension while, this technology on the one hand can suitably deepen network depth so that the study of model
Ability and learning effect are more preferable, on the other hand can avoid over-fitting while reducing parameter calculation amount.
(4) present invention can be with automated randomized adjusting training sample during network training using data enhancing technology
Training set can both be increased by horizontally or vertically rotating angle, saturation degree, exposure, tone and noise jamming, the new samples of generation
The abundant simulation of real scenes of energy again, and then enhance the robustness and stability of model.
(5) batch is divided into subdivisions sub- batch by the present invention, can effectively be mitigated GPU and be calculated pressure,
Prevent memory from overflowing.
(6) present invention updates (momentum) and weight decaying (decay) by setting weight, adjusts model complexity pair
The influence of loss function can drive model to accelerate convergence, reach global optimum while preventing model over-fitting.
(7) present invention is by setting stepping learning rate strategy, when the epochs of iteration difference number, adjusts corresponding study
Rate accelerates the global convergence of network.
(8) present invention has used multiple dimensioned training and multiple dimensioned test-taking techniques, because network of the invention has only used convolution
Layer and pond layer (being changed based on size), therefore any adjustment can be carried out to detection picture, it is every pass through n*batches training will
Randomly choose new dimension of picture, adjustment network to respective dimensions then proceeds by training, and this mechanism makes network can be with
Various sizes of logo picture is further better anticipated, reduces omission factor and false detection rate;With similar think of when model measurement
Think, when can find test effect preferably, test the best input size of photo resize, to reach preferably detection effect
Fruit prevents missing inspection and erroneous detection.
(9) language that model of the present invention uses is that C language and CUDA are held in same hardware platform and Detection task under
Scanning frequency degree faster, it is more stable.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
The concept map that Fig. 2 designs for inventive network.
Fig. 3 is convolution kernel seperated schematic diagram of the present invention.
The structure chart that Fig. 4 designs for inventive network.
Fig. 5 is model test results figure of the present invention.
Fig. 6 is the effect picture of the single vehicle detection of the present invention.
Fig. 7 is the effect picture of the more vehicle detections of the present invention.
Specific embodiment
The present invention is described in further detail below with reference to embodiment, embodiments of the present invention are not limited thereto.
Embodiment:
As shown in figs. 1-7, in order to overcome the drawbacks of the prior art, the deep learning frame based on Darknet is flat by the present invention
Platform is designed and builds network model, and using one kind, a stage non-cascaded structure, network increase by using data end to end
By force, the technologies such as convolution kernel separation, multi-scale feature fusion promote the multiattribute detection effect of vehicle, are realizing compared with high detection essence
While true rate and recall ratio, there is preferable real-time.
A kind of more attribute detection methods of vehicle based on the study of single network multiple-task, this method comprises:
Step 1: picture is collected and screening;
Step 2: data set production makes vehicle multiattribute data collection according to VOC standard data set format;
Step 3: network design is based on Darknet deep learning frame, is arrived according to the multiattribute feature of vehicle using end
Mode planned network structure hold, that a stage is non-cascaded, builds network model;
Step 4: model training sets and adjusts model parameter, according to the network model of the design training more attribute numbers of vehicle
According to collection, and in training, data enhancing and multiple dimensioned training are carried out;
Step 5: model measurement carries out the more attribute tests of vehicle using trained network model;
Step 6: model evaluation, according to test result assessment models effect.
Further in order to preferably realize that the present invention, the step 1 utilize monitoring camera, vehicle photo is obtained.As
Preferred embodiment is imaged using cell monitoring, to obtain the vehicle photo under actual scene.Vehicle photo after obtaining and screening
The vehicle photo of 15 kinds of common types of brand including a variety of models such as car, SUV, MPV is 3300 total, every kind of vehicle photo
About 220.
Further in order to preferably realize the present invention, artificial primary dcreening operation is carried out to the vehicle photo of acquisition, screens out vehicle back
Scene area is larger, the serious ambiguous vehicle photo of vehicle attribute.
Further in order to preferably realize the present invention, the step 2 the specific implementation process is as follows:
Using LabelImg tool, vehicle multiattribute data collection is made according to deep learning standard VOC data set format, and
Vehicle multiattribute data collection is divided into training set and test set in the ratio of 10:1, i.e., is belonged in training set comprising 3000 vehicles more
Property data, include 300 vehicle multiattribute datas in test set.
Further in order to preferably realize the present invention, the vehicle data collection it is specific the production method is as follows:
Tri- files of Annotation, ImageSets and JPEGImages newly-built first, ImageSets file
In include Main file, setting logo picture directory and .xml label file directory (catalogue is English name) sets vehicle
Attribute tags name (tag name shares 22 and is English name, wherein logo label be 15 kinds, 2 kinds of license plate label, vehicle label
5 kinds, total number of labels is 22 kinds) and be stored in file LabelImg-master data in predefined_classes.txt, will
Vehicle photo after obtaining and screen in step 1 is stored in JPEGImages file.LabelImg tool is opened to vehicle
Photo carries out more attribute labelings, and will be in samples pictures title a part deposit trainval.txt in the .xml file of generation
For training, for testing in another part deposit test.txt file, trainval.txt and test.txt file are stored in
Main file.Wherein, the figure being stored in the picture name quantity in trainval.txt file and deposit test.txt file
The ratio of piece title quantity is 10:1, i.e., 3000 samples pictures titles, test.txt text are shared in trainval.txt file
300 samples pictures titles are shared in part..xml in file deposit Annotation file.
Further in order to preferably realize the present invention, the step 3 the specific implementation process is as follows:
Using Darknet deep learning frame as platform, designed according to the multiattribute feature of vehicle using structure end to end
Core network (is used for feature extraction), and preferably, the core network of design includes 16 different convolutional layer (every layer of convolution
BatchNormalization layer of addition and corresponding active coating after layer) the maximum value pond layer different with 3, core network divides
It is not made of four Block (block) comprising 1,3,5,7 different convolutional layer, respectively there is a maximum value pond between adjacent Block
Change layer to be vertically connected with;
For the complexity of vehicle photo under simulation actual scene, the generalization ability of model is improved, (the training before core network
After sample input), being equipped with sample data enhances module, faces sample from the tripartites such as color and illumination, rotation angle, noise jamming
Data carry out enhancing processing;
To reduce parameter, reducing calculation amount, convolution kernel isolation technics is utilized in each convolutional layer, big convolution kernel is split into
Kernel_size=N*N equivalence transformation (that is: is Kernel_size=by the cascade of two or more small convolution kernels
n1*n1, Kernel_size=n2*n2, wherein N*N > n1*n1+n2*n2), preferably, convolutional layer of the invention all makes
Alternately connection is carried out with the convolution kernel of 1*1 and 3*3, and replacing size with this is more than the biggish convolution kernel of 3*3;
Inputting the fixed resize of dimension of picture is 416*416*3, for different attribute (i.e. logo, license plate, the vehicle of vehicle
Type) the characteristics of, using the method for multi-scale feature fusion, by characteristic layer 13*13*1024 (serial number the 19th in such as Fig. 4), 13*13*
Three 256 (serial numbers the 21st in such as Fig. 4), 13*13*256 (serial number the 23rd in such as Fig. 4) branch fusion composition 13*13*1536 are (such as
The characteristic layer of serial number the 25) in Fig. 4, fused characteristic layer 13*13*1536 are converted by the convolutional layer of the last one, output
(N and sample class number etc. are related, this hair by 13*13*N for corresponding detection dimensions (result data containing softmax classification, positioning)
Bright is 13*13*135, as serial number is 25) in Fig. 4.
For the complexity of model is effectively reduced and improves accuracy, the present invention uses a stage non-cascaded structure design simultaneously
Mode predicts classification and coordinate simultaneously using prediction block (anchorbox), final characteristic pattern is divided into the grid of S*S
(grid cell), preferably, present invention 13*13, each grid predict B (present invention 5) a bounding box
(bounding box) and C (present invention 22) class discrimination properties, final output S*S* [B* (5+C)] dimensional vector (S* here
S* [B* (5+C)] is corresponded to each other with network output 13*13*135, and 5 indicate 4 coordinates and 1 confidence level of each frame, confidence level
For IOU of the grid under comprising target conditions, if true frame (ground truth) is A, prediction block (anchorbox) is B, then
IOU=A ∩ B/A ∪ B), each bounding box is multiplied to obtain the category by the class probability of corresponding grid with the box confidence level
Confidence score first filters the low boxes of confidence score, then carries out NMS (non-maxima suppression) processing to the boxes of reservation, obtains
To final testing result.
Further in order to preferably realize the present invention, the step 4 the specific implementation process is as follows:
(1) parameter setting is carried out first:
The value of batch, subdivisions, momentum, decay and initial learning rate is set separately, batch is indicated
Batch, subdivisions indicate sub- batch, and momentum indicates that weight updates coefficient, and decay indicates weight attenuation parameter, real
The sample size being sent into every time in the training of border is batch/subdivisions, i.e., each batch Sample Refreshment primary parameter will
Batch is divided into subdivisions sub- batch, can effectively mitigate GPU and calculate pressure, prevent memory from overflowing;As excellent
Scheme is selected, batch=32, subdivisions=8 are set, i.e., the sample size being sent into every time in hands-on is batch/
Subdivisions=4, setting weight update Coefficient m omentum=0.9, weight attenuation parameter decay=0.0005, adjust
Influence of the model complexity to loss function, prevents model over-fitting, 0.001 is set by initial learning rate, when network iteration
It, will when iterating to the 100th and 130 epoch (being an epoch by the primary sample size of all training sample iteration) respectively
Learning rate corresponding change is 0.1 times and 0.01 times originally, to accelerate network convergence to global optimum, trains 140 altogether
Deconditioning after epoch.
(2) after setup parameter, start to carry out network training, the training sample of input enters the enhancing of the data in network front end
Processing module, the module carry out color and light change, angle rotation transformation and addition for inputting the training sample of network
The operation such as noise jamming, while increasing training sample radix, can greatly improve the generalization ability of model, enhance model
Stability, so as to the anti-interference energy of environment of the time of day of vehicle photo, enhancing model under the various actual scenes of more preferable simulation
Power, data enhancing method particularly includes:
(a) color and illumination adjust the saturation degree, exposure and tone of samples pictures, and are generated newly according to setting value
Training sample, so that model can be significantly improved to the vehicle of different saturation, exposure and tone while increasing training set
The detection effect of photo, enhances the robustness of model;
(b) angle rotates, and sets the rotation angle of the horizontal or vertical direction of samples pictures, and generate newly according to setting value
Training sample vehicle can shine under more preferable simulation actual scene so that model is adapted to the detection of multi-angle sample object
The time of day of piece;
(c) noise jamming is added randomized jitter noise to samples pictures, and generates new training sample according to setting value,
Allow model preferably to cope with the interference of external environment, prevents from enhancing the generalization ability of model while over-fitting again.
(3) during repetitive exercise, multiple dimensioned training is carried out to model:
Because present networks have only used convolutional layer and pond layer (changing based on size), therefore it can dynamically adjust samples pictures
Size, and then make network model have stronger generalization ability and robustness, concrete operations are as follows: it is every pass through 10 batches of training
(i.e. 10batches) will randomly choose new dimension of picture;The sampling parameter of Web vector graphic is 32, and then dimension of picture uses
32 multiple, it is the smallest having a size of 320*320, it is maximum having a size of 608*608.Adjustment network to respective dimensions then proceed into
Row training;This mechanism allows network that various sizes of picture is better anticipated, and the same network can carry out different points
The Detection task of resolution.
(4) training of loss function judgment models is utilized, loss function includes error in classification and the big mould of position error two
Different weight coefficients is arranged according to the harmony of sample set and influence size in block, and loss function uses:
Wherein W, H respectively represent the width of characteristic pattern and height, A represent priori frame number, and preferably A=5, λ represent weight coefficient,
First item loss is the confidence level error for calculating background, needs first to calculate each prediction block (anchorboxe) and owns
The IOU value of true frame (groundtruth), and it is maximized Max_IOU, if the value is less than the threshold value of setting, as excellent
Select scheme, given threshold 0.5, it may be assumed that if the value of Max_IOU less than 0.5, then this prediction block is just labeled as
Background needs to calculate the confidence level error of noobj;Section 2 is the error of coordinate for calculating priori frame and prediction block, but
It is only to be calculated between preceding 12800 iterations, it is therefore an objective to make prediction block Fast Learning to the shape of priori frame early period in training
Shape;Third sport is calculated to be missed with some true matched prediction block each section loss value of frame (ground truth), including coordinate
Difference, confidence level error and error in classification.If true frame (ground truth) is A, prediction block (anchorboxe) is B, then
IOU=A ∩ B/A ∪ B.Each attribute is calculated according to above-mentioned loss respectively, and final summation is total loss loss.Pass through
Loss function value, the detection performance of judgment models.
(5) training stops: utilizing SGD gradient updating strategy, based on the backpropagation principle for allowing loss function to minimize, allows
Model is trained on the server, and when 140 epoch of iteration (iteration 11250 times), the loss value of loss function drops to
Percentile after decimal point, and no longer change substantially, deconditioning, i.e., model at this time have been optimal models at this time.
Further in order to preferably realize the present invention, the step 5 the specific implementation process is as follows:
Multiple dimensioned test is carried out to the vehicle photo in test set, preferably, in 416*416~1024*1024
It is step-length with 32 in range, the random resize for successively carrying out size to all vehicle photos in test set is initialized, and with
The vehicle photo after resize is tested each time for one group, as soon as every pass through group test, randomly chooses new dimension of picture,
It is repeatedly tested with this, to reach best detection effect, prevents missing inspection and erroneous detection;It is best finally to choose test effect
One packet size value, that is, recall ratio (Recall) and maximum one group of average accuracy mean value (mAP), and record test size, index
And result.
Further in order to preferably realize the present invention, the step 6 the specific implementation process is as follows:
According to test result, the best photo size of test result is 640*640, corresponding recall ratio (Recall) peace
Equal accurate rate mean value (mAP) is maximum.Test set totally 300 vehicle photos (each generic attribute of vehicle photo is uniform, photo number from
0 starts), test results are shown in figure 5, recall ratio Recall=96.10%, average accuracy mean value mAP=90.4%.
The above is only presently preferred embodiments of the present invention, not does limitation in any form to the present invention, it is all according to
According to technical spirit any simple modification to the above embodiments of the invention, equivalent variations, protection of the invention is each fallen within
Within the scope of.
Claims (9)
1. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task, it is characterised in that: this method comprises:
Step 1: picture is collected and screening;
Step 2: data set production makes vehicle multiattribute data collection according to VOC standard data set format;
Step 3: network design is based on Darknet deep learning frame, according to the multiattribute feature of vehicle using end to end,
One stage non-cascaded mode planned network structure, builds network model;
Step 4: model training sets and adjusts model parameter, according to the network model of design training vehicle multiattribute data
Collection, and in training, carry out data enhancing and multiple dimensioned training;
Step 5: model measurement carries out the more attribute tests of vehicle using trained network model;
Step 6: model evaluation.
2. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature
Be: the step 1 utilizes monitoring camera, obtains the vehicle photo under actual scene.
3. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1 or 2, special
Sign is: carrying out artificial primary dcreening operation to the vehicle photo of acquisition, screens out the vehicle that vehicle context region is big, vehicle attribute seriously obscures
Photo.
4. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature
Be: the step 2 the specific implementation process is as follows:
Using LabelImg tool, vehicle multiattribute data collection is made according to deep learning standard VOC data set format, and by vehicle
Multiattribute data collection is divided into training set and test set in the ratio of 10:1.
5. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 4, feature
Be: the vehicle data collection it is specific the production method is as follows:
Annotation, ImageSets and JPEGImages file are created, includes Main text in ImageSets file
Part folder, setting logo picture directory and .xml label file directory, set vehicle attribute tag name, step 1 are obtained and screened
In vehicle photo deposit JPEGImages file afterwards, opens LabelImg tool and more attribute labelings is carried out to vehicle photo,
And the samples pictures title in the .xml file of generation is stored in trainval.txt and test.txt file, it will
Trainval.txt and test.txt file are stored in Main file.
6. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature
Be: the step 3 the specific implementation process is as follows:
Using Darknet deep learning frame as platform, according to the multiattribute feature of vehicle, trunk is designed using structure end to end
Network, and BatchNormalization layers and corresponding active coating are added after every layer of convolutional layer of core network, it is then sharp
Big convolution kernel is split into the cascade of two or more small convolution kernels with convolution kernel isolation technics, and non-using a stage
Cascade structure design pattern predicts classification and coordinate simultaneously using anchorbox, builds final network model, wherein
Anchorbox indicates prediction block.
7. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature
Be: the step 4 the specific implementation process is as follows:
(1) parameter setting:
The value of batch, subdivisions, momentum, decay and initial learning rate is set separately, batch indicates to criticize
Secondary, subdivisions indicates sub- batch, and momentum indicates that weight updates coefficient, and decay indicates weight attenuation parameter, practical
The sample size being sent into every time in training is batch/subdivisions;
(2) after setup parameter, data, which are added, in network front layer enhances processing module, and the training sample for inputting network carries out face
Color and light change, angle rotation transformation and addition noise jamming, specifically:
(a) color and illumination, adjust the saturation degree, exposure and tone of samples pictures, and new training is generated according to setting value
Sample;
(b) angle rotates, and sets the rotation angle of the horizontal or vertical direction of samples pictures, and new instruction is generated according to setting value
Practice sample;
(c) noise jamming is added randomized jitter noise to samples pictures, and generates new training sample according to setting value;
(3) during repetitive exercise, multiple dimensioned training is carried out to model:
It sets every criticize by n and trains i.e. n*batches, new dimension of picture is just randomly choosed, after adjusting network to respective dimensions
Continue to train;
(4) training of loss function judgment models is utilized, loss function includes error in classification and the big module of position error two, damage
Function is lost to use:
Wherein W, H respectively represent the width of characteristic pattern and height, A represent priori frame number, and λ represents weight coefficient;
(5) training stops: utilizing SGD gradient updating strategy, based on the backpropagation principle for allowing loss function to minimize, allows model
It is trained, the percentile after loss value of loss function drops to decimal point, and no longer changes substantially on the server, at this time
Deconditioning.
8. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature
Be: the step 5 the specific implementation process is as follows:
Multiple dimensioned test is carried out to the vehicle photo in test set, it may be assumed that the size of all vehicle photos in test set is random
Resize initialization, and repeatedly tested with all vehicle photos in test set after initialization for one group, choose test knot
The best packet size value of fruit, and record test index and result.
9. a kind of more attribute detection methods of vehicle based on the study of single network multiple-task according to claim 1, feature
Be: the step 6 the specific implementation process is as follows:
According to test result, recall ratio, average accuracy, average accuracy mean value, the prediction effect of assessment models are examined.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910086525.4A CN109886147A (en) | 2019-01-29 | 2019-01-29 | A kind of more attribute detection methods of vehicle based on the study of single network multiple-task |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910086525.4A CN109886147A (en) | 2019-01-29 | 2019-01-29 | A kind of more attribute detection methods of vehicle based on the study of single network multiple-task |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109886147A true CN109886147A (en) | 2019-06-14 |
Family
ID=66927176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910086525.4A Pending CN109886147A (en) | 2019-01-29 | 2019-01-29 | A kind of more attribute detection methods of vehicle based on the study of single network multiple-task |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109886147A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110865628A (en) * | 2019-10-25 | 2020-03-06 | 清华大学深圳国际研究生院 | New energy automobile electric control system fault prediction method based on working condition data |
CN111209858A (en) * | 2020-01-06 | 2020-05-29 | 电子科技大学 | Real-time license plate detection method based on deep convolutional neural network |
CN111310862A (en) * | 2020-03-27 | 2020-06-19 | 西安电子科技大学 | Deep neural network license plate positioning method based on image enhancement in complex environment |
CN111415533A (en) * | 2020-04-22 | 2020-07-14 | 湖北民族大学 | Bend safety early warning monitoring method, device and system |
CN111582339A (en) * | 2020-04-28 | 2020-08-25 | 江西理工大学 | Vehicle detection and identification method based on deep learning |
CN111627020A (en) * | 2020-06-03 | 2020-09-04 | 山东贝特建筑项目管理咨询有限公司 | Detection method and system for anchor bolt in heat insulation board and computer storage medium |
CN112270252A (en) * | 2020-10-26 | 2021-01-26 | 西安工程大学 | Multi-vehicle target identification method for improving YOLOv2 model |
CN113111859A (en) * | 2021-05-12 | 2021-07-13 | 吉林大学 | License plate deblurring detection method based on deep learning |
CN113888754A (en) * | 2021-08-20 | 2022-01-04 | 北京工业大学 | Vehicle multi-attribute identification method based on radar vision fusion |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740906A (en) * | 2016-01-29 | 2016-07-06 | 中国科学院重庆绿色智能技术研究院 | Depth learning based vehicle multi-attribute federation analysis method |
US9760806B1 (en) * | 2016-05-11 | 2017-09-12 | TCL Research America Inc. | Method and system for vision-centric deep-learning-based road situation analysis |
CN107886073A (en) * | 2017-11-10 | 2018-04-06 | 重庆邮电大学 | A kind of more attribute recognition approaches of fine granularity vehicle based on convolutional neural networks |
US20180211117A1 (en) * | 2016-12-20 | 2018-07-26 | Jayant Ratti | On-demand artificial intelligence and roadway stewardship system |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN108537117A (en) * | 2018-03-06 | 2018-09-14 | 哈尔滨思派科技有限公司 | A kind of occupant detection method and system based on deep learning |
CN108875595A (en) * | 2018-05-29 | 2018-11-23 | 重庆大学 | A kind of Driving Scene object detection method merged based on deep learning and multilayer feature |
CN109063630A (en) * | 2018-07-27 | 2018-12-21 | 北京以萨技术股份有限公司 | A kind of fast vehicle detection method based on separable convolution technique and frame difference compensation policy |
-
2019
- 2019-01-29 CN CN201910086525.4A patent/CN109886147A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740906A (en) * | 2016-01-29 | 2016-07-06 | 中国科学院重庆绿色智能技术研究院 | Depth learning based vehicle multi-attribute federation analysis method |
US9760806B1 (en) * | 2016-05-11 | 2017-09-12 | TCL Research America Inc. | Method and system for vision-centric deep-learning-based road situation analysis |
US20180211117A1 (en) * | 2016-12-20 | 2018-07-26 | Jayant Ratti | On-demand artificial intelligence and roadway stewardship system |
CN107886073A (en) * | 2017-11-10 | 2018-04-06 | 重庆邮电大学 | A kind of more attribute recognition approaches of fine granularity vehicle based on convolutional neural networks |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN108537117A (en) * | 2018-03-06 | 2018-09-14 | 哈尔滨思派科技有限公司 | A kind of occupant detection method and system based on deep learning |
CN108875595A (en) * | 2018-05-29 | 2018-11-23 | 重庆大学 | A kind of Driving Scene object detection method merged based on deep learning and multilayer feature |
CN109063630A (en) * | 2018-07-27 | 2018-12-21 | 北京以萨技术股份有限公司 | A kind of fast vehicle detection method based on separable convolution technique and frame difference compensation policy |
Non-Patent Citations (2)
Title |
---|
SHUO YANG 等: "Fast vehicle logo detection in complex scenes", 《OPTICS AND LASER TECHNOLOGY》 * |
叶虎: "目标检测|YOLOv2原理与实现(附YOLOv3)", 《机器学习算法工程师公众号》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110865628A (en) * | 2019-10-25 | 2020-03-06 | 清华大学深圳国际研究生院 | New energy automobile electric control system fault prediction method based on working condition data |
CN111209858A (en) * | 2020-01-06 | 2020-05-29 | 电子科技大学 | Real-time license plate detection method based on deep convolutional neural network |
CN111310862A (en) * | 2020-03-27 | 2020-06-19 | 西安电子科技大学 | Deep neural network license plate positioning method based on image enhancement in complex environment |
CN111310862B (en) * | 2020-03-27 | 2024-02-09 | 西安电子科技大学 | Image enhancement-based deep neural network license plate positioning method in complex environment |
CN111415533A (en) * | 2020-04-22 | 2020-07-14 | 湖北民族大学 | Bend safety early warning monitoring method, device and system |
CN111415533B (en) * | 2020-04-22 | 2021-09-21 | 湖北民族大学 | Bend safety early warning monitoring method, device and system |
CN111582339B (en) * | 2020-04-28 | 2023-07-25 | 江西理工大学 | Vehicle detection and recognition method based on deep learning |
CN111582339A (en) * | 2020-04-28 | 2020-08-25 | 江西理工大学 | Vehicle detection and identification method based on deep learning |
CN111627020A (en) * | 2020-06-03 | 2020-09-04 | 山东贝特建筑项目管理咨询有限公司 | Detection method and system for anchor bolt in heat insulation board and computer storage medium |
CN112270252A (en) * | 2020-10-26 | 2021-01-26 | 西安工程大学 | Multi-vehicle target identification method for improving YOLOv2 model |
CN113111859B (en) * | 2021-05-12 | 2022-04-19 | 吉林大学 | License plate deblurring detection method based on deep learning |
CN113111859A (en) * | 2021-05-12 | 2021-07-13 | 吉林大学 | License plate deblurring detection method based on deep learning |
CN113888754A (en) * | 2021-08-20 | 2022-01-04 | 北京工业大学 | Vehicle multi-attribute identification method based on radar vision fusion |
CN113888754B (en) * | 2021-08-20 | 2024-04-26 | 北京工业大学 | Vehicle multi-attribute identification method based on radar vision fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109886147A (en) | A kind of more attribute detection methods of vehicle based on the study of single network multiple-task | |
CN109816024A (en) | A kind of real-time automobile logo detection method based on multi-scale feature fusion and DCNN | |
CN110929577A (en) | Improved target identification method based on YOLOv3 lightweight framework | |
Li et al. | Traffic light recognition for complex scene with fusion detections | |
CN108875600A (en) | A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO | |
CN111898523A (en) | Remote sensing image special vehicle target detection method based on transfer learning | |
CN110175613A (en) | Street view image semantic segmentation method based on Analysis On Multi-scale Features and codec models | |
CN110363122A (en) | A kind of cross-domain object detection method based on multilayer feature alignment | |
CN108509954A (en) | A kind of more car plate dynamic identifying methods of real-time traffic scene | |
CN110097044A (en) | Stage car plate detection recognition methods based on deep learning | |
CN106845430A (en) | Pedestrian detection and tracking based on acceleration region convolutional neural networks | |
CN105574550A (en) | Vehicle identification method and device | |
CN105809121A (en) | Multi-characteristic synergic traffic sign detection and identification method | |
CN111967313B (en) | Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm | |
Gong et al. | Object detection based on improved YOLOv3-tiny | |
CN111767927A (en) | Lightweight license plate recognition method and system based on full convolution network | |
Wan et al. | AFSar: An anchor-free SAR target detection algorithm based on multiscale enhancement representation learning | |
CN108647595A (en) | Vehicle recognition methods again based on more attribute depth characteristics | |
CN105160330A (en) | Vehicle logo recognition method and vehicle logo recognition system | |
CN112528934A (en) | Improved YOLOv3 traffic sign detection method based on multi-scale feature layer | |
CN112232371A (en) | American license plate recognition method based on YOLOv3 and text recognition | |
CN113239753A (en) | Improved traffic sign detection and identification method based on YOLOv4 | |
CN110659601A (en) | Depth full convolution network remote sensing image dense vehicle detection method based on central point | |
CN115861756A (en) | Earth background small target identification method based on cascade combination network | |
CN112084897A (en) | Rapid traffic large-scene vehicle target detection method of GS-SSD |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190614 |