CN110321874A

CN110321874A - A kind of light-weighted convolutional neural networks pedestrian recognition method

Info

Publication number: CN110321874A
Application number: CN201910629693.3A
Authority: CN
Inventors: 陈聪; 杨忠; 韩家明; 宋佳蓉
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2019-10-11

Abstract

The present invention discloses a kind of light-weighted convolutional neural networks pedestrian recognition method, and step is: obtaining original pedestrian's data, obtains the pedestrian image with mark information, constructs pedestrian's data set；The input size for designing pedestrian dummy network, clusters data set, selects suitable candidate frame；Image is pre-processed, and expanding data；Convolutional neural networks are constructed, and pretreated image feeding network is trained, obtain the network model with pedestrian's identification function；Image with mark information is sent into network in batches, is trained；It checks the penalty values and accuracy rate on training set, is trained again after regularized learning algorithm rate or increase the number of iterations if not ideal enough, if result ideal tests trained network on verifying collection, network is adjusted according to verification result again.Such recognition methods can improve accuracy of identification, run network model on the hardware platform of lower configuration while ensuring target detection model real-time.

Description

A kind of light-weighted convolutional neural networks pedestrian recognition method

Technical field

The invention belongs to mode identification technology, in particular to a kind of light-weighted convolutional neural networks pedestrian identification side Method.

Background technique

In recent years, with the rise with AI industry, the continuous development of the industries such as communications and transportation, pedestrian detection technology Extensive concern has been obtained in fields such as intelligent monitoring, intelligent drivings.In traditional pedestrian detection method, pass through engineer's Feature extractor, such as Haar, HOG, LBP etc. extract pedestrian's feature from training sample, then are trained with the pedestrian's feature extracted The classifiers such as SVM carry out pedestrian detection task.Such as using HOG+LBP characteristic processing pedestrian's occlusion issue, Lai Tigao pedestrian The accuracy rate of detection.With ICF, ACF, gradient magnitude feature, LUV color characteristic etc. proposes that pedestrian's feature obtains preferably Expression, also achieves preferable achievement in pedestrian's Detection task.But by pedestrian's feature of manual designs have generalization it is poor, The disadvantages of being difficult to adapt to pedestrian's metamorphosis, it is difficult in practical applications.And in recent years, with the development of deep learning, Especially extensive application of the convolutional neural networks in computer vision field, in conjunction with the pedestrian recognition method of convolutional neural networks Many concerns and application are obtained.

It is suitable that " pedestrian detection method under special scenes " of Patent No. 201611146030.9 is selected according to test set Training set；Supervision message required for detection framework is determined in training set subscript, is completed in training set with confrontation network Pedestrian's mark；VOC form is converted by training set be delivered to R FCN be trained；Finally by the training pattern of R FCN to specific Scene pedestrian detects, and the method real-time is insufficient, higher to hsrdware requirements, is difficult to realize in embedded device.

A kind of " real-time railway mouth multi-cam pedestrian inspection based on deep learning of Patent No. 201810457285.X Survey method " video that is acquired by multiple cameras, camera is overlapped emphasis monitoring area and monitors；Selection needs that detects to take the photograph As head region and obtain the camera video stream under the region；Video image processing is carried out in GPU processor cluster: first will Image cutting, then detects the image after cutting with trained pedestrian detection model；It monitors in real time in GPU processor GPU service condition and be scheduled according to scheduling strategy；Pedestrian area coordinate is calculated, current time stamp is obtained and is sent to use Family client.A kind of real-time railway mouth multi-cam pedestrian detection method based on deep learning, by video processing and depth Study combines, and uses the method for camera overlapping monitoring to improve the accuracy rate of railway mouth pedestrian detection.The method needs GPU processor group is built, deployment is got up more difficult.

" one kind knows method for distinguishing for pedestrian " of Patent No. 201810103970.2 utilizes video processing and depth Habit technology, training convolutional neural networks obtain pedestrian's disaggregated model on personal data of being expert at collection, and to be identified by the model realization The feature extraction of pedestrian, obtains candidate row list finally by characteristic measure and method for reordering in pedestrian and database, real The identification of existing people.The method is utilized residual error network, therefore there are network parameter amount is big, higher to storage capacity requirement equal to lack Point.

Paper " a kind of improved convolutional neural networks pedestrian recognition method " proposes a kind of real-time pedestrian detection method.With The input size of YOLOv3-tiny network foundation change network model；Using clustering method, target frame is carried out to data set Cluster chooses the candidate frame size and quantity for being suitble to pedestrian detection；Method by increasing certain amount convolutional layer redesigns special Sign is extracted and target detection network；Finally training obtains detection model on mixed data set.Though the method improves pedestrian's inspection The precision of survey, but network query function amount is still larger, is disposed in mobile terminal more difficult.

Although mostly existing in conclusion having more research to the pedestrian detection based on convolutional neural networks both at home and abroad Network parameter is excessive, the more complex problem of network structure, have higher requirements to hardware device and running memory.

Summary of the invention

The purpose of the present invention is to provide a kind of light-weighted convolutional neural networks pedestrian recognition method, can ensure While target detection model real-time, accuracy of identification is improved, transports network model on the hardware platform of lower configuration Row.

In order to achieve the above objectives, solution of the invention is:

A kind of light-weighted convolutional neural networks pedestrian recognition method, includes the following steps:

Step 1, original pedestrian's data are obtained, the pedestrian image with mark information is obtained, construct pedestrian's data set；

Step 2, the input size of pedestrian dummy network is designed, and data set is clustered according to the size of design, is selected Select suitable candidate frame；

Step 3, image is pre-processed, and passes through data enhancement methods expanding data；

Step 4, construct convolutional neural networks, and by by step 3 pretreatment after and expand the figure with mark information As being sent into network in batches, it is trained, obtains the network model with pedestrian's identification function；

Step 5, after the completion of network training, the penalty values and accuracy rate on training set is checked, are adjusted if not ideal enough It is trained again after learning rate or increase the number of iterations, completes to train if result ideal, go to step 6；

Step 6, trained network is tested on verifying collection, network is adjusted according to verification result again.

In above-mentioned steps 1, the method for building pedestrian's data set is: to INRIA data set, VOC data set, MS COCO number It is screened according to collection, obtains the pedestrian image of all ages and classes, different sexes, different shape, and the coordinate by pedestrian in the picture As mark information, pedestrian's data set is constructed.

In above-mentioned steps 2, the pedestrian image concentrated to data is labeled, and pedestrian is labeled as 0, the seat of recording mark frame Mark.

In above-mentioned steps 3, data enhancement methods, which are specifically included, to carry out random exposure to image, adjustment image saturation, adjusts Three kinds of whole picture tone operations.

In above-mentioned steps 4, the convolutional neural networks of building include convolutional layer, pond layer and detection layers, in which:

Convolutional layer structure: the first convolutional layer uses the filter of 16 3 × 3 sizes, and step-length 1 is filled with 1；Second is deep The filter that separable convolutional layer uses 16 3 × 3 sizes is spent, step-length 1 is filled with 1；Third convolutional layer uses 32 1 × 1 The filter of size, step-length 1, is filled with 0；The size of first pond layer filter is 3 × 3, and step-length 2 is used to down adopt Sample；4th depth separates the filter that convolutional layer uses 32 3 × 3 sizes, and step-length 1 is filled with 1；5th convolutional layer is adopted With the filter of 64 1 × 1 sizes, step-length 1 is filled with 0；The size of second pond layer filter is 2 × 2, and step-length is 2, it is used for down-sampling；6th depth separates convolutional layer and uses 64 as the filter of 3 × 3 sizes, and step-length 1 is filled with 1； 7th convolutional layer uses the filter of 128 1 × 1 sizes, and step-length 1 is filled with 0；The size of third down-sampling layer filter It is 2 × 2, step-length 2 is used for down-sampling；8th depth separates convolutional layer and uses 128 as the filter of 3 × 3 sizes, walks A length of 1, it is filled with 1；9th convolutional layer uses 256 as the filter of 1 × 1 size, and step-length 1 is filled with 0；4th pond The size of layer filter is 2 × 2, and step-length 2 is used for down-sampling；The separable convolutional layer of tenth depth uses 256 big for 3 × 3 Small filter, step-length 1, is filled with 1；11st convolutional layer uses 512 as the filter of 1 × 1 size, and step-length 1 is filled out Fill is 0；The size of 5th pond layer filter is 2 × 2, and step-length 2 is used for down-sampling；12nd depth separates convolutional layer 512 are used as the filter of 3 × 3 sizes, step-length 1 is filled with 1；13rd convolutional layer use 1024 for 1 × 1 size Filter, step-length 1 is filled with 0；The size of 6th pond layer filter is 2 × 2, step-length 1；14th depth can divide From the filter that convolutional layer uses 1024 3 × 3 sizes, it is filled with 1；

Detection layers: it is divided into two scales, scale 1: adds three convolutional layers after basic network and predicted, addition In convolutional layer, the 15th convolutional layer uses the filter of 256 3 × 3 sizes, and step-length 1 is filled with 1；16th convolutional layer is adopted With the filter of 512 3 × 3 sizes, step-length 1 is filled with 1；17th convolutional layer uses the filter of 18 3 × 3 sizes, Step-length is 1, is filled with 1；

Scale 2: 2 times of up-samplings are carried out from the 16th convolutional layer, the feature for being 38 × 18 with the output of the 11st convolutional layer Figure is added, then is predicted after adding three convolution, and the 18th convolutional layer uses the filter of 128 3 × 3 sizes, and step-length is 1, it is filled with 1；19th convolutional layer uses the filter of 256 3 × 3 sizes, and step-length 1 is filled with 1；20th convolutional layer Using the filter of 18 3 × 3 sizes, step-length 1 is filled with 1.

In above-mentioned detection layers, the characteristic pattern of scale 1 exports the classification and location information of target having a size of 19 × 9；Scale Two characteristic pattern exports the classification and location information of target having a size of 38 × 18.

In above-mentioned steps 5, in the training process, the penalty values and accuracy rate on training set are calculated, using under stochastic gradient It drops algorithm and carries out backpropagation, realize that parameter updates.

In above-mentioned steps 5, the penalty values and accuracy rate on training set are checked, it is expected that loss value is not higher than 20%, it is expected that standard True rate is not less than 90%, and recall rate is not less than 85%, and omission factor is not higher than 15%.

The particular content of above-mentioned steps 6 is: if effect is verified very well and shows bad on collection on training set, reducing The number of iterations changes decay value, again with training set training network adjusted, return step 5.

After adopting the above scheme, compared with prior art, the present invention having the advantage that

(1) based on Tiny-YOLO, change the Aspect Ratio of network model input, extracted using network more horizontal To feature, selects the rectangular image of 608 × 288 resolution ratio as network inputs, increase the extraction effect of transverse features, for row The target frame that people's detection data is concentrated carries out K-means clustering, facilitates the ability for the pedestrian's identification for promoting network.

2, construction INRIA+VOC+MS COCO mixes pedestrian detection data set, the training on mixing pedestrian detection data set Network model keeps the generalization of network model stronger, and detection effect is more preferable.

3,3 × 3 common convolutional layers are replaced with into the separable convolutional layer of 3 × 3 depth to reduce network model calculation amount, together Shi Jiashen network.Due to 3 × 3 depth separate convolutional layer addition, by calculation amount be reduced to common convolution model three/ One, reduce pedestrian detection model running to memory capacity and running memory requirement, guarantees while increasing pedestrian's recognition capability Model real-time.

The invention proposes a kind of light-weighted convolutional neural networks pedestrian recognition method technologies, using Tiny-YOLO as base Plinth changes the input size of network model, in conjunction with pedestrian's size characteristics in image, using clustering method, to data set into Row target frame cluster, chooses the candidate frame size and quantity for being suitble to pedestrian detection；Common convolutional layer is replaced with depth to separate Convolutional layer redesigns feature extraction and target detection network；The training on mixed data set, enhances model generalization.

Detailed description of the invention

Fig. 1 is flow chart of the invention；

Fig. 2 is inventive network model structure, includes network layer and every layer of output；

Wherein, Input: input picture；Conv: common convolutional layer；Depthwise Conv: depth separates convolutional layer； Ped-Detection: detection layers.

Specific embodiment

Below with reference to attached drawing, technical solution of the present invention and beneficial effect are described in detail.

As shown in Figure 1, the present invention provides a kind of light-weighted convolutional neural networks pedestrian recognition method, including walk as follows It is rapid:

Step 1, original pedestrian's data are obtained, INRIA data set, VOC data set, MS COCO data set are sieved Choosing obtains the pedestrian image of all ages and classes, different sexes, different shape, and believes the coordinate of pedestrian in the picture as label Breath constructs pedestrian's data set；

Step 2, the input size of pedestrian dummy network is designed, and data set is clustered according to the size of design, is selected Select suitable candidate frame；In the present embodiment, facilitate pedestrian's identification due to increasing transverse features expression, change pedestrian's mould The Aspect Ratio of type network inputs extracts more transverse features using rectangle input network, while in order to avoid input picture Influence of the resolution ratio to networked-induced delay, selects 608 × 288 image in different resolution as network inputs；The pedestrian that data are concentrated Image is labeled, and pedestrian is labeled as 0, the coordinate of recording mark frame；

Step 3, image is pre-processed, and passes through data enhancement methods expanding data；Wherein, the data enhancing side Method, which is specifically included, carries out three kinds of picture tone random exposure, adjustment image saturation, adjustment operations to image, is enhanced by data To the supplement of sample data, so that the network finally trained has better robust for the pedestrian detection under varying environment Property, it also can effectively prevent the over-fitting of network；

Step 4, convolutional neural networks are constructed, and pretreated image feeding network is trained, obtain that there is row The network model of people's identification function；

As shown in Fig. 2, the convolutional neural networks of building include convolutional layer, pond layer and detection layers, in which:

Convolutional layer structure: the first convolutional layer uses the filter of 16 3 × 3 sizes, and step-length 1 is filled with 1；Second is deep The filter that separable convolutional layer uses 16 3 × 3 sizes is spent, step-length 1 is filled with 1；Third convolutional layer uses 32 1 × 1 The filter of size, step-length 1, is filled with 0；The size of first pond layer filter is 3 × 3, and step-length 2 is used to down adopt Sample；4th depth separates the filter that convolutional layer uses 32 3 × 3 sizes, and step-length 1 is filled with 1；5th convolutional layer is adopted With the filter of 64 1 × 1 sizes, step-length 1 is filled with 0；The size of second pond layer filter is 2 × 2, and step-length is 2, it is used for down-sampling；6th depth separates convolutional layer and uses 64 as the filter of 3 × 3 sizes, and step-length 1 is filled with 1； 7th convolutional layer uses the filter of 128 1 × 1 sizes, and step-length 1 is filled with 0；The size of third down-sampling layer filter It is 2 × 2, step-length 2 is used for down-sampling；8th depth separates convolutional layer and uses 128 as the filter of 3 × 3 sizes, walks A length of 1, it is filled with 1；9th convolutional layer uses 256 as the filter of 1 × 1 size, and step-length 1 is filled with 0；4th pond The size of layer filter is 2 × 2, and step-length 2 is used for down-sampling；The separable convolutional layer of tenth depth uses 256 big for 3 × 3 Small filter, step-length 1, is filled with 1；11st convolutional layer uses 512 as the filter of 1 × 1 size, and step-length 1 is filled out Fill is 0；The size of 5th pond layer filter is 2 × 2, and step-length 2 is used for down-sampling；12nd depth separates convolutional layer 512 are used as the filter of 3 × 3 sizes, step-length 1 is filled with 1；13rd convolutional layer use 1024 for 1 × 1 size Filter, step-length 1 is filled with 0；The size of 6th pond layer filter is 2 × 2, step-length 1；14th depth can divide From the filter that convolutional layer uses 1024 3 × 3 sizes, it is filled with 1.

Detection layers: it is divided into two scales, scale 1: adds three convolutional layers after basic network and predicted.Addition In convolutional layer, the 15th convolutional layer uses the filter of 256 3 × 3 sizes, and step-length 1 is filled with 1；16th convolutional layer is adopted With the filter of 512 3 × 3 sizes, step-length 1 is filled with 1；17th convolutional layer uses the filter of 18 3 × 3 sizes, Step-length is 1, is filled with 1.The characteristic pattern of scale one exports the classification and location information of target having a size of 19 × 9, to larger ruler Very little pedestrian has preferable recognition capability；

Scale 2: 2 times of up-samplings are carried out from the 16th convolutional layer, the feature for being 38 × 18 with the output of the 11st convolutional layer Figure is added, then is predicted after adding three convolution, and the 18th convolutional layer uses the filter of 128 3 × 3 sizes, and step-length is 1, it is filled with 1；19th convolutional layer uses the filter of 256 3 × 3 sizes, and step-length 1 is filled with 1；20th convolutional layer Using the filter of 18 3 × 3 sizes, step-length 1 is filled with 1.The characteristic pattern of scale two exports target having a size of 38 × 18 Classification and location information, have preferable recognition capability to the pedestrian of smaller size.

In step 4, the number of iterations and Study rate parameter appropriate are set, wherein the number of iterations should be ensured that in iteration time Network can restrain in number, i.e. accuracy rate and penalty values of the network on training set tends towards stability.

Step 5, the image of tape label information is sent into network in batches, is trained.The wherein amount of images that every batch of includes, It is specifically formulated in conjunction with hardware facility；Used here as GPU model GTX Titan Xp, video card memory 12g, every batch of is sent into 64； In training process, the penalty values and accuracy rate on training set are calculated, backpropagation is carried out using stochastic gradient descent algorithm, is realized Parameter updates；

Step 6, after the completion of network training, the penalty values and accuracy rate on training set are checked, it is expected that loss value is not higher than 20%, it is expected that accuracy rate is not less than 90%, recall rate is not less than 85%, and omission factor is not higher than 15%.If not ideal enough pass through Regularized learning algorithm rate or increase the number of iterations are trained again, if result ideal completes training.

Step 7, trained network is tested on verifying collection, network is adjusted according to verification result again.If Effect shows bad very well and on verifying collection on training set, can reduce the number of iterations or attempt to change decay value, use again Training set training network adjusted, that is, return to step 5；

Step 8, trained network model is subjected to batch testing on verifying collection, final network model loss is 15%, mAP reach 92%, and recall rate reaches 93%, omission factor 7%, to the row of various sizes of pedestrian and partial occlusion People has better detection effect.

The above examples only illustrate the technical idea of the present invention, and this does not limit the scope of protection of the present invention, all According to the technical idea provided by the invention, any changes made on the basis of the technical scheme each falls within the scope of the present invention Within.

Claims

1. a kind of light-weighted convolutional neural networks pedestrian recognition method, it is characterised in that include the following steps:

Step 2, the input size of pedestrian dummy network is designed, and pedestrian's data set is clustered according to the size of design, is selected Select suitable candidate frame；

Step 4, construct convolutional neural networks, and by by step 3 pretreatment after and expand the image with mark information divide It criticizes and is sent into network, be trained, obtain the network model with pedestrian's identification function；

Step 5, after the completion of network training, the penalty values and accuracy rate on training set, the regularized learning algorithm if not ideal enough are checked It is trained again after rate or increase the number of iterations, completes to train if result ideal, go to step 6；

2. a kind of light-weighted convolutional neural networks pedestrian recognition method as described in claim 1, it is characterised in that: the step In rapid 1, the method for building pedestrian's data set is: screening, obtains to INRIA data set, VOC data set, MS COCO data set To the pedestrian image of all ages and classes, different sexes, different shape, and using the coordinate of pedestrian in the picture as mark information, structure Construction Bank's personal data collection.

3. a kind of light-weighted convolutional neural networks pedestrian recognition method as described in claim 1, it is characterised in that: the step In rapid 2, the pedestrian image concentrated to data is labeled, and pedestrian is labeled as 0, the coordinate of recording mark frame.

4. a kind of light-weighted convolutional neural networks pedestrian recognition method as described in claim 1, it is characterised in that: the step In rapid 3, data enhancement methods, which are specifically included, carries out random exposure, adjustment image saturation, three kinds of picture tone of adjustment to image Operation.

5. a kind of light-weighted convolutional neural networks pedestrian recognition method as described in claim 1, it is characterised in that: the step In rapid 4, the convolutional neural networks of building include convolutional layer, pond layer and detection layers, in which:

Convolutional layer structure: the first convolutional layer uses the filter of 16 3 × 3 sizes, and step-length 1 is filled with 1；Second depth can The filter that convolutional layer uses 16 3 × 3 sizes is separated, step-length 1 is filled with 1；Third convolutional layer uses 32 1 × 1 sizes Filter, step-length 1 is filled with 0；The size of first pond layer filter is 3 × 3, and step-length 2 is used for down-sampling；The Four depth separate the filter that convolutional layer uses 32 3 × 3 sizes, and step-length 1 is filled with 1；5th convolutional layer uses 64 The filter of 1 × 1 size, step-length 1, is filled with 0；The size of second pond layer filter is 2 × 2, and step-length 2 is used for Down-sampling；6th depth separates convolutional layer and uses 64 as the filter of 3 × 3 sizes, and step-length 1 is filled with 1；Volume seven Lamination uses the filter of 128 1 × 1 sizes, and step-length 1 is filled with 0；The size of third down-sampling layer filter is 2 × 2, Step-length is 2, is used for down-sampling；8th depth separates convolutional layer and uses 128 as the filter of 3 × 3 sizes, and step-length 1 is filled out Fill is 1；9th convolutional layer uses 256 as the filter of 1 × 1 size, and step-length 1 is filled with 0；4th pond layer filter Size be 2 × 2, step-length 2, be used for down-sampling；Tenth depth separate convolutional layer use 256 for the filtering of 3 × 3 sizes Device, step-length 1, is filled with 1；11st convolutional layer uses 512 as the filter of 1 × 1 size, and step-length 1 is filled with 0；The The size of five pond layer filter is 2 × 2, and step-length 2 is used for down-sampling；12nd depth separates convolutional layer and uses 512 For the filter of 3 × 3 sizes, step-length 1 is filled with 1；13rd convolutional layer use 1024 for the filter of 1 × 1 size, Step-length is 1, is filled with 0；The size of 6th pond layer filter is 2 × 2, step-length 1；14th depth separates convolutional layer and adopts With the filter of 1024 3 × 3 sizes, it is filled with 1；

Detection layers: it is divided into two scales, scale 1: adds three convolutional layers after basic network and predicted, the convolution of addition In layer, the 15th convolutional layer uses the filter of 256 3 × 3 sizes, and step-length 1 is filled with 1；16th convolutional layer uses The filter of 512 3 × 3 sizes, step-length 1, is filled with 1；17th convolutional layer uses the filter of 18 3 × 3 sizes, step A length of 1, it is filled with 1；

Scale 2: 2 times of up-samplings are carried out from the 16th convolutional layer, the characteristic pattern phase for being 38 × 18 with the output of the 11st convolutional layer Add, then predicted after adding three convolution, the 18th convolutional layer uses the filter of 128 3 × 3 sizes, and step-length 1 is filled out Fill is 1；19th convolutional layer uses the filter of 256 3 × 3 sizes, and step-length 1 is filled with 1；20th convolutional layer uses The filter of 18 3 × 3 sizes, step-length 1, is filled with 1.

6. a kind of light-weighted convolutional neural networks pedestrian recognition method as claimed in claim 5, it is characterised in that: the inspection It surveys in layer, the characteristic pattern of scale 1 exports the classification and location information of target having a size of 19 × 9；The characteristic pattern size of scale two It is 38 × 18, exports the classification and location information of target.

7. a kind of light-weighted convolutional neural networks pedestrian recognition method as described in claim 1, it is characterised in that: the step In rapid 5, in the training process, the penalty values and accuracy rate on training set are calculated, is carried out using stochastic gradient descent algorithm reversed It propagates, realizes that parameter updates.

8. a kind of light-weighted convolutional neural networks pedestrian recognition method as described in claim 1, it is characterised in that: the step In rapid 5, the penalty values and accuracy rate on training set are checked, it is expected that loss value is not higher than 20%, it is expected that accuracy rate is not less than 90%, Recall rate is not less than 85%, and omission factor is not higher than 15%.

9. a kind of light-weighted convolutional neural networks pedestrian recognition method as described in claim 1, it is characterised in that: the step Rapid 6 particular content is: if effect is verified very well and shows bad on collection on training set, reducing the number of iterations or change Decay value, again with training set training network adjusted, return step 5.