CN108229390A

CN108229390A - Rapid pedestrian detection method based on deep learning

Info

Publication number: CN108229390A
Application number: CN201810001077.9A
Authority: CN
Inventors: 孙君凤; 许宏吉; 刘爱玲; 房桦; 刘琛
Original assignee: JOVISION TECHNOLOGY Co Ltd; Shandong University
Current assignee: JOVISION TECHNOLOGY Co Ltd; Shandong University
Priority date: 2018-01-02
Filing date: 2018-01-02
Publication date: 2018-06-29

Abstract

The rapid pedestrian detection method based on deep learning of the present invention, includes the following steps：（1）Pedestrian data set of the structure with markup information：The monitor video image under different scenes is acquired, pedestrian detection standard data set is built, and data set is expanded, includes image and its markup information；（2）Using pedestrian's data set of structure to network model pre-training, generation is suitable for the model of pedestrian detection：Pre-training model is finely adjusted using pedestrian's data set, is trained for the pedestrian detection model suitable for constructed data set；（3）The test of the quick pedestrian detection of multi-channel video is carried out to trained model：It based on multi-channel video joining method, treats test image and is pre-processed, input trained model and be detected, obtain the location information of target.The method of the present invention can reduce the influence to pedestrian detection such as illumination, complex background, quickly detect pedestrian, monitor video is analyzed automatically, note abnormalities event in time.

Description

Rapid pedestrian detection method based on deep learning

Technical field

The present invention relates to image procossing, video monitoring and security protection be applied to pedestrian detection technical field, specially one Rapid pedestrian detection method of the kind based on deep learning.

Background technology

Pedestrian detection is exactly computer for given image and video, judges wherein whether there is pedestrian, if there is also Need to provide the specific location of pedestrian.Pedestrian detection is pedestrian tracking, behavioural analysis, gait analysis, and pedestrian's identification etc. is ground The basis studied carefully and premise, a good pedestrian detection algorithm can provide strong support and guarantee for the latter.In actual life, Pedestrian detection is widely used scene.In recent years, numerous individual such as entrance, office, market, bank and public place Monitoring camera is assembled with, but the work monitored is most to be completed by artificial monitoring.However the work manually monitored It not only needs to expend a large amount of manpower, and since this work is easy to make observer tired out so that observer may leak Fall some important information, so as to bring huge economic loss.Using pedestrian detection technology, we, which can realize, allows computer Automatic each pedestrian detected under monitoring camera.Can also realize based on this specified pedestrian track is carried out with Track is identified pedestrian's identity and pedestrian behavior is analyzed etc., if it find that the timely automated alarm of unusual condition.This Sample can greatly reduce manpower, can more improve monitoring accuracy and prevent trouble before it happens, reach us and carry out the original intention of video monitoring.

Human body has both flexible with rigid feature so that pedestrian detection easily by clothes, illumination, block, pedestrian's posture, ruler The restriction of the complicated factors such as degree, shooting angle, and it is unstable.

Currently used pedestrian detection method includes：Background difference, frame difference method, optical flow method, template matching method and based on machine Method of device study etc..Aforementioned four kinds of methods are all based on the human body detecting method of image processing techniques, however these methods exist In face of human body clothing, the bodily form, human posture, human body block, lighting issues and complex background etc. problem the problem of when, institute Reflected accuracy rate and robustness are very poor.Based on the method for machine learning from the rule of training sample study human body, obtain Then model is tested on test set.If data and feature can be selected reasonably, it is subject to rational algorithm to carry out Training, the problems such as can preferably overcoming such as human body diversity, illumination, background diversity, it has also become pedestrian detection now A kind of mainstream algorithm.

For security protection industry, the quantity of monitor video is more and more, is provided largely to the development of deep learning Data, but for deep learning, computing capability is a major limitation problem, the time for how saving detection is non- It is often important.

Invention content

In order to compensate for the shortcomings of the prior art, the present invention provides a kind of simple in structure, easy to use based on depth The rapid pedestrian detection method of habit can reduce the influence to pedestrian detection such as illumination, complex background, quickly detect pedestrian, from Dynamic that monitor video is analyzed, note abnormalities event in time, improves the efficiency and robustness of pedestrian detection.

The present invention is achieved through the following technical solutions：

The rapid pedestrian detection method based on deep learning of the present invention, it is characterised in that：Include the following steps：

（1）Pedestrian data set of the structure with markup information：

The monitor video image under different scenes is acquired, builds pedestrian detection standard data set, and data set is expanded, is wrapped Containing image and its markup information；

（2）Using pedestrian's data set of structure to network model pre-training, generation is suitable for the model of pedestrian detection：

Pre-training model is finely adjusted using data set ready in step 1, is trained for suitable for constructed data set Pedestrian detection model；

（3）The test of the quick pedestrian detection of multi-channel video is carried out to trained model：

It based on multi-channel video joining method, treats test image and is pre-processed, input trained model and be detected, obtain To the location information of target.

Step（1）In：Standard pedestrian data set of the structure with markup information, mainly including following three parts：

Pedestrian's picture image under 1-1, acquisition different scenes；Comprising a variety of pedestrian's postures, several scenes, a variety of illumination effects with And in one day different periods video image so that the data set of construction is capable of the diversity of effective covering scene；

1-2, collected data set is effectively expanded, by image mirrors, angle rotation, size scaling, addition with The method of machine noise；

1-3, the data images after expansion are generated with corresponding mark and label information, target in markup information, that is, sample Location information, target generic in label information, that is, sample, classification behaviour are labeled as 1 or inhuman and are labeled as -1.

Step（2）In：Utilize model of the pedestrian's data set training of structure based on Faster R-CNN：

By the labeled data collection of above-mentioned structure, pre-training model is finely adjusted using Faster R-CNN network structures, is instructed Practice the model suitable for pedestrian detection；Faster R-CNN networks include the RPN convolutional neural networks for generating candidate region With the Fast R-CNN convolutional neural networks composition for target-recognition.

Step（3）In：The test of the quick pedestrian detection of multi-channel video is carried out to trained model, mainly comprising following Three parts：

3-1, image to be tested is generated using multi-channel video frame joining method

By the way that the monitor video image of multichannel is spliced, a large-sized image is combined into, it then will be spliced Image is input in trained model and is detected, and the video frame for being equivalent to multiple channels is carried out at the same time detection, can be effective Save the detection time of video to be detected；

3-2, the influence that ambient enviroment is reduced using video frame preprocess method

In detection process, the main purpose of image preprocessing is the redundancy eliminated in image, filters out interference, noise, is increased Detectability for information about by force, so as to improve the reliability of subsequent characteristics extraction and detection；

Available image processing method mainly includes following methods：Histogram equalization, normalization, Gamma corrections；

3-3, it will be input in trained model by pretreated splicing frame and carry out pedestrian detection, output pedestrian position letter Breath and corresponding confidence level.

Step（2）In：It is finely adjusted on the data set of structure using existing ImageNet models, obtains being suitable for row The new disaggregated model of people's detection, key step are as follows：

2-1 carries out pre-training using the ZF-Net networks of ImageNet models, and the full articulamentum of the network model is carried out first Modification, output class are two classes, i.e. pedestrian and non-pedestrian；

2-2 pre-training models on the sample set of structure, Faster R-CNN include the fine tuning RPN nets for extracting candidate region Network and the Fast R-CNN networks for detecting target；RPN networks handle the image comprising pedestrian of input, generation Pedestrian's roughing region；And Fast R-CNN networks further differentiate these pedestrian's roughing regions, output pedestrian is final Location information；RPN networks and Fast R-CNN networks can sharing feature, reduce the time of extraction candidate region, shorten Time of target detection；

Using the method for pre-training primarily to a good netinit value is obtained, to avoid sunken in subsequent training Enter local minimum, while also be able to accelerate the convergence rate of network；The follow-up sample set using structure is finely adjusted network, To ensure that parameter is more suitable for current sample set, the verification and measurement ratio on data set is improved.

Step（3）In：

It in 3-2, is corrected by quick Gamma, each color channel of image is standardized in rgb color space； Gamma corrects the contrast for having adjusted image, reduces the influence caused by image local shade and illumination variation, alleviates field The influence of scape and illumination to feature extraction.

Step（3）In, in 3-2, Gamma correction methods, it is assumed that the pixel value of all images is the integer between 0 ~ M, to one The pixel that a pixel value is i performs following operate：

3-2-1 is normalized：Pixel value is converted into the real number j between 0 ~ 1,；

3-2-2 is pre-compensated for：Pre-compensation value, wherein Gamma represents the gamma values chosen, in practical application In, it should gamma values are adjusted in a certain range according to actual conditions, to obtain optimum efficiency；

3-2-3 renormalizations：The integer value between 0 ~ M will be changed to by the real number value contravariant of precompensation,；Aforesaid operations are performed to each pixel in image, i.e., can be obtained by Gamma Image value after transformation；Simple way is by the corresponding Gamma changing values of all integer calculations between 0 ~ M, is corresponded to Value be stored in the Gamma pre-established a correction look-up table, using the table to image of any pixel value between 0 ~ M into Row Gamma is corrected.

The invention has the advantages that

1. the present invention on the basis of existing standard pedestrian's data set, increases the data under current environment, by acquiring a variety of rings Video image under border, and data set is effectively expanded by the method that a variety of data enhance, not only increase data Amount, at the same also cover it is more under the conditions of data, effectively prevent network over-fitting, improve the accuracy of pedestrian detection.

2. method of the present invention by effectively being spliced to multi-channel video frame, effectively increases multi-channel video pedestrian The speed of detection.Interior at the same time, detection method proposed by the present invention can detect multi-channel video simultaneously, compared to tradition one The detection method of road video, effectively saves detection time.

3. by the present invention in that with image pretreatment operation, the quality of image to be detected under varying environment is effectively had adjusted, The influence caused by image local shade and illumination variation is reduced, alleviates the shadow of scene and illumination to pedestrian's feature extraction It rings, improves the robustness of pedestrian detection.

4. by using the network model of deep learning, solve the deficiency of manual extraction characteristics of image, improve pedestrian The accuracy rate of detection.

Description of the drawings

Fig. 1 is whole 3 steps flow chart schematic diagrams of invention.Fig. 2 is the step of the present invention（1）Flow show It is intended to.Fig. 3 is the step of the present invention（3）Flow diagram.Fig. 4 is the whole detailed flow diagram of the present invention.

Specific embodiment

A kind of specific embodiment of the attached drawing for the present invention.

A kind of rapid pedestrian detection method based on deep learning disclosed in this invention is mainly comprising establishment sample set, instruction Practice and quickly detect three phases.Key content wherein of the invention is quick detection-phase, and the part is by image to be tested Spliced, and carry out the influence caused by image procossing reduction image local shade and illumination variation, improve pedestrian detection Speed and robustness.

Step is as follows：

（1）Pedestrian data set of the structure with markup information；

The monitor video image under different scenes is acquired, builds pedestrian detection standard data set, and data set is expanded, is wrapped Containing image and its markup information.

（2）Using pedestrian's data set of structure to network model pre-training, generation is suitable for the model of pedestrian detection；

Pre-training model is finely adjusted using data set ready in step 1, is trained for suitable for constructed data set Pedestrian detection model.

（3）The test of the quick pedestrian detection of multi-channel video is carried out to trained model.

It based on multi-channel video joining method, treats test image and is pre-processed, input trained model and examined It surveys, obtains the location information of target.

Pedestrian's picture image under 1-1 acquisition different scenes；

1-2 effectively expands collected data set, such as passes through the side such as mirror image, rotation, scaling, addition random noise Method；

1-3 gives birth to the markup information of paired data collection image.

By the labeled data collection of above-mentioned structure, pre-training model is finely adjusted using Faster R-CNN network structures, is instructed Practice the model suitable for pedestrian detection.Faster R-CNN networks include the RPN convolutional neural networks for generating candidate region With the Fast R-CNN convolutional neural networks composition for target-recognition.

Step（3）：The test of the quick pedestrian detection of multi-channel video is carried out to trained model, mainly comprising following three Part：

3-1 generates image to be tested using multi-channel video frame splicing

For security protection industry, the quantity of monitor video is more and more, and a large amount of data are provided to the development of deep learning, But for deep learning, computing capability is a major limitation problem, and the time for how saving detection is very heavy It wants.

By the way that the video image of multichannel is spliced, a large-sized image is combined into, it then will be spliced Image is input in trained model and is detected, and can effectively save the detection time of video to be detected.Assuming that by N number of logical The video frame in road is spliced into a frame image, is input in model, and the video frame for being equivalent to N number of channel is carried out at the same time detection, can have Effect saves detection time.

3-2 reduces the influence of ambient enviroment using video frame preprocess method

In detection process, image to be detected of input can lead to picture quality because of limitation and the random disturbances of various conditions Difference, direct use cannot get a desired effect.The main purpose of image preprocessing is the redundancy eliminated in image, is filtered out Interference, noise, enhance detectability for information about, so as to improve the reliability of subsequent characteristics extraction and detection.

The present invention is corrected by quick Gamma, and standard is carried out to each color channel of image in rgb color space Change.Gamma corrects the contrast for having adjusted image, reduces the influence caused by image local shade and illumination variation, mitigates The influence of scene and illumination to feature extraction.

3-3 will be input in trained model by pretreated splicing frame and carry out pedestrian detection, output pedestrian position Confidence ceases and corresponding confidence level.

Step（1）In, pedestrian data set of the structure with markup information.

Monitor video image under 1-1 acquisition different scenes, as possible comprising a variety of pedestrian's postures, several scenes, Duo Zhongguang According to influence and in one day different periods video image so that the data set of construction is capable of the diversity of effective covering scene.

For 1-2 for deep learning, big data is being effectively ensured for its effect, and collected sample data set is passed through Different methods are effectively expanded.

Common data set extending method is including but not limited to following：Image mirrors, angle rotation, size scaling and addition The modes such as random noise.

Mark and label information corresponding to the data set generation after expansion 1-3, target in markup information, that is, sample Location information, target generic in label information, that is, sample, classification behaviour in the present invention is labeled as 1 or inhuman being labeled as- 1。

Step（2）In, pre-training is carried out to the model based on Faster R-CNN algorithms using pedestrian's data set of structure, Generate final pedestrian detection model.

It is finely adjusted, obtained suitable for the new of pedestrian detection on the data set of structure using existing ImageNet models Disaggregated model.Key step is as follows：

Pre-training is carried out using the ZF-Net networks of ImageNet models in the 2-1 present invention, the complete of the network model is connected first It connects layer to modify, output class is two classes, i.e. pedestrian and non-pedestrian；

2-2 pre-training models on the sample set of structure, including fine tuning RPN networks and Fast R-CNN networks.RPN networks pair The image comprising pedestrian of input is handled, generation pedestrian's roughing region；And Fast R-CNN networks are to these pedestrian's roughings Region is further differentiated, exports the final location information of pedestrian.

Using the method for pre-training primarily to a good netinit value is obtained, to avoid subsequent training In be absorbed in local minimum, while also be able to accelerate network convergence rate.The follow-up sample set using structure carries out network Fine tuning, to ensure that parameter is more suitable for current sample set, improves the verification and measurement ratio on data set.

Step（2）Used model is realized based on Faster RCNN frames.Faster RCNN are included and are used for Extract the RPN networks of candidate region and the Fast RCNN networks for detecting target.RPN networks and Fast RCNN networks can Sharing feature greatly reduces the time of extraction candidate region, shortens the time of target detection.

Step（3）In, the test of the quick pedestrian detection of multi-channel video is carried out to trained pedestrian detection model.

3-1, image to be tested is generated using picture frame splicing to multi-channel video frame

With safety monitoring high definition, intelligent, integrated fast development, monitor video resource is more and more, and monitoring big data is The development of deep learning provides fertile soil.In addition to big data, another main problem of influence depth study is computing capability Limitation, the deep learning based on big data need stronger computing capability, and it is extremely important for how saving the time of detection 's.

For pedestrian detection, by the way that the feature of multi-path monitoring video is effectively spliced, be combined into one it is to be measured Attempt picture, spliced image is input in trained model and is detected, can effectively save multi-channel video pedestrian inspection The time of survey.

Assuming that the video frame of N number of channel is spliced into a frame image, it is input in model, carries out pedestrian detection, be equivalent to N The video of a channel is carried out at the same time detection, originally the time spent in one channel of detection, can detect regarding for N number of channel simultaneously Frequently, the detection time of multi-channel video is effectively saved.In the present embodiment, the value of N needs to consider the splicing of multi-channel video frame Influence of the image afterwards to pedestrian detection, while in view of splicing frame to factors such as the deformation of primitive frame, it is proposed that value is nature Several squares.Value suggestion in the present embodiment is it is not limited to N values are 4.

3-2, in view of spliced image in different illumination and varying environment, the pretreatment operation for carrying out video frame subtracts The influence of few ambient enviroment.

In detection process, image to be detected of input can lead to image because of limitation and the random disturbances of various conditions Of poor quality, direct use cannot get a desired effect.The main purpose of image preprocessing is the redundancy eliminated in image, Interference, noise are filtered out, enhances detectability for information about, so as to improve the reliability of subsequent characteristics extraction and detection.

Available image processing method includes but not limited to following methods：Histogram equalization, normalization, Gamma corrections The methods of.In the present embodiment by taking Gamma is corrected as an example, key step is introduced：

Quick Gamma bearing calibrations, are standardized each color channel of image in rgb color space.Gamma is corrected The contrast of image is had adjusted, the influence caused by image local shade and illumination variation is reduced, alleviates scene and illumination Influence to feature extraction.

Assuming that the pixel value of all images is the integer between 0 ~ M, following grasp is performed to the pixel that a pixel value is i Make：

3-2-2 is pre-compensated for：Pre-compensation value, wherein Gamma represents the gamma values chosen, in practical application In, it should gamma value is adjusted in a certain range according to actual conditions, to obtain optimum efficiency；

3-2-3 renormalizations：The integer value between 0 ~ M will be changed to by the real number value contravariant of precompensation,.To in image each pixel perform an aforesaid operations, i.e., can obtain by Image value after Gamma transformation.Simple way is by the corresponding Gamma changing values of all integer calculations between 0 ~ M, by it Corresponding value is stored in the Gamma pre-established a correction look-up table, uses figure of the table to any pixel value between 0 ~ M As carrying out Gamma corrections.

3-3 is last, will splice and is input in network model by the stitching image of image procossing, carries out pedestrian detection, Export position and the confidence score of pedestrian.

Claims

1. a kind of rapid pedestrian detection method based on deep learning, it is characterised in that：Include the following steps：

（1）Pedestrian data set of the structure with markup information：

2. the rapid pedestrian detection method according to claim 1 based on deep learning, it is characterised in that：

3. the rapid pedestrian detection method according to claim 1 based on deep learning, it is characterised in that：

4. the rapid pedestrian detection method according to claim 1 based on deep learning, it is characterised in that：

Step（3）In：The test of the quick pedestrian detection of multi-channel video is carried out to trained model, mainly comprising following three Point：

5. the rapid pedestrian detection method according to claim 3 based on deep learning, it is characterised in that：

Step（2）In：

It is finely adjusted on the data set of structure using existing ImageNet models, obtains new point suitable for pedestrian detection Class model, key step are as follows：

2-2 pre-training models on the sample set of structure, Faster R-CNN include the fine tuning RPN nets for extracting candidate region Network and the Fast R-CNN networks for detecting target；RPN networks handle the image comprising pedestrian of input, generation Pedestrian's roughing region；And Fast R-CNN networks further differentiate these pedestrian's roughing regions, output pedestrian is final Location information；RPN networks and Fast RCNN networks can sharing feature, reduce the time of extraction candidate region, shorten The time of target detection；

6. the rapid pedestrian detection method according to claim 4 based on deep learning, it is characterised in that：

Step（3）In：

7. the rapid pedestrian detection method according to claim 6 based on deep learning, it is characterised in that：

Step（3）In, in 3-2, Gamma correction methods, it is assumed that the pixel value of all images is the integer between 0 ~ M, to a picture Element value performs following operate for the pixel of i：

3-2-2 is pre-compensated for：Pre-compensation value, the gamma values of wherein Gamma expression selections, in practical applications, Gamma values should be adjusted in a certain range according to actual conditions, to obtain optimum efficiency；