CN109753946A

CN109753946A - A kind of real scene pedestrian's small target deteection network and detection method based on the supervision of body key point

Info

Publication number: CN109753946A
Application number: CN201910063682.3A
Authority: CN
Inventors: 张永强; 丁明理; 李贤�; 杨光磊; 董娜; 朱月熠; 王莉娜; 白延成
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2019-05-14

Abstract

The invention proposes a kind of real scene pedestrian's small target deteection networks and detection method based on the supervision of body key point, belong to computer vision pedestrian detection technology field.The detection network includes super-resolution network, critical point detection network and pedestrian's sorter network: the detection method prepares training sample first, then, candidate region image is generated using benchmark pedestrian detector, high-definition picture corresponding with low-resolution image is generated by super-resolution network, according to pedestrian body key point come supervised training super-resolution network, finally determine that the image of input is the super-resolution image that true high-definition picture or super-resolution network generate using pedestrian's sorter network, the dual training with super-resolution network is realized simultaneously, and for determining that the image of input is pedestrian image or background image, and then complete the detection of pedestrian's Small object in real scene.The small pedestrian detection in real scene may be implemented using the method.

Description

It is a kind of based on body key point supervision real scene pedestrian's small target deteection network and Detection method

Technical field

The present invention relates to a kind of real scene pedestrian's small target deteection network based on the supervision of body key point and detection sides Method belongs to computer vision pedestrian detection technology field.

Background technique

With China's expanding economy, the propulsion of urbanization causes urban population quantity to be increased sharply, AT STATION, subway, market The size of population etc. the aggregation of many public places is huge, it is easy to safety accident occur.Therefore, passenger flow situation is detected in time, Pedestrian test and analyze and seems particularly significant.In recent years, based on the pedestrian detection technology of deep learning, especially true field What pedestrian's small target deteection technology exactly occurred and rapidly developed in this background under scape identifies and is determined for pedestrian's Small object A solution of position.A branch of the pedestrian detection technology as image procossing and area of pattern recognition is always to calculate A very important research topic in machine visual field.Meanwhile it plays the angle of key technology in some practical applications Color, for example, the counting of pedestrian retrieval, population, automatic Pilot and intelligence based search system etc..To the small mesh of pedestrian under real scene Mark detection expansion further investigation not only has a wide range of applications demand and prospect, asks for solving the other of computer vision field Topic also has great reference.

In recent years, after especially deep learning (Deep Learning) is risen, some scientific research institutions both domestic and external, height School, enterprise have put into huge human and material resources, have carried out a large amount of correlative study to pedestrian detection and have worked, in theoretical research and reality Certain achievement is all achieved in the application of border, and proposes the intelligent monitor system of some pedestrian detections.Although pedestrian detection is closed Key technology is grown rapidly, and domestic and foreign scholars also achieve plentiful and substantial research achievement in many related fieldss, but existing The research in stage is posing for photograph on image or carrying out on ideal image in laboratory mostly, such image tool There are a following features: first, pedestrian's scale is larger, and is located at image centre；Second, image background is more clean.However true Pedestrian image in real field scape, pedestrian is usually extremely small and background is complex, in addition, also suffering from posture, illumination etc. The influence of factors.How in this real scene and by realization pedestrian's Small object under the action of these influence factors Precisely detection, becomes a hot issue urgently to be solved.

So far, pedestrian detection technology mainly experienced two stages during its development: traditional pedestrian's inspection Survey method and pedestrian detection method based on deep learning.Pedestrian detection method based on deep learning is regardless of in Detection accuracy Or all it is significantly larger than traditional pedestrian detection method on detection efficiency.It is main for the pedestrian detection method based on deep learning It is wanted to be divided into two classes: two rank pedestrian detection frames and single order pedestrian detection frame.In two rank pedestrian detection frames, for one By altimetric image, the first stage mainly generates pedestrian candidate region image, and generally generating about thousands of a most probables includes pedestrian Candidate region image, common method has Selective Search, Edge Boxes, RPN etc.；Second stage is to these generations Candidate region image further classified (Classification) and position return (Regression)；This kind side Method is mainly improved from general object detecting method Fast-RCNN and Faster-RCNN.In single order pedestrian detection frame In, classification and position regression forecasting directly are carried out to anchor point (Anchor), representative method mainly has based on YOLO/SSD's Pedestrian detection method.Although these pedestrian detection methods at this stage make some progress, but in the row of real scene People's small target deteection effect far can not be satisfactory.Main cause is: for those especially small pedestrians, as shown in Figure 1, volume The convolution operation of product neural network will lose a large amount of even all of detailed information, the validity feature letter that the feature of extraction is included Breath is very little, is not enough to realize effective detection of pedestrian's Small object.For example, for a height 50-75 pixel size pedestrian, By convolutional neural networks 4-5 times convolution operation, several pixel sizes are only remained in the last layer of characteristic layer, it is such Feature far can not be used to effectively one pedestrian's small target deteection network of training.

For pedestrian's small target deteection problem under real scene, at present still in for particular problem design ad hoc approach Stage, it is poor to the adaptive ability of the environment complicated and changeable under real scene, it is relatively simple or respective to be frequently in background The available preferable effect of detection method ability designed under conditions of specific occasion.But be detected pedestrian target it is smaller and scheme When complex as background, existing pedestrian detection technology usually has that verification and measurement ratio is lower.Correspondingly, there are also sides Method solves the problems, such as pedestrian's small target deteection under real scene.For example, an ACF is trained in some work first (Aggregated Channel Features) detector generates candidate region, trains two pedestrian detectors' difference later The biggish pedestrian of scale and lesser pedestrian are detected；The shallow-layer feature that work also passes through fusion convolutional neural networks To realize the detection to pedestrian's Small object.But in general all there is inefficiency in all these work, no It can fundamentally solve the problems, such as pedestrian's small target deteection under real scene.

Summary of the invention

Present invention is generally directed to the deficiency of the existing pedestrian detection method based on deep learning, overcome it is existing these Pedestrian detection method is not suitable for the difficulty of pedestrian's small target deteection, and the pedestrian detection method solved at this stage is true in identification It is small to provide a kind of real scene pedestrian based on the supervision of body key point for accuracy low problem when pedestrian's Small object in scene Target detection network and detection method.It can be with using the real scene pedestrian small target detecting method supervised based on body key point Realize that the small pedestrian detection in real scene, test object are not limited solely to biggish pedestrian target in real scene, less Be confined to the picture of posing for photograph of laboratory ideally, in particular so that pedestrian's range image capture device farther out when generate Pedestrian's small target deteection is possibly realized.

A kind of real scene pedestrian's small target deteection network based on the supervision of body key point, used technical solution is such as Under:

Real scene pedestrian's small target deteection network based on the supervision of body key point includes super-resolution network, closes Key point detects network and pedestrian's sorter network:

Super-resolution network: for generating high-definition picture corresponding with low-resolution image:

Critical point detection network: for the loss function of body key point to be introduced into the loss function of super-resolution network In, using pedestrian body key point come supervised training super-resolution network；

Pedestrian's sorter network: for determining that the image of input is that true high-definition picture and super-resolution network generate Super-resolution image and realize with the dual training of super-resolution network, and image for determining input is that pedestrian schemes Picture or background image, and then complete the detection of pedestrian's Small object in real scene.

Further, the super-resolution network is deep learning network, including convolutional layer one, convolutional layer two, convolutional layer Three, convolution block one, convolution block two, convolution block three, convolution block four, convolution block five and warp lamination one, warp lamination two；The volume The data output end of lamination one is connected with the data input pin of the convolution block one；The convolution block one, convolution block two, convolution block Three, it is corresponding with data input pin sequence connected to carry out data output end respectively for convolution block four and convolution block five；The convolution block five Data output end be connected with the data input pin of convolutional layer two；The data output end of the convolutional layer two and warp lamination one Data input pin is connected；The data output end of the warp lamination one is connected with the data input pin of the warp lamination two, institute The data output end for stating warp lamination two is connected with the data input pin of the convolutional layer three, the data interaction of the convolutional layer three End is connected with the supervised training data interaction end of the critical point detection network, and each warp lamination realizes 2 times of up-samplings, this The resolution ratio of the output image of the entire super-resolution network of sample will be 4 times of input picture.

Further, pedestrian's sorter network includes the Resnet50 master network structure full articulamentum parallel with two.

A kind of real scene pedestrian's small target detecting method based on the detection network, used technical solution is such as Under:

The real scene pedestrian small target detecting method includes:

Step 1: generating a benchmark pedestrian detector using the training sample training in pedestrian detection database；Then, Interception is carried out to the sample image in the pedestrian detection database by the benchmark pedestrian detector and obtains candidate region figure Picture；The candidate region image of each generation and the overlapping area of artificial mark true value are sought for the candidate region image, And then obtain positive sample and negative sample；Wherein, positive sample indicates that pedestrian candidate image, negative sample indicate background candidate image；

Step 2: the benchmark pedestrian detector to be intercepted to the candidate region image obtained as high-definition picture, benefit Corresponding low-resolution image is obtained by 4 times of the high-definition picture down-sampling with bilinear interpolation；

Step 3: using critical point detection network by the way that the loss function of body key point is introduced into super-resolution network The mode of loss function exercise supervision training to super-resolution network, the super-resolution network after being trained；

Step 4: the low-resolution image is input in the super-resolution network after training, surpassing after training is utilized Resolution ratio network training generates high-definition picture corresponding with low-resolution image, as super-resolution image；

Step 5: by pedestrian detection database true high-definition picture and the super-resolution image input simultaneously In pedestrian's sorter network, dual training is carried out to the super-resolution network using pedestrian's sorter network, meanwhile, the pedestrian point Class network is pedestrian image or background image according to the low-resolution image of super-resolution image judgement input, and then complete At the recognition detection of pedestrian's Small object in real scene.

Further, the detailed process of acquisition candidate region image described in step 1 includes:

Step 1: generating a benchmark pedestrian detector using the training sample training in pedestrian detection database；Wherein, The pedestrian detection database includes training sample image and test sample image；

Step 2: using benchmark pedestrian detector, it carries out pedestrian position for each image in training sample image Information prediction is set, and interception generates 100 most possible regions comprising pedestrian and preservation from each image；

Step 3: using benchmark pedestrian detector, it carries out pedestrian position for each image in test sample image Information prediction is set, and interception generates 100 most possible regions comprising pedestrian and preservation from each image；Wherein, The most possible region comprising pedestrian that two steps and third step obtain is candidate region image.

Further, pedestrian detection database described in step 1 uses CityPersons data set；It is described All tested set the goal are divided into normal, Small object, serious according to pedestrian's size and by coverage extent by CityPersons data set Shelter target and all four classes, wherein Small object refers to the high pedestrian target between 50-75 pixel, all referring to all Pedestrian, including normal, Small object, whole pedestrian targets for blocking and not blocking.

Further, the detailed process of the super-resolution network after being trained described in step 3 includes:

Step 1 carries out in advance the body key point information in each candidate region image using critical point detection network It surveys, and using the body key point predicted as body key point true value；

Step 2 obtains body key point loss function using the body key point true data calculation；

Step 3, the oversubscription using the body key point loss function supervised training super-resolution network, after being trained Resolution network.

Further, the body key point loss function are as follows:

Wherein, W_iIt (p) is a binary mask (mask), W_iAnd W (p)=0_i(p)=1 it is illustrated respectively in i-th of instruction The true value for practicing p-th of key point in image exists or is not present；Indicate the true of body key point confidence characteristic figure Value, K_i(p) the body key point confidence characteristic figure detected is indicated.

Further, the body key point includes nose, left and right eye, left-right ear, left and right shoulder, left and right elbow pass Section, left and right wrist joint, left and right hip joint, left and right knee joint and left and right ankle-joint.

Further, the classification standard of positive sample described in step 1 and negative sample are as follows: overlapping area is greater than 0.5 and marks For positive sample, overlapping area is labeled as negative sample less than 0.35；

Test phase in step 5 judges the standard of pedestrian image and background image are as follows: pedestrian is passed through in input candidate region Score after sorter network is greater than threshold value 0.5 and is determined as positive sample, i.e. pedestrian image, differentiates if score is less than threshold value 0.5 For negative sample, i.e. background image.

The invention has the advantages that:

The present invention pedestrian's small target deteection under above-mentioned real scene there are aiming at the problem that, the invention proposes one kind to be based on The real scene pedestrian's small target deteection network and detection method of body key point supervision, by drawing for pedestrian body critical point detection Enter into pedestrian's small target deteection technology, i.e., the training real scene pedestrian small target deteection under the supervision of pedestrian body key point Network, and then realize the detection of real scene pedestrian Small object.

Real scene pedestrian's small target deteection network proposed by the present invention based on the supervision of body key point, utilizes super-resolution Network, body critical point detection network, pedestrian's sorter network structure and the positive negative training sample having had been built up can train one A real scene pedestrian's small target deteection network based on the supervision of body key point.In order to increase the stability of network training, this Invention is optimized training book using the optimisation strategy of confrontation network (Generative Adversarial Network, GAN) and sent out Bright proposed real scene pedestrian's small target deteection network based on the supervision of body key point, that is, allow super-resolution network and row The mutual game of people's sorter network, alternative optimization mode be trained.Super-resolution network is adopted at random from low resolution sample As input, output result needs to imitate the authentic specimen in high-resolution sample set as far as possible sample.Pedestrian's sorter network it is defeated Enter pedestrian's high resolution graphics that the study then exported for true pedestrian's high-definition picture sample and super-resolution network generates Decent, the purpose is to distinguish the output of super-resolution network as far as possible from authentic specimen, while differentiating high-resolution The classification of rate image, and super-resolution network will then cheat pedestrian's sorter network as much as possible.Two groups of networks are confronted with each other, constantly Adjusting parameter, final purpose be so that pedestrian's sorter network can not judge whether the output result of super-resolution network true, into And reach super-resolution network and can produce clearly high-definition picture, while pedestrian's sorter network can accurately classify it is defeated The classification for entering image is pedestrian image or background image.

In real scene pedestrian's small target detecting method proposed by the present invention based on the supervision of body key point, super-resolution Network can generate a corresponding high-definition image according to pedestrian's Small object image, and further utilize the prison of body key point The image that information is superintended and directed to promote super-resolution e-learning to generate is more life-like, includes more detail of the high frequency, solves Not the problem of existing pedestrian detection method based on deep learning is not suitable for pedestrian's small target deteection, and then improve real scene The accuracy rate of lower pedestrian's small target deteection breaches existing deep learning method and is not suitable for pedestrian's Small object inspection in real scene The obstacle of survey overcomes pedestrian detection method at this stage accuracy low tired when pedestrian's Small object in identifying real scene Difficulty promotes application and pedestrian small target deteection technology of the pedestrian detection technology based on deep learning under real scene Development, makes to play certain impetus from laboratory to practical application for pedestrian detection technology.

Detailed description of the invention

Fig. 1 is pedestrian's Small object schematic diagram in real scene；

Real scene pedestrian's small target deteection schematic network structure that Fig. 2 is supervised based on body key point；

Fig. 3 is pedestrian body critical point detection schematic diagram；

Fig. 4 is the heat diagram of the body key point and each key point that detect on characteristic pattern；

Fig. 5 is experimental result picture one；

Fig. 6 is experimental result picture two；

Fig. 7 is experimental result picture three；

Fig. 8 is experimental result picture four.

Specific embodiment

The present invention will be further described combined with specific embodiments below, but the present invention should not be limited by the examples.

Embodiment 1:

Super-resolution network: for generating high-definition picture corresponding with low-resolution image, pass through super-resolution net 4 times of super-resolutions can be realized in the frame structure of network, utilize convolutional Neural net on the basis of the super-resolution image that study generates Network extracts feature, will include detailed information abundant, be conducive to effective detection of pedestrian's Small object.

Pedestrian's sorter network: for determining that the image of input is that true high-definition picture and super-resolution network generate Super-resolution image realize super-resolution network dual training, and for determines input image be pedestrian image or Background image, and then complete the detection of pedestrian's Small object in real scene.

Wherein, the super-resolution network is deep learning network, including convolutional layer one, convolutional layer two, convolutional layer three, volume Block one, convolution block two, convolution block three, convolution block four, convolution block five and warp lamination one, warp lamination two；The convolutional layer One data output end is connected with the data input pin of the convolution block one；The convolution block one, convolution block two, convolution block three, It is corresponding with data input pin sequence connected that convolution block four and convolution block five carry out data output end respectively；The number of the convolution block five It is connected according to output end with the data input pin of convolutional layer two；The data output end of the convolutional layer two and the data of warp lamination one Input terminal is connected；The data output end of the warp lamination one is connected with the data input pin of the warp lamination two, described anti- The data output end of convolutional layer two is connected with the data input pin of the convolutional layer three, the data interaction end of the convolutional layer three with The supervised training data interaction end of the critical point detection network is connected, and each warp lamination realizes 2 times of up-samplings, whole in this way The resolution ratio of the output image of a super-resolution network will be 4 times of input picture.

Pedestrian's sorter network includes the Resnet50 master network structure full articulamentum parallel with two.

Real scene pedestrian's small target deteection net based on the supervision of body key point that the present embodiment proposes.In entire frame There are three network modules, as shown in Figure 2, wherein first is super-resolution network, and main function is that training generates a pedestrian The corresponding high-definition picture of Small object.In super-resolution network, input as pedestrian candidate region image, pedestrian candidate region Image includes the pedestrian image of high-resolution pedestrian image and low resolution, wherein the pedestrian image of low resolution is to utilize Linear interpolation method is as obtained from 4 times of high-resolution pedestrian image down-sampling.As shown in table 1, super-resolution network is one A depth convolutional neural networks are comprising three convolutional layers, five convolution blocks, two warp laminations.Each convolutional layer can be with Realize 2 times of up-samplings of input picture.The output of super-resolution network is the clearly super-resolution of 4 times of up-samplings after training Rate image can solve the very little problem of characteristic details information of pedestrian's Small object by convolutional neural networks extraction, meanwhile, clearly The clear super-resolution image comprising a large amount of high frequency details inputs the easier differentiation of subsequent pedestrian's sorter network Image is pedestrian image or background image, and then achievees the purpose that pedestrian's small target deteection in real scene.

Second network is critical point detection network, and main function is the key point letter for detecting pedestrian's Small object body Breath, the present invention utilize detect pedestrian body 17 key points carry out pedestrian detection, the key point is respectively nose, control Eyes, left-right ear, left and right shoulder, left and right elbow joint, left and right wrist joint, left and right hip joint, left and right knee joint, left and right ankle close Section, as shown in Figure 3.Later, using the loss function before the key point and mark key point detected come supervised training oversubscription Resolution network is believed so that the super-resolution image that super-resolution e-learning generates is more clear comprising more high frequency details Breath.Third network is pedestrian's sorter network, its main function is the clearly super-resolution for determining super-resolution network and generating Rate image is pedestrian image or background image.It is detected using a body critical point detection network in the present embodiment by super The key point for the clearly super-resolution image that resolution ratio network generates, as shown in Figure 4.Meanwhile in order to promote super-resolution net The image that network generates is more life-like, has more detail of the high frequency, the loss function of key point is introduced into super-resolution In the loss function of network, and then promote super-resolution to reach using pedestrian body key point supervised training super-resolution network The image that rate network generates is more life-like, the purpose with more detail of the high frequency.

Third network is pedestrian's sorter network, and in pedestrian's sorter network, input is true high-resolution pedestrian Candidate region image and the super-resolution image generated by super-resolution e-learning.As shown in table 1, the present embodiment uses Network trunk of the ResNet50 network structure as pedestrian's sorter network.In order to realize that it is defeated that pedestrian's sorter network can differentiate simultaneously Enter the super-resolution image (true/false) and determine that image is true high-definition picture or the generation of super-resolution e-learning Input picture is pedestrian image or background image, and the present embodiment is on the basis of master network Resnet50, according to actual needs The present embodiment has increased two parallel full articulamentums (Fully-connected Layer) newly in the structure of ResNet50, In a full articulamentum effect be differentiate input picture be it is true/false, the effect of another full articulamentum be determine input figure It seem pedestrian image or background image.The input of pedestrian's sorter network be true high-resolution pedestrian candidate region image and By the super-resolution image that super-resolution e-learning generates, it is true high-definition picture that output, which is input picture respectively, Probability and input picture be pedestrian image probability.

The network structure detailed information of table 1 super-resolution network and pedestrian's sorter network

The present embodiment using under real scene image or video frame as research object, user can be according to practical application request Voluntarily construct corresponding pedestrian detection database.In embodiment, it uses and has announced compared with other methods for convenience The CityPersons database with markup information.In CityPersons database, all images are in reality It captures and obtains under scene, the real scene in their 18 cities across 3 seasons in the multiple countries in Europe records figure, It and include a large amount of pedestrian's Small object (pedestrian's height is in 50-75 pixel).It establishes after tranining database, the present embodiment can use Benchmark pedestrian surveys the prediction that device carries out pedestrian position information to each picture that training sample is concentrated, and according to the position of prediction Interception obtains pedestrian candidate image and background candidate image sample.These candidate images being intercepted are further utilized to train oversubscription Resolution network and pedestrian's sorter network, wherein pedestrian candidate image is as positive sample, and background candidate image is as negative sample.

To sum up, the present embodiment proposes a kind of novel general true field end to end based on the supervision of body key point Being introduced into pedestrian's small target deteection technology for pedestrian body critical point detection is expert at by scape pedestrian's small target deteection frame Training pedestrian's small target deteection network under the supervision of human body key point.Super-resolution network can be according to pedestrian's Small object image It generates a corresponding high-definition image, and further promotes super-resolution network science using the supervision message of body key point It is more life-like to practise the image generated, includes more detail of the high frequency, solves existing pedestrian's inspection based on deep learning Survey method is not suitable for the problem of pedestrian's small target deteection, promotes the development of pedestrian's small target deteection technology, is pedestrian detection Technology plays certain impetus from laboratory to practical application.

Embodiment 2

The real scene pedestrian small target detecting method includes:

Step 1: the training sample training using CityPersons data set generates a benchmark pedestrian detector；So Afterwards, interception is carried out to the sample image in the CityPersons data set by the benchmark pedestrian detector and obtains candidate Area image；Wherein, the CityPersons data set tested sets the goal point according to pedestrian's size and by coverage extent by all For normal, Small object, serious shelter target and whole four classes, wherein Small object refers to the high pedestrian between 50-75 pixel Target；The candidate region image of each generation and the faying surface of artificial mark true value are sought for the candidate region image Product, and then obtain positive sample and negative sample；Wherein, positive sample indicates that pedestrian candidate image, negative sample indicate background candidate image； Herein, the classification standard of the positive sample and negative sample are as follows: overlapping area is greater than 0.5 and is labeled as positive sample, overlapping area Negative sample is labeled as less than 0.35；

Step 3: using critical point detection network by the way that the loss function of body key point is introduced into super-resolution network The mode of loss function exercise supervision training to super-resolution network, the super-resolution network after being trained；Wherein, described Body key point includes nose, left and right eye, left-right ear, left and right shoulder, left and right elbow joint, left and right wrist joint, left and right hip pass Section, left and right knee joint and left and right ankle-joint.

Step 5: by CityPersons data set true high-definition picture and the super-resolution image simultaneously It inputs in pedestrian's sorter network, dual training is carried out to the super-resolution network using pedestrian's sorter network, meanwhile, the row People's sorter network judges according to the high-definition picture (i.e. super-resolution image) that the super-resolution network after the training generates The low-resolution image of input is pedestrian image or background image, and then completes the identification inspection of pedestrian's Small object in real scene It surveys.Wherein, it includes: input candidate regions that the low-resolution image of the judgement input, which is the process of pedestrian image or background image, Score of the domain after pedestrian's sorter network is greater than certain threshold value (0.5) and is determined as pedestrian image, if score is less than 0.5 It is determined as background image.

The present embodiment proposes that the real scene pedestrian small target detecting method prepares according to the actual demand of oneself first Training sample is compared with existing method for convenience, and the present embodiment is using disclosed with markup information CityPersons database.Then, using training sample one benchmark pedestrian detector of training of preparation, benchmark pedestrian is utilized Detector generates candidate region image, for the super-resolution network and pedestrian's sorter network preparation training sample in the training present invention This, is in order to prove the versatility of real scene pedestrian's small target detecting method proposed by the present invention based on the detection network, The present embodiment uses existing ALFNet pedestrian detector to realize above-mentioned purpose.Later, trained benchmark pedestrian is utilized It surveys device and generates candidate region image, i.e., the pedestrian position for each image pattern concentrated to training data is predicted, and It is intercepted according to the pedestrian position information of prediction and generates pedestrian candidate image and background candidate image, obtained pedestrian candidate image It will be as real scene pedestrian's small target deteection network based on the supervision of body key point in the present invention with background candidate image Training sample.Further, since not having the mark of body key point in CityPersons data set, the present embodiment uses ECCV18 The method to rank the first in body critical point detection contest is come to the body key point letter in each pedestrian candidate region image It ceases and is predicted, and using the body key point predicted as body key point true value, and then be used to calculate body key point damage It loses function and removes supervised training super-resolution network.Finally, pedestrian and background candidate image using these interceptions, have obtained Pedestrian body key point true value trains the real scene pedestrian based on the supervision of body key point proposed as input information Small target deteection network includes specifically super-resolution network and pedestrian's sorter network, and wherein super-resolution network is according to input Low-resolution image study generates corresponding clearly high-resolution pedestrian image, and pedestrian's sorter network ultimately generates basis High-definition picture provides a more accurate pedestrian detection result.

Every part will be described in detail below:

The preparation process of training sample: training sample image can be collected voluntarily according to actual needs, and then be constructed corresponding Real scene pedestrian detection database, can also select existing disclosed pedestrian detection database, such as Caltech, The databases such as CityPersons.It is compared for convenience with other existing methods, the present embodiment uses CityPersons number Training sample and test sample according to the image of concentration as proposition method of the present invention.CityPersons data set be one most The new disclosed real scene pedestrian detection database being widely used, and these pictures are to capture to obtain under really scene , their 18 cities in the multiple countries in Europe across 3 seasons real street scene record figure comprising A large amount of pedestrian's Small object (pedestrian's height is between 50-75 pixel), while the shadow of the factors such as these pedestrians are blocked, illumination It rings, pedestrian's Small object under this real scene causes very big tired to the Small object identification of existing pedestrian detection method Difficulty makes existing pedestrian detection method that can not accurately identify pedestrian's Small object, proposes huge choose for existing detection method War.CityPersons data set includes 5000 images, the pedestrian targets of about 35000 marks, and the present embodiment is set according to standard Fixed training set and test set come train and test it is proposed by the invention based on body key point supervision real scene pedestrian Small target deteection network.In addition, CityPersons data set sets the goal all be detected according to pedestrian's size and by coverage extent Be divided into four classes, be normal/Small object/serious shelter target/whole respectively, Small Target refer to height 50-75 pixel it Between pedestrian target.The selection of CityPersons data set can fully demonstrate and illustrate proposed by the present invention based on the inspection The real scene pedestrian small target detecting method of survey grid network can overcome the difficulty of pedestrian's small target deteection under real scene, in turn Effectively improve the accuracy rate of pedestrian's small target deteection under real scene.

The training and candidate region image interception process of benchmark pedestrian detector: it is instructed using above-mentioned ready training sample Practice a benchmark pedestrian detector, effect be for it is proposed by the present invention based on body key point supervision real scene pedestrian it is small Target detection network generates tractable sample, i.e. interception generates candidate region image.The quality of benchmark pedestrian detector is by direct shadow Ring the quality to the real scene pedestrian's small target deteection training sample supervised based on body key point.Here reference row People's detector can be trained voluntarily, be also possible to any one existing pedestrian detector, and the present invention examines this benchmark pedestrian Device is surveyed as a reference line (Baseline), on the basis of this detector the accuracy rate of pedestrian's small target deteection is obtained It is promoted to further.So the present embodiment realizes above-mentioned purpose, infrastructure network using pedestrian detector ALFNet For ResNet-50.It is the real scene pedestrian's Small object supervised based on body key point using pedestrian detector in the present embodiment Detect the detailed process that network interception generates candidate region image are as follows:

The first step generates a benchmark pedestrian detector using the training sample training in CityPersons data set；Its In, the CityPersons data set includes training sample image and test sample image；

Finally, these images saved will finally generate to obtain corresponding high-resolution by super-resolution e-learning (4 times of up-samplings) image, then judges that these high-resolution images are pedestrian image or back by pedestrian's sorter network Scape image, and then realize the function of pedestrian's small target deteection under real scene.

Further, since not having the mark of pedestrian body key point in CityPersons data set, the present embodiment uses body Critical point detection network comes supervised training super-resolution network, specific process are as follows:

The body key point loss function are as follows:

Wherein, W_iIt (p) is a binary mask (mask), W_iAnd W (p)=0_i(p)=1 it is illustrated respectively in i-th of instruction The true value for practicing p-th of key point in image exists or is not present；Indicate the true of body key point confidence characteristic figure Value, K_i(p) the body key point confidence characteristic figure detected is indicated.Under the supervision of body key point, super-resolution network The image that study generates not only includes a large amount of detail of the high frequency, and super-resolution image is but also pedestrian's sorter network is easier The image for differentiating input is pedestrian image or background image.The input of super-resolution network is high-resolution pedestrian candidate figure Picture and opposite low-resolution image, and the body key point true value generated, the high-resolution that output generates for study Rate image.

Using critical point detection network to the training process of pedestrian's sorter network are as follows:

What it is to above-mentioned interception preservation most possibly includes pedestrian candidate region image, seeks candidate region of each generation The overlapping area of image and the pedestrian position true value (Ground-truth Bounding Boxes) artificially marked (Intersection over Union, IoU) is labeled as positive sample (pedestrian) if IoU is greater than 0.5, if IoU is less than 0.35 is labeled as negative sample (background).Due to real scene pedestrian's Small object based on the supervision of body key point in the present invention The image that super-resolution e-learning generates in detection network realizes 4 times of up-samplings, so needing corresponding low resolution when training Rate and high-definition picture are as training sample.In the present embodiment, by the figure of the pedestrian's reference detector selected interception generation As being used as high-definition picture, using 4 times of down-sampling of bilinear interpolation (Bi-linear Interpolation Method) Mode by the high-definition picture with being converted into corresponding low-resolution image.Finally, super-resolution network is exported super Resolution image and true high-definition picture are input to pedestrian's sorter network, instruct training pedestrian's sorter network.

A kind of work of real scene pedestrian's small target detecting method based on the supervision of body key point described in the present embodiment Process and experimental result:

A kind of real scene pedestrian's small target detecting method based on the detection network is based on body by training Real scene pedestrian's small target deteection network implementations real scene pedestrian's small target deteection of key point supervision, the course of work Are as follows:

Using super-resolution network, body critical point detection network and has been had been built up positive and negative pedestrian's sorter network structure One real scene pedestrian's small target deteection network based on the supervision of body key point of training sample training.In order to increase network instruction Experienced stability, the optimisation strategy for using for reference generation confrontation network (Generative Adversarial Network, GAN) are next excellent Change training real scene pedestrian's small target deteection network proposed by the invention based on the supervision of body key point, that is, allows super-resolution Rate network and the mode of the mutual game of pedestrian's sorter network, alternative optimization are trained.Super-resolution network is from low resolution sample As input, output result needs to imitate the authentic specimen in high-resolution sample set as far as possible stochastical sampling in this.Pedestrian point The pedestrian that the study that the input of class network then exports for true pedestrian's high-definition picture sample and super-resolution network generates High-definition picture sample, the purpose is to distinguish the output of super-resolution network as far as possible from authentic specimen, simultaneously Differentiate the classification of high-definition picture, and super-resolution network will then cheat pedestrian's sorter network as much as possible.Two groups of network phases Mutually confrontation, continuous adjusting parameter, final purpose is so that pedestrian's sorter network can not judge the output result of super-resolution network It is whether true, and then reach super-resolution network and can produce clearly high-definition picture, while pedestrian's sorter network can be with The accurately classification of classification input picture, is pedestrian image or background image.In the present embodiment, super-resolution network and row The network parameter of people's sorter network is trained since original initiation parameter, uses standard deviation to come for 0.02 Gaussian Profile initial Change convolution nuclear parameter (weight), deviation (bias) is initialized as 0.The network parameter of pedestrian's sorter network is utilized in ImageNet The model of pre-training is initialized on data set, for two newly-increased full articulamentums, the Gaussian Profile for being 0.1 with standard deviation Deinitialization, while deviation (bias) is initialized as 0.In addition, the loss function of pedestrian body key point is introduced into super-resolution Rate network obtains in objective function, so that the output image of super-resolution network is more clear, and then reaches pedestrian's sorter network more Add the purpose for being easy to differentiate input picture classification.When training whole network, each Mini-batch includes 64 images, positive and negative sample This ratio is 1:1, and total the number of iterations is 20 wheels, and the learning rate of preceding 12 wheel iteration is 0.01, and the learning rate of rear 8 wheel iteration is 0.001。

Experimental result: real scene pedestrian's small target deteection based on the supervision of body key point of training through the above steps Network is not easy to be influenced by scale and leads to that Detection accuracy is low to be limited by existing pedestrian detection method, Ke Yishi Pedestrian's small target deteection task under existing real scene.Experiments have shown that " the real scene pedestrian based on the supervision of body key point is small Object detection method " pedestrian's positioning accuracy is accurate, while detection efficiency is high, and table 2 is experimental result correlation data, wherein the present invention Using the standard evaluation index FPPI (False Positive Per Image) of CityPersons data set to the net trained Network is assessed.As can be seen that " the real scene pedestrian based on the supervision of body key point proposed by the present invention from correlation data Small target detecting method " the promotion that than the accuracy rate of current state-of-the-art pedestrian detector ALFNet to have one very big, 0.77%/0.99%/0.59% has been respectively increased on Reasonable/Small/All image set.In addition, newest with other Pedestrian detector is compared, and testing result of the present invention in Reasonable/Small/All image set will be better than other Method has reached current highest testing result 11.24/41.07/38.11, especially for height between 50-75 pixel The accuracy rate of pedestrian's small target deteection improves about 1%, thus demonstrate it is proposed by the present invention based on body key point supervision Validity of the real scene pedestrian small target detecting method on pedestrian's small target deteection.Fig. 5, Fig. 6, Fig. 7, Fig. 8 are experiment knot Fruit figure, the frame of Green is the true value position of the pedestrian artificially marked (Ground-truth Bounding Boxes), red Detection block be " the real scene pedestrian small target detecting method based on the supervision of body key point " testing result, Cong Tuzhong sees Even out pedestrian is very small, background is complicated, is illuminated by the light etc. factors influence under conditions of, method proposed by the present invention almost may be used To find whole pedestrian targets to be detected.

2 experimental result correlation data of table

Although the present invention has been disclosed in the preferred embodiment as above, it is not intended to limit the invention, any to be familiar with this The people of technology can do various changes and modification, therefore protection of the invention without departing from the spirit and scope of the present invention Range should subject to the definition of the claims.

Claims

1. a kind of real scene pedestrian's small target deteection network based on the supervision of body key point, which is characterized in that described to be based on Real scene pedestrian's small target deteection network of body key point supervision includes super-resolution network, critical point detection network and row People's sorter network:

Critical point detection network: for the loss function of body key point to be introduced into the loss function of super-resolution network, Using pedestrian body key point come supervised training super-resolution network；

Pedestrian's sorter network: for determining that it is super that the image of input is that true high-definition picture and super-resolution network generate Image in different resolution and realize with the dual training of super-resolution network, and for determines the image inputted be pedestrian image also It is background image, and then completes the detection of pedestrian's Small object in real scene.

2. real scene pedestrian small target detecting method according to claim 1, which is characterized in that the super-resolution network For deep learning network, including convolutional layer one, convolutional layer two, convolutional layer three, convolution block one, convolution block two, convolution block three, convolution Block four, convolution block five and warp lamination one, warp lamination two；The data output end of the convolutional layer one and the convolution block one Data input pin is connected；The convolution block one, convolution block two, convolution block three, convolution block four and convolution block five carry out data respectively Output end is corresponding with data input pin sequence to be connected；The data output end of the convolution block five and the data input pin of convolutional layer two It is connected；The data output end of the convolutional layer two is connected with the data input pin of warp lamination one；The number of the warp lamination one It is connected according to output end with the data input pin of the warp lamination two, the data output end of the warp lamination two and the convolution The data input pin of layer three is connected, the supervised training number at the data interaction end of the convolutional layer three and the critical point detection network It is connected according to interaction end.

3. real scene pedestrian small target detecting method according to claim 1, which is characterized in that pedestrian's sorter network Including the Resnet50 master network structure full articulamentum parallel with two.

4. a kind of based on the real scene pedestrian's small target detecting method for detecting network described in claim 1, which is characterized in that institute Stating real scene pedestrian's small target detecting method includes:

Step 1: generating a benchmark pedestrian detector using the training sample training in pedestrian detection database；Then, pass through The benchmark pedestrian detector carries out interception to the sample image in the pedestrian detection database and obtains candidate region image；It is right The candidate region image of each generation and the overlapping area of artificial mark true value are sought in the candidate region image, and then are obtained Obtain positive sample and negative sample；Wherein, positive sample indicates that pedestrian candidate image, negative sample indicate background candidate image；

Step 2: the benchmark pedestrian detector is intercepted the candidate region image obtained as high-definition picture, using double Linear interpolation method obtains corresponding low-resolution image for 4 times of the high-definition picture down-sampling；

Step 3: using critical point detection network by the way that the loss function of body key point to be introduced into the damage of super-resolution network The mode for losing function exercises supervision training to super-resolution network, the super-resolution network after being trained；

Step 4: the low-resolution image is input in the super-resolution network after training, the super-resolution after training is utilized Rate network training generates high-definition picture corresponding with low-resolution image, as super-resolution image；

Step 5: by the true high-definition picture and the super-resolution image in pedestrian detection database while inputting pedestrian In sorter network, dual training is carried out to the super-resolution network using pedestrian's sorter network, meanwhile, pedestrian's classification net Network is pedestrian image or background image according to the low-resolution image of super-resolution image judgement input, and then is completed true The recognition detection of pedestrian's Small object in real field scape.

5. real scene pedestrian small target detecting method according to claim 4, which is characterized in that waited described in step 1 The detailed process of constituency area image includes:

Step 1: generating a benchmark pedestrian detector using the training sample training in pedestrian detection database；Wherein, described Pedestrian detection database includes training sample image and test sample image；

Step 2: using benchmark pedestrian detector, it carries out pedestrian position letter for each image in training sample image Breath prediction, and interception generates 100 most possible regions comprising pedestrian and preservation from each image；

Step 3: using benchmark pedestrian detector, it carries out pedestrian position letter for each image in test sample image Breath prediction, and interception generates 100 most possible regions comprising pedestrian and preservation from each image；Wherein, second step The most possible region comprising pedestrian obtained with third step is candidate region image.

6. real scene pedestrian small target detecting method according to claim 4, which is characterized in that the inspection of pedestrian described in step 1 Measured data library uses CityPersons data set；The CityPersons data set is incited somebody to action according to pedestrian's size and by coverage extent All tested set the goal are divided into normal, Small object, serious shelter target and all four classes, wherein Small object refers to height in 50-75 Pedestrian target between a pixel.

7. real scene pedestrian small target detecting method according to claim 4, which is characterized in that positive sample described in step 1 The classification standard of this and negative sample are as follows: overlapping area is greater than 0.5 and is labeled as positive sample, and overlapping area is labeled as less than 0.35 Negative sample.

8. real scene pedestrian small target detecting method according to claim 4, which is characterized in that instructed described in step 3 The detailed process of super-resolution network after white silk includes:

Step 1 predicts the body key point information in each candidate region image using critical point detection network, and Using the body key point predicted as body key point true value；

Step 3, the super-resolution using the body key point loss function supervised training super-resolution network, after being trained Network.

9. real scene pedestrian small target detecting method according to claim 8, which is characterized in that the body key point damage Lose function are as follows:

Wherein, W_iIt (p) is a binary mask (mask), W_iAnd W (p)=0_i(p)=1 it is illustrated respectively in i-th of training figure The true value of p-th of key point as in exists or is not present；Indicate the true value of body key point confidence characteristic figure, K_i(p) the body key point confidence characteristic figure detected is indicated.

10. real scene pedestrian small target detecting method according to claim 8, which is characterized in that the body key point Including nose, left and right eye, left-right ear, left and right shoulder, left and right elbow joint, left and right wrist joint, left and right hip joint, left and right knee pass Section and left and right ankle-joint.