CN108460336A

CN108460336A - A kind of pedestrian detection method based on deep learning

Info

Publication number: CN108460336A
Application number: CN201810082310.0A
Authority: CN
Inventors: 孙炜程; 朱松豪; 荆晓远; 代心灵
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2018-08-28

Abstract

The present invention relates to a kind of pedestrian detection methods based on deep learning, and this approach includes the following steps：Video image input extraction network to be detected is generated into characteristic pattern first, then, the characteristic pattern input area that network generates will be extracted and propose network, propose that method detection is most likely to be the region of pedestrian using region, pedestrian candidate person and the corresponding score of pedestrian candidate person are generated, finally determines whether pedestrian candidate person is real pedestrian using trained decision Tree algorithms.It is an advantage of the invention that calculating simply, quickly, the accuracy rate of pedestrian detection can be significantly improved.

Description

A kind of pedestrian detection method based on deep learning

Technical field

The present invention relates to a kind of pedestrian detection method, especially a kind of pedestrian detection method based on deep learning belongs to Technical field of image processing.

Background technology

It is reported that pedestrian detection is an important subject of computer vision field.The purpose of pedestrian detection be for Accurately identify and position position of the pedestrian in image or video sequence.Currently, pedestrian detection vehicle DAS (Driver Assistant System), It is widely used in intelligent video monitoring and intelligent transportation.

Traditional pedestrian detection method is also referred to as the model of hand-designed, is the spy that pedestrian is indicated based on low-level image feature Sign, such as HOG features, Haar features, LBP features, LUV features, ICF features, Squares ChnFtrs features and LDCF are special Sign etc..At present mostly pedestrian detection is solved the problems, such as grader using support vector machines or decision tree.However, traditional row People's detection method needs the feature of engineer complex, needs a large amount of professional knowledge and there are one in terms of robustness Fixed limitation.

With the development of deep learning, based on the pedestrian detection method of deep learning in the case where not considering to calculate cost Achieve huge success.In general, the pedestrian detection method based on deep learning can be divided into two classes：One kind is to be based on area The pedestrian detection method that domain is suggested, such as R-CNN methods, SPP-Net methods, Faster R-CNN methods and R-FCN methods； Another kind of is the pedestrian detection method for being not based on region suggestion, such as YOLO methods and SSD methods.Although being not based on region to build The pedestrian detection method of view has some advantages in calculating speed, but this method cannot obtain very high precision.Therefore, right For most of pedestrian detection methods based on deep learning, the plan of pedestrian's candidate is often generated using region suggestion Slightly.

In addition, as deep learning is in the extensive use of pedestrian's detection field, convolutional neural networks are widely used in Pedestrian detection, such as AlexNet networks, VGG networks, ZF networks, Fast-RCNN networks, Faster-RCNN networks, R-CNN nets Network, MS-CNN networks and R-FCN networks.In R-CNN, Fast-RCNN and Faster R-CNN this series of method, Propose that strategy is used to improve target detection accuracy rate and calculating speed in region.For MS-CNN methods, one multiple dimensioned Propose that network is used to improve the accuracy of detection Small object in region.For R-FCN methods, carried using full convolutional network and region View network is combined to carry out pedestrian detection.Compared with Faster-RCNN methods, R-FCN methods substantially increase calculating speed And slightly improve the accuracy rate of pedestrian detection.Although the target detection technique development based on deep learning is very rapid, Either in terms of accuracy rate or speed, the method for pedestrian detection is still significantly improved space.

Invention content

It is an object of the invention to：In view of the defects existing in the prior art, a kind of new row based on study depth is proposed People's detection method, accuracy and rapidity for improving pedestrian detection.

In order to reach object above, the present invention provides a kind of pedestrian detection method based on deep learning, including it is following Step：

Video image input extraction network to be detected is generated characteristic pattern by the first step；

Second step will extract the characteristic pattern input area proposal network that network generates, and region is recycled to propose that method detection most has It may be the region of pedestrian, generate pedestrian candidate person and the corresponding score of pedestrian candidate person；

Third step determines whether pedestrian candidate person is real pedestrian using trained decision Tree algorithms.

Video image is input in designed pedestrian detection model by the present invention first, then using based on deep learning PVANet networks generate characteristic pattern, propose that network generates pedestrian candidate person and corresponding score, last profit followed by region Classified to the pedestrian candidate person of generation to find out real pedestrian with trained decision Tree algorithms.

Preferably, the extraction network uses PVANet networks, the PVANet networks to have 14 layers, and wherein three first layers are Convolutional layer, centre are two groups of initial layers, and every group of initial layers include the identical initial layers of four structures, and last three layers are full connection Layer；The output of the full articulamentum is the input that network and decision tree classifier are proposed in region.

It is further preferred that the structure of all initial layers is all identical, single initial layers are by first, second, third point Zhi Zucheng, first branch are made of one 1 × 1 convolutional layer, and second branch is by one 1 × 1 convolutional layer and one A 3 × 3 convolutional layer composition, the third branch are made of one 1 × 1 convolutional layer and two 3 × 3 convolutional layers.

Still further preferably, the specific method is as follows for single initial layers generation characteristic pattern：The characteristic pattern that last layer generates It is passed to the first, second, third, etc. three branches of initial layers respectively, the characteristic pattern then exported by these three branches is transmitted To an articulamentum, next layer is finally entered, becomes next layer of input feature vector figure, is more precisely obtained by initial layers in this way The target of small scale.Last layer is that the next layer of convolutional layer of single initial layers either front is single initial layers or subsequent Full articulamentum.

Preferably, in network is proposed in the region, for the input feature vector figure that PVANet networks generate, by a m*m The sliding window of size is used and generates multiple features connected entirely on each width characteristic pattern, and the feature each connected entirely includes two Branch, one of branch are scs layers, another branch is cds layers.Single sliding window can predict simultaneously different scale and The region motion of different aspect ratios.For example, when the region motion of sliding window four scales of prediction and four length-width ratios, will produce 4*4 region motion.That is, sliding window generates 4*4*4 output at cds layers, 2*4*4 output is generated at scs layers. Described cds layers is used for generating pedestrian candidate person, including pedestrian candidate person（Predict target）The coordinate of central point and the pedestrian Candidate（Predict target）Width and height；Described scs layers is used for generating the corresponding score of pedestrian candidate person, that is, predicts mesh Target corresponding scores, predict the corresponding scores of target be predict target be the estimated probability of pedestrian and be not pedestrian probability； The pedestrian candidate person generated by cds layers score corresponding with the pedestrian candidate person generated by scs layers is transported to decision tree classification Device is trained and detects.

Cost is calculated in order to be reduced in the case where not influencing pedestrian detection precision, in the training process, by PVANet nets The convolution feature that network generates is used as region and proposes network and detect the input of network.

The decision tree of the present invention uses tree, wherein a characteristic attribute is sentenced in the expression of each non-leaf nodes It is disconnected, characteristic attribute of each branching representation pair judge as a result, each leaf node represents a classification.In order to determine Plan, the characteristic attribute for first having to treat class object since root node are tested, and are then selected according to test result corresponding Branch, finally repeat the process until reaching a leaf node.The classification of the leaf node of arrival is exactly to predict The classification of the target to be classified.

Although the mutation of decision Tree algorithms has very much, such as ID3, C5.0 and CART algorithms, their basic thought All it is identical, and the use of the accuracy rate that decision Tree algorithms are classified is very high.Decision tree classifier it is basic Thought is the multiple Weak Classifiers of training on the same training set, then these Weak Classifiers are combined into final strong classification Device.These Weak Classifiers are respectively there are one weighting parameter β, that is, the ratio of sample number that grader is correctly classified.Therefore it needs One threshold value is set to determine whether sample is correctly classified.Successive ignition will be carried out in the training process to decision tree, If the classification accuracy of some Weak Classifier is very low during an iteration, the performance of the Weak Classifier is also meaned that very Difference, then the parameter of the Weak Classifier will be reduced.

Specifically, training the method for decision tree as follows using RealBoost algorithms：

1. given training set,

（x₁, y₁）...（x_i, y_i）...（x_N, y_N）

Wherein, y_iIt is feature vector, and i=1 ..., N；

2. in the starting stage, Weak Classifier is numbered and number is denoted as j, according to（1）Formula determines each Weak Classifier Weight,

（1）

Wherein, W_jFor the weight of Weak Classifier, H is the number of Weak Classifier；

3. carrying out n times to Weak Classifier to train to obtain training data, training is numbered and number is denoted as n, then basis （2）Formula obtains a probability Estimation,

（2）

Wherein, P_n（y）For the probabilistic estimated value of Weak Classifier, N is the frequency of training of Weak Classifier；

4. basis（3）Formula calculates the true Distribution value of Weak Classifier,

（3）

Wherein, f_n（y）For the actual value of Weak Classifier, R is set of real numbers；

5. in the training process, according to（4）Formula obtains the weight of Weak Classifier,

（4）

6. after each iteration, the weight of all Weak Classifiers is normalized again so that total weight of all Weak Classifiers Equal to 1, strong classifier is finally obtained, according to（5）Formula obtains strong classifier,

（5）

Wherein, N is the frequency of training of Weak Classifier.

The method for determining real pedestrian is as follows：First with trained decision Tree algorithms to the pedestrian candidate person of generation into Row classification, then according to pedestrian candidate person's classification preset threshold value, when the pedestrian candidate person in characteristic pattern is the probability of pedestrian When less than preset threshold value, then pedestrian candidate person classification is real pedestrian, and the otherwise classification is not real pedestrian.

It is an advantage of the invention that calculating simply, quickly, the accuracy rate of pedestrian detection can be significantly improved.

Description of the drawings

The present invention will be further described below with reference to the drawings.

Fig. 1 is the network model figure of the present invention.

Fig. 2 is the structural model figure of PVANet networks in the present invention.

Fig. 3 is the structural model figure of PVANet network initial layers in the present invention.

Fig. 4 is the structural model figure that network is proposed in region in the present invention.

Fig. 5（a）For the part training sample exemplary plot in the present invention on Caltech pedestrian detections data set.

Fig. 5（b）Sample instantiation figure is detected for the part in the present invention on Caltech pedestrian detections data set.

Fig. 6（a）For the part training sample exemplary plot in the present invention on INRIA pedestrian detections data set.

Fig. 6（b）Sample instantiation figure is detected for the part in the present invention on INRIA pedestrian detections data set.

Fig. 7（a）The part sample being detected on Caltech pedestrian detection data sets for pedestrian detection model in the present invention This exemplary plot.

Fig. 7（b）The part knot being detected on Caltech pedestrian detection data sets for pedestrian detection model in the present invention Fruit exemplary plot.

Fig. 8（a）The part sample being detected on INRIA pedestrian detection data sets for pedestrian detection model in the present invention Exemplary plot.

Fig. 8（b）The partial results being detected on INRIA pedestrian detection data sets for pedestrian detection model in the present invention Exemplary plot.

Specific implementation mode

Embodiment one

A kind of pedestrian detection method based on deep learning is present embodiments provided, the mentality of designing of this method is：By video figure As being input to designed pedestrian detection model（Model structure is shown in Fig. 1）In, it is given birth to using the PVANet networks based on deep learning Propose that network proposes that method generates pedestrian candidate person and row using region at characteristic pattern, then by the characteristic pattern input area of generation The corresponding score of people's candidate finally classifies to find out to the pedestrian candidate person of generation using trained decision Tree algorithms Real pedestrian.

PVANet networks share 14 layers, and wherein three first layers are convolutional layer, and centre is two groups of initial layers, every group of initial layers packet Containing the identical initial layers of four structures, last three layers are full articulamentum（It is shown in Table 1）.As shown in Fig. 2, video image enters PVANet After network, pass through three first layers convolutional layer successively, the processing of intermediate eight layers of initial layers and rear three layers of full articulamentum generates the spy of output Sign figure, characteristic pattern are passed to region by full articulamentum and propose that network and decision tree classifier, the output of full articulamentum are proposed for region The input of network and decision tree classifier.The structure of all initial layers is all identical.

The structure for the PVANet networks that table 1. optimized

Convolutional layer	4*4_32
		Convolutional layer	3*3_32
Convolutional layer	3*3_32
		First group of initial layers	11_96 - 11_16_33_64 - 11_16_33_32_33_32
First group of initial layers	11_96 - 11_16_33_64 - 11_16_33_32_33_32
		First group of initial layers	11_96 - 11_16_33_64 - 11_16_33_32_33_32
First group of initial layers	11_96 - 11_16_33_64 - 11_16_33_32_33_32
		Second group of initial layers	11_128 - 11_32_33_96 - 11_16_33_32_33_32
Second group of initial layers	11_128 - 11_32_33_96 - 11_16_33_32_33_32
		Second group of initial layers	11_128 - 11_32_33_96 - 11_16_33_32_33_32
Second group of initial layers	11_128 - 11_32_33_96 - 11_16_33_32_33_32
		Full articulamentum	4096
Full articulamentum	4096
		Full articulamentum	1000

Table 1 lists the structure of the PVANet networks optimized.In the entire network, l*l_M indicates that the convolution kernel of this layer is l*l And will export M characteristic pattern.In initial layers, branch different in initial layers is indicated using "-", last full articulamentum Parameter indicates the number for the neuron for including in full articulamentum.

As shown in figure 3, single initial layers are made of the first, second, third branch, the first branch is by one 1 × 1 convolution Layer composition, the second branch are made of one 1 × 1 convolutional layer and one 3 × 3 convolutional layer, and third branch is by one 1 × 1 Convolutional layer and two 3 × 3 convolutional layers composition.The characteristic pattern that last layer generates is passed to three branches of initial layers respectively, so The characteristic pattern exported afterwards by these three branches is transferred into an articulamentum, and the characteristic pattern which exports finally enters next Layer, becomes next layer of input feature vector figure.Last layer can be the convolutional layer that single initial layers can also be front, and next layer can To be that single initial layers can also subsequent full articulamentum.The convolution feature generated by PVANet networks is used as region and proposes net Network and the input for detecting network.

As shown in figure 4, in network is proposed in region, it is for the input feature vector figure that PVANet networks generate, a m*m is big Small sliding window is used and generates multiple features connected entirely on each width characteristic pattern, and the feature each connected entirely includes two points Branch, one of branch is scs layers, another branch is cds layers.Single sliding window can predict different scale and not simultaneously With the region motion of aspect ratio.Cds layers be used for generating the i.e. coordinates of prediction target's center point and the width of the prediction target and Highly, the scs layers of corresponding scores for being used for generating prediction target predict that target is the estimated probability of pedestrian and is not pedestrian Probability.The output finally generated by cds layers and by scs layers is sent to decision tree classifier and is trained and detects.

In decision tree classifier, the method for training decision tree is as follows：Given training set（x₁, y₁）...（x_i, y_i）...（x_N, y_N）,, wherein y_iIt is feature vector, and i=1 ..., N.In the starting stage, Weak Classifier is numbered simultaneously Number is denoted as j, according to（1）Formula isDetermine the weight of each Weak Classifier, wherein W_jFor Weak Classifier Weight, H be Weak Classifier number.In the training process, n times first are carried out to Weak Classifier to train to obtain training data, is being instructed Practice and training is numbered before starting and number is denoted as n, further according to（2）Formula is Obtain a probability Estimation, wherein P_n（y）For the probabilistic estimated value of Weak Classifier, N is the frequency of training of Weak Classifier, then root According to（3）Formula isCalculate the true Distribution value of Weak Classifier, wherein f_n（y）It is weak The actual value of grader, R are set of real numbers, last basis（4）Formula isIt obtains weak The weight of grader.After each iteration, the weight of all Weak Classifiers is normalized again so that all Weak Classifiers Total weight is equal to 1, finally obtains strong classifier。

After decision tree trains, classified to the pedestrian candidate person of generation using trained decision Tree algorithms, so Afterwards according to pedestrian candidate person's classification preset threshold value, set in advance when the probability that the pedestrian candidate person in characteristic pattern is pedestrian is less than When fixed threshold value, then pedestrian candidate person classification is real pedestrian, and the otherwise classification is not real pedestrian.

Pedestrian detection model of the present embodiment based on deep learning is tested, to assess its performance.Specific assessment side Method is as follows：

Step A, the data set for assessment algorithm performance is introduced.

The experiment uses two pedestrian detection data sets, respectively Caltech pedestrian detections data set and INRIA pedestrian's inspection Measured data collection.Wherein, Caltech pedestrian detections data set is one bigger pedestrian's data set of current scale, the data Collection has 250000 frames to be marked from a video for being up to ten hours using vehicle-mounted camera shooting in the video, It includes 350000 rectangle frames and 2300 different pedestrians that entire video, which has altogether, and the data set is also to these rectangle frames Between hiding relation marked.This data set includes 11 small videos from entire video, each video For size all in 1G or so, mark all has been carried out in the first six video therein, this six videos include 192000 pedestrians altogether, 6100 positive samples and 61000 negative samples, are used as being trained pedestrian detection network；Five videos do not correspond to afterwards Markup information, this five videos altogether include 155000 pedestrians, 56000 positive samples and 65000 negative samples, be used to Detect the effect of pedestrian detection method.Fig. 5（a）And Fig. 5（b）Give one on the pedestrian detection data set of California Institute of Technology The example of a little pictures, wherein Fig. 5（a）Show the part training image of Caltech pedestrian detection data sets, Fig. 5（b）Exhibition What is shown is the partial test image of Caltech pedestrian detection data sets.From Fig. 5（a）And Fig. 5（b）In as can be seen that Caltech Video frame in pedestrian detection data set is very fuzzy, therefore it is one to carry out experiment on Caltech pedestrian detection data sets Challenging task.

INRIA pedestrian detection data sets are most common static pedestrian's Test databases, include original graph in the data set Piece and corresponding label.INRIA data sets provide two kinds of training and test sample, and it is different that one is photo resolutions Training and test sample, another kind are the identical training of photo resolution and test sample.The experiment of the present embodiment uses picture The different sample of resolution ratio, wherein training set include 614 positive samples and 1218 negative samples, and test set includes 288 positive samples Sheet and 453 negative samples.Fig. 6（a）And Fig. 6（b）Give the example of some pictures on INRIA data sets, Fig. 6（a）It is Train picture, including 6 positive samples and 6 negative samples, Fig. 6 in the part of INIRIA data sets（b）It is the portion of INIRIA data sets Divide test pictures, including 6 positive samples and 6 negative samples.From Fig. 6（a）And Fig. 6（b）As can be seen that in INRIA data sets Video definition is relatively high.

Step B, a kind of pedestrian detection method based on deep learning is given for the present embodiment pedestrian detection model.

First, using the characteristic pattern of the mobile pedestrian of PVANet networks extraction optimized（The knot of the PVANet networks of optimization Structure is as shown in table 1）.Video image can generate 512 characteristic patterns by PVANet networks, preceding 128 features in these characteristic patterns Figure is used for region and proposes that network generates pedestrian candidate person and corresponding score.

Secondly, propose that network generates pedestrian candidate person and corresponding score using region.Propose to use in network in region Five kinds of scales and five kinds of length-width ratios generate 25 regions to each sliding window and suggest.For every frame picture, we, which only obtain, divides Highest 200 regions, which suggest being sent into decision tree classifier, is trained decision tree.

Finally, it trains decision tree classifier and the pedestrian candidate person of generation is divided using trained decision Tree algorithms Class is to find out real pedestrian.Purpose to decision tree classifier training is succinct and being capable of fine arranged row in order to obtain one The decision tree of people.In the most initial stage of training decision tree, by all positive samples, grab sample and identical as positive sample quantity Negative sample and a certain proportion of sample for being difficult to classify as training set, entire training is divided into six stages, by decision The threshold value of Tree Classifier is set as 0.7.Include 64 in the first stage of training decision tree to set, later the quantity of tree of each stage It is double, and addition certain proportion, the negative sample for being difficult to classify constitute new training set in old training set, ultimately generate one The decision tree classifier set with 2048, this decision tree classifier is the strong classifier finally used.

Step C, by contrast experiment, the performance of the put forward algorithm of the present embodiment is inquired into.

During the experiment, the threshold value of decision tree classifier is set as 0.7, to the pedestrian detection model of the present embodiment into Row experiment.After experiment, for pedestrian detection model experimental result time performance with it is state-of-the-art with some on miss rate Method is compared, including CompACT-Deep methods, CCF methods and LDCF methods.Final comparison result is in table 2 Middle display.From in final comparison result it can be found that the method for the present embodiment for these state-of-the-art methods not Advantage is only occupied in the processing of single frames picture and on miss rate again smaller than these methods.

Comparison result of the table 2. on time performance and miss rate

Method	Time/per pictures (second)	Miss rate %
			PVANet+RPN+BF [the present embodiment method]	0.48	9
CompACT-Deep	0.5	12
			LDCF	0.6	25
CCF	13	17

In addition, Fig. 7（a）And Fig. 7（b）Illustrate the sediment that pedestrian detection is carried out on Caltech pedestrian detection data sets Picture.Fig. 7（a）It is the original image on Caltech data sets, Fig. 7（b）It is that these original images are examined in the pedestrian of the present embodiment Survey corresponding testing result on model.From Fig. 7（a）And Fig. 7（b）Although can be seen that on Caltech pedestrian detection data sets Picture is very fuzzy, but the pedestrian detection model of the present embodiment remains to obtain the result of good pedestrian detection.

The pedestrian detection model that the present embodiment is proposed carried out on Caltech pedestrian detection data sets training and It is tested on INRIA data sets to verify the validity of model.Fig. 8（a）And Fig. 8（b）It illustrates in INRIA pedestrian detection numbers According to the sediment picture for carrying out pedestrian detection on collection.Fig. 8（a）It is the original image on INRIA data sets, Fig. 8（b）It is these Original image corresponding testing result on the pedestrian detection model of the present embodiment.From Fig. 8（a）And Fig. 8（b）As can be seen that the greatest extent The problem of pipe blocks is not resolved, but the model of the present embodiment obtains very well on INRIA pedestrian detection data sets Pedestrian detection result.

In short, by showing the scheme that the present embodiment proposes in the experiment of INRIA and Caltech pedestrian detection data sets It is really effective, the accuracy and rapidity of pedestrian detection can be significantly improved.

In addition to the implementation, the present invention can also have other embodiment.It is all to use equivalent substitution or equivalent transformation shape At technical solution, fall within the scope of protection required by the present invention.

Claims

1. a kind of pedestrian detection method based on deep learning, which is characterized in that include the following steps：

2. a kind of pedestrian detection method based on deep learning according to claim 1, it is characterised in that:The extraction net Network uses PVANet networks, the PVANet networks to have 14 layers, and wherein three first layers are convolutional layer, and centre is two groups of initial layers, Every group of initial layers include the identical initial layers of four structures, and last three layers are full articulamentum；The output of the full articulamentum is area Propose the input of network and decision tree classifier in domain.

3. a kind of pedestrian detection method based on deep learning according to claim 2, it is characterised in that:Single initial layers by First, second, third branch forms, and first branch is made of one 1 × 1 convolutional layer, and second branch is by one 1 The convolutional layer composition of × 1 convolutional layer and one 3 × 3, the third branch is by one 1 × 1 convolutional layer and two 3 × 3 Convolutional layer forms.

4. a kind of pedestrian detection method based on deep learning according to claim 3, which is characterized in that single initial layers Generating characteristic pattern, the specific method is as follows：Last layer generate characteristic pattern by respectively be passed to initial layers three branches, then by The characteristic pattern of these three branches output is transferred into an articulamentum, finally enters next layer, becomes next layer of input feature vector figure.

5. a kind of pedestrian detection method based on deep learning according to claim 4, it is characterised in that：It is carried in the region It discusses in network, for the input feature vector figure that PVANet networks generate, a sliding window is used and is generated on each width characteristic pattern Multiple features connected entirely, the feature each connected entirely include Liang Ge branches, and one of branch is scs layers, another branch It is cds layers；Described cds layers is used for generating pedestrian candidate person, including the coordinate of pedestrian candidate person's central point and the pedestrian candidate The width and height of person；Described scs layers is used for generating the corresponding score of pedestrian candidate person；The pedestrian candidate person generated by cds layers Score corresponding with the pedestrian candidate person generated by scs layers is transported to decision tree classifier and is trained and detects.

6. a kind of pedestrian detection method based on deep learning according to claim 5, it is characterised in that：Single sliding window The region motion of different scale and different aspect ratios can be predicted simultaneously.

7. a kind of pedestrian detection method based on deep learning according to claim 6, it is characterised in that：The sliding window When the region motion of four scales of prediction and four length-width ratios, 4*4 region motion will produce.

8. a kind of pedestrian detection method based on deep learning according to claim 7, it is characterised in that：The sliding window 4*4*4 output is generated at cds layers, and 2*4*4 output is generated at scs layers.

9. a kind of pedestrian detection method based on deep learning according to claim 8, it is characterised in that：Using RealBoost algorithms train the method for decision tree as follows：

1. given training set,

（x₁, y₁）...（x_i, y_i）...（x_N, y_N）

Wherein, y_iIt is feature vector, and i=1 ..., N；

（1）

（2）

（3）

（4）

（5）

Wherein, N is the frequency of training of Weak Classifier.

10. a kind of pedestrian detection method based on deep learning according to claim 9, which is characterized in that determine real row The method of people is as follows：Classify to the pedestrian candidate person of generation first with trained decision Tree algorithms, then according to row People's candidate classification preset threshold value, when the probability that the pedestrian candidate person in characteristic pattern is pedestrian is less than preset threshold value When, then pedestrian candidate person classification is pedestrian, and the otherwise classification is not pedestrian.