CN108038409A - A kind of pedestrian detection method - Google Patents

A kind of pedestrian detection method Download PDF

Info

Publication number
CN108038409A
CN108038409A CN201711030102.8A CN201711030102A CN108038409A CN 108038409 A CN108038409 A CN 108038409A CN 201711030102 A CN201711030102 A CN 201711030102A CN 108038409 A CN108038409 A CN 108038409A
Authority
CN
China
Prior art keywords
image
scale
pedestrian
feature map
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711030102.8A
Other languages
Chinese (zh)
Other versions
CN108038409B (en
Inventor
章东平
胡葵
王都洋
张香伟
杨力
肖刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Gao Chuan Security Service Technology Co Ltd
Original Assignee
Jiangxi Gao Chuan Security Service Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Gao Chuan Security Service Technology Co Ltd filed Critical Jiangxi Gao Chuan Security Service Technology Co Ltd
Priority to CN201711030102.8A priority Critical patent/CN108038409B/en
Publication of CN108038409A publication Critical patent/CN108038409A/en
Application granted granted Critical
Publication of CN108038409B publication Critical patent/CN108038409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of pedestrian detection method, pass through the pedestrian detection method based on convolutional neural networks, multiple convolution and pond are carried out to input picture, extract artwork feature, obtain the corresponding characteristic pattern of artwork, corresponding characteristic pattern after artwork scales is gone out by characteristics of image pyramid rule approximate calculation, suggest network RPN generation candidate windows by region respectively, Candidate Submission window is further selected by candidate window one skilled in the art Size Distribution and is collected, the pedestrian target of the different scale corresponding weight on different scales image is trained using the training data of tape label, training grader network.Try to achieve the judgement that the confidence level that the candidate window after collecting obtains after grader makes final pedestrian detection compared with the threshold value of setting.The pyramidal application of characteristics of image avoids the heavy calculation that characteristic pattern is calculated in image scaling, and detects the erroneous judgement and missing inspection that effectively prevent the detection of single features figure on different characteristic figure using the mode that different weights weight.

Description

Pedestrian detection method
Technical Field
The invention relates to a pedestrian detection method, and belongs to the field of target detection.
Background
In recent years, pedestrian detection technology has been widely applied in the fields of intelligent monitoring, automatic driving, robot vision, and the like. In practical application, the pedestrian detection is very challenging due to the fact that the sizes of pedestrians captured in the video are variable in the wearing and posture of pedestrians. Pedestrian detection has two main modes: one is a traditional pedestrian detection method based on a sliding window, and the other is a pedestrian detection method based on deep learning and feature extraction. The traditional pedestrian detection method is large in calculation amount and limited in detection speed without using GPU resources, because the performance of a computer is continuously enhanced and the GPU calculation capability is used, the detection speed of most deep learning methods based on learning characteristics is superior to that of the traditional method, but the multi-scale problem of pedestrians is difficult to solve.
Disclosure of Invention
In order to solve the problems that speed and detection precision are difficult to balance in the pedestrian detection process and the pedestrian is in multi-scale, the invention provides a pedestrian detection method, which comprises the following steps:
determining a current frame image: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as the current frame image;
step (2) obtaining a characteristic diagram: passing the current frame image through a plurality of convolution layers and pooling layers, and obtaining a feature map through the last convolution layer;
and (3) feature diagram expansion: calculating a feature map corresponding to an image adjacent scale through an image feature pyramid rule, and sequentially expanding N small-scale expansion feature maps and N large-scale expansion feature maps, wherein the expansion times N and the expansion multiples are not limited, so that 2N +1 feature maps are obtained;
step (4) proposing window allocation: generating a candidate window by the feature map through a regional recommendation network (RPN), and further selecting the candidate window according to the pedestrian size distribution;
step (5) classification network training: training a deep neural network by utilizing the distribution of pedestrians in different feature maps in various scales;
step (6), pedestrian detection and labeling: and (4) summarizing the number of the proposed windows of the obtained three scale feature maps in proportion, classifying by the trained classifier in the step (5), and framing the pedestrians after non-maximum value inhibition.
Further, the step (1) is specifically as follows: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as a current frame image, and marking as I 1
Further, the step (2) is specifically as follows: passing the current frame image through multiple convolutional layers and pooling layers, wherein the convolutional layers and pooling layers are crossed without any limit, obtaining a feature map (feature map) through the last convolutional layer, and recording as f 1
Further, the step (3) is specifically: calculating image I through image power law rule and image characteristic pyramid rule 1 Feature maps corresponding to near-scale, typically using f m =C p (S(I 1 M)), in which formula I 1 Representing the original image, M the zoom size, S the original image, C p Features are calculated on behalf of the convolution pooling operation. Now to reduce convolution operations and increase operating speed, the formula is used:
wherein the parameter m represents the current scale, m 'represents the scaled scale, S represents scaling the feature graph by m'/m times, f represents the feature, the constant coefficient alpha can be measured on the training set by experiment, the formula shows the originalFIG. I m Calculating features by convolution pooling operation, obtaining image features close to zoom scale from known feature map change, and calculating feature maps corresponding to the original image with alpha and beta times of size, such as 1/2 × I 1 And 2 x I 1 (the scale of the extended picture and the number of the extension are not limited, and the two scales are convenient to select in consideration of detection speed and expression) because the pyramid rule performs proximity calculation every timeMultiplying, calculating the characteristic graph by iteration four times, and taking the corresponding characteristic graph as f 1/2 Because there is no high frequency loss in the image upsampling, the information content of the upsampled picture is similar to the content of the low resolution, and the characteristic calculation formula is:
f σ =σ*S(f 1 ,σ)) (2.2)
in the formula f 1 Representing the original image to the feature map, S representing the feature map f 1 Magnification of a factor of sigma, f σ Is an up-sampled image.
Further, the step (4) is specifically as follows: because the RPN has a single receptive field, a large target tends to be detected on a reduced-scale image corresponding feature map, a small target tends to be detected on an enlarged-scale image corresponding feature map, a pedestrian target in the image is divided into three scales, an experiment is carried out on a KITTI data set with multi-scale pedestrians, and the pedestrians in the data set are set to be height according to different height heights<H 1 ,H 1 ≤height< H 2 ,...,H n-1 ≤height<H n ,height≥H n Here H 1 To H n The number of small and large pixel points corresponding to the number of pedestrians with different scales is A respectively 1 ,A 2 ,...,A n . Then, selecting T pedestrian candidate frames of each scale on each feature map according to the proportion distribution of the candidate frames in the feature map, and sequentially selecting T pedestrian candidate frames uv The number of the main components is one,
in the formula T uv Is the number of the pedestrians with the v scale on the u characteristic diagram which needs to be extracted finally, Z u Is the sum of the candidate windows to be extracted finally on the u-th feature map, Z u (n is more than or equal to 1 and less than or equal to 2N + 1) according to the data set condition, and the same number or different numbers can be selected from each characteristic diagram), wherein A in the formula uv The number of the pedestrians in the v scale on the u feature map is shown. Because the proposed window network has a single receptive field (the area of the input image corresponding to the response of a certain node on the output feature map), a large target tends to be detected on the reduced-scale image corresponding feature map, and a small target tends to be detected on the enlarged-scale image corresponding feature map, so that the candidate windows are extracted according to the proportion of targets with different scales, which is beneficial to exerting the detection advantages of the network on different feature maps.
Further, the step (5) is specifically as follows:
1) Selecting a KITTI data set with various pedestrian scales for experiment, and dividing pedestrians into X-size pedestrians according to the height on a training data set (the size series is not limited);
2) The method comprises the steps of sharing and training an RPN (region proxy) network and a softmax classifier combined network by utilizing convolutional layer characteristics, training an RPN region suggestion network by adopting a cross-rotation training mode, then training a region-based classifier network by using a candidate window, and then training the RPN region suggestion network by using the classifier network. The loss layer (loss layer) is the end point of the Convolutional Neural Network (CNN), and accepts two values as input, wherein one value is the predicted value of the CNN, and the other value is a real label. The Loss layer performs a series of operations through the predicted value and the tag value to obtain a Loss Function (Loss Function) of the current network, which is generally referred to as L (W), where W is a vector space formed by the current network weight. The purpose of training the network is to find a weight W (opt) which minimizes a loss function L (W) in a weight space, the weight W (opt) can be approximated by an optimization method of random gradient descent (stochastic gradient device), and the network has two loss functions, one is a classification loss function and the other is a regression loss function;
3) Because the structure in the step (3) is changed, the loss function is optimized correspondingly, the parameter to be trained and optimized is W, and the parameter is setWherein M is i Is that the training is a sampled image block of interest, N is the total number of training samples, y i E (0, 1) is M i Class label of B i =(m'/m)*(b i x ,b i y ,b i w ,b i h ) Is the bounding box coordinate corresponding to the feature map, where b i x ,b i y ,b i w ,b i h Respectively representing the coordinates of the image blocks on the original image, (m'/m) is the zoom size explained in step (3);
4) The multitask penalty function thus is:
where n is the number of scale steps of the target size, E x Is a data sample for each scale, M i Is an image block of interest sampled by a training set, A 1 ,A 2 ,...,A n Representing the number of pedestrians in n scales, respectively, and l is a combined loss function of classification and regression, defined as:
l(M,(y,B)|W)=L cls (p(M),y)+β[y≥1]L loc (T y ,B) (2.5)
where β is a trade-off coefficient, T y Is the predicted frame position of class y, [ y ≧ 1]Indicates that there is a regression loss only at the positive, L clc And L loc Cross entropy loss and boundary regression loss, respectively, are defined as:
in the formula p y (M)=p 0 (M)+p 1 (M),y ∈ (0, 1) is class label of M, T i y =(t i x ,t i y ,t i w ,t i h Is the predicted bounding box position, B) i =(m'/m)*(b i x ,b i y ,b i w ,b i h ) Is the bounding box coordinates corresponding to the feature map.
5) Because the prediction probability p and the prediction label T in 4) are obtained by multiplying the feature vector after the proposal and the respective weight vector, the combined parameters in the classification and regression processes can be continuously adjusted according to the predicted value and the label by the formula, so that the loss function L (W) is minimized, and the combined optimal parameter W (W) is obtained cls ,w loc ) I.e. byWhere L (W) is the multi-tasking loss function and φ is the regularization parameter.
Further, the step (6) is specifically as follows: and (5) aggregating the proposed windows in the step (4) into J windows, inputting the characteristic graphs with fixed sizes through the interested pool layer and the full connection layer, classifying through the trained classifier in the step (5), and removing windows which are overlapped with the maximum confidence window by more than 65% through non-maximum suppression.
The invention discloses a pedestrian detection method based on a convolutional neural network, which is characterized in that feature graphs corresponding to adjacent sizes of original images are calculated through a feature pyramid rule of images, so that heavy calculation amount of the feature graphs obtained through image scaling calculation is avoided, and misjudgment and missing detection of single feature graph detection are effectively avoided by utilizing different weight weights for detection on different feature graphs. An effective tradeoff between pedestrian detection speed and accuracy is achieved.
Drawings
FIG. 1 is a flow chart of a convolutional neural network-based pedestrian detection method of the present invention;
FIG. 2 is a diagram of a candidate window (propofol) optimization selection algorithm for the pedestrian detection method of the present invention;
FIG. 3 is a schematic diagram of a non-maxima suppression implementation;
fig. 4 is an effect diagram of the pedestrian detection method of the present application on the KITTI dataset picture.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, the pedestrian detection method based on the convolutional neural network of the present invention includes the following steps:
determining a current frame image: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as the current frame image;
step (2) obtaining a characteristic diagram: passing the current frame image through a plurality of convolution layers and pooling layers, and obtaining a feature map (feature map) through the last convolution layer;
and (3) feature diagram expansion: calculating a feature map corresponding to the image approach scale through an image power rate rule and an image feature pyramid rule, wherein the expanded picture scale and the expansion times are not limited;
step (4) proposing window allocation: selecting a proper pedestrian data set or manufacturing a pedestrian data set with variable scales by self, dividing the targets in the picture into three scales of small scale, medium scale and large scale (the scale series is determined by the pedestrian scale of the data set), and distributing the number of the proposal windows according to the proportion of the targets with the same scale in the picture;
step (5) classification network training: training a deep neural network by utilizing the distribution of pedestrians with various scales in different feature maps;
step (6), pedestrian detection and labeling: and (4) summarizing the number of the candidate windows of the obtained three scale feature maps in proportion, classifying by the trained classifier in the step (5), and framing out the pedestrians after non-maximum value inhibition.
The step (1) of determining the current frame image comprises the following steps: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as the current frame image, and recording the current frame image as I 1
The step (2) of obtaining the characteristic diagram comprises the following steps: passing the current frame image through multiple convolutional layers and pooling layers, wherein the convolutional layers and pooling layers are crossed without any limit, obtaining a feature map (feature map) through the last convolutional layer, and recording as f 1
The step of expanding the feature map in the step (3) is as follows: calculating image I through image power rate rule and image characteristic pyramid rule 1 Feature maps corresponding to close-up scales, typically using f m =C p (S(I 1 M)), in which formula I 1 Representing the original image, M the zoom size, S the original image, C p Features are calculated on behalf of the convolution pooling operation. Now to reduce the convolution operation and increase the running speed, the formula is used:
wherein the parameter m represents the current scale, m 'represents the scaled scale, S represents scaling the feature map by m'/m times, f represents the feature, the constant coefficient alpha can be measured on the training set by experiment, the above formula shows that the original image I m Features are obtained by convolution pooling operations, and near-scale image features are approximated from known feature maps, e.g., 1/2 × I 1 Can calculate to obtain f 1/2 Because there is no high frequency loss in the image upsampling, the information content of the upsampled picture is similar to the content of the low resolution, and the characteristic calculation formula is:
f σ =σ*S(f 1 ,σ)) (3.2)
in the formula f 1 Representing the original image to the feature map, S representing the feature map f 1 Magnification of a factor of sigma, f σ Is an up-sampled image.
The step of allocating the proposal window in the step (4) is as follows:because the RPN has a single receptive field, a large target tends to be detected on a corresponding characteristic diagram of an image with a reduced scale, a small target tends to be detected on a corresponding characteristic diagram of an image with an enlarged scale, the target in the image is divided into three scales, an experiment is carried out on a KITTI data set with multi-scale pedestrians, and the pedestrians in the data set are set to be height according to different height heights< H 1 ,H 1 ≤height<H 2 ,height≥H 2 These three dimensions, here H 1 To H 2 Respectively 50 and 200 pixel points, the number of the pedestrians corresponding to different scales is A 1 ,A 2 ,A 3 . And then selecting Z in candidate windows with people height smaller than 50pixels on different feature maps according to the confidence degree K *A 1 /(A 1 +A 2 +A 3 ) Selecting Z in the candidate window with the pedestrian height of more than 50 and less than 200 pixels according to the confidence degree K *A 2 /(A 1 +A 2 +A 3 ) Selecting Z in the candidate window with pedestrian height greater than 200 pixels according to the confidence degree K *A 3 /(A 1 +A 2 +A 3 ) A here A 1 ,A 2 ,A 3 Respectively representing the number of the pedestrian candidate windows extracted in the first three different scales, K =1,2,3 respectively representing a reduced characteristic diagram, an original characteristic diagram and an enlarged characteristic diagram, Z K And representing the number of candidate windows required to be extracted from each feature map. As shown in FIG. 2, f 1 、f 1/2 、f 2 Respectively represent images I 1 Feature map obtained by convolution of last layer and image I obtained by expansion calculation 1 The selected number of the candidate windows is respectively as follows: z 1 、Z 2 、Z 3 . Because the characteristic map detects pedestrian deviation in different modes of proposing the candidate windows by adopting proportion distribution, the candidate windows are distributed according to the proportion of different target scales, so that the detection advantages of the network on different characteristic maps are favorably exerted.
The step of training the classification network in the step (5) comprises the following steps:
1) Selecting a KITTI data set with various pedestrian scales for experiment, and dividing pedestrians into X-size pedestrians according to the height on a training data set (the size series is not limited);
2) The method comprises the steps of utilizing convolutional layer characteristics to share and train an RPN (region provider network) network and a softmax classifier combined network, adopting a cross-over alternate training mode, firstly training a Region Proposed Network (RPN), then training a region-based classifier network by using a proposal (provider), and then training the region proposed network by using the classifier network. The loss layer (loss layer) is the end point of the Convolutional Neural Network (CNN), and accepts two values as inputs, one of which is the predicted value of CNN and the other is the true label. The Loss layer performs a series of operations on the two inputs to obtain a Loss Function (Loss Function) of the current network, which is generally referred to as L (W), where W is a vector space formed by the current network weights. The purpose of training the network is to find a weight W (opt) which minimizes a loss function L (W) in a weight space, the weight W (opt) can be approximated by an optimization method of random gradient descent (stochastic gradient device), and the network has two loss functions, one is a classification loss function and the other is a regression loss function;
3) Because the structure in the step (3) is changed, the loss function is optimized correspondingly, the parameter to be trained and optimized is W, and the parameter is setWherein M is i Is that the training is a sampled image block of interest, N is the total number of training samples, y i E (0, 1) is M i Class label of B i =(m'/m)*(b i x ,b i y ,b i w ,b i h ) Is the bounding box coordinate corresponding to the feature map, where b i x ,b i y ,b i w ,b i h Respectively representing the coordinates of the image blocks on the original image, (m'/m) is the zoom size explained in step (3);
4) The multitask penalty function thus is:
where n is the number of scale steps of the target size, E x Is a data sample for each scale, M i Is an image block of interest sampled by a training set, A 1 ,A 2 ,...,A n Representing the number of pedestrians in n scales, respectively, and l is a combined loss function of classification and regression, defined as:
l(M,(y,B)|W)=L cls (p(M),y)+β[y≥1]L loc (T y ,B) (3.4)
where β is a trade-off coefficient, T y Is the predicted frame position of class y, [ y ≧ 1]Indicating that there is a regression loss, L, only in the positive case clc And L loc Cross entropy loss and boundary regression loss, respectively, are defined as:
in the formula p y (M)=p 0 (M)+p 1 (M),y ∈ (0, 1) is the class label of M, T i y =(t i x ,t i y ,t i w ,t i h Is the predicted frame position, B i =(m'/m)*(b i x ,b i y ,b i w ,b i h ) Is the bounding box coordinates corresponding to the feature map.
5) Because the prediction probability p and the prediction label T in the step 4) are obtained by multiplying the feature vector after proposal (pro) and the respective weight vector, the joint parameters in the classification and regression processes can be continuously adjusted according to the predicted value and the label by the formula, so that the loss function L (W) is minimized, and the joint optimal parameter W (W) is obtained cls ,w loc ) I.e. W (W) scl ,wolc)=arg min W L (W) + phi W. Where L (W) is the multi-tasking loss function and φ is the regularization parameter.
The step of detecting and marking the pedestrian in the step (6) is as follows: assembling J proposed windows in the step (4), inputting feature diagram size through an interest pool layer and a full connection layer to be fixed, classifying through a trained classifier to obtain confidence degrees of candidate pedestrians, and enabling the candidate pedestrians of each scale to be respectively matched with the weight l obtained through training in the step (5) x Multiplication, by non-maximum suppression, removes windows that overlap more than 65% of the maximum confidence window, as shown in FIG. 3, S 1 ,S 3 Respectively represent the areas of two detection frames, S 2 Represents the overlapping area of two detection frames, and the intersection ratio is S 2 /(S 1 +S 3 -S 2 ) If the intersection ratio is greater than the threshold of 0.65, the box with the lower confidence is discarded. Fig. 4 is a diagram showing the effect of the pedestrian detection method of the present application on the image of the kitis dataset, and it can be seen that many pedestrians with a height less than 50pixels are detected. Therefore, the feasibility and the detection advantages of the pedestrian detection method can be seen.

Claims (5)

1. A pedestrian detection method, characterized by comprising the steps of:
determining a current frame image: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as a current frame image, and marking as I 1
Step (2) calculating a characteristic diagram: passing the current frame image through multiple convolutional layers and pooling layers, wherein the convolutional layers and pooling layers are crossed without limitation, passing the last convolutional layer to obtain a feature map (feature map), denoted as f 1
And (3) feature diagram expansion: calculating a feature map corresponding to an image adjacent scale through an image feature pyramid rule, and sequentially expanding N small-scale expansion feature maps and N large-scale expansion feature maps, wherein the expansion times N and the expansion multiples are not limited, so that 2N +1 feature maps are obtained;
and (4) extracting a candidate window: generating a candidate window by the feature map through a regional recommended network (RPN), and further selecting the candidate window according to pedestrian size distribution;
step (5) training a classifier: training a deep neural network by utilizing the distribution of pedestrians in different feature maps in various scales;
and (6) pedestrian detection output: and summarizing the obtained candidate windows of the multi-scale characteristic diagram, classifying by a trained classifier, and framing the pedestrians after non-maximum value inhibition.
2. The pedestrian detection method according to claim 1, characterized in that: the step (3) is specifically as follows: computing an image I 1 Feature maps corresponding to close-up scales, typically using f m =C p (S(I 1 M)), wherein I) 1 Representing the original image, M the zoom size, S the zoom of the image, C p Representing the calculation characteristics of convolution pooling operation, improving the calculation speed for reducing the convolution operation, calculating a characteristic graph corresponding to the proximity gauge image according to the image characteristic pyramid rule, wherein the calculation formula is as follows:
wherein the parameter m represents the current scale, m 'represents the scaled scale, S represents scaling the feature map by m'/m times, f represents the feature, the constant coefficient alpha can be measured on the training set by experiment, the above formula shows that the original image I m Features are obtained by convolution pooling operations, and near-scale image features are approximated from known feature maps, e.g. 1/2 × I 1 F1/2 can be calculated, because the image upsampling has no high-frequency loss, the information content of the upsampled picture is similar to the content of the low resolution, and the characteristic calculation formula is as follows:
f σ =σ*S(f 1 ,σ)) (1.2)
in the formula f 1 Representing the original image to the feature map, S representing the feature map f 1 Magnification of a times, f σ Is an up-sampled image.
3. The pedestrian detection method according to claim 1The method is characterized in that: the step (4) is specifically as follows: respectively generating candidate proposing windows by the characteristic graph through an RPN network, and setting the pedestrian scale as height according to the height of the pedestrian in the candidate window<H 1 ,H 1 ≤height<H 2 ,...,H n-1 ≤height<H n ,height≥H n Here H 1 To H n The number of small and large pixel points is A corresponding to the number of pedestrians with different scales respectively 1 ,A 2 ,...,A n (ii) a Then, selecting T for the pedestrian candidate frames of each scale on each feature map according to the proportion distribution of the candidate frames in the feature map uv Sequentially selecting the number T uv The candidate window is:
in the formula T uv The number of the pedestrians with the v scale on the u feature map needing to be extracted finally, Z u Is the sum of the candidate windows to be extracted finally on the u-th feature map, Z u (n is more than or equal to 1 and less than or equal to 2N + 1) according to the data set condition, the same number can be selected and different numbers can be extracted on each characteristic diagram), wherein A is uv The number of the pedestrians in the v scale on the u feature map is shown.
4. The pedestrian detection method according to claim 1, characterized in that: the step (5) is specifically as follows:
1) Selecting a KITTI data set with various pedestrian scales for experiment, and dividing the pedestrians into n scales of pedestrians according to the height on a training data set;
2) The deep neural network is trained by using a training set of KITTI data sets, and a loss layer (loss layer) of the Convolutional Neural Network (CNN) receives two values as input, wherein one value is a predicted value of the Convolutional Neural Network (CNN) and the other value is a real label. The Loss layer carries out a series of operations through the predicted value and the label value to obtain a Loss Function (Loss Function) of the current network, generally denoted as L (W), wherein W is a vector space formed by the weight of the current network;
3) The loss function is optimized correspondingly, the parameter of training optimization is set as W, and the parameter is set as Wherein M is i Is the image block of interest sampled by the training set, N is the total number of training samples, y i E (0, 1) is M i The class label of (a) is used,is the bounding box coordinate corresponding to the feature map, whereinRespectively representing the coordinates of an image block on an original image, (m '/m) is a scaling factor, m represents the current scale, and m' represents the scaled scale;
4) The multitask penalty function is:
where n is the number of scale steps of the target size, E x Is a data sample for each scale, M i Is an image block of interest sampled by a training set, A 1 ,A 2 ,...,A n Representing the number of pedestrians in n scales, respectively, and l is a combined loss function of classification and regression, defined as:
l(M,(y,B)|W)=L cls (p(M),y)+β[y≥1]L loc (T y ,B) (1.5)
where β is a trade-off coefficient, T y Is the predicted frame position of class y, [ y ≧ 1]Indicating that there is a regression loss, L, only in the positive case clc And L loc Cross entropy loss and boundary regression loss, respectively, are defined as:
in the formula p y (M)=p 0 (M)+p 1 (M),y ∈ (0, 1) is the class label of M,it is the predicted position of the bounding box, is the bounding box coordinate corresponding to the characteristic graph;
5) Because the prediction probability p and the prediction label T in the step 4) are respectively obtained by multiplying the feature vector by the weight vector of each, the joint parameters in the classification and regression processes can be continuously adjusted according to the predicted values and the labels by the formula, so that the loss function L (W) is minimized, and the joint optimal parameter W (W) is obtained cls ,w loc ) I.e. W (W) cls ,w loc )=argmin W L (W) + phi W, where L (W) is the multitask loss function and phi is the regularization parameter.
5. The pedestrian detection method according to claim 1, characterized in that: the step (6) is specifically as follows: assembling J proposed windows in the step (4), fixing the size of the input feature map through a full connection layer, classifying through a trained classifier to obtain the confidence coefficient of the candidate pedestrian, and judging as the pedestrian if the result is greater than 0.75; the framed pedestrian then passes through non-maximum suppression, removing windows that overlap more than 65% with the maximum confidence window.
CN201711030102.8A 2017-10-27 2017-10-27 Pedestrian detection method Active CN108038409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711030102.8A CN108038409B (en) 2017-10-27 2017-10-27 Pedestrian detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711030102.8A CN108038409B (en) 2017-10-27 2017-10-27 Pedestrian detection method

Publications (2)

Publication Number Publication Date
CN108038409A true CN108038409A (en) 2018-05-15
CN108038409B CN108038409B (en) 2021-12-28

Family

ID=62093419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711030102.8A Active CN108038409B (en) 2017-10-27 2017-10-27 Pedestrian detection method

Country Status (1)

Country Link
CN (1) CN108038409B (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109003223A (en) * 2018-07-13 2018-12-14 北京字节跳动网络技术有限公司 Image processing method and device
CN109101915A (en) * 2018-08-01 2018-12-28 中国计量大学 Face and pedestrian and Attribute Recognition network structure design method based on deep learning
CN109117717A (en) * 2018-06-29 2019-01-01 广州烽火众智数字技术有限公司 A kind of city pedestrian detection method
CN109242801A (en) * 2018-09-26 2019-01-18 北京字节跳动网络技术有限公司 Image processing method and device
CN109255352A (en) * 2018-09-07 2019-01-22 北京旷视科技有限公司 Object detection method, apparatus and system
CN109284669A (en) * 2018-08-01 2019-01-29 辽宁工业大学 Pedestrian detection method based on Mask RCNN
CN109284670A (en) * 2018-08-01 2019-01-29 清华大学 A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism
CN109492596A (en) * 2018-11-19 2019-03-19 南京信息工程大学 A kind of pedestrian detection method and system based on K-means cluster and region recommendation network
CN109658412A (en) * 2018-11-30 2019-04-19 湖南视比特机器人有限公司 It is a kind of towards de-stacking sorting packing case quickly identify dividing method
CN109800637A (en) * 2018-12-14 2019-05-24 中国科学院深圳先进技术研究院 A kind of remote sensing image small target detecting method
CN109829421A (en) * 2019-01-29 2019-05-31 西安邮电大学 The method, apparatus and computer readable storage medium of vehicle detection
CN109858451A (en) * 2019-02-14 2019-06-07 清华大学深圳研究生院 A kind of non-cooperation hand detection method
CN110059544A (en) * 2019-03-07 2019-07-26 华中科技大学 A kind of pedestrian detection method and system based on road scene
CN110097050A (en) * 2019-04-03 2019-08-06 平安科技(深圳)有限公司 Pedestrian detection method, device, computer equipment and storage medium
CN110136097A (en) * 2019-04-10 2019-08-16 南方电网科学研究院有限责任公司 One kind being based on the pyramidal insulator breakdown recognition methods of feature and device
CN110211097A (en) * 2019-05-14 2019-09-06 河海大学 A kind of crack image detecting method based on the migration of Faster R-CNN parameter
CN110263712A (en) * 2019-06-20 2019-09-20 江南大学 A kind of coarse-fine pedestrian detection method based on region candidate
CN110443366A (en) * 2019-07-30 2019-11-12 上海商汤智能科技有限公司 Optimization method and device, object detection method and the device of neural network
CN110490058A (en) * 2019-07-09 2019-11-22 北京迈格威科技有限公司 Training method, device, system and the computer-readable medium of pedestrian detection model
CN110648322A (en) * 2019-09-25 2020-01-03 杭州智团信息技术有限公司 Method and system for detecting abnormal cervical cells
CN110647897A (en) * 2018-06-26 2020-01-03 广东工业大学 Zero sample image classification and identification method based on multi-part attention mechanism
CN110659658A (en) * 2018-06-29 2020-01-07 杭州海康威视数字技术股份有限公司 Target detection method and device
CN111339967A (en) * 2020-02-28 2020-06-26 长安大学 Pedestrian detection method based on multi-view graph convolution network
CN111523494A (en) * 2020-04-27 2020-08-11 天津中科智能识别产业技术研究院有限公司 Human body image detection method
CN111681243A (en) * 2020-08-17 2020-09-18 广东利元亨智能装备股份有限公司 Welding image processing method and device and electronic equipment
CN111832383A (en) * 2020-05-08 2020-10-27 北京嘀嘀无限科技发展有限公司 Training method of gesture key point recognition model, gesture recognition method and device
WO2021018106A1 (en) * 2019-07-30 2021-02-04 华为技术有限公司 Pedestrian detection method, apparatus, computer-readable storage medium and chip

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750708A (en) * 2012-05-11 2012-10-24 天津大学 Affine motion target tracing algorithm based on fast robust feature matching
CN103247059A (en) * 2013-05-27 2013-08-14 北京师范大学 Remote sensing image region of interest detection method based on integer wavelets and visual features
CN104850844A (en) * 2015-05-27 2015-08-19 成都新舟锐视科技有限公司 Pedestrian detection method based on rapid construction of image characteristic pyramid
US20160104056A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Spatial pyramid pooling networks for image processing
CN105678231A (en) * 2015-12-30 2016-06-15 中通服公众信息产业股份有限公司 Pedestrian image detection method based on sparse coding and neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750708A (en) * 2012-05-11 2012-10-24 天津大学 Affine motion target tracing algorithm based on fast robust feature matching
CN103247059A (en) * 2013-05-27 2013-08-14 北京师范大学 Remote sensing image region of interest detection method based on integer wavelets and visual features
US20160104056A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Spatial pyramid pooling networks for image processing
CN104850844A (en) * 2015-05-27 2015-08-19 成都新舟锐视科技有限公司 Pedestrian detection method based on rapid construction of image characteristic pyramid
CN105678231A (en) * 2015-12-30 2016-06-15 中通服公众信息产业股份有限公司 Pedestrian image detection method based on sparse coding and neural network

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647897A (en) * 2018-06-26 2020-01-03 广东工业大学 Zero sample image classification and identification method based on multi-part attention mechanism
CN110647897B (en) * 2018-06-26 2023-04-18 广东工业大学 Zero sample image classification and identification method based on multi-part attention mechanism
CN110659658B (en) * 2018-06-29 2022-07-29 杭州海康威视数字技术股份有限公司 Target detection method and device
CN109117717A (en) * 2018-06-29 2019-01-01 广州烽火众智数字技术有限公司 A kind of city pedestrian detection method
CN110659658A (en) * 2018-06-29 2020-01-07 杭州海康威视数字技术股份有限公司 Target detection method and device
CN109003223A (en) * 2018-07-13 2018-12-14 北京字节跳动网络技术有限公司 Image processing method and device
CN109003223B (en) * 2018-07-13 2020-02-28 北京字节跳动网络技术有限公司 Picture processing method and device
CN109284670A (en) * 2018-08-01 2019-01-29 清华大学 A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism
CN109284669A (en) * 2018-08-01 2019-01-29 辽宁工业大学 Pedestrian detection method based on Mask RCNN
CN109101915A (en) * 2018-08-01 2018-12-28 中国计量大学 Face and pedestrian and Attribute Recognition network structure design method based on deep learning
CN109255352B (en) * 2018-09-07 2021-06-22 北京旷视科技有限公司 Target detection method, device and system
CN109255352A (en) * 2018-09-07 2019-01-22 北京旷视科技有限公司 Object detection method, apparatus and system
CN109242801A (en) * 2018-09-26 2019-01-18 北京字节跳动网络技术有限公司 Image processing method and device
CN109492596B (en) * 2018-11-19 2022-03-29 南京信息工程大学 Pedestrian detection method and system based on K-means clustering and regional recommendation network
CN109492596A (en) * 2018-11-19 2019-03-19 南京信息工程大学 A kind of pedestrian detection method and system based on K-means cluster and region recommendation network
CN109658412A (en) * 2018-11-30 2019-04-19 湖南视比特机器人有限公司 It is a kind of towards de-stacking sorting packing case quickly identify dividing method
CN109800637A (en) * 2018-12-14 2019-05-24 中国科学院深圳先进技术研究院 A kind of remote sensing image small target detecting method
CN109829421A (en) * 2019-01-29 2019-05-31 西安邮电大学 The method, apparatus and computer readable storage medium of vehicle detection
CN109858451A (en) * 2019-02-14 2019-06-07 清华大学深圳研究生院 A kind of non-cooperation hand detection method
CN109858451B (en) * 2019-02-14 2020-10-23 清华大学深圳研究生院 Non-matching hand detection method
CN110059544A (en) * 2019-03-07 2019-07-26 华中科技大学 A kind of pedestrian detection method and system based on road scene
CN110097050B (en) * 2019-04-03 2024-03-08 平安科技(深圳)有限公司 Pedestrian detection method, device, computer equipment and storage medium
CN110097050A (en) * 2019-04-03 2019-08-06 平安科技(深圳)有限公司 Pedestrian detection method, device, computer equipment and storage medium
CN110136097A (en) * 2019-04-10 2019-08-16 南方电网科学研究院有限责任公司 One kind being based on the pyramidal insulator breakdown recognition methods of feature and device
CN110211097A (en) * 2019-05-14 2019-09-06 河海大学 A kind of crack image detecting method based on the migration of Faster R-CNN parameter
CN110263712A (en) * 2019-06-20 2019-09-20 江南大学 A kind of coarse-fine pedestrian detection method based on region candidate
CN110490058A (en) * 2019-07-09 2019-11-22 北京迈格威科技有限公司 Training method, device, system and the computer-readable medium of pedestrian detection model
WO2021018106A1 (en) * 2019-07-30 2021-02-04 华为技术有限公司 Pedestrian detection method, apparatus, computer-readable storage medium and chip
CN110443366A (en) * 2019-07-30 2019-11-12 上海商汤智能科技有限公司 Optimization method and device, object detection method and the device of neural network
CN110648322A (en) * 2019-09-25 2020-01-03 杭州智团信息技术有限公司 Method and system for detecting abnormal cervical cells
CN110648322B (en) * 2019-09-25 2023-08-15 杭州智团信息技术有限公司 Cervical abnormal cell detection method and system
CN111339967A (en) * 2020-02-28 2020-06-26 长安大学 Pedestrian detection method based on multi-view graph convolution network
CN111339967B (en) * 2020-02-28 2023-04-07 长安大学 Pedestrian detection method based on multi-view graph convolution network
CN111523494A (en) * 2020-04-27 2020-08-11 天津中科智能识别产业技术研究院有限公司 Human body image detection method
CN111832383A (en) * 2020-05-08 2020-10-27 北京嘀嘀无限科技发展有限公司 Training method of gesture key point recognition model, gesture recognition method and device
CN111832383B (en) * 2020-05-08 2023-12-08 北京嘀嘀无限科技发展有限公司 Training method of gesture key point recognition model, gesture recognition method and device
CN111681243A (en) * 2020-08-17 2020-09-18 广东利元亨智能装备股份有限公司 Welding image processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN108038409B (en) 2021-12-28

Similar Documents

Publication Publication Date Title
CN108038409B (en) Pedestrian detection method
CN111209810B (en) Boundary frame segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time through visible light and infrared images
CN109284670B (en) Pedestrian detection method and device based on multi-scale attention mechanism
CN108830285B (en) Target detection method for reinforcement learning based on fast-RCNN
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN108062525B (en) Deep learning hand detection method based on hand region prediction
CN108986152B (en) Foreign matter detection method and device based on difference image
CN107545263B (en) Object detection method and device
CN114092389A (en) Glass panel surface defect detection method based on small sample learning
CN110751154B (en) Complex environment multi-shape text detection method based on pixel-level segmentation
CN107506792B (en) Semi-supervised salient object detection method
CN109377511B (en) Moving target tracking method based on sample combination and depth detection network
CN110263877B (en) Scene character detection method
CN113139543A (en) Training method of target object detection model, target object detection method and device
WO2023116632A1 (en) Video instance segmentation method and apparatus based on spatio-temporal memory information
CN112418108A (en) Remote sensing image multi-class target detection method based on sample reweighing
US20200034664A1 (en) Network Architecture for Generating a Labeled Overhead Image
CN115620081B (en) Training method of target detection model and target detection method and device
Zhao et al. Multiscale object detection in high-resolution remote sensing images via rotation invariant deep features driven by channel attention
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
Guo et al. A novel transformer-based network with attention mechanism for automatic pavement crack detection
CN115240119A (en) Pedestrian small target detection method in video monitoring based on deep learning
CN115147418A (en) Compression training method and device for defect detection model
CN113657225B (en) Target detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant