CN108038409B - Pedestrian detection method - Google Patents

Pedestrian detection method Download PDF

Info

Publication number
CN108038409B
CN108038409B CN201711030102.8A CN201711030102A CN108038409B CN 108038409 B CN108038409 B CN 108038409B CN 201711030102 A CN201711030102 A CN 201711030102A CN 108038409 B CN108038409 B CN 108038409B
Authority
CN
China
Prior art keywords
image
scale
feature map
candidate
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711030102.8A
Other languages
Chinese (zh)
Other versions
CN108038409A (en
Inventor
章东平
胡葵
王都洋
张香伟
杨力
肖刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Gosun Guard Security Service Technology Co ltd
Original Assignee
Jiangxi Gosun Guard Security Service Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Gosun Guard Security Service Technology Co ltd filed Critical Jiangxi Gosun Guard Security Service Technology Co ltd
Priority to CN201711030102.8A priority Critical patent/CN108038409B/en
Publication of CN108038409A publication Critical patent/CN108038409A/en
Application granted granted Critical
Publication of CN108038409B publication Critical patent/CN108038409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian detection method, which comprises the steps of carrying out convolution and pooling on an input image for multiple times through a pedestrian detection method based on a convolutional neural network, extracting the characteristics of an original image to obtain a characteristic diagram corresponding to the original image, approximately calculating the corresponding characteristic diagram after the original image is zoomed through an image characteristic pyramid rule, generating candidate windows through a region suggestion network (RPN), further selecting the candidate suggestion windows according to the size distribution of pedestrians in the candidate windows, summarizing the candidate suggestion windows, training the corresponding weights of pedestrian targets with different scales on images with different scales by utilizing labeled training data, and training a classifier network. And comparing the confidence coefficient obtained after the aggregated candidate windows pass through the classifier with a set threshold value to make final judgment on pedestrian detection. The application of the image feature pyramid avoids heavy calculation amount of obtaining the feature map by image scaling calculation, and the detection on different feature maps by using different weight weighting modes effectively avoids misjudgment and missing detection of single feature map detection.

Description

Pedestrian detection method
Technical Field
The invention relates to a pedestrian detection method, and belongs to the field of target detection.
Background
In recent years, pedestrian detection technology has been widely applied in the fields of intelligent monitoring, automatic driving, robot vision, and the like. In practical application, the pedestrian detection is very challenging due to the fact that the sizes of pedestrians captured in the video are variable in the wearing and posture of pedestrians. Pedestrian detection has two main modes: one is a traditional pedestrian detection method based on a sliding window, and the other is a pedestrian detection method based on deep learning and feature extraction. The traditional pedestrian detection method is large in calculation amount and limited in detection speed without utilizing GPU resources, and due to the fact that the performance of a computer is continuously enhanced and the GPU computing capacity is utilized, the detection speed of a deep learning method based on learning characteristics is superior to that of the traditional method mostly, but the multi-scale problem of pedestrians is difficult to solve.
Disclosure of Invention
In order to solve the problems that speed and detection precision are difficult to balance in the pedestrian detection process and the pedestrian is in multi-scale, the invention provides a pedestrian detection method, which comprises the following steps:
step (1) determining a current frame image: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as the current frame image;
step (2) obtaining a characteristic diagram: passing the current frame image through a plurality of convolution layers and pooling layers, and obtaining a feature map through the last convolution layer;
and (3) feature diagram expansion: calculating a feature map corresponding to an image adjacent scale through an image feature pyramid rule, and sequentially expanding N small-scale expansion feature maps and N large-scale expansion feature maps, wherein the expansion times N and the expansion multiples are not limited, so that 2N +1 feature maps are obtained;
step (4) proposing window allocation: generating a candidate window by the feature map through a regional suggested network (RPN), and further selecting the candidate window according to the pedestrian size distribution;
step (5), training a classification network: training a deep neural network by utilizing the distribution of pedestrians in different feature maps in various scales;
and (6) pedestrian detection and labeling: and (5) summarizing the number of the proposed windows of the obtained three scale characteristic graphs in proportion, classifying by the trained classifier in the step (5), and framing out the pedestrians after non-maximum value inhibition.
Further, the step (1) is specifically as follows: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as the current frame image, and recording the current frame image as I1
Further, the step (2) is specifically as follows: passing the current frame image through a plurality of convolutional layers and pooling layers, wherein the convolutional layers and the pooling layers are crossed and the number of layers is not limited, and the last layer is passedA feature map (feature map) is obtained for each convolution layer and is denoted as f1
Further, the step (3) is specifically as follows: calculating image I through image power rate rule and image characteristic pyramid rule1Feature maps corresponding to close-up scales, typically using fm=Cp(S(I1M)), wherein I)1Representing the original image, M the zoom size, S the original image, CpFeatures are calculated on behalf of the convolution pooling operation. Now to reduce the convolution operation and increase the running speed, the formula is used:
Figure GDA0001610929570000021
wherein the parameter m represents the current scale, m 'represents the scaled scale, S represents scaling the feature graph by m'/m times, f represents the feature, the constant coefficient alpha can be measured on the training set by experiments, and the formula shows that the original graph ImCalculating features by convolution pooling, obtaining adjacent zooming image features from known feature map changes, and calculating feature maps corresponding to alpha-fold and beta-fold pictures of the original image, such as 1/2 ^ I1And 2. about.I1(the scale of the extended picture and the number of the extension are not limited, and the two scales are conveniently selected in consideration of the detection speed and expression) because the pyramid rule is close to the calculation every time
Figure GDA0001610929570000022
Multiplying, the feature map is calculated by iteration four times, and the corresponding feature map is f1/2Because the image upsampling has no high-frequency loss, the information content of the upsampled picture is similar to the content of the low resolution, and the characteristic calculation formula is as follows:
fσ=σ*S(f1,σ)) (2.2)
in the formula f1Representing the original image to the feature map, S representing the feature map f1Magnification of a times, fσIs an up-sampled image.
Further, the step (4) is specifically as follows: because the RPN has a single receptive field, in the minification of the scaleThe method comprises the steps that a large target tends to be detected on an image corresponding characteristic diagram of a model, a small target tends to be detected on an enlarged image corresponding characteristic diagram, a pedestrian target in the image is divided into three scales, experiments are carried out on a KITTI data set with multi-scale pedestrians, and the pedestrians in the data set are set to be height according to different height heights<H1,H1≤height<H2,...,Hn-1≤height<Hn,height≥HnHere H1To HnThe number of small and large pixel points is A corresponding to the number of pedestrians with different scales respectively1,A2,...,An. Then, selecting T pedestrian candidate frames of each scale on each feature map according to the proportion distribution of the candidate frames in the feature map, and sequentially selecting T pedestrian candidate framesuvThe number of the main components is one,
Figure GDA0001610929570000031
in the formula TuvIs the number of the pedestrians with the v scale on the u characteristic diagram which needs to be extracted finally, ZuIs the sum of the candidate windows to be extracted finally on the u-th feature map, Zu(N is more than or equal to 1 and less than or equal to 2N +1) is determined according to the condition of the data set, the same number can be selected or different numbers can be extracted from each feature map), and A in the formulauvThe number of the pedestrians in the v scale on the u feature map is shown. Because the proposed window network has a single receptive field (the area of the input image corresponding to the response of a certain node on the output feature map), a large target tends to be detected on the reduced-scale image corresponding feature map, and a small target tends to be detected on the enlarged-scale image corresponding feature map, so that the candidate windows are extracted according to the proportion of the targets with different scales, which is beneficial to exerting the detection advantages of the network on different feature maps.
Further, the step (5) is specifically as follows:
1) selecting a KITTI data set with various pedestrian scales for experiment, and dividing pedestrians into X-size pedestrians according to the height on a training data set (the size series is not limited);
2) the method comprises the steps of utilizing convolutional layer characteristics to share and train an RPN (regional protocol network) network and a softmax classifier combined network, adopting a cross-over alternate training mode, firstly training an RPN region suggestion network, then training a region-based classifier network by using a candidate window, and then training the RPN region suggestion network by using the classifier network. The loss layer (loss layer) is the end point of the Convolutional Neural Network (CNN), and accepts two values as inputs, one of which is the predicted value of CNN and the other is the true label. The Loss layer performs a series of operations through the predicted value and the tag value to obtain a Loss Function (Loss Function) of the current network, which is generally denoted as l (W), where W is a vector space formed by the current network weights. The purpose of training the network is to find the weight W (opt) which minimizes the loss function L (W) in the weight space, and the weight W (opt) can be approximated by an optimization method of random gradient descent (stochastic gradient device), wherein the network has two loss functions, one is a classification loss function and the other is a regression loss function;
3) because the structure in the step (3) is changed, the loss function is optimized correspondingly, the parameter to be trained and optimized is W, and the parameter is set
Figure GDA0001610929570000041
Wherein M isiIs that the training is a sampled image block of interest, N is the total number of training samples, yiE (0,1) is MiClass label of Bi=(m'/m)*(bi x,bi y,bi w,bi h) Is the bounding box coordinate corresponding to the feature map, where bi x,bi y,bi w,bi hRespectively representing the coordinates of the image blocks on the original image, (m'/m) is the zoom size explained in step (3);
4) the multitask penalty function thus is:
Figure GDA0001610929570000042
where n is the number of scale steps of the target size, ExIs a data sample for each scale, MiIs a collection of trainingSampled image block of interest, A1,A2,...,AnRepresenting the number of pedestrians in n scales, respectively, and l is a combined loss function of classification and regression, defined as:
l(M,(y,B)|W)=Lcls(p(M),y)+β[y≥1]Lloc(Ty,B) (2.5)
where β is a trade-off coefficient, TyIs the predicted frame position of class y, [ y ≧ 1]Indicating that there is a regression loss, L, only in the positive caseclcAnd LlocCross entropy loss and boundary regression loss, respectively, are defined as:
Figure GDA0001610929570000043
in the formula py(M)=p0(M)+p1(M),
Figure GDA0001610929570000044
y ∈ (0,1) is the class label of M, Ti y=(ti x,ti y,ti w,ti hIs the predicted frame position, Bi=(m'/m)*(bi x,bi y,bi w,bi h) Is the bounding box coordinates corresponding to the feature map.
5) In 4), the prediction probability p and the prediction label T are obtained by multiplying the feature vector after the proposal and the respective weight vector, so that the combined parameters in the classification and regression processes can be continuously adjusted according to the predicted value and the label by the formula, the loss function L (W) is minimized, and the combined optimal parameter W (W) is obtainedcls,wloc) I.e. by
Figure GDA0001610929570000051
Where L (W) is the multi-tasking loss function and φ is the regularization parameter.
Further, the step (6) is specifically as follows: and (4) assembling J proposed windows in the step (4), inputting feature map with fixed size through the interested pool layer and the full connection layer, classifying through the trained classifier in the step (5), and removing windows which are overlapped with the maximum confidence window by more than 65% through non-maximum suppression.
The invention discloses a pedestrian detection method based on a convolutional neural network, which is characterized in that feature graphs corresponding to adjacent sizes of original images are calculated through a feature pyramid rule of images, so that heavy calculation amount of the feature graphs obtained through image scaling calculation is avoided, and misjudgment and missing detection of single feature graph detection are effectively avoided by utilizing different weight weights for detection on different feature graphs. An effective tradeoff between pedestrian detection speed and accuracy is achieved.
Drawings
FIG. 1 is a flow chart of a convolutional neural network-based pedestrian detection method of the present invention;
FIG. 2 is a diagram of a candidate window (propofol) optimization selection algorithm for the pedestrian detection method of the present invention;
FIG. 3 is a schematic diagram of a non-maxima suppression implementation;
fig. 4 is an effect diagram of the pedestrian detection method of the present application on the KITTI dataset picture.
Detailed Description
The invention will be further explained with reference to the drawings.
As shown in fig. 1, a pedestrian detection method based on a convolutional neural network of the present invention includes the following steps:
step (1) determining a current frame image: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as the current frame image;
step (2) obtaining a characteristic diagram: passing the current frame image through a plurality of convolution layers and pooling layers, and obtaining a feature map (feature map) through the last convolution layer;
and (3) feature diagram expansion: calculating a feature map corresponding to the image approach scale through an image power rate rule and an image feature pyramid rule, wherein the expanded picture scale and the expansion times are not limited;
step (4) proposing window allocation: selecting a proper pedestrian data set or manufacturing a pedestrian data set with variable scales by self, dividing the targets in the picture into three scales of small scale, medium scale and large scale (the scale series is determined by the pedestrian scale of the data set), and distributing the number of the proposal windows according to the proportion of the targets with the same scale in the picture;
step (5), training a classification network: training a deep neural network by utilizing the distribution of pedestrians in different feature maps in various scales;
and (6) pedestrian detection and labeling: and (4) summarizing the number of the candidate windows of the obtained three scale feature maps in proportion, classifying by the trained classifier in the step (5), and framing out the pedestrians after non-maximum value inhibition.
The step (1) of determining the current frame image comprises the following steps: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as the current frame image, and recording the current frame image as I1
The step (2) of obtaining the characteristic diagram comprises the following steps: passing the current frame image through multiple convolutional layers and pooling layers, wherein the convolutional layers and pooling layers are crossed without any limit, obtaining a feature map (feature map) through the last convolutional layer, and recording as f1
The step of expanding the feature map in the step (3) is as follows: calculating image I through image power rate rule and image characteristic pyramid rule1Feature maps corresponding to close-up scales, typically using fm=Cp(S(I1M)), wherein I)1Representing the original image, M the zoom size, S the original image, CpFeatures are calculated on behalf of the convolution pooling operation. Now to reduce the convolution operation and increase the running speed, the formula is used:
Figure GDA0001610929570000061
wherein the parameter m represents the current scale, m 'represents the scaled scale, S represents scaling the feature map by m'/m times, f represents the feature, the constant coefficient alpha can be measured on the training set by experiment, the above formula shows that the original image ImFeatures are obtained by convolution pooling operations, and features of the near-zoom image are derived from the convolved imageThe known feature map is obtained by approximate calculation, such as 1/2I1Can calculate to obtain f1/2Because the image upsampling has no high-frequency loss, the information content of the upsampled picture is similar to the content of the low resolution, and the characteristic calculation formula is as follows:
fσ=σ*S(f1,σ)) (3.2)
in the formula f1Representing the original image to the feature map, S representing the feature map f1Magnification of a times, fσIs an up-sampled image.
The step of allocating the proposal window in the step (4) is as follows: because the RPN has a single receptive field, a large target tends to be detected on a reduced-scale image corresponding feature map, a small target tends to be detected on an enlarged-scale image corresponding feature map, the target in the picture is divided into three scales, an experiment is carried out on a KITTI data set with multi-scale pedestrians, and the pedestrians in the data set are set as heights according to different height heights<H1,H1≤height<H2,height≥H2These three dimensions, here H1To H2Respectively 50 and 200 pixel points, the number of the pedestrians corresponding to different scales is A1,A2,A3. And then selecting Z in candidate windows with the height of people smaller than 50pixels on different feature maps according to the confidence degreeK*A1/(A1+A2+A3) Selecting Z in the candidate window with the pedestrian height of more than 50 and less than 200 pixels according to the confidence degreeK*A2/(A1+A2+A3) Selecting Z in the candidate window with pedestrian height greater than 200 pixels according to the confidence degreeK*A3/(A1+A2+A3) A here A1,A2,A3Respectively representing the number of pedestrian candidate windows extracted in the first three different scales, wherein K is 1, 2, 3 respectively represent a reduced feature map, an original feature map and an enlarged feature map, and Z isKAnd representing the number of candidate windows required to be extracted from each feature map. As shown in FIG. 2, f1、f1/2、f2Respectively represent images I1Features obtained by the last layer of convolutionImage I obtained by drawing and expansion calculation1The selection number of the candidate windows is respectively as follows: z1、Z2、Z3. Because the pedestrian detection of the feature map is biased to different positions, the candidate window is allocated according to the proportion of different target scales, so that the detection advantages of the network on different feature maps are favorably exerted.
The step of training the classification network in the step (5) comprises the following steps:
1) selecting a KITTI data set with various pedestrian scales for experiment, and dividing pedestrians into X-size pedestrians according to the height on a training data set (the size series is not limited);
2) the RPN (region provider network) network and the softmax classifier combined network are trained by utilizing convolutional layer feature sharing, and the Region Proposed Network (RPN) is trained firstly, then the region-based classifier network is trained by using the proposal (provider), and finally the region proposed network is trained by using the classifier network by adopting a cross-over alternate training mode. The loss layer (loss layer) is the end point of the Convolutional Neural Network (CNN), and accepts two values as inputs, one of which is the predicted value of CNN and the other is the true label. The Loss layer performs a series of operations on the two inputs to obtain a Loss Function (Loss Function) of the current network, generally referred to as l (W), where W is a vector space formed by the current network weights. The purpose of training the network is to find the weight W (opt) which minimizes the loss function L (W) in the weight space, and the weight W (opt) can be approximated by an optimization method of random gradient descent (stochastic gradient device), wherein the network has two loss functions, one is a classification loss function and the other is a regression loss function;
3) because the structure in the step (3) is changed, the loss function is optimized correspondingly, the parameter to be trained and optimized is W, and the parameter is set
Figure GDA0001610929570000081
Wherein M isiIs that the training is a sampled image block of interest, N is the total number of training samples, yiE (0,1) is MiClass label of Bi=(m'/m)*(bi x,bi y,bi w,bi h) Is the bounding box coordinate corresponding to the feature map, where bi x,bi y,bi w,bi hRespectively representing the coordinates of the image blocks on the original image, (m'/m) is the zoom size explained in step (3);
4) the multitask penalty function thus is:
Figure GDA0001610929570000082
where n is the number of scale steps of the target size, ExIs a data sample for each scale, MiIs an image block of interest sampled by a training set, A1,A2,...,AnRepresenting the number of pedestrians in n scales, respectively, and l is a combined loss function of classification and regression, defined as:
l(M,(y,B)|W)=Lcls(p(M),y)+β[y≥1]Lloc(Ty,B) (3.4)
where β is a trade-off coefficient, TyIs the predicted frame position of class y, [ y ≧ 1]Indicating that there is a regression loss, L, only in the positive caseclcAnd LlocCross entropy loss and boundary regression loss, respectively, are defined as:
Figure GDA0001610929570000083
in the formula py(M)=p0(M)+p1(M),
Figure GDA0001610929570000084
y ∈ (0,1) is the class label of M, Ti y=(ti x,ti y,ti w,ti hIs the predicted frame position, Bi=(m'/m)*(bi x,bi y,bi w,bi h) Is the bounding box coordinates corresponding to the feature map.
5) In the step 4), the prediction probability p and the prediction label T are obtained by multiplying the proposed (pro-audible) feature vector and the weight vector of each feature vector, so that the combined parameters in the classification and regression processes can be continuously adjusted according to the predicted values and the labels by the formula, the loss function L (W) is minimized, and the combined optimal parameter W (W) is obtainedcls,wloc) I.e. W (W)scl,wolc)=arg min WL (W) + phi W. Where L (W) is the multi-tasking loss function and φ is the regularization parameter.
The step of detecting and marking the pedestrian in the step (6) is as follows: assembling J proposed windows in the step (4), inputting feature diagram size through an interest pool layer and a full connection layer to be fixed, classifying through a trained classifier to obtain confidence degrees of candidate pedestrians, and enabling the candidate pedestrians of each scale to be respectively matched with the weight l obtained through training in the step (5)xMultiplication, by non-maximum suppression, removes windows that overlap more than 65% of the maximum confidence window, as shown in FIG. 3, S1,S3Respectively represent the areas of two detection frames, S2Represents the overlapping area of two detection frames, and the intersection ratio is S2/(S1+S3-S2) If the intersection ratio is greater than the threshold of 0.65, the box with the lower confidence is discarded. Fig. 4 is a diagram of the effect of the pedestrian detection method of the present application on the image of the kitis dataset, and it can be seen that many pedestrians with a height less than 50pixels are detected. Therefore, the feasibility and the detection advantages of the pedestrian detection method can be seen.

Claims (3)

1. A pedestrian detection method, characterized by comprising the steps of:
step (1) determining a current frame image: taking a picture in the test set as a current frame image or taking a frame image to be processed in a video sequence as the current frame image, and recording the current frame image as I1
Step (2) calculating a characteristic diagram: passing the current frame image through multiple convolutional layers and pooling layers, wherein the convolutional layers and pooling layers are crossed without any limit, passing the last convolutional layer to obtain a feature map (feature map), and recording as f1
And (3) feature diagram expansion: calculating a feature map corresponding to an image adjacent scale through an image feature pyramid rule, and sequentially expanding N small-scale expansion feature maps and N large-scale expansion feature maps, wherein the expansion times N and the expansion multiples are not limited, so that 2N +1 feature maps are obtained;
and (4) extracting a candidate window: generating a candidate window by the feature map through a regional suggested network (RPN), and further selecting the candidate window according to the pedestrian size distribution;
step (5) training a classifier: training a deep neural network by utilizing the distribution of pedestrians with various scales in different feature maps;
and (6) pedestrian detection output: summarizing the obtained candidate windows of the multi-scale characteristic diagram, classifying by a trained classifier, and framing out the pedestrians after non-maximum value inhibition;
the step (3) is specifically as follows: computing an image I1Feature maps corresponding to close-in scales, using fm=Cp(S(I1,M)),
In the formula I1Representing the original image, M the zoom size, S the zoom of the image, CpRepresenting the calculation characteristics of convolution pooling operation, improving the calculation speed for reducing the convolution operation, calculating a characteristic graph corresponding to the proximity gauge image according to the image characteristic pyramid rule, wherein the calculation formula is as follows:
Figure FDA0003247919020000011
wherein the parameter m represents the current scale, m 'represents the scaled scale, S represents scaling the feature map by m'/m times, f represents the feature, the constant coefficient alpha can be measured on the training set by experiment, the above formula shows that the original image ImFeatures are obtained by convolution pooling operations, and near-scale image features are approximated from known feature maps, 1/2 ^ I1Can calculate to obtain f1/2Because the image upsampling has no high-frequency loss, the information content of the upsampled picture is similar to the content of the low resolution, and the characteristic calculation formula is:
fσ=σ*S(f1,σ) (1.2)
In the formula f1Representing the original image to the feature map, S representing the feature map f1Magnification of a times, fσIs an up-sampled image;
the step (4) is specifically as follows: respectively generating candidate proposing windows by the characteristic graph through an RPN network, and setting the pedestrian scale as height according to the height of the pedestrian in the candidate window<H1,H1≤height<H2,...,Hn-1≤height<Hn,height≥HnHere H1To HnThe number of the pixel points from small to large is A corresponding to the number of pedestrians with different scales respectively1,A2,...,An(ii) a Then, selecting T for the pedestrian candidate frame of each scale on each feature map according to the proportion distribution of the candidate frames in the feature mapuvSequentially selecting the number TuvThe candidate window is:
Figure FDA0003247919020000021
in the formula TuvIs the number of the pedestrians with the v scale on the u characteristic diagram which needs to be extracted finally, ZuIs the sum of the candidate windows to be extracted finally on the u-th feature map, N is more than or equal to 1 and less than or equal to 2N +1, and the same number of candidate windows or different numbers of candidate windows can be selected from each feature map according to the data set condition, wherein A isuvThe number of the pedestrians in the v scale on the u feature map is shown.
2. The pedestrian detection method according to claim 1, characterized in that: the step (5) is specifically as follows:
1) selecting a KITTI data set with various pedestrian scales for experiment, and dividing the pedestrians into n scales of pedestrians according to the height on a training data set;
2) training a deep neural network by utilizing a training set of a KITTI data set, wherein a loss layer (loss layer) of a Convolutional Neural Network (CNN) receives two values as input, wherein one value is a predicted value of the Convolutional Neural Network (CNN), and the other value is a real label; the Loss layer carries out a series of operations through the predicted value and the label value to obtain a Loss Function (Loss Function) of the current network, and the Loss Function is marked as L (W), wherein W is a vector space formed by the weight of the current network;
3) the loss function is optimized correspondingly, the parameters of training optimization are set as W, and
Figure FDA0003247919020000022
Figure FDA0003247919020000023
wherein M isiIs the image block of interest sampled by the training set, N is the total number of training samples, yiE (0,1) is MiClass label of Bi=(m'/m)*(bi x,bi y,bi w,bi h) Is the bounding box coordinate corresponding to the feature map, where bi x,bi y,bi w,bi hRespectively representing the coordinates of an image block on an original image, (m '/m) is a scaling factor, m represents the current scale, and m' represents the scaled scale;
4) the multitask penalty function is:
Figure FDA0003247919020000031
where n is the number of scale steps of the target size, ExIs a data sample for each scale, MiIs an image block of interest sampled by a training set, A1,A2,...,AnRespectively representing the number of pedestrians of n scales, lxIs a joint loss function of classification and regression corresponding to each scale, defined as:
l(M,(y,B)|W)=Lcls(p(M),y)+β[y≥1]Lloc(Ty,B) (1.5)
where β is a trade-off coefficient, TyIs the predicted frame position of class y, [ y ≧1]Indicating that there is a regression loss, L, only in the positive samplesclcAnd LlocCross entropy loss and boundary regression loss, respectively, are defined as:
Figure FDA0003247919020000032
in the formula py(M)=p0(M)+p1(M),
Figure FDA0003247919020000033
y ∈ (0,1) is the class label of M, Ti y=(ti x,ti y,ti w,ti hIs the predicted frame position, Bi=(m'/m)*(bi x,bi y,bi w,bi h) Is the bounding box coordinate corresponding to the characteristic graph;
5) because the prediction probability p and the prediction label T in the step 4) are respectively obtained by multiplying the feature vector by the weight vector of each feature vector, the joint parameters in the classification and regression processes can be continuously adjusted according to the predicted values and the labels by the formula, so that the loss function L (W) is minimized, and the joint optimal parameter W (W) is obtainedcls,wloc) I.e. W (W)cls,wloc)=argminWL (W) + phi | W | where L (W) is the multitask loss function and phi is the regularization parameter.
3. The pedestrian detection method according to claim 1, characterized in that: the step (6) is specifically as follows: assembling J proposed windows in the step (4), fixing the size of the input feature map through a full connection layer, classifying by using a trained classifier to obtain the confidence coefficient of the candidate pedestrian, and judging the candidate pedestrian if the result is greater than 0.75; the framed pedestrian then passes through non-maximum suppression, removing windows that overlap more than 65% with the maximum confidence window.
CN201711030102.8A 2017-10-27 2017-10-27 Pedestrian detection method Active CN108038409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711030102.8A CN108038409B (en) 2017-10-27 2017-10-27 Pedestrian detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711030102.8A CN108038409B (en) 2017-10-27 2017-10-27 Pedestrian detection method

Publications (2)

Publication Number Publication Date
CN108038409A CN108038409A (en) 2018-05-15
CN108038409B true CN108038409B (en) 2021-12-28

Family

ID=62093419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711030102.8A Active CN108038409B (en) 2017-10-27 2017-10-27 Pedestrian detection method

Country Status (1)

Country Link
CN (1) CN108038409B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647897B (en) * 2018-06-26 2023-04-18 广东工业大学 Zero sample image classification and identification method based on multi-part attention mechanism
CN109117717A (en) * 2018-06-29 2019-01-01 广州烽火众智数字技术有限公司 A kind of city pedestrian detection method
CN110659658B (en) * 2018-06-29 2022-07-29 杭州海康威视数字技术股份有限公司 Target detection method and device
CN109003223B (en) * 2018-07-13 2020-02-28 北京字节跳动网络技术有限公司 Picture processing method and device
CN109284670B (en) * 2018-08-01 2020-09-25 清华大学 Pedestrian detection method and device based on multi-scale attention mechanism
CN109101915B (en) * 2018-08-01 2021-04-27 中国计量大学 Face, pedestrian and attribute recognition network structure design method based on deep learning
CN109284669A (en) * 2018-08-01 2019-01-29 辽宁工业大学 Pedestrian detection method based on Mask RCNN
CN109255352B (en) * 2018-09-07 2021-06-22 北京旷视科技有限公司 Target detection method, device and system
CN109242801B (en) * 2018-09-26 2021-07-02 北京字节跳动网络技术有限公司 Image processing method and device
CN109492596B (en) * 2018-11-19 2022-03-29 南京信息工程大学 Pedestrian detection method and system based on K-means clustering and regional recommendation network
CN109658412B (en) * 2018-11-30 2021-03-30 湖南视比特机器人有限公司 Rapid packaging box identification and segmentation method for unstacking and sorting
CN109800637A (en) * 2018-12-14 2019-05-24 中国科学院深圳先进技术研究院 A kind of remote sensing image small target detecting method
CN109829421B (en) * 2019-01-29 2020-09-08 西安邮电大学 Method and device for vehicle detection and computer readable storage medium
CN109858451B (en) * 2019-02-14 2020-10-23 清华大学深圳研究生院 Non-matching hand detection method
CN110059544B (en) * 2019-03-07 2021-03-26 华中科技大学 Pedestrian detection method and system based on road scene
CN110097050B (en) * 2019-04-03 2024-03-08 平安科技(深圳)有限公司 Pedestrian detection method, device, computer equipment and storage medium
CN110136097A (en) * 2019-04-10 2019-08-16 南方电网科学研究院有限责任公司 One kind being based on the pyramidal insulator breakdown recognition methods of feature and device
CN110211097B (en) * 2019-05-14 2021-06-08 河海大学 Crack image detection method based on fast R-CNN parameter migration
CN110263712B (en) * 2019-06-20 2021-02-23 江南大学 Coarse and fine pedestrian detection method based on region candidates
CN110490058B (en) * 2019-07-09 2022-07-26 北京迈格威科技有限公司 Training method, device and system of pedestrian detection model and computer readable medium
CN112307826A (en) * 2019-07-30 2021-02-02 华为技术有限公司 Pedestrian detection method, device, computer-readable storage medium and chip
CN110443366B (en) * 2019-07-30 2022-08-30 上海商汤智能科技有限公司 Neural network optimization method and device, and target detection method and device
CN110648322B (en) * 2019-09-25 2023-08-15 杭州智团信息技术有限公司 Cervical abnormal cell detection method and system
CN111339967B (en) * 2020-02-28 2023-04-07 长安大学 Pedestrian detection method based on multi-view graph convolution network
CN111523494A (en) * 2020-04-27 2020-08-11 天津中科智能识别产业技术研究院有限公司 Human body image detection method
CN111832383B (en) * 2020-05-08 2023-12-08 北京嘀嘀无限科技发展有限公司 Training method of gesture key point recognition model, gesture recognition method and device
CN111681243B (en) * 2020-08-17 2021-02-26 广东利元亨智能装备股份有限公司 Welding image processing method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750708A (en) * 2012-05-11 2012-10-24 天津大学 Affine motion target tracing algorithm based on fast robust feature matching
CN103247059A (en) * 2013-05-27 2013-08-14 北京师范大学 Remote sensing image region of interest detection method based on integer wavelets and visual features
CN104850844A (en) * 2015-05-27 2015-08-19 成都新舟锐视科技有限公司 Pedestrian detection method based on rapid construction of image characteristic pyramid
CN105678231A (en) * 2015-12-30 2016-06-15 中通服公众信息产业股份有限公司 Pedestrian image detection method based on sparse coding and neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016054779A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Spatial pyramid pooling networks for image processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750708A (en) * 2012-05-11 2012-10-24 天津大学 Affine motion target tracing algorithm based on fast robust feature matching
CN103247059A (en) * 2013-05-27 2013-08-14 北京师范大学 Remote sensing image region of interest detection method based on integer wavelets and visual features
CN104850844A (en) * 2015-05-27 2015-08-19 成都新舟锐视科技有限公司 Pedestrian detection method based on rapid construction of image characteristic pyramid
CN105678231A (en) * 2015-12-30 2016-06-15 中通服公众信息产业股份有限公司 Pedestrian image detection method based on sparse coding and neural network

Also Published As

Publication number Publication date
CN108038409A (en) 2018-05-15

Similar Documents

Publication Publication Date Title
CN108038409B (en) Pedestrian detection method
CN111209810B (en) Boundary frame segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time through visible light and infrared images
WO2019144575A1 (en) Fast pedestrian detection method and device
CN109284670B (en) Pedestrian detection method and device based on multi-scale attention mechanism
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN112733822B (en) End-to-end text detection and identification method
CN108062525B (en) Deep learning hand detection method based on hand region prediction
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN107545263B (en) Object detection method and device
CN106951830B (en) Image scene multi-object marking method based on prior condition constraint
WO2021218786A1 (en) Data processing system, object detection method and apparatus thereof
CN111860439A (en) Unmanned aerial vehicle inspection image defect detection method, system and equipment
CN109377511B (en) Moving target tracking method based on sample combination and depth detection network
US10755146B2 (en) Network architecture for generating a labeled overhead image
CN110263877B (en) Scene character detection method
CN112991269A (en) Identification and classification method for lung CT image
CN113139543A (en) Training method of target object detection model, target object detection method and device
Zhao et al. Multiscale object detection in high-resolution remote sensing images via rotation invariant deep features driven by channel attention
CN107767416A (en) The recognition methods of pedestrian&#39;s direction in a kind of low-resolution image
WO2023116632A1 (en) Video instance segmentation method and apparatus based on spatio-temporal memory information
CN106407978B (en) Method for detecting salient object in unconstrained video by combining similarity degree
CN115620081B (en) Training method of target detection model and target detection method and device
CN115147418B (en) Compression training method and device for defect detection model
CN115240119A (en) Pedestrian small target detection method in video monitoring based on deep learning
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant