CN108898047B - Pedestrian detection method and system based on blocking and shielding perception - Google Patents
Pedestrian detection method and system based on blocking and shielding perception Download PDFInfo
- Publication number
- CN108898047B CN108898047B CN201810393658.1A CN201810393658A CN108898047B CN 108898047 B CN108898047 B CN 108898047B CN 201810393658 A CN201810393658 A CN 201810393658A CN 108898047 B CN108898047 B CN 108898047B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- frame
- preset
- anchor
- anchor point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 148
- 230000008447 perception Effects 0.000 title claims abstract description 32
- 230000000903 blocking effect Effects 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 33
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 238000012216 screening Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims description 123
- 238000012549 training Methods 0.000 claims description 81
- 238000013527 convolutional neural network Methods 0.000 claims description 61
- 238000013528 artificial neural network Methods 0.000 claims description 44
- 238000002372 labelling Methods 0.000 claims description 30
- 239000000126 substance Substances 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 12
- 230000003778 catagen phase Effects 0.000 claims description 9
- 230000003321 amplification Effects 0.000 claims description 6
- 238000005065 mining Methods 0.000 claims description 6
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000003909 pattern recognition Methods 0.000 abstract description 3
- 238000011176 pooling Methods 0.000 description 9
- 230000002776 aggregation Effects 0.000 description 6
- 238000004220 aggregation Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000006116 polymerization reaction Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 210000000746 body region Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
Abstract
The invention belongs to the technical field of pattern recognition, and particularly relates to a pedestrian detection method and system based on blocking shielding perception, aiming at solving the technical problem of low pedestrian detection accuracy caused by the fact that pedestrians are shielded. To this end, the pedestrian detection method of the invention includes: acquiring image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image; performing feature fusion on the acquired image features to obtain the overall features of the corresponding pedestrians; acquiring a plurality of detection result frames of the image of the pedestrian to be detected according to the overall characteristics; and selecting a detection result frame which meets preset screening conditions from the obtained detection result frames. Based on the steps, the blocked pedestrians in the image to be detected can be effectively detected. Meanwhile, the pedestrian detection system of the invention can execute and realize the method.
Description
Technical Field
The invention belongs to the technical field of pattern recognition, and particularly relates to a pedestrian detection method and system based on blocking shielding perception.
Background
The pedestrian detection technology is a technology for automatically searching the position and size of a pedestrian in an arbitrary input image, and is widely applied to the fields of computer vision, pattern recognition and the like, such as automatic driving, video monitoring, biometric recognition and the like.
Under a complex environment in real life, the problem of blocking of pedestrians is one of the biggest challenges facing pedestrian detection at present, and especially under a crowded scene, how to perform efficient and accurate pedestrian detection is a hot spot and a difficult point of research. To address this problem, most current pedestrian detection methods use a block-based model, by learning a series of block detectors, and synthesizing the results of each detector for final pedestrian localization. These methods, however, simply require that each predicted detection window be as close as possible to the pedestrian labeling box without taking into account the inherent link between them. Therefore, the performance of these pedestrian detectors is very sensitive to the setting of Non Maximum Suppression (Non Maximum Suppression) thresholds, which have a greater impact on the detector performance, especially in large-scale crowded scenes.
Disclosure of Invention
In order to solve the above-mentioned problems in the prior art, that is, to solve the technical problem of low accuracy of pedestrian detection caused by blocking a pedestrian, an aspect of the present invention provides a pedestrian detection method based on blocking perception, including:
acquiring image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
performing feature fusion on the acquired image features to obtain the overall features of the corresponding pedestrians;
acquiring a plurality of detection result frames of the to-be-detected pedestrian image according to the overall characteristics;
selecting a detection result frame which meets a preset screening condition from the obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network.
Further, before "based on a pre-constructed pedestrian detection model, and according to a to-be-detected pedestrian image, acquiring image features corresponding to each preset human body detection area", the method further includes:
performing data amplification processing on the preset training image to obtain a training sample;
matching the anchor point frame with a pedestrian marking frame in the training sample, and dividing the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
selecting a preset first number of negative samples by adopting a difficult negative sample mining method;
calculating a loss function value according to the positive sample and the selected negative sample, and updating the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
Further, the Faster R-CNN neural network comprises an RPN module; before "based on a pre-constructed pedestrian detection model, and according to a to-be-detected pedestrian image, acquiring image features corresponding to each preset human body detection area", the method further includes:
based on a preset training image and according to a loss function shown in the following formula, performing network training on the RPN module:
wherein the content of the first and second substances,in order to classify the loss function for the pedestrian,for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;andrespectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is:
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
wherein the content of the first and second substances,in order to be a function of the regression loss,β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
wherein N isregThe total number of anchor boxes for the regression phase,is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Further, the Faster R-CNN neural network also comprises a Fast R-CNN module; before "based on a pre-constructed pedestrian detection model, and according to a to-be-detected pedestrian image, acquiring image features corresponding to each preset human body detection area", the method further includes:
based on a preset training image, performing network training on the Fast R-CNN module according to a loss function shown in the following formula:
wherein the content of the first and second substances,in order to classify the loss function for the pedestrian,in order to be a function of the polymerization loss,for occlusion handling loss functions, i denotes the anchor box label, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;andrespectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha3Is a third hyperparameter, and lambda is a fourth hyperparameter;
the pedestrian classification loss function is:
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
wherein the content of the first and second substances,in order to be a function of the regression loss,β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
wherein N isregThe total number of anchor boxes for the regression phase,is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Further, the step of "matching the anchor point frame with the pedestrian labeling frame in the training sample" specifically includes:
calculating the intersection, division and superposition ratio of each anchor point frame and each pedestrian marking frame;
selecting an anchor point frame which is intersected with each pedestrian marking frame and has the largest overlapping ratio, and matching each selected anchor point frame with each corresponding human face marking frame;
judging whether the intersection and superposition ratio of the rest anchor frames and each pedestrian marking frame is greater than a preset first threshold value or not after the selected anchor frames are removed: if so, matching;
acquiring human face labeling frames of which the matching number of the anchor frames is less than a preset second number, and selecting all anchor frames which are subjected to intersection and superposition with each pedestrian labeling frame and have a superposition ratio greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
selecting a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; and the value of the preset third quantity is the average matching quantity of the anchor points of the pedestrian labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
In another aspect of the present invention, a pedestrian detection system based on block occlusion perception is further provided, including:
the image characteristic acquisition module is configured to acquire image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
the image feature fusion module is configured to perform feature fusion on the image features acquired by the image feature acquisition module to obtain the overall features of the corresponding pedestrians;
the detection result frame acquisition module is configured to acquire a plurality of detection result frames of the to-be-detected pedestrian image according to the overall characteristics obtained by the image characteristic fusion module;
a detection result frame screening module configured to select a detection result frame satisfying a preset screening condition among the plurality of obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network.
Further, the pedestrian detection system further comprises a model training module, the model training module comprising:
the training image processing unit is configured to perform data amplification processing on the preset training image to obtain a training sample;
the positive and negative sample dividing unit is configured to match the anchor point frame with a pedestrian marking frame in the training sample and divide the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
the negative sample screening unit is configured to select a preset first number of negative samples by adopting a difficult negative sample mining method;
a network updating unit configured to calculate a loss function value according to the positive sample and the selected negative sample, and update the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
Further, the Faster R-CNN neural network comprises an RPN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown in the following formula, performing network training on the RPN module:
wherein the content of the first and second substances,in order to classify the loss function for the pedestrian,for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the corresponding prediction coordinate of the pedestrian;Andrespectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is:
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
wherein the content of the first and second substances,in order to be a function of the regression loss,β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
wherein N isregThe total number of anchor boxes for the regression phase,is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Further, the Faster R-CNN neural network comprises a Fast R-CNN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown as the following formula, performing network training on the Fast R-CNN module:
wherein the content of the first and second substances,in order to classify the loss function for the pedestrian,in order to be a function of the polymerization loss,for occlusion handling loss functions, i denotes the anchor box label, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;andrespectively representing association with the ith anchor blockObject class labels and corresponding calibration coordinates, alpha3Is a third hyperparameter, and lambda is a fourth hyperparameter;
the pedestrian classification loss function is:
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
wherein the content of the first and second substances,in order to be a function of the regression loss,β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
wherein N isregThe total number of anchor boxes for the regression phase,is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the mark of the ithDetermining the total number of anchor points associated with the pedestrian, j being the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Further, the positive and negative sample division unit includes:
an intersection and overlap ratio calculation subunit configured to calculate an intersection and overlap ratio of each anchor point frame and each pedestrian labeling frame;
the first matching subunit is configured to select the anchor point frame which is subjected to intersection with each pedestrian marking frame and has the largest overlapping ratio, and match each selected anchor point frame with each corresponding face marking frame;
a second matching subunit, configured to determine whether, after removing the selected anchor point frame, the intersection and superposition ratio of the remaining anchor point frames to each pedestrian labeling frame is greater than a preset first threshold: if so, matching;
the third matching subunit is configured to acquire the face labeling frames of which the matching number of the anchor frames is less than a preset second number, and select all the anchor frames of which the intersection and superposition ratio with each pedestrian labeling frame is greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
the fourth matching subunit is configured to select a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; and the value of the preset third quantity is the average matching quantity of the anchor points of the human face labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
Compared with the closest prior art, the technical scheme at least has the following beneficial effects:
1. according to the pedestrian detection method based on block shielding perception, provided by the invention, the image characteristics of pedestrians are obtained in a block mode according to the preset human body detection area according to the pedestrian detection model established by the Faster R-CNN neural network, and then the obtained image characteristics are fused, so that the shielded pedestrians in the image to be detected can be effectively detected.
2. The high convolution layer in the pedestrian detection model provided by the invention is associated with the anchor point frame, and the high convolution base layer can extract deeper semantic information, so that the pedestrian detection precision is improved.
3. The pedestrian detection system based on the block shielding perception can realize the pedestrian detection method based on the block shielding perception.
Drawings
FIG. 1 is a schematic diagram of main steps of a pedestrian detection method based on block occlusion perception in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a main structure of a block occlusion aware ROI pooling unit according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a main structure of an occlusion processing unit for sensing blocking in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a pedestrian detection system based on block occlusion perception in an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
The pedestrian is very easy to be sheltered from in large-scale crowded environment, is difficult to carry out effective detection to the pedestrian in this way the pedestrian detection process. Based on the method, the pedestrian detection method based on the blocking shielding perception can be used for efficiently and accurately detecting pedestrians in a complex environment, and a satisfactory detection result can be still obtained under the condition that large-scale shielding exists.
The following describes a pedestrian detection method based on block occlusion perception according to the present invention with reference to the accompanying drawings.
Fig. 1 exemplarily shows an implementation flow of a pedestrian detection method based on block occlusion perception in this embodiment, and as shown in fig. 1, the pedestrian detection method based on block occlusion perception in this embodiment may include the following steps:
step S101: and acquiring image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to the image of the pedestrian to be detected.
Step S102: and carrying out feature fusion on the acquired image features to obtain the overall features of the corresponding pedestrians.
Step S103: and acquiring a plurality of detection result frames of the pedestrian image to be detected according to the overall characteristics.
Step S104: selecting a detection result frame which meets a preset screening condition from the obtained multiple detection result frames;
specifically, the pedestrian detection model in the present embodiment is a model constructed based on the Faster R-CNN neural network, and an anchor point frame is associated in the high convolution layer of the Faster R-CNN neural network. The dimensions and associated layers of the anchor block of the pedestrian detection model, as well as the basic network framework of the design, are described in detail below.
In the design of the size of the anchor point frame and the related layer, the abundance degree of semantic information and spatial information of feature maps extracted by different convolution layers is different, and considering that under the condition of large-scale shielding, the feature information of a target pedestrian is difficult to extract due to the shielding, and more semantic information is required to support. In practical application, the pedestrian target with the extremely small size as the human face detection does not exist, and the requirement on the spatial information is greatly reduced. The semantic information contained in the features of the bottom layer of the shallow neural network is shallow, and the identification capability of the object with a larger scale is insufficient due to the small receptive field; moreover, because the extracted shallow features lack enough semantic information, under the interference of shielding and the like, the performance of the resolution device is greatly reduced and the robustness is insufficient due to the fact that the extraction of the features is more difficult; and the deep neural network layer can extract deeper semantic information and global information, and although part of spatial information is lost, the characteristics of the deep convolutional layer can effectively overcome the problem of insufficient feature extraction caused by occlusion in a complex environment, particularly under the condition of occlusion.
Therefore, in this embodiment, the top convolutional layer (i.e., the high convolutional layer) is selected to be associated with the anchor frame. For example, if the VGG-16 model is selected as the basic architecture and the selected high-level convolution layer is conv5_3, then a pedestrian image to be detected with a size of 1000 × 600 is obtained, and the corresponding feature map size is 60 × 40. In order to realize the detection of pedestrians with different sizes in the image, for each position of the feature map, anchor point frames with 11 different sizes are densely paved: the areas are respectively (32)2,432,582,782,1062,1442,1942,2612,3532,4772,6432) The width-to-height ratio of all anchor points is 0.41 (human body approximate scale), so as to realize pedestrian detection of different sizes in the image.
For the problems of false detection and missed detection caused by occlusion between pedestrians, in the network framework of the pedestrian detection model in the embodiment, a RoI pooling layer in an original Fast R-CNN module in the pedestrian detection model is replaced by a partitioned occlusion sensing RoI pooling unit, and the RoI pooling unit is used for integrating structural information of different positions of a human body, inputting the integrated information into the Fast R-CNN module, and estimating the occlusion state through a small neural network.
Referring to fig. 2, fig. 2 illustrates the main structure of a block occlusion aware ROI pooling unit in this embodiment. As shown in fig. 2, the body region is first divided into five sections, each of which uses the RoI pooling layer to sample the features into a small feature map of fixed size (7 in both width and height). Then, based on the obtained feature maps of the different human body regions, the visibility of each part is estimated using an occlusion processing unit. Referring to fig. 3, fig. 3 is a schematic diagram illustrating a main structure of an occlusion processing unit for block occlusion sensing according to an embodiment of the present invention, and as shown in fig. 3, the occlusion processing unit is composed of three convolution layers followed by a softmax layer, and performs parameter training on the occlusion processing unit by using a log loss function. In particular, assume ci,jThe jth part, o, representing the ith candidate windowi,jIndicating visibility of corresponding predictionsThe method comprises the following steps of dividing,is the true visibility score for the corresponding calibration. If c isi,jIs more than half visible, thenOtherwise it is 0. Mathematically, i.e. if ci,jThe intersection and parallel ratio between the calibration window and the corresponding calibration window is greater than or equal to 0.5Otherwise it is 0. Formula (1) shows a formula for scoring the visibility of each part based on the occlusion processing unit,
wherein Ω (. cndot.) is an area calculation function, U (c)i,j) Is ci,jThe area of (a) is,is composed ofTheta is the set intersection ratio threshold, here set to 0.5, indicating that more than half of the portion is visible, thenOtherwise it is 0. Therefore, the present embodiment defines the loss function of the occlusion handling unit according to equation (2):
wherein i is the reference number of the anchor frame, tiFor the coordinates corresponding to the predicted pedestrian at the ith anchor point frame,and the calibration coordinates of the object associated with the ith anchor point frame.
And then, performing dot product operation on the feature map of each human body part and the corresponding prediction visibility to obtain the final feature, wherein the feature dimension is 512 multiplied by 7. Finally, the characteristic graphs of the five parts of the human body are added one by one according to elements and are used for the classification and window regression of the Fast R-CNN module.
Further, in the pedestrian detection method shown in fig. 1 in this embodiment, the pedestrian detection model may be subjected to network training according to a preset training image, so as to obtain the pedestrian detection model meeting a preset convergence condition.
Specifically, in this embodiment, the network training may be performed on the pedestrian detection model according to the following steps:
step S201: and carrying out data augmentation processing on the preset training image to obtain a training sample.
In this embodiment, the data amplification processing on the training image may include color dithering operation, random cropping operation, horizontal flipping operation, and scale transformation operation:
firstly, performing color dithering operation on a training image, specifically: parameters such as brightness, contrast, and saturation of the training image were randomly adjusted with a probability of 0.6, respectively.
Secondly, random clipping operation is carried out on the training image after the color dithering operation, and the method specifically comprises the following steps: 6 sub-images of squares are randomly cropped. Wherein, 1 sub-image is the largest square sub-image in the training image, and the side lengths of the other 4 sub-images are 0.4-1.0 times of the short side of the training image. And randomly selecting 1 sub-image from 5 sub-images as a final training sample.
Thirdly, carrying out horizontal turning operation on the selected training sample, specifically: the horizontal flipping operation can be performed randomly with a probability of 0.6.
Finally, carrying out scale transformation operation on the training sample after the horizontal overturning operation, specifically comprising the following steps: the training samples are scaled to 1000 x 600 images.
In the embodiment, the color dithering operation, the random clipping operation, the horizontal turning operation and the scale transformation operation are sequentially performed on the training image, so that the data volume can be increased under the condition of not changing the image category, and the generalization capability of the model can be improved.
Step S202: matching the anchor point frame with a pedestrian marking frame in the training sample, and dividing the anchor point frame into a positive sample and a negative sample according to a matching result; wherein, the positive sample is the anchor point frame matched with the pedestrian mark frame, and the negative sample is the anchor point frame unmatched with the pedestrian mark frame.
Specifically, in order to solve the problem that a part of pedestrians cannot match enough anchor points under the existing matching strategy, the invention adopts a certain compensation strategy for the marking frame. The steps of matching the anchor point frame with the pedestrian marking frame in the training sample are as follows:
firstly, calculating the intersection and superposition ratio of each anchor point frame and each pedestrian marking frame;
secondly, selecting the anchor point frame which is intersected with each pedestrian marking frame and has the largest overlapping ratio, and matching each selected anchor point frame with each corresponding pedestrian marking frame;
thirdly, after the selected anchor point frame is removed, judging whether the intersection and superposition ratio of the rest anchor point frames and each pedestrian marking frame is larger than a preset first threshold value: if so, matching; in this embodiment, the first threshold is 0.4, and it should be noted that the average value of the number of anchor blocks matched by all pedestrian labeling blocks matched with enough anchor blocks is Np。
Thirdly, acquiring pedestrian marking frames with the matching number of the anchor frames smaller than the preset second number, and selecting all anchor frames with the intersection and superposition ratio of each pedestrian marking frame larger than the preset second threshold; the preset first threshold value is larger than a preset second threshold value; in this embodiment, the step is a scale compensation operation for missing and missing repair, the second threshold is set to 0.1, and for pedestrian labeling frames that are not matched with enough anchor frames, all anchor frames that are intersected with the pedestrian labeling frame and have a superposition ratio greater than 0.1 are selected. Equation (3) shows that all anchor box sequences with an intersection and overlap ratio greater than 0.1:
[a1,a2,a3,...,aN] (3)
wherein, aNIncluding the location and size of the anchor box.
Finally, according to the sequence of the intersection and the superposition of all the selected anchor point frames from large to small, selecting a preset third number of anchor point frames to match with the corresponding pedestrian marking frames; in the implementation, according to the intersection ratio of the pedestrian mark frames and the pedestrian mark frames, the pedestrian mark frames are sorted in a descending order according to a formula (4),
[A1,A2,A3,...,AN] (4)
finally, select the first NpAnd the anchor point frame is used as the anchor point frame matched with the pedestrian marking frame. Wherein N ispAnd setting the average matching number of the pedestrian labeling boxes as an adjustable parameter by default.
And the value of the preset third quantity is the average matching quantity of the anchor points of the human face labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
Step S203: selecting a preset first number of negative samples by adopting a difficult negative sample mining method
Specifically, for all negative samples, calculating error values brought by classification prediction of the negative samples, performing descending sorting according to the error values, selecting a batch of negative samples with the largest error values as the negative samples of the training data set, and discarding all the other negative samples to ensure that the quantity ratio of the positive samples to the negative samples is 1: 3. Therefore, the positive and negative samples have a relatively balanced quantitative relation, which is beneficial to the smooth network training.
Step S204: calculating a loss function value according to the positive sample and the selected negative sample, and updating the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
In particular, to reduce false detection problems caused by mutual occlusion between adjacent pedestrians, it is required that the candidate window should be closer to the pedestrian location associated therewith as calibrated in the data set. The traditional Faster R-CNN detection framework consists of two parts, namely a Regional Proposal Network (RPN) module and a Fast R-CNN module. The former is used to generate high quality candidate windows, while the latter is used to perform object classification and regression calculations on these candidate windows to better locate the object.
For the false detection problem caused by the occlusion of the adjacent pedestrian, in this embodiment, the loss function of the regional recommendation network (RPN) module is adjusted and redefined, and the loss function of the regional recommendation network (RPN) module is as shown in formula (5):
wherein i is the anchor box label, piAnd tiThe prediction probability of the pedestrian and the corresponding prediction coordinate of the pedestrian are set as the ith anchor point frame;andlabeling the object class associated with the ith anchor point frame and the corresponding calibration coordinates (here, a binary problem, the pedestrian class is 1, and the background class is 0); alpha is alpha1To introduce the first hyperparameter, the two loss functions are weight adjusted.To classify the pedestrian as a function of lossAs a function of polymerization loss.
The classification loss is estimated using a log loss function, which is defined as equation (6):
wherein the content of the first and second substances,Nclsthe total number of anchor boxes in the classification process.
In order to enable the RPN module to generate the correct candidate window more efficiently, the present invention introduces a new penalty function, called aggregation penalty function (aggregation loss), into the RPN module. The loss function not only enables the candidate windows to more accurately locate the annotation positions of the pedestrians associated with the candidate windows, but also reduces the distance between the candidate windows associated with the same pedestrian. The definition of the aggregation loss function is shown in equation (7):
wherein the content of the first and second substances,the regression loss function is used for constraining the candidate window to enable the candidate window to be closer to a target calibration window; whileThe candidate window is constrained to more compactly position the position of the target calibration object for a compactness loss function; β is a second hyperparameter for adjusting the weights of the two loss functions.
The invention uses a smooth L1Loss function definition regression loss functionFor measuring the accuracy of the predicted detection window, as shown in equation (8):
wherein N isregThe total number of anchor boxes for the regression phase,is about the predicted detection window tiL of1Loss value of the loss function.
Compactness loss functionFor evaluating the confidence level of all candidate windows associated with the same labeled pedestrian. In particular, assume thatThe pedestrian calibration window is a calibrated pedestrian sequence, and the pedestrian calibration windows are provided with anchor point frames associated with the pedestrian calibration window, namely at least one anchor point frame is intersected with the calibration window; { phi1,...,ΦpThe anchor point frame associated with the marked pedestrian window is marked with a sequence, i.e. for the mark phikThe anchor block of (1) is referred to by the reference numeralIs associated with the pedestrian. Here, smooth L is used1The loss function measures the error between the predicted position information of the anchor frame and the actually calibrated position information, and is used for describing the compactness between the predicted detection window and the actually calibrated window, and the specific form of the compactness loss function is shown as formula (9):
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiL is the total number of anchor points associated with the ith calibration pedestrian, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Meanwhile, in order to further improve the accuracy of window regression and strengthen the pedestrian detection capability of the model for the shielded environment, the invention also introduces an aggregation loss term into the loss function of the Fast R-CNN module, and the loss function is shown as a formula (10):
wherein alpha is3Is the third hyperparameter and lambda is the fourth hyperparameter, and is a classification loss functionAnd the aggregate loss functionIs as defined in the RPN network,the loss function is processed for occlusion as shown in equation (2). By introducing the aggregation loss item into the RPN module and the Fast R-CNN module of the pedestrian detector, the positioning capability of the detection window can be enhanced, and the overall detection performance is improved.
And then, iteratively updating the network parameters by using a random gradient descent method and back propagation errors until the training converges or the set maximum training times is reached to obtain the final network model parameters.
And in the testing stage, inputting the testing image into the trained network model for pedestrian detection, and outputting a detection result frame. Since the number of output detection frames is very large, most detection frames are screened out firstly by a confidence threshold value T of 0.05, and then the top N is selected according to the confidencea400 detection frames. Then, using a non-maximum value inhibition method to remove repeated detection frames, and selecting the top N according to the confidence coefficientbAnd (5) obtaining the final detection result by 200 detection frames.
Aiming at the pedestrian detection problem in a large-scale shielding environment, the accuracy of pedestrian detection is improved by introducing the shielding perception R-CNN model. Specifically, the invention designs a new aggregation loss function to reduce the false detection problem caused by the overlapping between adjacent pedestrians, and enables the candidate window to be positioned to the target pedestrian position more compactly and accurately; meanwhile, in order to solve the detection problem caused by occlusion, the invention designs a partitioned occlusion perception RoI pooling unit to replace an ROI pooling layer used in the traditional Fast R-CNN, and the pooling unit reduces the influence of occlusion on pedestrian detection by integrating visibility predicted values of different parts of a human body. When the convolutional neural network is trained, the pedestrian marking frame and the anchor point frame need to be matched, but under the existing matching strategy, the pedestrian marking frame with a certain scale cannot be matched with enough anchor point frames. Finally, the invention realizes the pedestrian detection method based on the blocking shielding perception, can efficiently and accurately detect the pedestrian in the image, and particularly remarkably improves the pedestrian detection capability in a large-scale shielding environment.
The invention further provides a pedestrian detection system based on block occlusion perception, and referring to fig. 4, fig. 4 exemplarily shows a schematic diagram of a pedestrian detection system based on block occlusion perception in the embodiment, and as shown in fig. 4, the system includes:
the image characteristic acquisition module is configured to acquire image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
the image feature fusion module is configured to perform feature fusion on the image features acquired by the image feature acquisition module to obtain the overall features of the corresponding pedestrians;
the detection result frame acquisition module is configured to acquire a plurality of detection result frames of the image of the pedestrian to be detected according to the overall characteristics acquired by the image characteristic fusion module;
a detection result frame screening module configured to select a detection result frame satisfying a preset screening condition among the plurality of obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network.
In a preferred embodiment of the above pedestrian detection system based on blocking occlusion perception, the pedestrian detection system further includes a model training module, and the model training module includes:
the training image processing unit is configured to perform data amplification processing on a preset training image to obtain a training sample;
the positive and negative sample dividing unit is configured to match the anchor point frame with a pedestrian marking frame in the training sample and divide the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
the negative sample screening unit is configured to select a preset first number of negative samples by adopting a difficult negative sample mining method;
a network updating unit configured to calculate a loss function value according to the positive sample and the selected negative sample, and update the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
In the above preferred embodiment of the pedestrian detection system based on block occlusion perception, the Faster R-CNN neural network comprises an RPN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown in a formula (11), carrying out network training on the RPN module:
wherein the content of the first and second substances,in order to classify the loss function for the pedestrian,for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;andrespectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is shown in equation (12):
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the polymerization loss function is shown in equation (13):
wherein the content of the first and second substances,in order to be a function of the regression loss,β is a second hyperparameter, which is a compactness loss function;
the regression loss function is shown in equation (14):
wherein N isregThe total number of anchor boxes for the regression phase,is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is shown in equation (15):
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
In the above preferred embodiment of the pedestrian detection system based on block occlusion perception, the Faster R-CNN neural network comprises a Fast R-CNN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown in a formula (16), performing network training on a Fast R-CNN module:
wherein the content of the first and second substances,in order to classify the loss function for the pedestrian,in order to be a function of the polymerization loss,for occlusion handling loss functions, i denotes the anchor box label, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;andrespectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha3Is a third hyperparameter, and lambda is a fourth hyperparameter;
the pedestrian classification loss function is shown in equation (17):
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the polymerization loss function is shown in equation (18):
wherein the content of the first and second substances,in order to be a function of the regression loss,β is a second hyperparameter, which is a compactness loss function;
the regression loss function is shown in equation (19):
wherein N isregThe total number of anchor boxes for the regression phase,is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is shown in equation (20):
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
In a preferred embodiment of the above pedestrian detection system based on block occlusion perception, the positive and negative sample dividing unit includes:
an intersection and overlap ratio calculation subunit configured to calculate an intersection and overlap ratio of each anchor point frame and each pedestrian labeling frame;
the first matching subunit is configured to select the anchor point frame which is subjected to intersection with each pedestrian marking frame and has the largest overlapping ratio, and match each selected anchor point frame with each corresponding face marking frame;
a second matching subunit, configured to determine whether, after removing the selected anchor point frame, the intersection and superposition ratio of the remaining anchor point frames to each pedestrian labeling frame is greater than a preset first threshold: if so, matching;
the third matching subunit is configured to acquire the face labeling frames of which the matching number of the anchor frames is less than a preset second number, and select all the anchor frames which are subjected to intersection with each pedestrian labeling frame and have the superposition ratio greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
the fourth matching subunit is configured to select a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; the value of the preset third number is the average matching number of the anchor points of the human face labeling frames of which the matching number is greater than or equal to the preset second number.
Those of skill in the art will appreciate that the various illustrative systems and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
Claims (8)
1. A pedestrian detection method based on block occlusion perception is characterized by comprising the following steps:
acquiring image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
performing feature fusion on the acquired image features to obtain the overall features of the corresponding pedestrians;
acquiring a plurality of detection result frames of the to-be-detected pedestrian image according to the overall characteristics;
selecting a detection result frame which meets a preset screening condition from the obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network;
the Faster R-CNN neural network comprises an RPN module, and the method further comprises the step of carrying out network training on the RPN module based on a preset training image and according to a loss function shown in the following formula:
wherein the content of the first and second substances,in order to classify the loss function for the pedestrian,for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;andrespectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is:
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
wherein the content of the first and second substances,in order to be a function of the regression loss,β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
wherein N isregThe total number of anchor boxes for the regression phase,is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
2. The pedestrian detection method based on blocking and blocking perception according to claim 1, wherein before acquiring image features corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image, the method further comprises:
performing data amplification processing on a preset training image to obtain a training sample;
matching the anchor point frame with a pedestrian marking frame in the training sample, and dividing the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
selecting a preset first number of negative samples by adopting a difficult negative sample mining method;
calculating a loss function value according to the positive sample and the selected negative sample, and updating the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
3. The pedestrian detection method based on block occlusion perception according to claim 2, wherein the Faster R-CNN neural network further comprises a Fast R-CNN module; before "based on a pre-constructed pedestrian detection model, and according to a to-be-detected pedestrian image, acquiring image features corresponding to each preset human body detection area", the method further includes:
based on a preset training image, performing network training on the Fast R-CNN module according to a loss function shown in the following formula:
4. The pedestrian detection method based on block occlusion perception according to claim 2 or 3, wherein the step of matching the anchor point frame with the pedestrian labeling frame in the training sample specifically comprises:
calculating the intersection, division and superposition ratio of each anchor point frame and each pedestrian marking frame;
selecting an anchor point frame which is intersected with each pedestrian marking frame and has the largest overlapping ratio, and matching each selected anchor point frame with each corresponding human face marking frame;
judging whether the intersection and superposition ratio of the rest anchor frames and each pedestrian marking frame is greater than a preset first threshold value or not after the selected anchor frames are removed: if so, matching;
acquiring human face labeling frames of which the matching number of the anchor frames is less than a preset second number, and selecting all anchor frames which are subjected to intersection and superposition with each pedestrian labeling frame and have a superposition ratio greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
selecting a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; and the value of the preset third quantity is the average matching quantity of the anchor points of the pedestrian labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
5. A pedestrian detection system based on blocking shielding perception is characterized by comprising:
the image characteristic acquisition module is configured to acquire image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
the image feature fusion module is configured to perform feature fusion on the image features acquired by the image feature acquisition module to obtain the overall features of the corresponding pedestrians;
the detection result frame acquisition module is configured to acquire a plurality of detection result frames of the to-be-detected pedestrian image according to the overall characteristics obtained by the image characteristic fusion module;
a detection result frame screening module configured to select a detection result frame satisfying a preset screening condition among the plurality of obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network;
the system also comprises a model training module, wherein the Faster R-CNN neural network comprises an RPN module; in this case, the model training module is configured to perform the following operations:
based on a preset training image and according to a loss function shown in the following formula, performing network training on the RPN module:
wherein the content of the first and second substances,in order to classify the loss function for the pedestrian,for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;andrespectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is:
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
wherein the content of the first and second substances,in order to be a function of the regression loss,β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
wherein N isregThe total number of anchor boxes for the regression phase,is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
6. The pedestrian detection system based on block occlusion perception according to claim 5, wherein the model training module comprises:
the training image processing unit is configured to perform data amplification processing on a preset training image to obtain a training sample;
the positive and negative sample dividing unit is configured to match the anchor point frame with a pedestrian marking frame in the training sample and divide the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
the negative sample screening unit is configured to select a preset first number of negative samples by adopting a difficult negative sample mining method;
a network updating unit configured to calculate a loss function value according to the positive sample and the selected negative sample, and update the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
7. The block occlusion perception-based pedestrian detection system of claim 6, wherein the Faster R-CNN neural network comprises a Fast R-CNN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown as the following formula, performing network training on the Fast R-CNN module:
8. The pedestrian detection system based on block occlusion perception according to claim 6 or 7, wherein the positive and negative sample division unit comprises:
an intersection and overlap ratio calculation subunit configured to calculate an intersection and overlap ratio of each anchor point frame and each pedestrian labeling frame;
the first matching subunit is configured to select the anchor point frame which is subjected to intersection with each pedestrian marking frame and has the largest overlapping ratio, and match each selected anchor point frame with each corresponding face marking frame;
a second matching subunit, configured to determine whether, after removing the selected anchor point frame, the intersection and superposition ratio of the remaining anchor point frames to each pedestrian labeling frame is greater than a preset first threshold: if so, matching;
the third matching subunit is configured to acquire the face labeling frames of which the matching number of the anchor frames is less than a preset second number, and select all the anchor frames of which the intersection and superposition ratio with each pedestrian labeling frame is greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
the fourth matching subunit is configured to select a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; and the value of the preset third quantity is the average matching quantity of the anchor points of the human face labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810393658.1A CN108898047B (en) | 2018-04-27 | 2018-04-27 | Pedestrian detection method and system based on blocking and shielding perception |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810393658.1A CN108898047B (en) | 2018-04-27 | 2018-04-27 | Pedestrian detection method and system based on blocking and shielding perception |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108898047A CN108898047A (en) | 2018-11-27 |
CN108898047B true CN108898047B (en) | 2021-03-19 |
Family
ID=64342527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810393658.1A Active CN108898047B (en) | 2018-04-27 | 2018-04-27 | Pedestrian detection method and system based on blocking and shielding perception |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108898047B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583512B (en) * | 2018-12-14 | 2021-05-25 | 北京旷视科技有限公司 | Image processing method, device and system |
CN109766796B (en) * | 2018-12-20 | 2023-04-18 | 西华大学 | Deep pedestrian detection method for dense crowd |
CN110222764B (en) * | 2019-06-10 | 2021-06-18 | 中南民族大学 | Method, system, device and storage medium for detecting occluded target |
CN110222657B (en) * | 2019-06-11 | 2021-07-20 | 中国科学院自动化研究所 | Single-step face detector optimization system, method and device |
CN112307826A (en) * | 2019-07-30 | 2021-02-02 | 华为技术有限公司 | Pedestrian detection method, device, computer-readable storage medium and chip |
CN110532985B (en) * | 2019-09-02 | 2022-07-22 | 北京迈格威科技有限公司 | Target detection method, device and system |
CN110796071B (en) * | 2019-10-28 | 2021-02-19 | 广州云从博衍智能科技有限公司 | Behavior detection method, system, machine-readable medium and device |
CN110796069B (en) * | 2019-10-28 | 2021-02-05 | 广州云从博衍智能科技有限公司 | Behavior detection method, system, equipment and machine readable medium |
CN111144203B (en) * | 2019-11-19 | 2023-06-16 | 浙江工商大学 | Pedestrian shielding detection method based on deep learning |
CN110880177A (en) * | 2019-11-26 | 2020-03-13 | 北京推想科技有限公司 | Image identification method and device |
CN110796127A (en) * | 2020-01-06 | 2020-02-14 | 四川通信科研规划设计有限责任公司 | Embryo prokaryotic detection system based on occlusion sensing, storage medium and terminal |
CN111832515A (en) * | 2020-07-21 | 2020-10-27 | 上海有个机器人有限公司 | Dense pedestrian detection method, medium, terminal and device |
CN112465799A (en) * | 2020-12-09 | 2021-03-09 | 南京甄视智能科技有限公司 | Optimization of object detector and object detection |
CN112528995B (en) * | 2020-12-22 | 2023-08-04 | 北京百度网讯科技有限公司 | Method for training target detection model, target detection method and device |
CN112906732B (en) * | 2020-12-31 | 2023-12-15 | 杭州旷云金智科技有限公司 | Target detection method, target detection device, electronic equipment and storage medium |
CN113569726B (en) * | 2021-07-27 | 2023-04-14 | 湖南大学 | Pedestrian detection method combining automatic data amplification and loss function search |
CN114550221B (en) * | 2022-04-22 | 2022-07-22 | 苏州浪潮智能科技有限公司 | Pedestrian re-identification method, device, equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354547A (en) * | 2015-10-30 | 2016-02-24 | 河海大学 | Pedestrian detection method in combination of texture and color features |
CN106022237A (en) * | 2016-05-13 | 2016-10-12 | 电子科技大学 | Pedestrian detection method based on end-to-end convolutional neural network |
CN106599939A (en) * | 2016-12-30 | 2017-04-26 | 深圳市唯特视科技有限公司 | Real-time target detection method based on region convolutional neural network |
CN107358182A (en) * | 2017-06-29 | 2017-11-17 | 维拓智能科技(深圳)有限公司 | Pedestrian detection method and terminal device |
CN107403141A (en) * | 2017-07-05 | 2017-11-28 | 中国科学院自动化研究所 | Method for detecting human face and device, computer-readable recording medium, equipment |
CN107463892A (en) * | 2017-07-27 | 2017-12-12 | 北京大学深圳研究生院 | Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics |
CN107679250A (en) * | 2017-11-01 | 2018-02-09 | 浙江工业大学 | A kind of multitask layered image search method based on depth own coding convolutional neural networks |
CN107730881A (en) * | 2017-06-13 | 2018-02-23 | 银江股份有限公司 | Traffic congestion vision detection system based on depth convolutional neural networks |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9858496B2 (en) * | 2016-01-20 | 2018-01-02 | Microsoft Technology Licensing, Llc | Object detection and classification in images |
-
2018
- 2018-04-27 CN CN201810393658.1A patent/CN108898047B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354547A (en) * | 2015-10-30 | 2016-02-24 | 河海大学 | Pedestrian detection method in combination of texture and color features |
CN106022237A (en) * | 2016-05-13 | 2016-10-12 | 电子科技大学 | Pedestrian detection method based on end-to-end convolutional neural network |
CN106599939A (en) * | 2016-12-30 | 2017-04-26 | 深圳市唯特视科技有限公司 | Real-time target detection method based on region convolutional neural network |
CN107730881A (en) * | 2017-06-13 | 2018-02-23 | 银江股份有限公司 | Traffic congestion vision detection system based on depth convolutional neural networks |
CN107358182A (en) * | 2017-06-29 | 2017-11-17 | 维拓智能科技(深圳)有限公司 | Pedestrian detection method and terminal device |
CN107403141A (en) * | 2017-07-05 | 2017-11-28 | 中国科学院自动化研究所 | Method for detecting human face and device, computer-readable recording medium, equipment |
CN107463892A (en) * | 2017-07-27 | 2017-12-12 | 北京大学深圳研究生院 | Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics |
CN107679250A (en) * | 2017-11-01 | 2018-02-09 | 浙江工业大学 | A kind of multitask layered image search method based on depth own coding convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks;Shaoqing Ren等;《arXiv:1506.01497v3》;20160107;第1-14页 * |
基于新型三元卷积神经网络的行人再辨识算法;朱建清等;《电子与信息学报》;20180419;第40卷(第4期);第1012-1016页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108898047A (en) | 2018-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108898047B (en) | Pedestrian detection method and system based on blocking and shielding perception | |
CN110084292B (en) | Target detection method based on DenseNet and multi-scale feature fusion | |
CN106960195B (en) | Crowd counting method and device based on deep learning | |
CN113160192B (en) | Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background | |
CN110738101B (en) | Behavior recognition method, behavior recognition device and computer-readable storage medium | |
CN108121984B (en) | Character recognition method and device | |
CN103699905B (en) | Method and device for positioning license plate | |
CN108830188A (en) | Vehicle checking method based on deep learning | |
CN110781836A (en) | Human body recognition method and device, computer equipment and storage medium | |
CN109977997B (en) | Image target detection and segmentation method based on convolutional neural network rapid robustness | |
CN104615986B (en) | The method that pedestrian detection is carried out to the video image of scene changes using multi-detector | |
CN111046856B (en) | Parallel pose tracking and map creating method based on dynamic and static feature extraction | |
CN107507170A (en) | A kind of airfield runway crack detection method based on multi-scale image information fusion | |
CN106780727B (en) | Vehicle head detection model reconstruction method and device | |
CN108537286A (en) | A kind of accurate recognition methods of complex target based on key area detection | |
CN111126393A (en) | Vehicle appearance refitting judgment method and device, computer equipment and storage medium | |
CN104463240B (en) | A kind of instrument localization method and device | |
CN108229524A (en) | A kind of chimney and condensing tower detection method based on remote sensing images | |
CN108009556A (en) | A kind of floater in river detection method based on fixed point graphical analysis | |
CN114821102A (en) | Intensive citrus quantity detection method, equipment, storage medium and device | |
CN108133235A (en) | A kind of pedestrian detection method based on neural network Analysis On Multi-scale Features figure | |
CN105740751A (en) | Object detection and identification method and system | |
CN111008576A (en) | Pedestrian detection and model training and updating method, device and readable storage medium thereof | |
CN106845458A (en) | A kind of rapid transit label detection method of the learning machine that transfinited based on core | |
CN112541372B (en) | Difficult sample screening method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |