CN108898047B - Pedestrian detection method and system based on blocking and shielding perception - Google Patents

Pedestrian detection method and system based on blocking and shielding perception Download PDF

Info

Publication number
CN108898047B
CN108898047B CN201810393658.1A CN201810393658A CN108898047B CN 108898047 B CN108898047 B CN 108898047B CN 201810393658 A CN201810393658 A CN 201810393658A CN 108898047 B CN108898047 B CN 108898047B
Authority
CN
China
Prior art keywords
pedestrian
frame
preset
anchor
anchor point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810393658.1A
Other languages
Chinese (zh)
Other versions
CN108898047A (en
Inventor
雷震
张士峰
庄楚斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201810393658.1A priority Critical patent/CN108898047B/en
Publication of CN108898047A publication Critical patent/CN108898047A/en
Application granted granted Critical
Publication of CN108898047B publication Critical patent/CN108898047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks

Abstract

The invention belongs to the technical field of pattern recognition, and particularly relates to a pedestrian detection method and system based on blocking shielding perception, aiming at solving the technical problem of low pedestrian detection accuracy caused by the fact that pedestrians are shielded. To this end, the pedestrian detection method of the invention includes: acquiring image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image; performing feature fusion on the acquired image features to obtain the overall features of the corresponding pedestrians; acquiring a plurality of detection result frames of the image of the pedestrian to be detected according to the overall characteristics; and selecting a detection result frame which meets preset screening conditions from the obtained detection result frames. Based on the steps, the blocked pedestrians in the image to be detected can be effectively detected. Meanwhile, the pedestrian detection system of the invention can execute and realize the method.

Description

Pedestrian detection method and system based on blocking and shielding perception
Technical Field
The invention belongs to the technical field of pattern recognition, and particularly relates to a pedestrian detection method and system based on blocking shielding perception.
Background
The pedestrian detection technology is a technology for automatically searching the position and size of a pedestrian in an arbitrary input image, and is widely applied to the fields of computer vision, pattern recognition and the like, such as automatic driving, video monitoring, biometric recognition and the like.
Under a complex environment in real life, the problem of blocking of pedestrians is one of the biggest challenges facing pedestrian detection at present, and especially under a crowded scene, how to perform efficient and accurate pedestrian detection is a hot spot and a difficult point of research. To address this problem, most current pedestrian detection methods use a block-based model, by learning a series of block detectors, and synthesizing the results of each detector for final pedestrian localization. These methods, however, simply require that each predicted detection window be as close as possible to the pedestrian labeling box without taking into account the inherent link between them. Therefore, the performance of these pedestrian detectors is very sensitive to the setting of Non Maximum Suppression (Non Maximum Suppression) thresholds, which have a greater impact on the detector performance, especially in large-scale crowded scenes.
Disclosure of Invention
In order to solve the above-mentioned problems in the prior art, that is, to solve the technical problem of low accuracy of pedestrian detection caused by blocking a pedestrian, an aspect of the present invention provides a pedestrian detection method based on blocking perception, including:
acquiring image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
performing feature fusion on the acquired image features to obtain the overall features of the corresponding pedestrians;
acquiring a plurality of detection result frames of the to-be-detected pedestrian image according to the overall characteristics;
selecting a detection result frame which meets a preset screening condition from the obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network.
Further, before "based on a pre-constructed pedestrian detection model, and according to a to-be-detected pedestrian image, acquiring image features corresponding to each preset human body detection area", the method further includes:
performing data amplification processing on the preset training image to obtain a training sample;
matching the anchor point frame with a pedestrian marking frame in the training sample, and dividing the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
selecting a preset first number of negative samples by adopting a difficult negative sample mining method;
calculating a loss function value according to the positive sample and the selected negative sample, and updating the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
Further, the Faster R-CNN neural network comprises an RPN module; before "based on a pre-constructed pedestrian detection model, and according to a to-be-detected pedestrian image, acquiring image features corresponding to each preset human body detection area", the method further includes:
based on a preset training image and according to a loss function shown in the following formula, performing network training on the RPN module:
Figure BDA0001644041840000021
wherein the content of the first and second substances,
Figure BDA0001644041840000022
in order to classify the loss function for the pedestrian,
Figure BDA0001644041840000023
for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;
Figure BDA0001644041840000024
and
Figure BDA0001644041840000025
respectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is:
Figure BDA0001644041840000026
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
Figure BDA0001644041840000027
wherein the content of the first and second substances,
Figure BDA0001644041840000031
in order to be a function of the regression loss,
Figure BDA0001644041840000032
β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
Figure BDA0001644041840000033
wherein N isregThe total number of anchor boxes for the regression phase,
Figure BDA0001644041840000034
is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
Figure BDA0001644041840000035
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Further, the Faster R-CNN neural network also comprises a Fast R-CNN module; before "based on a pre-constructed pedestrian detection model, and according to a to-be-detected pedestrian image, acquiring image features corresponding to each preset human body detection area", the method further includes:
based on a preset training image, performing network training on the Fast R-CNN module according to a loss function shown in the following formula:
Figure BDA0001644041840000036
wherein the content of the first and second substances,
Figure BDA0001644041840000037
in order to classify the loss function for the pedestrian,
Figure BDA0001644041840000038
in order to be a function of the polymerization loss,
Figure BDA0001644041840000039
for occlusion handling loss functions, i denotes the anchor box label, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;
Figure BDA00016440418400000310
and
Figure BDA00016440418400000311
respectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha3Is a third hyperparameter, and lambda is a fourth hyperparameter;
the pedestrian classification loss function is:
Figure BDA00016440418400000312
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
Figure BDA0001644041840000041
wherein the content of the first and second substances,
Figure BDA0001644041840000042
in order to be a function of the regression loss,
Figure BDA0001644041840000043
β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
Figure BDA0001644041840000044
wherein N isregThe total number of anchor boxes for the regression phase,
Figure BDA0001644041840000045
is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
Figure BDA0001644041840000046
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Further, the step of "matching the anchor point frame with the pedestrian labeling frame in the training sample" specifically includes:
calculating the intersection, division and superposition ratio of each anchor point frame and each pedestrian marking frame;
selecting an anchor point frame which is intersected with each pedestrian marking frame and has the largest overlapping ratio, and matching each selected anchor point frame with each corresponding human face marking frame;
judging whether the intersection and superposition ratio of the rest anchor frames and each pedestrian marking frame is greater than a preset first threshold value or not after the selected anchor frames are removed: if so, matching;
acquiring human face labeling frames of which the matching number of the anchor frames is less than a preset second number, and selecting all anchor frames which are subjected to intersection and superposition with each pedestrian labeling frame and have a superposition ratio greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
selecting a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; and the value of the preset third quantity is the average matching quantity of the anchor points of the pedestrian labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
In another aspect of the present invention, a pedestrian detection system based on block occlusion perception is further provided, including:
the image characteristic acquisition module is configured to acquire image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
the image feature fusion module is configured to perform feature fusion on the image features acquired by the image feature acquisition module to obtain the overall features of the corresponding pedestrians;
the detection result frame acquisition module is configured to acquire a plurality of detection result frames of the to-be-detected pedestrian image according to the overall characteristics obtained by the image characteristic fusion module;
a detection result frame screening module configured to select a detection result frame satisfying a preset screening condition among the plurality of obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network.
Further, the pedestrian detection system further comprises a model training module, the model training module comprising:
the training image processing unit is configured to perform data amplification processing on the preset training image to obtain a training sample;
the positive and negative sample dividing unit is configured to match the anchor point frame with a pedestrian marking frame in the training sample and divide the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
the negative sample screening unit is configured to select a preset first number of negative samples by adopting a difficult negative sample mining method;
a network updating unit configured to calculate a loss function value according to the positive sample and the selected negative sample, and update the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
Further, the Faster R-CNN neural network comprises an RPN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown in the following formula, performing network training on the RPN module:
Figure BDA0001644041840000051
wherein the content of the first and second substances,
Figure BDA0001644041840000052
in order to classify the loss function for the pedestrian,
Figure BDA0001644041840000053
for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the corresponding prediction coordinate of the pedestrian;
Figure BDA0001644041840000061
And
Figure BDA0001644041840000062
respectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is:
Figure BDA0001644041840000063
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
Figure BDA0001644041840000064
wherein the content of the first and second substances,
Figure BDA0001644041840000065
in order to be a function of the regression loss,
Figure BDA0001644041840000066
β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
Figure BDA0001644041840000067
wherein N isregThe total number of anchor boxes for the regression phase,
Figure BDA0001644041840000068
is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
Figure BDA0001644041840000069
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Further, the Faster R-CNN neural network comprises a Fast R-CNN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown as the following formula, performing network training on the Fast R-CNN module:
Figure BDA00016440418400000610
wherein the content of the first and second substances,
Figure BDA00016440418400000611
in order to classify the loss function for the pedestrian,
Figure BDA00016440418400000612
in order to be a function of the polymerization loss,
Figure BDA00016440418400000613
for occlusion handling loss functions, i denotes the anchor box label, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;
Figure BDA00016440418400000614
and
Figure BDA0001644041840000071
respectively representing association with the ith anchor blockObject class labels and corresponding calibration coordinates, alpha3Is a third hyperparameter, and lambda is a fourth hyperparameter;
the pedestrian classification loss function is:
Figure BDA0001644041840000072
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
Figure BDA0001644041840000073
wherein the content of the first and second substances,
Figure BDA0001644041840000074
in order to be a function of the regression loss,
Figure BDA0001644041840000075
β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
Figure BDA0001644041840000076
wherein N isregThe total number of anchor boxes for the regression phase,
Figure BDA0001644041840000077
is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
Figure BDA0001644041840000078
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the mark of the ithDetermining the total number of anchor points associated with the pedestrian, j being the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Further, the positive and negative sample division unit includes:
an intersection and overlap ratio calculation subunit configured to calculate an intersection and overlap ratio of each anchor point frame and each pedestrian labeling frame;
the first matching subunit is configured to select the anchor point frame which is subjected to intersection with each pedestrian marking frame and has the largest overlapping ratio, and match each selected anchor point frame with each corresponding face marking frame;
a second matching subunit, configured to determine whether, after removing the selected anchor point frame, the intersection and superposition ratio of the remaining anchor point frames to each pedestrian labeling frame is greater than a preset first threshold: if so, matching;
the third matching subunit is configured to acquire the face labeling frames of which the matching number of the anchor frames is less than a preset second number, and select all the anchor frames of which the intersection and superposition ratio with each pedestrian labeling frame is greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
the fourth matching subunit is configured to select a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; and the value of the preset third quantity is the average matching quantity of the anchor points of the human face labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
Compared with the closest prior art, the technical scheme at least has the following beneficial effects:
1. according to the pedestrian detection method based on block shielding perception, provided by the invention, the image characteristics of pedestrians are obtained in a block mode according to the preset human body detection area according to the pedestrian detection model established by the Faster R-CNN neural network, and then the obtained image characteristics are fused, so that the shielded pedestrians in the image to be detected can be effectively detected.
2. The high convolution layer in the pedestrian detection model provided by the invention is associated with the anchor point frame, and the high convolution base layer can extract deeper semantic information, so that the pedestrian detection precision is improved.
3. The pedestrian detection system based on the block shielding perception can realize the pedestrian detection method based on the block shielding perception.
Drawings
FIG. 1 is a schematic diagram of main steps of a pedestrian detection method based on block occlusion perception in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a main structure of a block occlusion aware ROI pooling unit according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a main structure of an occlusion processing unit for sensing blocking in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a pedestrian detection system based on block occlusion perception in an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
The pedestrian is very easy to be sheltered from in large-scale crowded environment, is difficult to carry out effective detection to the pedestrian in this way the pedestrian detection process. Based on the method, the pedestrian detection method based on the blocking shielding perception can be used for efficiently and accurately detecting pedestrians in a complex environment, and a satisfactory detection result can be still obtained under the condition that large-scale shielding exists.
The following describes a pedestrian detection method based on block occlusion perception according to the present invention with reference to the accompanying drawings.
Fig. 1 exemplarily shows an implementation flow of a pedestrian detection method based on block occlusion perception in this embodiment, and as shown in fig. 1, the pedestrian detection method based on block occlusion perception in this embodiment may include the following steps:
step S101: and acquiring image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to the image of the pedestrian to be detected.
Step S102: and carrying out feature fusion on the acquired image features to obtain the overall features of the corresponding pedestrians.
Step S103: and acquiring a plurality of detection result frames of the pedestrian image to be detected according to the overall characteristics.
Step S104: selecting a detection result frame which meets a preset screening condition from the obtained multiple detection result frames;
specifically, the pedestrian detection model in the present embodiment is a model constructed based on the Faster R-CNN neural network, and an anchor point frame is associated in the high convolution layer of the Faster R-CNN neural network. The dimensions and associated layers of the anchor block of the pedestrian detection model, as well as the basic network framework of the design, are described in detail below.
In the design of the size of the anchor point frame and the related layer, the abundance degree of semantic information and spatial information of feature maps extracted by different convolution layers is different, and considering that under the condition of large-scale shielding, the feature information of a target pedestrian is difficult to extract due to the shielding, and more semantic information is required to support. In practical application, the pedestrian target with the extremely small size as the human face detection does not exist, and the requirement on the spatial information is greatly reduced. The semantic information contained in the features of the bottom layer of the shallow neural network is shallow, and the identification capability of the object with a larger scale is insufficient due to the small receptive field; moreover, because the extracted shallow features lack enough semantic information, under the interference of shielding and the like, the performance of the resolution device is greatly reduced and the robustness is insufficient due to the fact that the extraction of the features is more difficult; and the deep neural network layer can extract deeper semantic information and global information, and although part of spatial information is lost, the characteristics of the deep convolutional layer can effectively overcome the problem of insufficient feature extraction caused by occlusion in a complex environment, particularly under the condition of occlusion.
Therefore, in this embodiment, the top convolutional layer (i.e., the high convolutional layer) is selected to be associated with the anchor frame. For example, if the VGG-16 model is selected as the basic architecture and the selected high-level convolution layer is conv5_3, then a pedestrian image to be detected with a size of 1000 × 600 is obtained, and the corresponding feature map size is 60 × 40. In order to realize the detection of pedestrians with different sizes in the image, for each position of the feature map, anchor point frames with 11 different sizes are densely paved: the areas are respectively (32)2,432,582,782,1062,1442,1942,2612,3532,4772,6432) The width-to-height ratio of all anchor points is 0.41 (human body approximate scale), so as to realize pedestrian detection of different sizes in the image.
For the problems of false detection and missed detection caused by occlusion between pedestrians, in the network framework of the pedestrian detection model in the embodiment, a RoI pooling layer in an original Fast R-CNN module in the pedestrian detection model is replaced by a partitioned occlusion sensing RoI pooling unit, and the RoI pooling unit is used for integrating structural information of different positions of a human body, inputting the integrated information into the Fast R-CNN module, and estimating the occlusion state through a small neural network.
Referring to fig. 2, fig. 2 illustrates the main structure of a block occlusion aware ROI pooling unit in this embodiment. As shown in fig. 2, the body region is first divided into five sections, each of which uses the RoI pooling layer to sample the features into a small feature map of fixed size (7 in both width and height). Then, based on the obtained feature maps of the different human body regions, the visibility of each part is estimated using an occlusion processing unit. Referring to fig. 3, fig. 3 is a schematic diagram illustrating a main structure of an occlusion processing unit for block occlusion sensing according to an embodiment of the present invention, and as shown in fig. 3, the occlusion processing unit is composed of three convolution layers followed by a softmax layer, and performs parameter training on the occlusion processing unit by using a log loss function. In particular, assume ci,jThe jth part, o, representing the ith candidate windowi,jIndicating visibility of corresponding predictionsThe method comprises the following steps of dividing,
Figure BDA0001644041840000101
is the true visibility score for the corresponding calibration. If c isi,jIs more than half visible, then
Figure BDA0001644041840000102
Otherwise it is 0. Mathematically, i.e. if ci,jThe intersection and parallel ratio between the calibration window and the corresponding calibration window is greater than or equal to 0.5
Figure BDA0001644041840000111
Otherwise it is 0. Formula (1) shows a formula for scoring the visibility of each part based on the occlusion processing unit,
Figure BDA0001644041840000112
wherein Ω (. cndot.) is an area calculation function, U (c)i,j) Is ci,jThe area of (a) is,
Figure BDA0001644041840000113
is composed of
Figure BDA0001644041840000114
Theta is the set intersection ratio threshold, here set to 0.5, indicating that more than half of the portion is visible, then
Figure BDA0001644041840000115
Otherwise it is 0. Therefore, the present embodiment defines the loss function of the occlusion handling unit according to equation (2):
Figure BDA0001644041840000116
wherein i is the reference number of the anchor frame, tiFor the coordinates corresponding to the predicted pedestrian at the ith anchor point frame,
Figure BDA0001644041840000117
and the calibration coordinates of the object associated with the ith anchor point frame.
And then, performing dot product operation on the feature map of each human body part and the corresponding prediction visibility to obtain the final feature, wherein the feature dimension is 512 multiplied by 7. Finally, the characteristic graphs of the five parts of the human body are added one by one according to elements and are used for the classification and window regression of the Fast R-CNN module.
Further, in the pedestrian detection method shown in fig. 1 in this embodiment, the pedestrian detection model may be subjected to network training according to a preset training image, so as to obtain the pedestrian detection model meeting a preset convergence condition.
Specifically, in this embodiment, the network training may be performed on the pedestrian detection model according to the following steps:
step S201: and carrying out data augmentation processing on the preset training image to obtain a training sample.
In this embodiment, the data amplification processing on the training image may include color dithering operation, random cropping operation, horizontal flipping operation, and scale transformation operation:
firstly, performing color dithering operation on a training image, specifically: parameters such as brightness, contrast, and saturation of the training image were randomly adjusted with a probability of 0.6, respectively.
Secondly, random clipping operation is carried out on the training image after the color dithering operation, and the method specifically comprises the following steps: 6 sub-images of squares are randomly cropped. Wherein, 1 sub-image is the largest square sub-image in the training image, and the side lengths of the other 4 sub-images are 0.4-1.0 times of the short side of the training image. And randomly selecting 1 sub-image from 5 sub-images as a final training sample.
Thirdly, carrying out horizontal turning operation on the selected training sample, specifically: the horizontal flipping operation can be performed randomly with a probability of 0.6.
Finally, carrying out scale transformation operation on the training sample after the horizontal overturning operation, specifically comprising the following steps: the training samples are scaled to 1000 x 600 images.
In the embodiment, the color dithering operation, the random clipping operation, the horizontal turning operation and the scale transformation operation are sequentially performed on the training image, so that the data volume can be increased under the condition of not changing the image category, and the generalization capability of the model can be improved.
Step S202: matching the anchor point frame with a pedestrian marking frame in the training sample, and dividing the anchor point frame into a positive sample and a negative sample according to a matching result; wherein, the positive sample is the anchor point frame matched with the pedestrian mark frame, and the negative sample is the anchor point frame unmatched with the pedestrian mark frame.
Specifically, in order to solve the problem that a part of pedestrians cannot match enough anchor points under the existing matching strategy, the invention adopts a certain compensation strategy for the marking frame. The steps of matching the anchor point frame with the pedestrian marking frame in the training sample are as follows:
firstly, calculating the intersection and superposition ratio of each anchor point frame and each pedestrian marking frame;
secondly, selecting the anchor point frame which is intersected with each pedestrian marking frame and has the largest overlapping ratio, and matching each selected anchor point frame with each corresponding pedestrian marking frame;
thirdly, after the selected anchor point frame is removed, judging whether the intersection and superposition ratio of the rest anchor point frames and each pedestrian marking frame is larger than a preset first threshold value: if so, matching; in this embodiment, the first threshold is 0.4, and it should be noted that the average value of the number of anchor blocks matched by all pedestrian labeling blocks matched with enough anchor blocks is Np
Thirdly, acquiring pedestrian marking frames with the matching number of the anchor frames smaller than the preset second number, and selecting all anchor frames with the intersection and superposition ratio of each pedestrian marking frame larger than the preset second threshold; the preset first threshold value is larger than a preset second threshold value; in this embodiment, the step is a scale compensation operation for missing and missing repair, the second threshold is set to 0.1, and for pedestrian labeling frames that are not matched with enough anchor frames, all anchor frames that are intersected with the pedestrian labeling frame and have a superposition ratio greater than 0.1 are selected. Equation (3) shows that all anchor box sequences with an intersection and overlap ratio greater than 0.1:
[a1,a2,a3,...,aN] (3)
wherein, aNIncluding the location and size of the anchor box.
Finally, according to the sequence of the intersection and the superposition of all the selected anchor point frames from large to small, selecting a preset third number of anchor point frames to match with the corresponding pedestrian marking frames; in the implementation, according to the intersection ratio of the pedestrian mark frames and the pedestrian mark frames, the pedestrian mark frames are sorted in a descending order according to a formula (4),
[A1,A2,A3,...,AN] (4)
finally, select the first NpAnd the anchor point frame is used as the anchor point frame matched with the pedestrian marking frame. Wherein N ispAnd setting the average matching number of the pedestrian labeling boxes as an adjustable parameter by default.
And the value of the preset third quantity is the average matching quantity of the anchor points of the human face labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
Step S203: selecting a preset first number of negative samples by adopting a difficult negative sample mining method
Specifically, for all negative samples, calculating error values brought by classification prediction of the negative samples, performing descending sorting according to the error values, selecting a batch of negative samples with the largest error values as the negative samples of the training data set, and discarding all the other negative samples to ensure that the quantity ratio of the positive samples to the negative samples is 1: 3. Therefore, the positive and negative samples have a relatively balanced quantitative relation, which is beneficial to the smooth network training.
Step S204: calculating a loss function value according to the positive sample and the selected negative sample, and updating the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
In particular, to reduce false detection problems caused by mutual occlusion between adjacent pedestrians, it is required that the candidate window should be closer to the pedestrian location associated therewith as calibrated in the data set. The traditional Faster R-CNN detection framework consists of two parts, namely a Regional Proposal Network (RPN) module and a Fast R-CNN module. The former is used to generate high quality candidate windows, while the latter is used to perform object classification and regression calculations on these candidate windows to better locate the object.
For the false detection problem caused by the occlusion of the adjacent pedestrian, in this embodiment, the loss function of the regional recommendation network (RPN) module is adjusted and redefined, and the loss function of the regional recommendation network (RPN) module is as shown in formula (5):
Figure BDA0001644041840000131
wherein i is the anchor box label, piAnd tiThe prediction probability of the pedestrian and the corresponding prediction coordinate of the pedestrian are set as the ith anchor point frame;
Figure BDA0001644041840000132
and
Figure BDA0001644041840000133
labeling the object class associated with the ith anchor point frame and the corresponding calibration coordinates (here, a binary problem, the pedestrian class is 1, and the background class is 0); alpha is alpha1To introduce the first hyperparameter, the two loss functions are weight adjusted.
Figure BDA0001644041840000141
To classify the pedestrian as a function of loss
Figure BDA0001644041840000142
As a function of polymerization loss.
The classification loss is estimated using a log loss function, which is defined as equation (6):
Figure BDA0001644041840000143
wherein the content of the first and second substances,Nclsthe total number of anchor boxes in the classification process.
In order to enable the RPN module to generate the correct candidate window more efficiently, the present invention introduces a new penalty function, called aggregation penalty function (aggregation loss), into the RPN module. The loss function not only enables the candidate windows to more accurately locate the annotation positions of the pedestrians associated with the candidate windows, but also reduces the distance between the candidate windows associated with the same pedestrian. The definition of the aggregation loss function is shown in equation (7):
Figure BDA0001644041840000144
wherein the content of the first and second substances,
Figure BDA0001644041840000145
the regression loss function is used for constraining the candidate window to enable the candidate window to be closer to a target calibration window; while
Figure BDA0001644041840000146
The candidate window is constrained to more compactly position the position of the target calibration object for a compactness loss function; β is a second hyperparameter for adjusting the weights of the two loss functions.
The invention uses a smooth L1Loss function definition regression loss function
Figure BDA0001644041840000147
For measuring the accuracy of the predicted detection window, as shown in equation (8):
Figure BDA0001644041840000148
wherein N isregThe total number of anchor boxes for the regression phase,
Figure BDA0001644041840000149
is about the predicted detection window tiL of1Loss value of the loss function.
Compactness loss function
Figure BDA00016440418400001410
For evaluating the confidence level of all candidate windows associated with the same labeled pedestrian. In particular, assume that
Figure BDA00016440418400001411
The pedestrian calibration window is a calibrated pedestrian sequence, and the pedestrian calibration windows are provided with anchor point frames associated with the pedestrian calibration window, namely at least one anchor point frame is intersected with the calibration window; { phi1,...,ΦpThe anchor point frame associated with the marked pedestrian window is marked with a sequence, i.e. for the mark phikThe anchor block of (1) is referred to by the reference numeral
Figure BDA00016440418400001412
Is associated with the pedestrian. Here, smooth L is used1The loss function measures the error between the predicted position information of the anchor frame and the actually calibrated position information, and is used for describing the compactness between the predicted detection window and the actually calibrated window, and the specific form of the compactness loss function is shown as formula (9):
Figure BDA0001644041840000151
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiL is the total number of anchor points associated with the ith calibration pedestrian, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
Meanwhile, in order to further improve the accuracy of window regression and strengthen the pedestrian detection capability of the model for the shielded environment, the invention also introduces an aggregation loss term into the loss function of the Fast R-CNN module, and the loss function is shown as a formula (10):
Figure BDA0001644041840000152
wherein alpha is3Is the third hyperparameter and lambda is the fourth hyperparameter, and is a classification loss function
Figure BDA0001644041840000153
And the aggregate loss function
Figure BDA0001644041840000154
Is as defined in the RPN network,
Figure BDA0001644041840000155
the loss function is processed for occlusion as shown in equation (2). By introducing the aggregation loss item into the RPN module and the Fast R-CNN module of the pedestrian detector, the positioning capability of the detection window can be enhanced, and the overall detection performance is improved.
And then, iteratively updating the network parameters by using a random gradient descent method and back propagation errors until the training converges or the set maximum training times is reached to obtain the final network model parameters.
And in the testing stage, inputting the testing image into the trained network model for pedestrian detection, and outputting a detection result frame. Since the number of output detection frames is very large, most detection frames are screened out firstly by a confidence threshold value T of 0.05, and then the top N is selected according to the confidencea400 detection frames. Then, using a non-maximum value inhibition method to remove repeated detection frames, and selecting the top N according to the confidence coefficientbAnd (5) obtaining the final detection result by 200 detection frames.
Aiming at the pedestrian detection problem in a large-scale shielding environment, the accuracy of pedestrian detection is improved by introducing the shielding perception R-CNN model. Specifically, the invention designs a new aggregation loss function to reduce the false detection problem caused by the overlapping between adjacent pedestrians, and enables the candidate window to be positioned to the target pedestrian position more compactly and accurately; meanwhile, in order to solve the detection problem caused by occlusion, the invention designs a partitioned occlusion perception RoI pooling unit to replace an ROI pooling layer used in the traditional Fast R-CNN, and the pooling unit reduces the influence of occlusion on pedestrian detection by integrating visibility predicted values of different parts of a human body. When the convolutional neural network is trained, the pedestrian marking frame and the anchor point frame need to be matched, but under the existing matching strategy, the pedestrian marking frame with a certain scale cannot be matched with enough anchor point frames. Finally, the invention realizes the pedestrian detection method based on the blocking shielding perception, can efficiently and accurately detect the pedestrian in the image, and particularly remarkably improves the pedestrian detection capability in a large-scale shielding environment.
The invention further provides a pedestrian detection system based on block occlusion perception, and referring to fig. 4, fig. 4 exemplarily shows a schematic diagram of a pedestrian detection system based on block occlusion perception in the embodiment, and as shown in fig. 4, the system includes:
the image characteristic acquisition module is configured to acquire image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
the image feature fusion module is configured to perform feature fusion on the image features acquired by the image feature acquisition module to obtain the overall features of the corresponding pedestrians;
the detection result frame acquisition module is configured to acquire a plurality of detection result frames of the image of the pedestrian to be detected according to the overall characteristics acquired by the image characteristic fusion module;
a detection result frame screening module configured to select a detection result frame satisfying a preset screening condition among the plurality of obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network.
In a preferred embodiment of the above pedestrian detection system based on blocking occlusion perception, the pedestrian detection system further includes a model training module, and the model training module includes:
the training image processing unit is configured to perform data amplification processing on a preset training image to obtain a training sample;
the positive and negative sample dividing unit is configured to match the anchor point frame with a pedestrian marking frame in the training sample and divide the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
the negative sample screening unit is configured to select a preset first number of negative samples by adopting a difficult negative sample mining method;
a network updating unit configured to calculate a loss function value according to the positive sample and the selected negative sample, and update the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
In the above preferred embodiment of the pedestrian detection system based on block occlusion perception, the Faster R-CNN neural network comprises an RPN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown in a formula (11), carrying out network training on the RPN module:
Figure BDA0001644041840000171
wherein the content of the first and second substances,
Figure BDA0001644041840000172
in order to classify the loss function for the pedestrian,
Figure BDA0001644041840000173
for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;
Figure BDA0001644041840000174
and
Figure BDA0001644041840000175
respectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is shown in equation (12):
Figure BDA0001644041840000176
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the polymerization loss function is shown in equation (13):
Figure BDA0001644041840000177
wherein the content of the first and second substances,
Figure BDA0001644041840000178
in order to be a function of the regression loss,
Figure BDA0001644041840000179
β is a second hyperparameter, which is a compactness loss function;
the regression loss function is shown in equation (14):
Figure BDA00016440418400001710
wherein N isregThe total number of anchor boxes for the regression phase,
Figure BDA00016440418400001711
is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is shown in equation (15):
Figure BDA00016440418400001712
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
In the above preferred embodiment of the pedestrian detection system based on block occlusion perception, the Faster R-CNN neural network comprises a Fast R-CNN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown in a formula (16), performing network training on a Fast R-CNN module:
Figure BDA0001644041840000181
wherein the content of the first and second substances,
Figure BDA0001644041840000182
in order to classify the loss function for the pedestrian,
Figure BDA0001644041840000183
in order to be a function of the polymerization loss,
Figure BDA0001644041840000184
for occlusion handling loss functions, i denotes the anchor box label, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;
Figure BDA0001644041840000185
and
Figure BDA0001644041840000186
respectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha3Is a third hyperparameter, and lambda is a fourth hyperparameter;
the pedestrian classification loss function is shown in equation (17):
Figure BDA0001644041840000187
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the polymerization loss function is shown in equation (18):
Figure BDA0001644041840000188
wherein the content of the first and second substances,
Figure BDA0001644041840000189
in order to be a function of the regression loss,
Figure BDA00016440418400001810
β is a second hyperparameter, which is a compactness loss function;
the regression loss function is shown in equation (19):
Figure BDA00016440418400001811
wherein N isregThe total number of anchor boxes for the regression phase,
Figure BDA00016440418400001812
is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is shown in equation (20):
Figure BDA00016440418400001813
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
In a preferred embodiment of the above pedestrian detection system based on block occlusion perception, the positive and negative sample dividing unit includes:
an intersection and overlap ratio calculation subunit configured to calculate an intersection and overlap ratio of each anchor point frame and each pedestrian labeling frame;
the first matching subunit is configured to select the anchor point frame which is subjected to intersection with each pedestrian marking frame and has the largest overlapping ratio, and match each selected anchor point frame with each corresponding face marking frame;
a second matching subunit, configured to determine whether, after removing the selected anchor point frame, the intersection and superposition ratio of the remaining anchor point frames to each pedestrian labeling frame is greater than a preset first threshold: if so, matching;
the third matching subunit is configured to acquire the face labeling frames of which the matching number of the anchor frames is less than a preset second number, and select all the anchor frames which are subjected to intersection with each pedestrian labeling frame and have the superposition ratio greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
the fourth matching subunit is configured to select a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; the value of the preset third number is the average matching number of the anchor points of the human face labeling frames of which the matching number is greater than or equal to the preset second number.
Those of skill in the art will appreciate that the various illustrative systems and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (8)

1. A pedestrian detection method based on block occlusion perception is characterized by comprising the following steps:
acquiring image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
performing feature fusion on the acquired image features to obtain the overall features of the corresponding pedestrians;
acquiring a plurality of detection result frames of the to-be-detected pedestrian image according to the overall characteristics;
selecting a detection result frame which meets a preset screening condition from the obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network;
the Faster R-CNN neural network comprises an RPN module, and the method further comprises the step of carrying out network training on the RPN module based on a preset training image and according to a loss function shown in the following formula:
Figure FDA0002833489370000011
wherein the content of the first and second substances,
Figure FDA0002833489370000012
in order to classify the loss function for the pedestrian,
Figure FDA0002833489370000013
for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;
Figure FDA0002833489370000014
and
Figure FDA0002833489370000015
respectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is:
Figure FDA0002833489370000016
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
Figure FDA0002833489370000017
wherein the content of the first and second substances,
Figure FDA0002833489370000018
in order to be a function of the regression loss,
Figure FDA0002833489370000019
β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
Figure FDA00028334893700000110
wherein N isregThe total number of anchor boxes for the regression phase,
Figure FDA00028334893700000111
is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
Figure FDA0002833489370000021
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
2. The pedestrian detection method based on blocking and blocking perception according to claim 1, wherein before acquiring image features corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image, the method further comprises:
performing data amplification processing on a preset training image to obtain a training sample;
matching the anchor point frame with a pedestrian marking frame in the training sample, and dividing the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
selecting a preset first number of negative samples by adopting a difficult negative sample mining method;
calculating a loss function value according to the positive sample and the selected negative sample, and updating the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
3. The pedestrian detection method based on block occlusion perception according to claim 2, wherein the Faster R-CNN neural network further comprises a Fast R-CNN module; before "based on a pre-constructed pedestrian detection model, and according to a to-be-detected pedestrian image, acquiring image features corresponding to each preset human body detection area", the method further includes:
based on a preset training image, performing network training on the Fast R-CNN module according to a loss function shown in the following formula:
Figure FDA0002833489370000022
wherein the content of the first and second substances,
Figure FDA0002833489370000023
for occlusion handling of the loss function, α3Is a third hyperparameter, λ is the secondAnd (4) four-over parameters.
4. The pedestrian detection method based on block occlusion perception according to claim 2 or 3, wherein the step of matching the anchor point frame with the pedestrian labeling frame in the training sample specifically comprises:
calculating the intersection, division and superposition ratio of each anchor point frame and each pedestrian marking frame;
selecting an anchor point frame which is intersected with each pedestrian marking frame and has the largest overlapping ratio, and matching each selected anchor point frame with each corresponding human face marking frame;
judging whether the intersection and superposition ratio of the rest anchor frames and each pedestrian marking frame is greater than a preset first threshold value or not after the selected anchor frames are removed: if so, matching;
acquiring human face labeling frames of which the matching number of the anchor frames is less than a preset second number, and selecting all anchor frames which are subjected to intersection and superposition with each pedestrian labeling frame and have a superposition ratio greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
selecting a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; and the value of the preset third quantity is the average matching quantity of the anchor points of the pedestrian labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
5. A pedestrian detection system based on blocking shielding perception is characterized by comprising:
the image characteristic acquisition module is configured to acquire image characteristics corresponding to each preset human body detection area based on a pre-constructed pedestrian detection model and according to a to-be-detected pedestrian image;
the image feature fusion module is configured to perform feature fusion on the image features acquired by the image feature acquisition module to obtain the overall features of the corresponding pedestrians;
the detection result frame acquisition module is configured to acquire a plurality of detection result frames of the to-be-detected pedestrian image according to the overall characteristics obtained by the image characteristic fusion module;
a detection result frame screening module configured to select a detection result frame satisfying a preset screening condition among the plurality of obtained detection result frames;
the pedestrian detection model is a model constructed based on an Faster R-CNN neural network, and an anchor point frame is associated in a high convolution layer of the Faster R-CNN neural network;
the system also comprises a model training module, wherein the Faster R-CNN neural network comprises an RPN module; in this case, the model training module is configured to perform the following operations:
based on a preset training image and according to a loss function shown in the following formula, performing network training on the RPN module:
Figure FDA0002833489370000041
wherein the content of the first and second substances,
Figure FDA0002833489370000042
in order to classify the loss function for the pedestrian,
Figure FDA0002833489370000043
for the aggregate loss function, i denotes the anchor box index, piAnd tiRespectively representing the prediction probability that the ith anchor point frame is a pedestrian and the prediction coordinate corresponding to the pedestrian;
Figure FDA0002833489370000044
and
Figure FDA0002833489370000045
respectively representing the object class label associated with the ith anchor point box and the corresponding calibration coordinates, alpha1Is a first hyperparameter;
the pedestrian classification loss function is:
Figure FDA0002833489370000046
wherein N isclsThe total number of anchor frames in the RPN module classification process is obtained;
the aggregate loss function:
Figure FDA0002833489370000047
wherein the content of the first and second substances,
Figure FDA0002833489370000048
in order to be a function of the regression loss,
Figure FDA0002833489370000049
β is a second hyperparameter, which is a compactness loss function;
the regression loss function is:
Figure FDA00028334893700000410
wherein N isregThe total number of anchor boxes for the regression phase,
Figure FDA00028334893700000411
is about the predicted detection window tiL of1A loss value of a loss function;
the compactness loss function is:
Figure FDA00028334893700000412
wherein N iscomFor the total number of pedestrians that intersect the anchor frame, | ΦiI is the total number of anchor points associated with the ith calibrated pedestrian, j is the anchor point mark number, tjMarking the coordinates corresponding to the predicted jth anchor point frame pedestrian, and p is the anchor point frame mark serial number associated with the calibrated pedestrian window, phipAnchor boxes associated with the calibrated pedestrian window are marked.
6. The pedestrian detection system based on block occlusion perception according to claim 5, wherein the model training module comprises:
the training image processing unit is configured to perform data amplification processing on a preset training image to obtain a training sample;
the positive and negative sample dividing unit is configured to match the anchor point frame with a pedestrian marking frame in the training sample and divide the anchor point frame into a positive sample and a negative sample according to a matching result; the positive sample is an anchor point frame matched with the pedestrian marking frame, and the negative sample is an anchor point frame not matched with the pedestrian marking frame;
the negative sample screening unit is configured to select a preset first number of negative samples by adopting a difficult negative sample mining method;
a network updating unit configured to calculate a loss function value according to the positive sample and the selected negative sample, and update the Faster R-CNN neural network according to the loss function value; and (4) network training is carried out again on the updated Faster R-CNN neural network until the updated Faster R-CNN neural network meets the preset convergence condition.
7. The block occlusion perception-based pedestrian detection system of claim 6, wherein the Faster R-CNN neural network comprises a Fast R-CNN module; in this case, the model training module is further configured to perform the following operations:
based on a preset training image and according to a loss function shown as the following formula, performing network training on the Fast R-CNN module:
Figure FDA0002833489370000051
wherein the content of the first and second substances,
Figure FDA0002833489370000052
processing the loss function α for occlusion3Is the third hyperparameter, and lambda is the fourth hyperparameter.
8. The pedestrian detection system based on block occlusion perception according to claim 6 or 7, wherein the positive and negative sample division unit comprises:
an intersection and overlap ratio calculation subunit configured to calculate an intersection and overlap ratio of each anchor point frame and each pedestrian labeling frame;
the first matching subunit is configured to select the anchor point frame which is subjected to intersection with each pedestrian marking frame and has the largest overlapping ratio, and match each selected anchor point frame with each corresponding face marking frame;
a second matching subunit, configured to determine whether, after removing the selected anchor point frame, the intersection and superposition ratio of the remaining anchor point frames to each pedestrian labeling frame is greater than a preset first threshold: if so, matching;
the third matching subunit is configured to acquire the face labeling frames of which the matching number of the anchor frames is less than a preset second number, and select all the anchor frames of which the intersection and superposition ratio with each pedestrian labeling frame is greater than a preset second threshold; the preset first threshold is larger than a preset second threshold;
the fourth matching subunit is configured to select a preset third number of anchor frames to match with the corresponding pedestrian marking frames according to the sequence of the intersection and the superposition of all the selected anchor frames from large to small; and the value of the preset third quantity is the average matching quantity of the anchor points of the human face labeling frames of which the matching quantity is greater than or equal to the preset second quantity.
CN201810393658.1A 2018-04-27 2018-04-27 Pedestrian detection method and system based on blocking and shielding perception Active CN108898047B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810393658.1A CN108898047B (en) 2018-04-27 2018-04-27 Pedestrian detection method and system based on blocking and shielding perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810393658.1A CN108898047B (en) 2018-04-27 2018-04-27 Pedestrian detection method and system based on blocking and shielding perception

Publications (2)

Publication Number Publication Date
CN108898047A CN108898047A (en) 2018-11-27
CN108898047B true CN108898047B (en) 2021-03-19

Family

ID=64342527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810393658.1A Active CN108898047B (en) 2018-04-27 2018-04-27 Pedestrian detection method and system based on blocking and shielding perception

Country Status (1)

Country Link
CN (1) CN108898047B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583512B (en) * 2018-12-14 2021-05-25 北京旷视科技有限公司 Image processing method, device and system
CN109766796B (en) * 2018-12-20 2023-04-18 西华大学 Deep pedestrian detection method for dense crowd
CN110222764B (en) * 2019-06-10 2021-06-18 中南民族大学 Method, system, device and storage medium for detecting occluded target
CN110222657B (en) * 2019-06-11 2021-07-20 中国科学院自动化研究所 Single-step face detector optimization system, method and device
CN112307826A (en) * 2019-07-30 2021-02-02 华为技术有限公司 Pedestrian detection method, device, computer-readable storage medium and chip
CN110532985B (en) * 2019-09-02 2022-07-22 北京迈格威科技有限公司 Target detection method, device and system
CN110796071B (en) * 2019-10-28 2021-02-19 广州云从博衍智能科技有限公司 Behavior detection method, system, machine-readable medium and device
CN110796069B (en) * 2019-10-28 2021-02-05 广州云从博衍智能科技有限公司 Behavior detection method, system, equipment and machine readable medium
CN111144203B (en) * 2019-11-19 2023-06-16 浙江工商大学 Pedestrian shielding detection method based on deep learning
CN110880177A (en) * 2019-11-26 2020-03-13 北京推想科技有限公司 Image identification method and device
CN110796127A (en) * 2020-01-06 2020-02-14 四川通信科研规划设计有限责任公司 Embryo prokaryotic detection system based on occlusion sensing, storage medium and terminal
CN111832515A (en) * 2020-07-21 2020-10-27 上海有个机器人有限公司 Dense pedestrian detection method, medium, terminal and device
CN112465799A (en) * 2020-12-09 2021-03-09 南京甄视智能科技有限公司 Optimization of object detector and object detection
CN112528995B (en) * 2020-12-22 2023-08-04 北京百度网讯科技有限公司 Method for training target detection model, target detection method and device
CN112906732B (en) * 2020-12-31 2023-12-15 杭州旷云金智科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN113569726B (en) * 2021-07-27 2023-04-14 湖南大学 Pedestrian detection method combining automatic data amplification and loss function search
CN114550221B (en) * 2022-04-22 2022-07-22 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354547A (en) * 2015-10-30 2016-02-24 河海大学 Pedestrian detection method in combination of texture and color features
CN106022237A (en) * 2016-05-13 2016-10-12 电子科技大学 Pedestrian detection method based on end-to-end convolutional neural network
CN106599939A (en) * 2016-12-30 2017-04-26 深圳市唯特视科技有限公司 Real-time target detection method based on region convolutional neural network
CN107358182A (en) * 2017-06-29 2017-11-17 维拓智能科技(深圳)有限公司 Pedestrian detection method and terminal device
CN107403141A (en) * 2017-07-05 2017-11-28 中国科学院自动化研究所 Method for detecting human face and device, computer-readable recording medium, equipment
CN107463892A (en) * 2017-07-27 2017-12-12 北京大学深圳研究生院 Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics
CN107679250A (en) * 2017-11-01 2018-02-09 浙江工业大学 A kind of multitask layered image search method based on depth own coding convolutional neural networks
CN107730881A (en) * 2017-06-13 2018-02-23 银江股份有限公司 Traffic congestion vision detection system based on depth convolutional neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9858496B2 (en) * 2016-01-20 2018-01-02 Microsoft Technology Licensing, Llc Object detection and classification in images

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354547A (en) * 2015-10-30 2016-02-24 河海大学 Pedestrian detection method in combination of texture and color features
CN106022237A (en) * 2016-05-13 2016-10-12 电子科技大学 Pedestrian detection method based on end-to-end convolutional neural network
CN106599939A (en) * 2016-12-30 2017-04-26 深圳市唯特视科技有限公司 Real-time target detection method based on region convolutional neural network
CN107730881A (en) * 2017-06-13 2018-02-23 银江股份有限公司 Traffic congestion vision detection system based on depth convolutional neural networks
CN107358182A (en) * 2017-06-29 2017-11-17 维拓智能科技(深圳)有限公司 Pedestrian detection method and terminal device
CN107403141A (en) * 2017-07-05 2017-11-28 中国科学院自动化研究所 Method for detecting human face and device, computer-readable recording medium, equipment
CN107463892A (en) * 2017-07-27 2017-12-12 北京大学深圳研究生院 Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics
CN107679250A (en) * 2017-11-01 2018-02-09 浙江工业大学 A kind of multitask layered image search method based on depth own coding convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks;Shaoqing Ren等;《arXiv:1506.01497v3》;20160107;第1-14页 *
基于新型三元卷积神经网络的行人再辨识算法;朱建清等;《电子与信息学报》;20180419;第40卷(第4期);第1012-1016页 *

Also Published As

Publication number Publication date
CN108898047A (en) 2018-11-27

Similar Documents

Publication Publication Date Title
CN108898047B (en) Pedestrian detection method and system based on blocking and shielding perception
CN110084292B (en) Target detection method based on DenseNet and multi-scale feature fusion
CN106960195B (en) Crowd counting method and device based on deep learning
CN113160192B (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN110738101B (en) Behavior recognition method, behavior recognition device and computer-readable storage medium
CN108121984B (en) Character recognition method and device
CN103699905B (en) Method and device for positioning license plate
CN108830188A (en) Vehicle checking method based on deep learning
CN110781836A (en) Human body recognition method and device, computer equipment and storage medium
CN109977997B (en) Image target detection and segmentation method based on convolutional neural network rapid robustness
CN104615986B (en) The method that pedestrian detection is carried out to the video image of scene changes using multi-detector
CN111046856B (en) Parallel pose tracking and map creating method based on dynamic and static feature extraction
CN107507170A (en) A kind of airfield runway crack detection method based on multi-scale image information fusion
CN106780727B (en) Vehicle head detection model reconstruction method and device
CN108537286A (en) A kind of accurate recognition methods of complex target based on key area detection
CN111126393A (en) Vehicle appearance refitting judgment method and device, computer equipment and storage medium
CN104463240B (en) A kind of instrument localization method and device
CN108229524A (en) A kind of chimney and condensing tower detection method based on remote sensing images
CN108009556A (en) A kind of floater in river detection method based on fixed point graphical analysis
CN114821102A (en) Intensive citrus quantity detection method, equipment, storage medium and device
CN108133235A (en) A kind of pedestrian detection method based on neural network Analysis On Multi-scale Features figure
CN105740751A (en) Object detection and identification method and system
CN111008576A (en) Pedestrian detection and model training and updating method, device and readable storage medium thereof
CN106845458A (en) A kind of rapid transit label detection method of the learning machine that transfinited based on core
CN112541372B (en) Difficult sample screening method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant