CN112613472B - Pedestrian detection method and system based on deep search matching - Google Patents

Pedestrian detection method and system based on deep search matching Download PDF

Info

Publication number
CN112613472B
CN112613472B CN202011629766.8A CN202011629766A CN112613472B CN 112613472 B CN112613472 B CN 112613472B CN 202011629766 A CN202011629766 A CN 202011629766A CN 112613472 B CN112613472 B CN 112613472B
Authority
CN
China
Prior art keywords
target candidate
matching
box
depth
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011629766.8A
Other languages
Chinese (zh)
Other versions
CN112613472A (en
Inventor
张重阳
罗艳
孙军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202011629766.8A priority Critical patent/CN112613472B/en
Publication of CN112613472A publication Critical patent/CN112613472A/en
Application granted granted Critical
Publication of CN112613472B publication Critical patent/CN112613472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a pedestrian detection method and system based on deep search matching, wherein the method comprises the following steps: generating a target candidate frame for the original image based on the area generation network; calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image: matching a certain number of target candidate boxes for each truth box in sequence by a search matching algorithm by utilizing matching loss; the matched target candidate frame in the original image is subjected to a regional pooling layer to obtain corresponding characteristics; and calculating the classification score and the regression position of the features to obtain a final detection result, namely the pedestrian target needing to be detected in the original image. The system comprises: the device comprises a region generation network module, a depth loss estimation module, a search matching module, a region pooling module and a classification regression module. The invention is more suitable for the real complex and changeable environment, and effectively improves the detection capability of the current pedestrian detector.

Description

Pedestrian detection method and system based on deep search matching
Technical Field
The invention relates to the technical field of target detection, in particular to a pedestrian detection method and system based on deep search matching.
Background
The advent of the big data era pushes the continuous updating and development of the computer technology, and the pedestrian detection technology as a research hotspot in the field of computer vision shows important application value in the fields of intelligent video monitoring, intelligent transportation and the like. The existing pedestrian detection algorithm has the following difficulties and challenges, and the detection result still needs to be improved: due to the shooting distance, the image is large, but the size of the target pedestrian is small, the characteristics of the target area are few after the reduction through the deep learning convolutional neural network, and effective detection and identification are difficult to perform; due to the fixity of the shooting angle, the body of the pedestrian has the problem of partial shielding, and useful information in the detection process is relatively reduced, so that missing detection is caused.
At present, mature pedestrian detection methods can be basically divided into two categories: (1) modeling based on the background. The method is mainly used for detecting the moving target in the video: the method comprises the steps of carrying out scene segmentation on an input static image, segmenting the foreground and the background of the input static image by using methods such as a Gaussian Mixture Model (GMM) or motion detection and the like, and extracting a specific moving object from the foreground. Such methods require a continuous sequence of images to achieve modeling and are not suitable for target detection in a single image. (2) Learning based on statistics. That is, all images known to belong to pedestrian targets are collected to form a training set, and features are extracted from the images of the training set based on a manually designed algorithm (such as HOG, Harr and the like). The extracted features are generally information of gray scale, texture, gradient histogram, edge, etc. of the target. And then constructing a pedestrian detection classifier according to the feature library of a large number of training samples. The classifier can generally use models such as SVM, Adaboost and neural network.
In general, in recent years, the target detection algorithm based on statistical learning is superior in performance, and can be divided into a traditional artificial characteristic target detection algorithm and a depth characteristic machine learning target detection algorithm.
The traditional artificial characteristic target detection algorithm mainly refers to that the target detection modeling is carried out by utilizing the characteristics of artificial design. The characteristic algorithm of artificial design which has excellent performance in recent years mainly comprises the following steps: DPM (Deformable Part model) algorithm (Object detection with discrete transformed Part-based models) proposed by Pedro F.Felzenszwalb et al in 2010. ICF (Integrated Channel features) proposed in Piotr Doll-r et al 2009, ACF algorithm (Fast Feature pyramides for Object Detection) proposed in 2014. An Informed Harr method (Informed Haar-like Features improved decision Detection) proposed by Shanshan Zhang et al in 2014 aims to extract Harr Features with more characteristic information for training. Although the artificially designed features have certain effects, the detection precision is still not high due to insufficient characterization capability of the artificial features. Because of the stronger feature learning and expression ability of the deep convolutional neural network model, the deep convolutional neural network model is more and more widely and successfully applied to the aspect of pedestrian detection. The basic pedestrian detection operator is the R-CNN (Region-conditional Neural Network) model. In 2014, Girshick et al proposed RCNN for the detection of a general target, and then proposed Fast RCNN and Fast RCNN, which improved the accuracy and speed of a target detection algorithm based on deep learning. The target detection based on the deep learning technology mostly uses the features extracted from the whole candidate frame for classification regression, and still has the problem of insufficient depth feature extraction, especially aiming at the shielded target and the small-size target in the pedestrian, on one hand, due to the invisibility of the part of the shielded target body, the visual features are limited; on the other hand, the characteristic size is smaller because the size of the deep convolutional neural network layer is reduced; these two factors result in low detection accuracy of the pedestrian target and a further reduction in the missing rate.
In 2018, a part annotation method (occupied peer Detection Through Guided Attention in CNNs) proposed by Shanghan Zhang et al aims to extract body part features with more characteristic information for training. However, there still exist some problems, on one hand, this method still uses the features of the whole target candidate frame, does not fully extract the body part features, and in addition, over-emphasizes the body part features or over-emphasizes the features of the whole candidate frame, resulting in the imbalance between the global and local networks, so the detector has not good generalization performance; on the other hand, using a part detector to extract body part features introduces additional labeled body part information, resulting in increased cost.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a pedestrian detection method and system based on deep search matching, which effectively relieve the problem of inconsistency in the matching process of a pedestrian detection frame, are beneficial to training a more robust and accurate pedestrian detector, and especially reduce the false detection rate of pedestrian detection under the shielding condition, so that the method and system are more suitable for the complex and changeable environment, and effectively improve the detection capability of the current pedestrian detector.
In a first aspect of the present invention, a pedestrian detection method based on deep search matching is provided, which includes:
s11: generating a target candidate frame for the original image based on the area generation network;
s12: calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image:
s13: matching a certain number of target candidate boxes for each truth box in sequence by a search matching algorithm by utilizing the matching loss;
s14: the matched target candidate frame in the original image is processed by a region Pooling (RoI Pooling) layer to obtain corresponding characteristics;
s15: and calculating a classification score and a regression position according to the features obtained in the step S14 to obtain a final detection result, namely the pedestrian target needing to be detected in the original image.
Preferably, the depth loss estimation function for calculating the matching loss function in S12 is:
Figure BDA0002879831970000031
wherein, biIndicates the ith target frame candidate, i is 1,2,3 … N, gjDenotes the jth true value box, j 12,3,. M,
Figure BDA0002879831970000032
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search distance of (c),
Figure BDA0002879831970000033
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search depth.
Preferably, the function of the search distance is manhattan distance, specifically:
Figure BDA0002879831970000034
wherein (x)i,yi) Representing target candidate box bi(x) coordinates of the center point of (c)j,yj) Box g for representing true valuejCoordinates of the center point of (a).
Preferably, the function of the search depth is:
Figure BDA0002879831970000035
wherein V ═ { V ═ Vk(bi,gj)|k=1,2,3,…T},vkRepresenting target candidate box biAnd truth box gjThe depth change sum of the matched k-th Manhattan path; the number of elements of the set V is T, and the target candidate box b is representediAnd truth box gjThere are T manhattan paths;
further, the air conditioner is provided with a fan,
depth variation sum v of k-th Manhattan pathk(bi,gj) The depth difference of the front coordinate point and the rear coordinate point along the path specifically comprises the following steps:
Figure BDA0002879831970000036
wherein the content of the first and second substances,
Figure BDA0002879831970000037
representing the slave target candidate box biTo the truth box gjQ is an integer greater than or equal to 1; more specifically, the present invention is to provide a novel,
Figure BDA0002879831970000038
representing coordinates of a starting point of the path, i.e. the target candidate box biA center point of (a);
Figure BDA0002879831970000039
representing path end coordinates, i.e. true value box gjA center point of (a);
Figure BDA00028798319700000310
the depth value representing the position of the qth coordinate point is calculated, more specifically, by a depth estimation network.
Preferably, the step of calculating classification scores and regression positions according to the features obtained in the step S14 in the step S15 further comprises: and carrying out weighted summation and back propagation on the loss of the matching loss, the loss of the search matching algorithm, the classification score and the loss of the regression position to construct an end-to-end training network.
In a second aspect of the present invention, a pedestrian detection system based on deep search matching is provided, which includes: the system comprises a region generation network module, a depth loss estimation module, a search matching module, a region pooling module and a classification regression module; wherein the content of the first and second substances,
the region generation network module is used for generating a target candidate frame for the original image based on a region generation network;
the depth loss estimation module is used for calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image:
the search matching module is used for matching a certain number of target candidate boxes for each truth-value box in sequence through a search matching algorithm by utilizing the matching loss;
the region pooling module is used for enabling the matched target candidate frame in the original image to pass through a region pooling layer to obtain corresponding features;
and the classification regression module is used for calculating a classification score and a regression position according to the features obtained by the region pooling module to obtain a final detection result, namely the final detection result is the pedestrian target needing to be detected in the original image.
Preferably, the method further comprises the following steps: and the detection network model module is used for carrying out weighted summation and back propagation on the losses of the depth loss estimation module, the search matching module and the classification regression module, constructing an end-to-end detection network model and training the detection network model by using the sum of the losses.
Compared with the prior art, the invention has at least one of the following advantages:
(1) according to the pedestrian detection method and system based on depth search matching, the target candidate frames with more consistent characteristics are matched for each true value frame through the depth-based search matching algorithm, so that the method and system are suitable for variable conditions in a practical application environment, the detection robustness is enhanced, the false detection and missed detection probability is reduced, and particularly for small-scale pedestrians with relatively less available information and serious interference of blocked pedestrians and redundant noise, the detection capability of a pedestrian target can be effectively improved;
(2) according to the pedestrian detection method and system based on depth search matching, accurate and efficient detection of a target in an occlusion scene can be well achieved through depth loss estimation and search matching;
(3) according to the pedestrian detection method and system based on deep search matching, the weighted summation and the back propagation are carried out through the loss of the matching loss, the loss of the search matching algorithm, the classification score and the loss of the regression position, end-to-end network training is achieved, and the detection result is more accurate.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings:
FIG. 1 is a flow chart of a pedestrian detection method based on deep search matching according to an embodiment of the present invention;
fig. 2 is a flow chart of a search matching mechanism according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
The existing pedestrian detection method can well identify the pedestrian target which is not seriously shielded, but because the actual application scene is more complicated, the pedestrian occupies a very small part without shielding or with less shielding, and therefore, most of pedestrian detectors have poor detection effect on the shielded target. The pedestrian detection under the complex scene has the following characteristics:
the pedestrian shielding method has the advantages that firstly, pedestrians are shielded frequently. In an actual application scene, the situation that the pedestrian target is partially shielded in the image is inevitable. Most existing algorithms fail because the global structural features of the pedestrian are destroyed. Furthermore, due to the diversity of the occlusion patterns, the performance of the algorithm that is too dependent on the site detector is poor.
And secondly, matching between the target candidate frame of the pedestrian detector and the truth value frame is inconsistent. Especially in an occlusion scene, because the pedestrian real value frames are dense, it is easy for target candidate frames with similar positions (similar features) to be matched with different real value frames. The pedestrian detector is difficult to train under the condition, the pedestrian target is difficult to be accurately positioned, and the false detection rate is increased.
Based on the difficulty in pedestrian detection in reality, the embodiment of the invention provides the pedestrian detection method and the system for carrying out depth search matching on the pedestrian image, the deep features in the CNN network are utilized to extract the target candidate frames, and the depth search matching loss is calculated for each target candidate frame and each truth value frame. By means of the matching loss, more consistent target candidate frames are matched for each truth value frame, so that the network can learn more consistent characteristics, excellent detection performance of conventional pedestrian samples is guaranteed, detection accuracy of shielding samples is improved, and false detection rate is reduced.
Fig. 1 is a flowchart of a pedestrian detection method based on depth search matching according to an embodiment of the present invention.
Referring to fig. 1, the pedestrian detection method based on depth search matching of the present embodiment includes:
s11: generating target candidate frame B ═ { B } for original image based on area generation networki|i=1,2,3,…};
S12: calculating each target candidate frame and each real value frame G-G of the real pedestrian target in the original imagejMatching loss l of 1,2,3, … | j |mat(bi,gj):
S13: matching a certain number of target candidate boxes for each truth box in sequence by a search matching algorithm by utilizing matching loss;
s14: the matched target candidate frame in the original image is subjected to a regional pooling layer to obtain corresponding characteristics;
s15: and calculating a classification score and a regression position according to the features obtained in the step S14 to obtain a final detection result, namely the pedestrian target needing to be detected in the original image.
In this embodiment, the original image in S11 may be subjected to operations such as multilayer convolution to obtain a feature map of the image: the image is passed through a Deep convolution layer (Deep CNN, DCNN) of a convolutional neural network module, such as VGG16 or ResNet, and the input image is subjected to operations such as multilayer convolution to obtain a feature map.
In the preferred embodiment, in S12, the number of elements in the target candidate frame set B is N, and in the entire detection process, the parameter N is set to 512, which indicates that 512 target candidate frames are extracted from the original image; the element number of the truth-value frame set G is M, and the fact that M pedestrian targets really exist in the original image is shown. Of course, other N may be selected in other embodiments. In this step, the RPN module in the fast RCNN network may be utilized to generate a target candidate box for input to the depth loss estimation module.
In the preferred embodiment, the depth loss estimation function for calculating the matching loss function in S12 is:
Figure BDA0002879831970000061
wherein, biIndicates the ith target frame candidate, i is 1,2,3 … N, gjDenotes the jth true value box, j 12,3,. M,
Figure BDA0002879831970000062
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search distance of (c),
Figure BDA0002879831970000063
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search depth.
In a preferred embodiment, the function of the search distance is manhattan distance, which is specifically:
Figure BDA0002879831970000064
wherein (x)i,yi) Representing target candidate box bi(x) coordinates of the center point of (c)j,yj) Box g for representing true valuejCoordinates of the center point of (a).
In a preferred embodiment, the function of the search depth is:
Figure BDA0002879831970000065
wherein V ═ { V ═ Vk(bi,gj)|k=1,2,3,…T},vkRepresenting target candidate box biAnd truth box gjThe depth change sum of the matched k-th Manhattan path; the number of elements of the set V is T, and the target candidate box b is representediAnd truth box gjThere are T manhattan paths;
depth variation sum v of k-th Manhattan pathk(bi,gj) For the depth difference of the front and back coordinate points along this path,the method specifically comprises the following steps:
Figure BDA0002879831970000066
wherein the content of the first and second substances,
Figure BDA0002879831970000067
representing the slave target candidate box biTo the truth box gjThe q-th coordinate point in the path of (1); more specifically, the present invention is to provide a novel,
Figure BDA0002879831970000068
representing coordinates of a starting point of the path, i.e. the target candidate box biA center point of (a);
Figure BDA0002879831970000069
representing path end coordinates, i.e. true value box gjA center point of (a);
Figure BDA00028798319700000610
the depth value representing the position of the qth coordinate point is calculated, more specifically, by a depth estimation network.
As shown in fig. 2, the search for a match in S13 is calculated as:
for each true value box gjAnd the ith target candidate frame biThe depth search match penalty of can be noted as lmat(bi,gj) The loss of deep search matches with all target candidate boxes may be represented as L in the setmat(gj)={lmat(bi,gj) 1,2,3, …, N }; will set Lmat(gj) The elements in (1) are reordered from small to large based on depth search match loss; will be L in the setmat(gj) Front mjOne target candidate box, i.e. m with minimum loss of deep search matchingjMatching the target candidate box to the true value box gjWherein m isjIs true value box gjThe maximum number of matches; after the search matching is completed, if the condition that the same target candidate frame is matched with different true value frames occurs, the target candidate frame is matched with the different true value framesThe target candidate box is matched to the true box with less loss in the deep search matching, and the proper target candidate box is matched again for the rest of the true boxes. True value box gjMaximum number of matches mjCan be expressed as:
mj=rj,j=1,2,3,…,M
wherein r represents such that
Figure BDA0002879831970000071
A largest integer less than N. And sending the matched target candidate frame into a region pooling module to obtain characteristics for classification and regression, sending the characteristics into a classification and regression module, and detecting and positioning the pedestrian target to obtain a detection result, namely the pedestrian target needing to be detected in the image.
In the preferred embodiment, the step of calculating the classification score and the regression position according to the features obtained in step S14 in step S15 further includes: and carrying out weighted summation and back propagation on the loss of the matching loss, the loss of the search matching algorithm, the classification score and the loss of the regression position, and constructing an end-to-end training mode.
In another embodiment of the present invention, a pedestrian detection system based on deep search matching is further provided, and the system is used for implementing the pedestrian detection method in the above embodiments. Specifically, the system comprises: the system comprises a region generation network module, a depth loss estimation module, a search matching module, a region pooling module and a classification regression module; the region generation network module is used for generating a target candidate frame for the original image based on the region generation network; the depth loss estimation module is used for calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image: the search matching module is used for matching a certain number of target candidate boxes for each truth-value box in sequence through a search matching algorithm by utilizing the matching loss; the region pooling module is used for enabling the matched target candidate frame in the original image to pass through a region pooling layer to obtain corresponding features; and the classification regression module is used for calculating a classification score and a regression position according to the features obtained by the region pooling module to obtain a final detection result, namely the pedestrian target needing to be detected in the original image.
In a preferred embodiment, the method further comprises: and the detection network model module is used for carrying out weighted summation and back propagation by using the losses of the depth loss estimation module, the search matching module and the classification regression module, constructing an end-to-end detection network model and training the detection network model by using the sum of the losses.
According to the embodiment of the invention, an end-to-end deep search matching pedestrian detection system is constructed, the consistent characteristics of pedestrians are fully extracted for the target candidate frames with consistent matching characteristics of each truth value frame, and the environmental interference is effectively removed, so that the detection performance of the pedestrian detector in a complex scene is effectively ensured.
In a preferred embodiment, the depth loss estimation module calculates the depth loss estimation function of the matching loss function as:
Figure BDA0002879831970000081
wherein, biIndicates the ith target frame candidate, i is 1,2,3 … N, gjDenotes the jth true value box, j 12,3,. M,
Figure BDA0002879831970000082
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search distance of (c),
Figure BDA0002879831970000083
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search depth.
In a preferred embodiment, the function of the search distance is manhattan distance, which is specifically:
Figure BDA0002879831970000084
wherein (x)i,yi) Representing target candidate box bi(x) coordinates of the center point of (c)j,yj) Box g for representing true valuejCoordinates of the center point of (a).
In a preferred embodiment, the function of the search depth is:
Figure BDA0002879831970000085
wherein V ═ { V ═ Vk(bi,gj)|k=1,2,3,…T},vkRepresenting target candidate box biAnd truth box gjThe depth change sum of the matched k-th Manhattan path; the number of elements of the set V is T, and the target candidate box b is representediAnd truth box gjThere are T manhattan paths;
further, the sum v of the depth variation of the kth Manhattan pathk(bi,gj) The depth difference of the front coordinate point and the rear coordinate point along the path specifically comprises the following steps:
Figure BDA0002879831970000086
wherein the content of the first and second substances,
Figure BDA0002879831970000087
representing the slave target candidate box biTo the truth box gjThe q-th coordinate point in the path of (1); more specifically, the present invention is to provide a novel,
Figure BDA0002879831970000088
representing coordinates of a starting point of the path, i.e. the target candidate box biA center point of (a);
Figure BDA0002879831970000089
representing path end coordinates, i.e. true value box gjA center point of (a);
Figure BDA00028798319700000810
the depth value representing the position of the qth coordinate point is calculated, more specifically, by a depth estimation network.
According to the pedestrian detection method and system in the embodiment of the invention, aiming at the shielding problem in pedestrian detection, through the design of the depth loss estimation module, the search matching module and the like, the accurate and efficient detection of the target in the shielding scene can be better solved.
In another embodiment, a pedestrian detection method in combination with the pedestrian detection system includes: the method comprises the steps of constructing and utilizing an image to be detected to be sent into a CNN network to generate features of different levels, and utilizing deep features and an RPN module to preliminarily extract target candidates; calculating the matching loss of each target candidate box and each true value box through a depth loss estimation module; matching a certain number of target candidate frames for each truth value frame by using a search matching module to obtain more consistent pedestrian characteristics with stronger robustness, and sending the pedestrian characteristics into a final classification and regression module for pedestrian target detection and accurate positioning; the whole network carries out weighted summation through the loss of each module, and the weighted summation is used as a loss function of the whole network to realize end-to-end network training. The whole detection process comprises four links:
firstly, the image to be detected is sent to a CNN network to carry out multilayer convolution operation to generate characteristics of different layers.
Secondly, generating a target candidate frame set B ═ B by utilizing the deep features and an RPN module in a Faster RCNN networki1,2,3, … }; taking a pedestrian target really existing in the original image as a truth value frame set G ═ Gj1,2,3, … }; calculating the matching loss l of each target candidate frame and each true value frame by using a depth loss estimation functionmat(bi,gj). The number of elements of the target candidate frame set B is N, and in the whole detection process, a parameter N is set to 512, which indicates that 512 target candidate frames are extracted from the original image; the element number of the truth-value frame set G is M, and the fact that M pedestrian targets really exist in the original image is shown.
Thirdly, utilizing the matching loss lmatAs input to a search matching module; matching a certain number of target candidate frames for different truth value frames in sequence through a search matching algorithm; sending all matched target candidate frames in the original image into a region pooling module to obtainAnd sending the corresponding characteristics into a classification and regression module to calculate a classification score and a regression position to obtain a final detection result, namely the pedestrian target needing to be detected in the image.
In the present embodiment, partially blocking the pedestrian means that the ratio of the height of the visible body part of the pedestrian to the height of the target of the complete pedestrian is between (0.65,1), and severely blocking the pedestrian means that the ratio of the height of the visible body part of the pedestrian to the height of the target of the complete pedestrian is between (0.20, 0.65).
According to the method and the system provided by the embodiment of the invention, a depth-based search matching algorithm is constructed, and the target candidate frames with more consistent characteristics are matched for each true value frame, so that the method and the system are suitable for variable conditions in a real application environment, the detection robustness is enhanced, the false detection and the missing detection probability are reduced, and particularly for small-scale pedestrians with relatively less available information and seriously interfered by redundant noise, the detection capability of the pedestrian target in the video image can be effectively improved.
It should be noted that, the steps in the method provided by the present invention can be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art can implement the step flow of the method by referring to the technical solution of the system, that is, the embodiment in the system can be understood as a preferred example for implementing the method, and details are not described herein.
Those skilled in the art will appreciate that, in addition to implementing the system and its various modules, devices, units provided by the present invention in pure computer readable program code, the system and its various devices provided by the present invention can be implemented with the same functionality in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like by entirely logically programming method steps. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.
The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and not to limit the invention. Any modifications and variations within the scope of the description, which may occur to those skilled in the art, are intended to be within the scope of the invention.

Claims (6)

1. A pedestrian detection method based on deep search matching is characterized by comprising the following steps:
s11: generating a target candidate frame for the original image based on the area generation network;
s12: calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image:
s13: matching a certain number of target candidate boxes for each truth box in sequence by a search matching algorithm by utilizing the matching loss;
s14: the matched target candidate frame in the original image is subjected to a regional pooling layer to obtain corresponding characteristics;
s15: calculating a classification score and a regression position according to the features obtained in the step S14 to obtain a final detection result, namely a pedestrian target needing to be detected in the original image;
and S12, calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image by using a depth loss estimation function, wherein the depth loss estimation function lmat(bi,gj) Comprises the following steps:
Figure FDA0003541694450000011
wherein, biIndicates the ith target frame candidate, i is 1,2,3 …, N, gjThe j-th truth box is shown, j is 1,2,3, …, M,
Figure FDA0003541694450000012
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search distance of (c),
Figure FDA0003541694450000013
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search depth of;
the function of the search distance is the Manhattan distance
Figure FDA0003541694450000014
The method specifically comprises the following steps:
Figure FDA0003541694450000015
wherein (x)i,yi) Representing target candidate box bi(x) coordinates of the center point of (c)j,yj) Box g for representing true valuejThe coordinates of the center point of (a);
function of the search depth
Figure FDA0003541694450000016
Comprises the following steps:
Figure FDA0003541694450000017
wherein V ═ { V ═ Vk(bi,gj)|k=1,2,3,…,T},vkRepresenting target candidate box biAnd truth box gjThe depth change sum of the matched k-th Manhattan path; the number of elements of the set V is T, and the target candidate box b is representediAnd truth box gjThere are T manhattan paths;
depth variation sum v of k-th Manhattan pathk(bi,gj) The depth difference of the front coordinate point and the rear coordinate point along the path specifically comprises the following steps:
Figure FDA0003541694450000018
wherein the content of the first and second substances,
Figure FDA0003541694450000019
representing the slave target candidate box biTo the truth box gjQ is an integer of 1 or more, wherein,
Figure FDA0003541694450000021
representing coordinates of a starting point of the path, i.e. the target candidate box biA center point of (a);
Figure FDA0003541694450000022
representing path end coordinates, i.e. true value box gjA center point of (a);
Figure FDA0003541694450000023
a depth value representing a position of the qth coordinate point;
the search matching algorithm in S13 is:
for each true value box gjAnd the ith target candidate frame biThe loss of deep search matching is recorded as lmat(bi,gj) The depth search matching loss with all target candidate boxes is represented as L by a setmat(gj)={lmat(bi,gj) 1,2,3, …, N }; will set Lmat(gj) The elements in (1) are reordered from small to large based on depth search match loss; will be L in the setmat(gj) Front mjOne target candidate box, i.e. m with minimum loss of deep search matchingjMatching the target candidate box to the true value box gjWherein m isjIs true value box gjThe maximum number of matches; after the searching and matching are completed, if the same target candidate frame is matched with different true value frames, matching the target candidate frame to the true value frame with smaller depth searching and matching loss, and re-matching the proper target candidate frame for the rest true value frames; truth valueFrame gjMaximum number of matches mjExpressed as:
mj=rj,j=1,2,3,…,M
wherein r represents such that
Figure FDA0003541694450000024
A maximum integer less than N; and sending the matched target candidate frame into a region pooling module to obtain characteristics for classification and regression, sending the characteristics into a classification and regression module, and detecting and positioning the pedestrian target to obtain a detection result, namely the pedestrian target needing to be detected in the image.
2. The pedestrian detection method based on depth search matching according to claim 1, wherein the depth values
Figure FDA0003541694450000025
And calculating by a depth estimation network.
3. The pedestrian detection method based on deep search matching according to claim 1, wherein the step of calculating classification scores and regression positions according to the features obtained in the step S14 in the step S15 further comprises: and carrying out weighted summation and back propagation on the loss of the matching loss, the loss of the search matching algorithm, the classification score and the loss of the regression position to construct an end-to-end training network.
4. A pedestrian detection system based on deep search matching, comprising: the system comprises a region generation network module, a depth loss estimation module, a search matching module, a region pooling module and a classification regression module; wherein the content of the first and second substances,
the region generation network module is used for generating a target candidate frame for the original image based on a region generation network;
the depth loss estimation module is used for calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image:
the search matching module is used for matching a certain number of target candidate boxes for each truth-value box in sequence through a search matching algorithm by utilizing the matching loss;
the region pooling module is used for enabling the matched target candidate frame in the original image to pass through a region pooling layer to obtain corresponding features;
the classification regression module is used for calculating a classification score and a regression position according to the features obtained by the region pooling module to obtain a final detection result, namely a pedestrian target needing to be detected in the original image;
the depth loss estimation module calculates a depth loss estimation function of the matching loss function as follows:
Figure FDA0003541694450000031
wherein, biIndicates the ith target frame candidate, i is 1,2,3 …, N, gjThe j-th truth box is shown, j is 1,2,3, …, M,
Figure FDA0003541694450000032
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search distance of (c),
Figure FDA0003541694450000033
representing a frame b for computing a target candidateiAnd truth box gjAs a function of the search depth of;
the function of the search distance is a manhattan distance, which specifically comprises the following steps:
Figure FDA0003541694450000034
wherein (x)i,yi) Representing target candidate box bi(x) coordinates of the center point of (c)j,yj) Box g for representing true valuejThe coordinates of the center point of (a);
the function of the search depth is:
Figure FDA0003541694450000035
wherein V ═ { V ═ Vk(bi,gj)|k=1,2,3,…,T},vkRepresenting target candidate box biAnd truth box gjThe depth change sum of the matched k-th Manhattan path; the number of elements of the set V is T, and the target candidate box b is representediAnd truth box gjThere are T manhattan paths;
depth variation sum v of k-th Manhattan pathk(bi,gj) The depth difference of the front coordinate point and the rear coordinate point along the path specifically comprises the following steps:
Figure FDA0003541694450000036
wherein the content of the first and second substances,
Figure FDA0003541694450000037
representing the slave target candidate box biTo the truth box gjThe q-th coordinate point in the path of (1),
Figure FDA0003541694450000038
representing coordinates of a starting point of the path, i.e. the target candidate box biA center point of (a);
Figure FDA0003541694450000039
representing path end coordinates, i.e. true value box gjA center point of (a);
Figure FDA00035416944500000310
a depth value representing a position of the qth coordinate point;
the search matching module, wherein the search matching algorithm is as follows:
for eachA true value box gjAnd the ith target candidate frame biThe loss of deep search matching is recorded as lmat(bi,gj) The depth search matching loss with all target candidate boxes is represented as L by a setmat(gj)={lmat(bi,gj) 1,2,3, …, N }; will set Lmat(gj) The elements in (1) are reordered from small to large based on depth search match loss; will be L in the setmat(gj) Front mjOne target candidate box, i.e. m with minimum loss of deep search matchingjMatching the target candidate box to the true value box gjWherein m isjIs true value box gjThe maximum number of matches; after the searching and matching are completed, if the same target candidate frame is matched with different true value frames, matching the target candidate frame to the true value frame with smaller depth searching and matching loss, and re-matching the proper target candidate frame for the rest true value frames; true value box gjMaximum number of matches mjExpressed as:
mj=rj,j=1,2,3,…,M
wherein r represents such that
Figure FDA0003541694450000041
A maximum integer less than N; and sending the matched target candidate frame into a region pooling module to obtain characteristics for classification and regression, sending the characteristics into a classification and regression module, and detecting and positioning the pedestrian target to obtain a detection result, namely the pedestrian target needing to be detected in the image.
5. The pedestrian detection system based on deep search matching according to claim 4, further comprising: and the detection network model module is used for carrying out weighted summation and back propagation on the losses of the depth loss estimation module, the search matching module and the classification regression module, constructing an end-to-end detection network model and training the detection network model by using the sum of the losses.
6. The pedestrian detection method based on depth search matching according to claim 4, wherein the depth values
Figure FDA0003541694450000042
And calculating by a depth estimation network.
CN202011629766.8A 2020-12-31 2020-12-31 Pedestrian detection method and system based on deep search matching Active CN112613472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011629766.8A CN112613472B (en) 2020-12-31 2020-12-31 Pedestrian detection method and system based on deep search matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011629766.8A CN112613472B (en) 2020-12-31 2020-12-31 Pedestrian detection method and system based on deep search matching

Publications (2)

Publication Number Publication Date
CN112613472A CN112613472A (en) 2021-04-06
CN112613472B true CN112613472B (en) 2022-04-26

Family

ID=75253223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011629766.8A Active CN112613472B (en) 2020-12-31 2020-12-31 Pedestrian detection method and system based on deep search matching

Country Status (1)

Country Link
CN (1) CN112613472B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612769B (en) * 2022-03-14 2023-05-26 电子科技大学 Integrated sensing infrared imaging ship detection method integrated with local structure information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399362A (en) * 2018-01-24 2018-08-14 中山大学 A kind of rapid pedestrian detection method and device
CN109753853A (en) * 2017-11-06 2019-05-14 北京航天长峰科技工业集团有限公司 One kind being completed at the same time pedestrian detection and pedestrian knows method for distinguishing again
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism
CN111160407A (en) * 2019-12-10 2020-05-15 重庆特斯联智慧科技股份有限公司 Deep learning target detection method and system
CN111476089A (en) * 2020-03-04 2020-07-31 上海交通大学 Pedestrian detection method, system and terminal based on multi-mode information fusion in image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897673B (en) * 2017-01-20 2020-02-21 南京邮电大学 Retinex algorithm and convolutional neural network-based pedestrian re-identification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753853A (en) * 2017-11-06 2019-05-14 北京航天长峰科技工业集团有限公司 One kind being completed at the same time pedestrian detection and pedestrian knows method for distinguishing again
CN108399362A (en) * 2018-01-24 2018-08-14 中山大学 A kind of rapid pedestrian detection method and device
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism
CN111160407A (en) * 2019-12-10 2020-05-15 重庆特斯联智慧科技股份有限公司 Deep learning target detection method and system
CN111476089A (en) * 2020-03-04 2020-07-31 上海交通大学 Pedestrian detection method, system and terminal based on multi-mode information fusion in image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Pedestrian Detection Based on Imbalance Prior for Surveillance Video》;Hui Zhang等;《2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)》;20171221;全文 *
《Where, What,Whether: Multi-modal Learning Meets Pedestrian Detection》;Yan Luo等;《arXiv:2012.10880v1》;20201220;全文 *

Also Published As

Publication number Publication date
CN112613472A (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN110135243B (en) Pedestrian detection method and system based on two-stage attention mechanism
CN110009679B (en) Target positioning method based on multi-scale feature convolutional neural network
CN104268539B (en) A kind of high performance face identification method and system
CN111914664A (en) Vehicle multi-target detection and track tracking method based on re-identification
CN112184752A (en) Video target tracking method based on pyramid convolution
CN109191497A (en) A kind of real-time online multi-object tracking method based on much information fusion
CN108665481A (en) Multilayer depth characteristic fusion it is adaptive resist block infrared object tracking method
CN108564598B (en) Improved online Boosting target tracking method
CN111767847B (en) Pedestrian multi-target tracking method integrating target detection and association
CN107909081A (en) The quick obtaining and quick calibrating method of image data set in a kind of deep learning
CN110263712A (en) A kind of coarse-fine pedestrian detection method based on region candidate
Yang et al. Single shot multibox detector with kalman filter for online pedestrian detection in video
CN113744311A (en) Twin neural network moving target tracking method based on full-connection attention module
CN108460790A (en) A kind of visual tracking method based on consistency fallout predictor model
CN109242883A (en) Optical remote sensing video target tracking method based on depth S R-KCF filtering
Liu et al. Hand gesture recognition based on single-shot multibox detector deep learning
CN113763424B (en) Real-time intelligent target detection method and system based on embedded platform
An et al. Transitive transfer learning-based anchor free rotatable detector for SAR target detection with few samples
Pang et al. Analysis of computer vision applied in martial arts
Liu et al. Video face detection based on improved SSD model and target tracking algorithm
CN112613472B (en) Pedestrian detection method and system based on deep search matching
Zhang Sports action recognition based on particle swarm optimization neural networks
Moridvaisi et al. An extended KCF tracking algorithm based on TLD structure in low frame rate videos
CN110826575A (en) Underwater target identification method based on machine learning
Altaf et al. Presenting an effective algorithm for tracking of moving object based on support vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant