CN112613472B

CN112613472B - Pedestrian detection method and system based on deep search matching

Info

Publication number: CN112613472B
Application number: CN202011629766.8A
Authority: CN
Inventors: 张重阳; 罗艳; 孙军
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-04-26
Anticipated expiration: 2040-12-31
Also published as: CN112613472A

Abstract

The invention discloses a pedestrian detection method and system based on deep search matching, wherein the method comprises the following steps: generating a target candidate frame for the original image based on the area generation network; calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image: matching a certain number of target candidate boxes for each truth box in sequence by a search matching algorithm by utilizing matching loss; the matched target candidate frame in the original image is subjected to a regional pooling layer to obtain corresponding characteristics; and calculating the classification score and the regression position of the features to obtain a final detection result, namely the pedestrian target needing to be detected in the original image. The system comprises: the device comprises a region generation network module, a depth loss estimation module, a search matching module, a region pooling module and a classification regression module. The invention is more suitable for the real complex and changeable environment, and effectively improves the detection capability of the current pedestrian detector.

Description

Pedestrian detection method and system based on deep search matching

Technical Field

The invention relates to the technical field of target detection, in particular to a pedestrian detection method and system based on deep search matching.

Background

The advent of the big data era pushes the continuous updating and development of the computer technology, and the pedestrian detection technology as a research hotspot in the field of computer vision shows important application value in the fields of intelligent video monitoring, intelligent transportation and the like. The existing pedestrian detection algorithm has the following difficulties and challenges, and the detection result still needs to be improved: due to the shooting distance, the image is large, but the size of the target pedestrian is small, the characteristics of the target area are few after the reduction through the deep learning convolutional neural network, and effective detection and identification are difficult to perform; due to the fixity of the shooting angle, the body of the pedestrian has the problem of partial shielding, and useful information in the detection process is relatively reduced, so that missing detection is caused.

At present, mature pedestrian detection methods can be basically divided into two categories: (1) modeling based on the background. The method is mainly used for detecting the moving target in the video: the method comprises the steps of carrying out scene segmentation on an input static image, segmenting the foreground and the background of the input static image by using methods such as a Gaussian Mixture Model (GMM) or motion detection and the like, and extracting a specific moving object from the foreground. Such methods require a continuous sequence of images to achieve modeling and are not suitable for target detection in a single image. (2) Learning based on statistics. That is, all images known to belong to pedestrian targets are collected to form a training set, and features are extracted from the images of the training set based on a manually designed algorithm (such as HOG, Harr and the like). The extracted features are generally information of gray scale, texture, gradient histogram, edge, etc. of the target. And then constructing a pedestrian detection classifier according to the feature library of a large number of training samples. The classifier can generally use models such as SVM, Adaboost and neural network.

In general, in recent years, the target detection algorithm based on statistical learning is superior in performance, and can be divided into a traditional artificial characteristic target detection algorithm and a depth characteristic machine learning target detection algorithm.

The traditional artificial characteristic target detection algorithm mainly refers to that the target detection modeling is carried out by utilizing the characteristics of artificial design. The characteristic algorithm of artificial design which has excellent performance in recent years mainly comprises the following steps: DPM (Deformable Part model) algorithm (Object detection with discrete transformed Part-based models) proposed by Pedro F.Felzenszwalb et al in 2010. ICF (Integrated Channel features) proposed in Piotr Doll-r et al 2009, ACF algorithm (Fast Feature pyramides for Object Detection) proposed in 2014. An Informed Harr method (Informed Haar-like Features improved decision Detection) proposed by Shanshan Zhang et al in 2014 aims to extract Harr Features with more characteristic information for training. Although the artificially designed features have certain effects, the detection precision is still not high due to insufficient characterization capability of the artificial features. Because of the stronger feature learning and expression ability of the deep convolutional neural network model, the deep convolutional neural network model is more and more widely and successfully applied to the aspect of pedestrian detection. The basic pedestrian detection operator is the R-CNN (Region-conditional Neural Network) model. In 2014, Girshick et al proposed RCNN for the detection of a general target, and then proposed Fast RCNN and Fast RCNN, which improved the accuracy and speed of a target detection algorithm based on deep learning. The target detection based on the deep learning technology mostly uses the features extracted from the whole candidate frame for classification regression, and still has the problem of insufficient depth feature extraction, especially aiming at the shielded target and the small-size target in the pedestrian, on one hand, due to the invisibility of the part of the shielded target body, the visual features are limited; on the other hand, the characteristic size is smaller because the size of the deep convolutional neural network layer is reduced; these two factors result in low detection accuracy of the pedestrian target and a further reduction in the missing rate.

In 2018, a part annotation method (occupied peer Detection Through Guided Attention in CNNs) proposed by Shanghan Zhang et al aims to extract body part features with more characteristic information for training. However, there still exist some problems, on one hand, this method still uses the features of the whole target candidate frame, does not fully extract the body part features, and in addition, over-emphasizes the body part features or over-emphasizes the features of the whole candidate frame, resulting in the imbalance between the global and local networks, so the detector has not good generalization performance; on the other hand, using a part detector to extract body part features introduces additional labeled body part information, resulting in increased cost.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a pedestrian detection method and system based on deep search matching, which effectively relieve the problem of inconsistency in the matching process of a pedestrian detection frame, are beneficial to training a more robust and accurate pedestrian detector, and especially reduce the false detection rate of pedestrian detection under the shielding condition, so that the method and system are more suitable for the complex and changeable environment, and effectively improve the detection capability of the current pedestrian detector.

In a first aspect of the present invention, a pedestrian detection method based on deep search matching is provided, which includes:

s11: generating a target candidate frame for the original image based on the area generation network;

s12: calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image:

s13: matching a certain number of target candidate boxes for each truth box in sequence by a search matching algorithm by utilizing the matching loss;

s14: the matched target candidate frame in the original image is processed by a region Pooling (RoI Pooling) layer to obtain corresponding characteristics;

s15: and calculating a classification score and a regression position according to the features obtained in the step S14 to obtain a final detection result, namely the pedestrian target needing to be detected in the original image.

Preferably, the depth loss estimation function for calculating the matching loss function in S12 is:

wherein, b_iIndicates the ith target frame candidate, i is 1,2,3 … N, g_jDenotes the jth true value box, j 12,3,. M,

representing a frame b for computing a target candidate_iAnd truth box g_jAs a function of the search distance of (c),

representing a frame b for computing a target candidate_iAnd truth box g_jAs a function of the search depth.

Preferably, the function of the search distance is manhattan distance, specifically:

wherein (x)_i,y_i) Representing target candidate box b_i(x) coordinates of the center point of (c)_j,y_j) Box g for representing true value_jCoordinates of the center point of (a).

Preferably, the function of the search depth is:

wherein V ═ { V ═ V_k(b_i,g_j)|k＝1,2,3,…T}，v_kRepresenting target candidate box b_iAnd truth box g_jThe depth change sum of the matched k-th Manhattan path; the number of elements of the set V is T, and the target candidate box b is represented_iAnd truth box g_jThere are T manhattan paths;

further, the air conditioner is provided with a fan,

depth variation sum v of k-th Manhattan path_k(b_i,g_j) The depth difference of the front coordinate point and the rear coordinate point along the path specifically comprises the following steps:

wherein the content of the first and second substances,

representing the slave target candidate box b_iTo the truth box g_jQ is an integer greater than or equal to 1; more specifically, the present invention is to provide a novel,

representing coordinates of a starting point of the path, i.e. the target candidate box b_iA center point of (a);

representing path end coordinates, i.e. true value box g_jA center point of (a);

the depth value representing the position of the qth coordinate point is calculated, more specifically, by a depth estimation network.

Preferably, the step of calculating classification scores and regression positions according to the features obtained in the step S14 in the step S15 further comprises: and carrying out weighted summation and back propagation on the loss of the matching loss, the loss of the search matching algorithm, the classification score and the loss of the regression position to construct an end-to-end training network.

In a second aspect of the present invention, a pedestrian detection system based on deep search matching is provided, which includes: the system comprises a region generation network module, a depth loss estimation module, a search matching module, a region pooling module and a classification regression module; wherein the content of the first and second substances,

the region generation network module is used for generating a target candidate frame for the original image based on a region generation network;

the depth loss estimation module is used for calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image:

the search matching module is used for matching a certain number of target candidate boxes for each truth-value box in sequence through a search matching algorithm by utilizing the matching loss;

the region pooling module is used for enabling the matched target candidate frame in the original image to pass through a region pooling layer to obtain corresponding features;

and the classification regression module is used for calculating a classification score and a regression position according to the features obtained by the region pooling module to obtain a final detection result, namely the final detection result is the pedestrian target needing to be detected in the original image.

Preferably, the method further comprises the following steps: and the detection network model module is used for carrying out weighted summation and back propagation on the losses of the depth loss estimation module, the search matching module and the classification regression module, constructing an end-to-end detection network model and training the detection network model by using the sum of the losses.

Compared with the prior art, the invention has at least one of the following advantages:

(1) according to the pedestrian detection method and system based on depth search matching, the target candidate frames with more consistent characteristics are matched for each true value frame through the depth-based search matching algorithm, so that the method and system are suitable for variable conditions in a practical application environment, the detection robustness is enhanced, the false detection and missed detection probability is reduced, and particularly for small-scale pedestrians with relatively less available information and serious interference of blocked pedestrians and redundant noise, the detection capability of a pedestrian target can be effectively improved;

(2) according to the pedestrian detection method and system based on depth search matching, accurate and efficient detection of a target in an occlusion scene can be well achieved through depth loss estimation and search matching;

(3) according to the pedestrian detection method and system based on deep search matching, the weighted summation and the back propagation are carried out through the loss of the matching loss, the loss of the search matching algorithm, the classification score and the loss of the regression position, end-to-end network training is achieved, and the detection result is more accurate.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings:

FIG. 1 is a flow chart of a pedestrian detection method based on deep search matching according to an embodiment of the present invention;

fig. 2 is a flow chart of a search matching mechanism according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

The existing pedestrian detection method can well identify the pedestrian target which is not seriously shielded, but because the actual application scene is more complicated, the pedestrian occupies a very small part without shielding or with less shielding, and therefore, most of pedestrian detectors have poor detection effect on the shielded target. The pedestrian detection under the complex scene has the following characteristics:

the pedestrian shielding method has the advantages that firstly, pedestrians are shielded frequently. In an actual application scene, the situation that the pedestrian target is partially shielded in the image is inevitable. Most existing algorithms fail because the global structural features of the pedestrian are destroyed. Furthermore, due to the diversity of the occlusion patterns, the performance of the algorithm that is too dependent on the site detector is poor.

And secondly, matching between the target candidate frame of the pedestrian detector and the truth value frame is inconsistent. Especially in an occlusion scene, because the pedestrian real value frames are dense, it is easy for target candidate frames with similar positions (similar features) to be matched with different real value frames. The pedestrian detector is difficult to train under the condition, the pedestrian target is difficult to be accurately positioned, and the false detection rate is increased.

Based on the difficulty in pedestrian detection in reality, the embodiment of the invention provides the pedestrian detection method and the system for carrying out depth search matching on the pedestrian image, the deep features in the CNN network are utilized to extract the target candidate frames, and the depth search matching loss is calculated for each target candidate frame and each truth value frame. By means of the matching loss, more consistent target candidate frames are matched for each truth value frame, so that the network can learn more consistent characteristics, excellent detection performance of conventional pedestrian samples is guaranteed, detection accuracy of shielding samples is improved, and false detection rate is reduced.

Fig. 1 is a flowchart of a pedestrian detection method based on depth search matching according to an embodiment of the present invention.

Referring to fig. 1, the pedestrian detection method based on depth search matching of the present embodiment includes:

s11: generating target candidate frame B ═ { B } for original image based on area generation network_i|i＝1,2,3,…}；

S12: calculating each target candidate frame and each real value frame G-G of the real pedestrian target in the original image_jMatching loss l of 1,2,3, … | j |_mat(b_i,g_j)：

S13: matching a certain number of target candidate boxes for each truth box in sequence by a search matching algorithm by utilizing matching loss;

s14: the matched target candidate frame in the original image is subjected to a regional pooling layer to obtain corresponding characteristics;

In this embodiment, the original image in S11 may be subjected to operations such as multilayer convolution to obtain a feature map of the image: the image is passed through a Deep convolution layer (Deep CNN, DCNN) of a convolutional neural network module, such as VGG16 or ResNet, and the input image is subjected to operations such as multilayer convolution to obtain a feature map.

In the preferred embodiment, in S12, the number of elements in the target candidate frame set B is N, and in the entire detection process, the parameter N is set to 512, which indicates that 512 target candidate frames are extracted from the original image; the element number of the truth-value frame set G is M, and the fact that M pedestrian targets really exist in the original image is shown. Of course, other N may be selected in other embodiments. In this step, the RPN module in the fast RCNN network may be utilized to generate a target candidate box for input to the depth loss estimation module.

In the preferred embodiment, the depth loss estimation function for calculating the matching loss function in S12 is:

In a preferred embodiment, the function of the search distance is manhattan distance, which is specifically:

In a preferred embodiment, the function of the search depth is:

depth variation sum v of k-th Manhattan path_k(b_i,g_j) For the depth difference of the front and back coordinate points along this path,the method specifically comprises the following steps:

wherein the content of the first and second substances,

representing the slave target candidate box b_iTo the truth box g_jThe q-th coordinate point in the path of (1); more specifically, the present invention is to provide a novel,

As shown in fig. 2, the search for a match in S13 is calculated as:

for each true value box g_jAnd the ith target candidate frame b_iThe depth search match penalty of can be noted as l_mat(b_i,g_j) The loss of deep search matches with all target candidate boxes may be represented as L in the set_mat(g_j)＝{l_mat(b_i,g_j) 1,2,3, …, N }; will set L_mat(g_j) The elements in (1) are reordered from small to large based on depth search match loss; will be L in the set_mat(g_j) Front m_jOne target candidate box, i.e. m with minimum loss of deep search matching_jMatching the target candidate box to the true value box g_jWherein m is_jIs true value box g_jThe maximum number of matches; after the search matching is completed, if the condition that the same target candidate frame is matched with different true value frames occurs, the target candidate frame is matched with the different true value framesThe target candidate box is matched to the true box with less loss in the deep search matching, and the proper target candidate box is matched again for the rest of the true boxes. True value box g_jMaximum number of matches m_jCan be expressed as:

m_j＝r^j,j＝1,2,3,…,M

wherein r represents such that

A largest integer less than N. And sending the matched target candidate frame into a region pooling module to obtain characteristics for classification and regression, sending the characteristics into a classification and regression module, and detecting and positioning the pedestrian target to obtain a detection result, namely the pedestrian target needing to be detected in the image.

In the preferred embodiment, the step of calculating the classification score and the regression position according to the features obtained in step S14 in step S15 further includes: and carrying out weighted summation and back propagation on the loss of the matching loss, the loss of the search matching algorithm, the classification score and the loss of the regression position, and constructing an end-to-end training mode.

In another embodiment of the present invention, a pedestrian detection system based on deep search matching is further provided, and the system is used for implementing the pedestrian detection method in the above embodiments. Specifically, the system comprises: the system comprises a region generation network module, a depth loss estimation module, a search matching module, a region pooling module and a classification regression module; the region generation network module is used for generating a target candidate frame for the original image based on the region generation network; the depth loss estimation module is used for calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image: the search matching module is used for matching a certain number of target candidate boxes for each truth-value box in sequence through a search matching algorithm by utilizing the matching loss; the region pooling module is used for enabling the matched target candidate frame in the original image to pass through a region pooling layer to obtain corresponding features; and the classification regression module is used for calculating a classification score and a regression position according to the features obtained by the region pooling module to obtain a final detection result, namely the pedestrian target needing to be detected in the original image.

In a preferred embodiment, the method further comprises: and the detection network model module is used for carrying out weighted summation and back propagation by using the losses of the depth loss estimation module, the search matching module and the classification regression module, constructing an end-to-end detection network model and training the detection network model by using the sum of the losses.

According to the embodiment of the invention, an end-to-end deep search matching pedestrian detection system is constructed, the consistent characteristics of pedestrians are fully extracted for the target candidate frames with consistent matching characteristics of each truth value frame, and the environmental interference is effectively removed, so that the detection performance of the pedestrian detector in a complex scene is effectively ensured.

In a preferred embodiment, the depth loss estimation module calculates the depth loss estimation function of the matching loss function as:

In a preferred embodiment, the function of the search depth is:

further, the sum v of the depth variation of the kth Manhattan path_k(b_i,g_j) The depth difference of the front coordinate point and the rear coordinate point along the path specifically comprises the following steps:

wherein the content of the first and second substances,

According to the pedestrian detection method and system in the embodiment of the invention, aiming at the shielding problem in pedestrian detection, through the design of the depth loss estimation module, the search matching module and the like, the accurate and efficient detection of the target in the shielding scene can be better solved.

In another embodiment, a pedestrian detection method in combination with the pedestrian detection system includes: the method comprises the steps of constructing and utilizing an image to be detected to be sent into a CNN network to generate features of different levels, and utilizing deep features and an RPN module to preliminarily extract target candidates; calculating the matching loss of each target candidate box and each true value box through a depth loss estimation module; matching a certain number of target candidate frames for each truth value frame by using a search matching module to obtain more consistent pedestrian characteristics with stronger robustness, and sending the pedestrian characteristics into a final classification and regression module for pedestrian target detection and accurate positioning; the whole network carries out weighted summation through the loss of each module, and the weighted summation is used as a loss function of the whole network to realize end-to-end network training. The whole detection process comprises four links:

firstly, the image to be detected is sent to a CNN network to carry out multilayer convolution operation to generate characteristics of different layers.

Secondly, generating a target candidate frame set B ═ B by utilizing the deep features and an RPN module in a Faster RCNN network_i1,2,3, … }; taking a pedestrian target really existing in the original image as a truth value frame set G ═ G_j1,2,3, … }; calculating the matching loss l of each target candidate frame and each true value frame by using a depth loss estimation function_mat(b_i,g_j). The number of elements of the target candidate frame set B is N, and in the whole detection process, a parameter N is set to 512, which indicates that 512 target candidate frames are extracted from the original image; the element number of the truth-value frame set G is M, and the fact that M pedestrian targets really exist in the original image is shown.

Thirdly, utilizing the matching loss l_matAs input to a search matching module; matching a certain number of target candidate frames for different truth value frames in sequence through a search matching algorithm; sending all matched target candidate frames in the original image into a region pooling module to obtainAnd sending the corresponding characteristics into a classification and regression module to calculate a classification score and a regression position to obtain a final detection result, namely the pedestrian target needing to be detected in the image.

In the present embodiment, partially blocking the pedestrian means that the ratio of the height of the visible body part of the pedestrian to the height of the target of the complete pedestrian is between (0.65,1), and severely blocking the pedestrian means that the ratio of the height of the visible body part of the pedestrian to the height of the target of the complete pedestrian is between (0.20, 0.65).

According to the method and the system provided by the embodiment of the invention, a depth-based search matching algorithm is constructed, and the target candidate frames with more consistent characteristics are matched for each true value frame, so that the method and the system are suitable for variable conditions in a real application environment, the detection robustness is enhanced, the false detection and the missing detection probability are reduced, and particularly for small-scale pedestrians with relatively less available information and seriously interfered by redundant noise, the detection capability of the pedestrian target in the video image can be effectively improved.

It should be noted that, the steps in the method provided by the present invention can be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art can implement the step flow of the method by referring to the technical solution of the system, that is, the embodiment in the system can be understood as a preferred example for implementing the method, and details are not described herein.

Those skilled in the art will appreciate that, in addition to implementing the system and its various modules, devices, units provided by the present invention in pure computer readable program code, the system and its various devices provided by the present invention can be implemented with the same functionality in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like by entirely logically programming method steps. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.

The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and not to limit the invention. Any modifications and variations within the scope of the description, which may occur to those skilled in the art, are intended to be within the scope of the invention.

Claims

1. A pedestrian detection method based on deep search matching is characterized by comprising the following steps:

s15: calculating a classification score and a regression position according to the features obtained in the step S14 to obtain a final detection result, namely a pedestrian target needing to be detected in the original image;

and S12, calculating the matching loss of each target candidate frame and each real value frame of the real pedestrian target in the original image by using a depth loss estimation function, wherein the depth loss estimation function l_mat(b_i,g_j) Comprises the following steps:

wherein, b_iIndicates the ith target frame candidate, i is 1,2,3 …, N, g_jThe j-th truth box is shown, j is 1,2,3, …, M,

representing a frame b for computing a target candidate_iAnd truth box g_jAs a function of the search depth of;

the function of the search distance is the Manhattan distance

The method specifically comprises the following steps:

wherein (x)_i,y_i) Representing target candidate box b_i(x) coordinates of the center point of (c)_j,y_j) Box g for representing true value_jThe coordinates of the center point of (a);

function of the search depth

Comprises the following steps:

wherein V ═ { V ═ V_k(b_i,g_j)|k＝1,2,3,…,T}，v_kRepresenting target candidate box b_iAnd truth box g_jThe depth change sum of the matched k-th Manhattan path; the number of elements of the set V is T, and the target candidate box b is represented_iAnd truth box g_jThere are T manhattan paths;

wherein the content of the first and second substances,

representing the slave target candidate box b_iTo the truth box g_jQ is an integer of 1 or more, wherein,

a depth value representing a position of the qth coordinate point;

the search matching algorithm in S13 is:

for each true value box g_jAnd the ith target candidate frame b_iThe loss of deep search matching is recorded as l_mat(b_i,g_j) The depth search matching loss with all target candidate boxes is represented as L by a set_mat(g_j)＝{l_mat(b_i,g_j) 1,2,3, …, N }; will set L_mat(g_j) The elements in (1) are reordered from small to large based on depth search match loss; will be L in the set_mat(g_j) Front m_jOne target candidate box, i.e. m with minimum loss of deep search matching_jMatching the target candidate box to the true value box g_jWherein m is_jIs true value box g_jThe maximum number of matches; after the searching and matching are completed, if the same target candidate frame is matched with different true value frames, matching the target candidate frame to the true value frame with smaller depth searching and matching loss, and re-matching the proper target candidate frame for the rest true value frames; truth valueFrame g_jMaximum number of matches m_jExpressed as:

m_j＝r^j,j＝1,2,3,…,M

wherein r represents such that

A maximum integer less than N; and sending the matched target candidate frame into a region pooling module to obtain characteristics for classification and regression, sending the characteristics into a classification and regression module, and detecting and positioning the pedestrian target to obtain a detection result, namely the pedestrian target needing to be detected in the image.

2. The pedestrian detection method based on depth search matching according to claim 1, wherein the depth values

And calculating by a depth estimation network.

3. The pedestrian detection method based on deep search matching according to claim 1, wherein the step of calculating classification scores and regression positions according to the features obtained in the step S14 in the step S15 further comprises: and carrying out weighted summation and back propagation on the loss of the matching loss, the loss of the search matching algorithm, the classification score and the loss of the regression position to construct an end-to-end training network.

4. A pedestrian detection system based on deep search matching, comprising: the system comprises a region generation network module, a depth loss estimation module, a search matching module, a region pooling module and a classification regression module; wherein the content of the first and second substances,

the classification regression module is used for calculating a classification score and a regression position according to the features obtained by the region pooling module to obtain a final detection result, namely a pedestrian target needing to be detected in the original image;

the depth loss estimation module calculates a depth loss estimation function of the matching loss function as follows:

the function of the search distance is a manhattan distance, which specifically comprises the following steps:

the function of the search depth is:

wherein the content of the first and second substances,

representing the slave target candidate box b_iTo the truth box g_jThe q-th coordinate point in the path of (1),

a depth value representing a position of the qth coordinate point;

the search matching module, wherein the search matching algorithm is as follows:

for eachA true value box g_jAnd the ith target candidate frame b_iThe loss of deep search matching is recorded as l_mat(b_i,g_j) The depth search matching loss with all target candidate boxes is represented as L by a set_mat(g_j)＝{l_mat(b_i,g_j) 1,2,3, …, N }; will set L_mat(g_j) The elements in (1) are reordered from small to large based on depth search match loss; will be L in the set_mat(g_j) Front m_jOne target candidate box, i.e. m with minimum loss of deep search matching_jMatching the target candidate box to the true value box g_jWherein m is_jIs true value box g_jThe maximum number of matches; after the searching and matching are completed, if the same target candidate frame is matched with different true value frames, matching the target candidate frame to the true value frame with smaller depth searching and matching loss, and re-matching the proper target candidate frame for the rest true value frames; true value box g_jMaximum number of matches m_jExpressed as:

m_j＝r^j,j＝1,2,3,…,M

wherein r represents such that

5. The pedestrian detection system based on deep search matching according to claim 4, further comprising: and the detection network model module is used for carrying out weighted summation and back propagation on the losses of the depth loss estimation module, the search matching module and the classification regression module, constructing an end-to-end detection network model and training the detection network model by using the sum of the losses.

6. The pedestrian detection method based on depth search matching according to claim 4, wherein the depth values

And calculating by a depth estimation network.