CN111027372A - Pedestrian target detection and identification method based on monocular vision and deep learning - Google Patents
Pedestrian target detection and identification method based on monocular vision and deep learning Download PDFInfo
- Publication number
- CN111027372A CN111027372A CN201910991615.8A CN201910991615A CN111027372A CN 111027372 A CN111027372 A CN 111027372A CN 201910991615 A CN201910991615 A CN 201910991615A CN 111027372 A CN111027372 A CN 111027372A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- feature
- image
- pyramid
- phase
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
Abstract
The invention belongs to the technical field of pedestrian target detection, and discloses a pedestrian target detection and identification method based on monocular vision and deep learning; establishing a small sample pedestrian data set, and collecting road pedestrian images in a real scene; carrying out pedestrian detection based on a target detection algorithm based on whole image candidates and single regression and based on depth features; fine-tuning weight parameters of a network higher layer on the VOC data set and the small sample pedestrian data set through secondary transfer learning; extracting the features of the multi-scale pyramid images based on the consistent phase, and extracting the contour features of the pedestrian images to obtain a multi-scale pyramid feature map; and (4) adopting a balanced focus loss function to replace a cross entropy loss function to measure the classification accuracy of the target. The method utilizes the CNN to obtain the depth characteristics, trains the deformable component model and effectively improves the detection precision; transfer learning is introduced, and the accuracy of pedestrian target detection and identification is improved by analyzing the hidden layer in the AlexNet model.
Description
Technical Field
The invention belongs to the technical field of pedestrian target detection, and particularly relates to a pedestrian target detection and identification method based on monocular vision and deep learning.
Background
Currently, the current state of the art commonly used in the industry is such that: pedestrian detection has important application in the directions of intelligent vehicles, robots, video monitoring and the like; pedestrian detection remains a challenging topic in the field of computer vision due to variable pedestrian pose and influence by factors such as illumination, background, clothing, occlusion, etc. At present, pedestrian detection based on computer vision is mostly based on a method of feature extraction and machine learning; in the aspect of feature extraction, features such as contours, textures, frequency domain information and color regions are commonly used for describing the difference between pedestrians and backgrounds. Features such as HOG, EOH, Edgelet, Shapelet and the like describe contour features of pedestrians, LBP features describe texture features of pedestrians, CSS features describe structural region features of human bodies by utilizing color similarity between local parts, and Haar wavelet features describe frequency domain information of pedestrians; among a plurality of characteristics, the HOG characteristics describe the gradient strength and the gradient direction distribution of a local area of an image, can well represent the appearance and the shape of a pedestrian, is insensitive to illumination and small amount of offset, shows excellent performance in pedestrian detection, and becomes a mainstream method for pedestrian detection at present.
In summary, the problems of the prior art are as follows: traffic conditions under the real road camera are complex, people flow is dense, people flow distribution is unbalanced, vehicle sample class unbalance is caused, people flow conditions are greatly different from those in public data sets, characteristics of people flow of different classes cannot be learned in a balanced mode during model training, and people flow detection effects of other classes are poor.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a pedestrian target detection and identification method based on monocular vision and deep learning.
The invention is realized in such a way that a pedestrian target detection and identification method based on monocular vision and deep learning comprises the following steps:
the method comprises the steps of firstly, establishing a small sample pedestrian data set, and collecting road pedestrian images in a real scene; respectively extracting the characteristics of the image and video data sets in the source domain by using the improved triple network; analyzing the common edge information of the pedestrians based on the posterior HOG characteristic of the gradient characteristic energy which is the statistical embodiment of the gradient information of the positive samples of a large number of pedestrians;
secondly, a depth feature-based target detection algorithm based on whole image candidates and single regression is realized and applied to pedestrian detection, double convolution is used for replacing a single convolution kernel, a gradient descent method with momentum is used for learning weight parameters of the network in a training stage, and a cross entropy loss function and a smooth L1 loss function are used as loss functions of a classifier and position regression;
thirdly, fine-tuning weight parameters of a network higher layer on the VOC data set and the small sample pedestrian data set through secondary transfer learning;
fourthly, extracting the features of the multi-scale pyramid images based on consistent phases, and extracting the contour features of the pedestrian images to obtain a multi-scale pyramid feature map;
and fifthly, adopting a balanced focus loss function to replace a cross entropy loss function to measure the classification accuracy of the target.
Further, the pedestrian data set in the first stepThe tagged source domain S contains NsPedestrian image video pair (I)si,Vsi),Isi∈RPIs the ith pedestrian image of the source domain, corresponding to the pedestrian video V in the source domainsi∈RP(ii) a In the same way, the method for preparing the composite material,pedestrian image and pedestrian video image without mark in target domain TAndrespectively represent; constructing a triple network enables the distance between the video of the target pedestrian and the image of the target pedestrian to be smaller than the distance between the video of the target pedestrian and the image of the target pedestrian, and triple loss is defined as follows:
represents Va,Ip,InFrom the source domain X, f2d2D image feature extraction sub-network, f, representing a composition of several 2D convolution layers3dAnd 3D video feature extraction sub-networks composed of a plurality of 3D convolutional layers.
Furthermore, in the second step, the AlexNet model is firstly cut off to obtain a convolution layer of the AlexNet model, then CNN model parameters are obtained through transfer learning to extract rich high-level features, specifically, the first 5 layers of convolution layers of the model are used for obtaining depth features, and then a support vector machine of hidden variables is trained by using each layer of features of a feature pyramid to obtain a global detector and a local detector of the DPM; in the detection process, a global feature mapping image and a local feature mapping image are constructed for a test image, then the local feature mapping images are pooled, then the global feature mapping images are cascaded to obtain a new feature mapping image, and then the trained discriminant model is used for deconvoluting the cascaded feature mapping image to obtain a detection result.
Further comprising:
firstly, extracting depth features from an image pyramid by using truncated CNN containing 5 layers of convolution layers to form a depth feature pyramid of 7 layers of 256 channels, then deconvolving the 256 channel feature layers of each layer by using an initialized global detector and a local detector respectively, pooling a feature map obtained after convolution of the local detector, cascading the pooled feature map with a global detector feature map to form a cascaded feature map, and then performing convolution operation by using a target geometric filter and the cascaded feature map to obtain a final single-component model detection score.
Further comprising: the maximum pooling equivalence is:
wherein when r ∈ { -k.,. k }, then dmaxA body function that maximizes pooling; dmaxConnections between maximized pooling and distance transform pooling may be established; wherein f: g → R; mf: g → R; distance variation pooling Df: g → R; by usingDf: the functional definition of G → R is:
in the convolutional layer, the calculation formula of the characteristic pixel value of the pedestrian image is as follows:
wherein K { (u, v) ∈ N2|0≤u<kx;0≤v<ky},kxAnd kyRespectively representing the l-th layer convolution kernelLength and width of;is the offset of the jth profile of the corresponding layer l; variables c and r represent the current longitudinal and transverse characteristic pixels, respectively, and variables u and v denote the convolution kernel in decibelsP represents the corresponding p-th training sample; f represents the activation function of the l-th layer; the convolution operation occurs equivalent to a convolution kernel volume input feature map
And further, establishing a multi-resolution pyramid model by using a Laplacian pyramid to finish multi-scale pyramid representation, extracting features of each layer of pedestrian images in the pyramid by using a phase-consistent algorithm, and finally fusing the pedestrian images in the multi-scale phase-consistent pyramid from top to bottom by using a multi-scale fusion algorithm to obtain an original pedestrian image phase-consistent feature map.
Further comprising: extracting features of each layer of pedestrian images in the pyramid by using a phase consistency algorithm, and finally fusing the pedestrian images in the multi-scale phase consistency pyramid from top to bottom by using a multi-scale fusion algorithm to obtain an original pedestrian image phase consistency feature map specifically comprises the following steps:
(1) searching the phase consistent feature map of the scale n to obtain the initial position (x) of the phase consistent feature point0,y0);
(2) In the phase-consistent profile of scale n-1, at (x)0,y0) Searching for phase consistent characteristic points in the 3 multiplied by 3 neighborhood; if the position is (x, y) with the outlier, the position (x, y) in the fused phase consistency feature map is a feature point; if not, reserving;
(3) searching a point communicated with the feature point in the n-1 scale at the feature point obtained by fusing the phase-consistent images to obtain a detail feature which is not contained in the fusion image of the scale n;
(4) and (5) searching the next feature point in the n-dimension phase-consistent feature map, and repeating the steps (1) to (3) until the whole map is completed.
The invention further aims to provide a road traffic monitoring platform applying the pedestrian target detection and identification method based on monocular vision and deep learning.
In summary, the advantages and positive effects of the invention are: the method has the advantages that an integrated convolutional neural network target detection model based on whole image candidates and single regression is realized, double convolution is used for replacing a single convolution kernel, an internal competition mechanism is used for replacing an activation layer to realize nonlinearity of the network, the network parameter quantity is reduced, the abstract capability of features on the target is improved, and the end-to-end real-time detection of pedestrian information is realized. In the training stage, learning weight parameters of the network by using a gradient descent method with momentum, and taking a cross entropy loss function and a smooth L1 loss function as a classifier and a loss function of position regression; fine-tuning weight parameters of higher layers of the network on the VOC data set and the small sample road pedestrian data set through secondary transfer learning, enhancing the feature representation capability of the model, and improving the average accuracy by 12 percentage points; aiming at the problem of missing detection of pedestrians, a feature diagram pyramid with transverse connection is introduced, context information in feature representation is increased to a certain extent, a newly-added pyramid is attached to an original network in a jumping transmission mode, the original network structure is not changed, only a small number of weight parameters and calculation operation are added, and the overall detection speed of a pedestrian algorithm is not significantly influenced.
The method utilizes the CNN to obtain the depth characteristics, trains the deformable component model and effectively improves the detection precision of the algorithm. The concept of Transfer Learning (TL) is introduced, and the hidden layer in the AlexNet model is analyzed to discover that the function of the bottom layer is the extraction of the general image features, and the depth features of the pedestrian image are generated at the high layer, so that the accuracy of pedestrian target detection and identification is improved.
Drawings
Fig. 1 is a flowchart of a pedestrian target detection and identification method based on monocular vision and deep learning according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.
As shown in fig. 1, the pedestrian target detection and identification method based on monocular vision and deep learning provided by the embodiment of the present invention includes the following steps:
s101: establishing a small sample pedestrian data set, and collecting road pedestrian images in a real scene; respectively extracting the characteristics of the image and video data sets in the source domain by using the improved triple network; based on the posterior HOG characteristics of the gradient characteristic energy, the gradient characteristic energy is the statistical embodiment of the gradient information of a large number of pedestrian positive samples, and the common edge information of pedestrians can be analyzed;
s102: the method comprises the steps of realizing a depth feature-based target detection algorithm based on whole image candidates and single regression, applying the algorithm to pedestrian detection, replacing a single convolution kernel with double convolution, learning weight parameters of a network by using a gradient descent method with momentum in a training stage, and taking a cross entropy loss function and a smooth L1 loss function as a classifier and a position regression loss function;
s103: fine-tuning weight parameters of a network higher layer on the VOC data set and the small sample pedestrian data set through secondary transfer learning;
s104: extracting the features of the multi-scale pyramid images based on the consistent phase, and extracting the contour features of the pedestrian images to obtain a multi-scale pyramid feature map;
s105: and (4) adopting a balanced focus loss function to replace a cross entropy loss function to measure the classification accuracy of the target.
In a preferred embodiment of the present invention, step S101 specifically includes: suppose thatThe tagged source domain S contains NsPedestrian image video pair (I)si,Vsi),Isi∈RPIs the ith pedestrian image of the source domain, corresponding to the pedestrian video V in the source domainsi∈RP. In the same way, the method for preparing the composite material,pedestrian image and pedestrian video image without mark in target domain TAndrespectively, are shown. Since the video features of pedestrians tend to contain richer information than the image features, constructing the triple network will make the distance between the video of the target pedestrian and the image (positive example) of the target pedestrian smaller than the distance between the video of the target pedestrian and the image (negative example) of the other pedestrian. The triplet penalty is defined as follows:
represents Va,Ip,InFrom the source domain X, f2d2D image feature extraction sub-network, f, representing a composition of several 2D convolution layers3dAnd 3D video feature extraction sub-networks composed of a plurality of 3D convolutional layers.
To make the model converge faster, the more "difficult" triples tend to be selected, i.e., givenSelecting a good case pictureSo thatSelecting negative example picturesSo thatIn particular, an online ternary generator and a batch block containing more samples are used, but only the smallest and largest samples in the batch block are calculated.
In a preferred embodiment of the present invention, in the target detection algorithm based on depth features in step S102, the AlexNet model is first truncated to obtain its convolution layer, and then CNN model parameters are obtained through transfer learning to extract rich high-level features, specifically, depth features are obtained by using the first 5 layers of convolution layers of the model, and then a support Vector Machine (LSVM) for training hidden variables is trained by using each layer of features of a feature pyramid to obtain a global detector and a local detector of the DPM. In the detection process, a global feature mapping image and a local feature mapping image are constructed for a test image, then the local feature mapping images are pooled, then the global feature mapping images are cascaded to obtain a new feature mapping image, and then the trained discriminant model is used for deconvoluting the cascaded feature mapping image to obtain a detection result.
Firstly, extracting depth features from an image pyramid by using truncated CNN containing 5 layers of convolution layers to form a depth feature pyramid of 7 layers of 256 channels, then deconvolving the 256 channel feature layers of each layer by using an initialized global detector and a local detector respectively, pooling a feature map obtained after convolution of the local detector, cascading the pooled feature map with a global detector feature map to form a cascaded feature map, and then performing convolution operation by using a target geometric filter and the cascaded feature map to obtain a final single-component model detection score.
The maximum pooling equivalence is:
wherein when r ∈ { -k.,. k }, then dmaxA body function that maximizes pooling; dmaxConnections between maximized pooling and distance transform pooling may be established; wherein f: g → R; mf: g → R; distance variation pooling Df: g → R; by usingDf: the functional definition of G → R is:
in the convolutional layer, the calculation formula of the characteristic pixel value of the pedestrian image is as follows:
wherein K { (u, v) ∈ N2|0≤u<kx;0≤v<ky},kxAnd kyRespectively representing the l-th layer convolution kernelLength and width of;is the offset of the jth profile of the corresponding layer l; variables c and r represent the current longitudinal and transverse characteristic pixels, respectively, and variables u and v denote the convolution kernel in decibelsP represents the corresponding p-th training sample; f represents the activation function of the l-th layer; the convolution operation occurs equivalent to a convolution kernel volume input feature map
In a preferred embodiment of the present invention, step S104 specifically includes: establishing a multi-resolution pyramid model by using a Laplacian Pyramid (LP), completing multi-scale pyramid representation, extracting features of pedestrian images in each layer of the pyramid by using a phase-consistent algorithm, and finally fusing the pedestrian images in the multi-scale phase-consistent pyramid from top to bottom by using a multi-scale fusion algorithm to obtain an original pedestrian image phase-consistent feature map.
The Laplace pyramid decomposition is an image decomposition method for decomposing an original pedestrian image into different spatial scales, and the steps of constructing the Laplace pyramid are as follows:
(1) the original pedestrian image is G0As the bottom layer of the gaussian pyramid;
(2) filtering the original pedestrian image by a Gaussian low-pass filter G and performing alternate row downsampling on the original pedestrian image to obtain a low-pass pedestrian image, namely the low-pass pedestrian image is the first layer G of the Gaussian pyramid1;
(3) G is to be1Performing interpolation expansion and filtering by an up-sampling and band-pass filter H to obtain G1And calculating the difference between the image and the original pedestrian image to obtain the band-pass component and the zero layer LP of the Laplacian pyramid1(ii) a Wherein, the low-pass filter G and the band-pass filter H are normalized filters;
(4) the next stage of decomposition is actually carried out on the obtained low-pass Gaussian pyramid image, the multi-scale decomposition is completed through iteration, and the iteration process can be expressed by a formula:
G1(i,j)=∑G(m,n)Gl-1(2i+m,2j+n)
LP(i,j)=Gl-1(i,j)-G′l(i,j);
in the formula:
wherein l is the maturity of decomposition of the Gaussian pyramid G and the Laplacian pyramid LP; i and j represent the number of rows and columns of the ith layer of the pyramid; from G0,G1,…,GnThe formed pyramid is a gaussian pyramid,
in a preferred embodiment of the present invention, extracting features from each layer of pedestrian images in the pyramid by using a phase-consistent algorithm, and finally fusing the pedestrian images in the multi-scale phase-consistent pyramid from top to bottom by using a multi-scale fusion algorithm to obtain an original pedestrian image phase-consistent feature map specifically includes:
(1) searching the phase consistent feature map of the scale n to obtain the initial position (x) of the phase consistent feature point0,y0);
(2) In the phase-consistent profile of scale n-1, at (x)0,y0) Searching for phase consistent characteristic points in the 3 multiplied by 3 neighborhood; if the position is (x, y) with the outlier, the position (x, y) in the fused phase consistency feature map is a feature point; if not, reserving;
(3) searching a point communicated with the feature point in the n-1 scale at the feature point obtained by fusing the phase-consistent images to obtain a detail feature which is not contained in the fusion image of the scale n;
(4) and (5) searching the next feature point in the n-dimension phase-consistent feature map, and repeating the steps (1) to (3) until the whole map is completed.
The application effect of the present invention will be described in detail with reference to the simulation.
Acquiring training samples of pedestrian images and videos in different road environments, and establishing a sample library containing 2000 positive samples of the head and shoulders of a pedestrian; changing an OpenCV (open computer vision library) of an Intel open machine into a program, and adopting an SVM classification principle; training a classifier to obtain a split model; when the video sequence is subjected to pedestrian detection, a classifier of a training number is imported, multi-scale detection is carried out on a frame to be detected by using a sliding window, whether the accurate pedestrian position can be obtained is judged, and if the detection rate does not meet the requirement and the false detection rate is too high, the classifier is retrained; if the detection is accurate, marking the detected pedestrian by using a rectangular frame; the hardware environment of the experiment is Intel i3-4130, 0.40GHzCPU and 2GB memory, and in order to keep the real-time requirement of the video, the frame rate of image acquisition and transmission is 20-30 frames/s; when the number of moving targets in the detection area is less, the average detection time of each picture is 30ms, when the number of moving targets is more, the average detection time of each picture is 80ms, the real-time requirement can be met, under the condition of small posture change and small angle deviation, the detection rate of the method can reach 95%, and the low false detection rate is kept.
TABLE 1 comparison of conventional Process with Process of the invention
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (8)
1. A pedestrian target detection and identification method based on monocular vision and deep learning is characterized in that the pedestrian target detection and identification method based on monocular vision and deep learning comprises the following steps:
the method comprises the steps of firstly, establishing a small sample pedestrian data set, and collecting road pedestrian images in a real scene; respectively extracting the characteristics of the image and video data sets in the source domain by using the improved triple network; analyzing the common edge information of the pedestrians based on the posterior HOG characteristic of the gradient characteristic energy which is the statistical embodiment of the gradient information of the positive samples of a large number of pedestrians;
secondly, a depth feature-based target detection algorithm based on whole image candidates and single regression is realized and applied to pedestrian detection, double convolution is used for replacing a single convolution kernel, a gradient descent method with momentum is used for learning weight parameters of the network in a training stage, and a cross entropy loss function and a smooth L1 loss function are used as loss functions of a classifier and position regression;
thirdly, fine-tuning weight parameters of a network higher layer on the VOC data set and the small sample pedestrian data set through secondary transfer learning;
fourthly, extracting the features of the multi-scale pyramid images based on consistent phases, and extracting the contour features of the pedestrian images to obtain a multi-scale pyramid feature map;
and fifthly, adopting a balanced focus loss function to replace a cross entropy loss function to measure the classification accuracy of the target.
2. The pedestrian target detection and identification method based on monocular vision and deep learning of claim 1, wherein the pedestrian data set in the first stepThe tagged source domain S contains NsPedestrian image video pair (I)si,Vsi),Isi∈RPIs the ith pedestrian image of the source domain, corresponding to the pedestrian video V in the source domainsi∈RP(ii) a In the same way, the method for preparing the composite material,pedestrian image and pedestrian video image without mark in target domain TAndrespectively represent; structure IIIThe tuple network makes the distance between the video of the target pedestrian and the image of the target pedestrian smaller than the distance between the video of the target pedestrian and the image of the target pedestrian, and the tuple loss is defined as follows:
3. The pedestrian target detection and identification method based on monocular vision and deep learning as claimed in claim 1, wherein in the second step, the target detection algorithm based on the depth features firstly truncates the AlexNet model to obtain its convolution layer, then obtains CNN model parameters through transfer learning for extracting rich high-level features, specifically, obtains the depth features by using the first 5 layers of convolution layer of the model, and then obtains the global detector and the local detector of the DPM by using each layer of feature training support vector machine of the hidden variables of the feature pyramid; in the detection process, a global feature mapping image and a local feature mapping image are constructed for a test image, then the local feature mapping images are pooled, then the global feature mapping images are cascaded to obtain a new feature mapping image, and then the trained discriminant model is used for deconvoluting the cascaded feature mapping image to obtain a detection result.
4. The pedestrian target detection and identification method based on monocular vision and deep learning of claim 3, further comprising:
firstly, extracting depth features from an image pyramid by using truncated CNN containing 5 layers of convolution layers to form a depth feature pyramid of 7 layers of 256 channels, then deconvolving the 256 channel feature layers of each layer by using an initialized global detector and a local detector respectively, pooling a feature map obtained after convolution of the local detector, cascading the pooled feature map with a global detector feature map to form a cascaded feature map, and then performing convolution operation by using a target geometric filter and the cascaded feature map to obtain a final single-component model detection score.
5. The pedestrian target detection and identification method based on monocular vision and deep learning of claim 4, further comprising: the maximum pooling equivalence is:
wherein when r ∈ { -k.,. k }, then dmaxA body function that maximizes pooling; dmaxConnections between maximized pooling and distance transform pooling may be established; wherein f: g → R; mf: g → R; distance variation pooling Df: g → R; by usingDf: the functional definition of G → R is:
in the convolutional layer, the calculation formula of the characteristic pixel value of the pedestrian image is as follows:
wherein K { (u, v) ∈ N2|0≤u<kx;0≤v<ky},kxAnd kyRespectively representing the l-th layer convolution kernelLength and width of;is the offset of the jth profile of the corresponding layer l; variables c and r represent the current longitudinal and transverse characteristic pixels, respectively, and variables u and v denote the convolution kernel in decibelsP represents the corresponding p-th training sample; f represents the activation function of the l-th layer; the convolution operation occurs equivalent to a convolution kernel volume input feature map
6. The method for detecting and identifying the pedestrian target based on the monocular vision and the deep learning as claimed in claim 1, wherein the fourth step utilizes a laplacian pyramid to establish a multi-resolution pyramid model, completes the multi-scale pyramid representation, utilizes a phase-consistent algorithm to extract the features of each layer of pedestrian images in the pyramid, and finally utilizes a multi-scale fusion algorithm to fuse the pedestrian images in the multi-scale phase-consistent pyramid from top to bottom to obtain an original pedestrian image phase-consistent feature map.
7. The pedestrian target detection and identification method based on monocular vision and deep learning of claim 6, further comprising: extracting features of each layer of pedestrian images in the pyramid by using a phase consistency algorithm, and finally fusing the pedestrian images in the multi-scale phase consistency pyramid from top to bottom by using a multi-scale fusion algorithm to obtain an original pedestrian image phase consistency feature map specifically comprises the following steps:
(1) searching the phase consistent feature map of the scale n to obtain the initial position (x) of the phase consistent feature point0,y0);
(2) In the phase-consistent profile of scale n-1, at (x)0,y0) Searching for phase consistent characteristic points in the 3 multiplied by 3 neighborhood; if there is a outlier and the position is (x, y), then the feature at (x, y) in the fused phase coincidence feature map is the featurePoint; if not, reserving;
(3) searching a point communicated with the feature point in the n-1 scale at the feature point obtained by fusing the phase-consistent images to obtain a detail feature which is not contained in the fusion image of the scale n;
(4) and (5) searching the next feature point in the n-dimension phase-consistent feature map, and repeating the steps (1) to (3) until the whole map is completed.
8. A road traffic monitoring platform applying the pedestrian target detection and identification method based on monocular vision and deep learning of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910991615.8A CN111027372A (en) | 2019-10-10 | 2019-10-10 | Pedestrian target detection and identification method based on monocular vision and deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910991615.8A CN111027372A (en) | 2019-10-10 | 2019-10-10 | Pedestrian target detection and identification method based on monocular vision and deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111027372A true CN111027372A (en) | 2020-04-17 |
Family
ID=70201148
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910991615.8A Pending CN111027372A (en) | 2019-10-10 | 2019-10-10 | Pedestrian target detection and identification method based on monocular vision and deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111027372A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582126A (en) * | 2020-04-30 | 2020-08-25 | 浙江工商大学 | Pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion |
CN111709294A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Express delivery personnel identity identification method based on multi-feature information |
CN111709336A (en) * | 2020-06-08 | 2020-09-25 | 杭州像素元科技有限公司 | Highway pedestrian detection method and device and readable storage medium |
CN111738088A (en) * | 2020-05-25 | 2020-10-02 | 西安交通大学 | Pedestrian distance prediction method based on monocular camera |
CN112006678A (en) * | 2020-09-10 | 2020-12-01 | 齐鲁工业大学 | Electrocardiogram abnormity identification method and system based on combination of AlexNet and transfer learning |
CN112052886A (en) * | 2020-08-21 | 2020-12-08 | 暨南大学 | Human body action attitude intelligent estimation method and device based on convolutional neural network |
CN112287854A (en) * | 2020-11-02 | 2021-01-29 | 湖北大学 | Building indoor personnel detection method and system based on deep neural network |
CN112597802A (en) * | 2020-11-25 | 2021-04-02 | 中国科学院空天信息创新研究院 | Pedestrian motion simulation method based on visual perception network deep learning |
CN112733730A (en) * | 2021-01-12 | 2021-04-30 | 中国石油大学(华东) | Oil extraction operation field smoke suction personnel identification processing method and system |
CN113257008A (en) * | 2021-05-12 | 2021-08-13 | 兰州交通大学 | Pedestrian flow dynamic control system and method based on deep learning |
CN117876711A (en) * | 2024-03-12 | 2024-04-12 | 金锐同创(北京)科技股份有限公司 | Image target detection method, device, equipment and medium based on image processing |
-
2019
- 2019-10-10 CN CN201910991615.8A patent/CN111027372A/en active Pending
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582126A (en) * | 2020-04-30 | 2020-08-25 | 浙江工商大学 | Pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion |
CN111582126B (en) * | 2020-04-30 | 2024-02-27 | 浙江工商大学 | Pedestrian re-recognition method based on multi-scale pedestrian contour segmentation fusion |
CN111709294A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Express delivery personnel identity identification method based on multi-feature information |
CN111709294B (en) * | 2020-05-18 | 2023-07-14 | 杭州电子科技大学 | Express delivery personnel identity recognition method based on multi-feature information |
CN111738088A (en) * | 2020-05-25 | 2020-10-02 | 西安交通大学 | Pedestrian distance prediction method based on monocular camera |
CN111709336B (en) * | 2020-06-08 | 2024-04-26 | 杭州像素元科技有限公司 | Expressway pedestrian detection method, equipment and readable storage medium |
CN111709336A (en) * | 2020-06-08 | 2020-09-25 | 杭州像素元科技有限公司 | Highway pedestrian detection method and device and readable storage medium |
CN112052886A (en) * | 2020-08-21 | 2020-12-08 | 暨南大学 | Human body action attitude intelligent estimation method and device based on convolutional neural network |
WO2022036777A1 (en) * | 2020-08-21 | 2022-02-24 | 暨南大学 | Method and device for intelligent estimation of human body movement posture based on convolutional neural network |
CN112052886B (en) * | 2020-08-21 | 2022-06-03 | 暨南大学 | Intelligent human body action posture estimation method and device based on convolutional neural network |
CN112006678A (en) * | 2020-09-10 | 2020-12-01 | 齐鲁工业大学 | Electrocardiogram abnormity identification method and system based on combination of AlexNet and transfer learning |
CN112287854A (en) * | 2020-11-02 | 2021-01-29 | 湖北大学 | Building indoor personnel detection method and system based on deep neural network |
CN112597802A (en) * | 2020-11-25 | 2021-04-02 | 中国科学院空天信息创新研究院 | Pedestrian motion simulation method based on visual perception network deep learning |
CN112733730A (en) * | 2021-01-12 | 2021-04-30 | 中国石油大学(华东) | Oil extraction operation field smoke suction personnel identification processing method and system |
CN112733730B (en) * | 2021-01-12 | 2022-11-18 | 中国石油大学(华东) | Oil extraction operation field smoke suction personnel identification processing method and system |
CN113257008A (en) * | 2021-05-12 | 2021-08-13 | 兰州交通大学 | Pedestrian flow dynamic control system and method based on deep learning |
CN117876711A (en) * | 2024-03-12 | 2024-04-12 | 金锐同创(北京)科技股份有限公司 | Image target detection method, device, equipment and medium based on image processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111027372A (en) | Pedestrian target detection and identification method based on monocular vision and deep learning | |
CN111797716B (en) | Single target tracking method based on Siamese network | |
CN108846358B (en) | Target tracking method for feature fusion based on twin network | |
CN107341517B (en) | Multi-scale small object detection method based on deep learning inter-level feature fusion | |
CN105869173B (en) | A kind of stereoscopic vision conspicuousness detection method | |
CN108416266B (en) | Method for rapidly identifying video behaviors by extracting moving object through optical flow | |
CN109740413A (en) | Pedestrian recognition methods, device, computer equipment and computer storage medium again | |
CN108009509A (en) | Vehicle target detection method | |
CN106709568A (en) | RGB-D image object detection and semantic segmentation method based on deep convolution network | |
CN110097575B (en) | Target tracking method based on local features and scale pool | |
CN112131908A (en) | Action identification method and device based on double-flow network, storage medium and equipment | |
CN103810473B (en) | A kind of target identification method of human object based on HMM | |
CN112288627B (en) | Recognition-oriented low-resolution face image super-resolution method | |
Geng et al. | Using deep learning in infrared images to enable human gesture recognition for autonomous vehicles | |
CN105787481B (en) | A kind of object detection method and its application based on the potential regional analysis of Objective | |
CN106529494A (en) | Human face recognition method based on multi-camera model | |
CN106530407A (en) | Three-dimensional panoramic splicing method, device and system for virtual reality | |
CN112465021B (en) | Pose track estimation method based on image frame interpolation method | |
CN114419732A (en) | HRNet human body posture identification method based on attention mechanism optimization | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
CN104463962B (en) | Three-dimensional scene reconstruction method based on GPS information video | |
CN116977674A (en) | Image matching method, related device, storage medium and program product | |
CN109508640A (en) | A kind of crowd's sentiment analysis method, apparatus and storage medium | |
CN111488951A (en) | Countermeasure metric learning algorithm based on RGB-D image classification problem | |
CN115147644A (en) | Method, system, device and storage medium for training and describing image description model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200417 |