CN114639115A - 3D pedestrian detection method based on fusion of human body key points and laser radar - Google Patents
3D pedestrian detection method based on fusion of human body key points and laser radar Download PDFInfo
- Publication number
- CN114639115A CN114639115A CN202210155255.XA CN202210155255A CN114639115A CN 114639115 A CN114639115 A CN 114639115A CN 202210155255 A CN202210155255 A CN 202210155255A CN 114639115 A CN114639115 A CN 114639115A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- detection
- radar
- network
- human body
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 93
- 230000004927 fusion Effects 0.000 title description 5
- 238000000034 method Methods 0.000 claims abstract description 31
- 238000000605 extraction Methods 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims description 29
- 238000013507 mapping Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 210000003423 ankle Anatomy 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 210000001624 hip Anatomy 0.000 claims description 3
- 210000003127 knee Anatomy 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 210000000707 wrist Anatomy 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims 1
- 230000003190 augmentative effect Effects 0.000 abstract description 3
- 230000007547 defect Effects 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract description 2
- 239000011159 matrix material Substances 0.000 description 6
- 238000005259 measurement Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 210000003414 extremity Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Processing (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a 3D pedestrian detection method integrating human body key points and a laser radar, which comprises key point detection, 3D feature extraction and pedestrian position prediction; according to the method, the pedestrian key points in the image and the depth features in the point cloud data are fully utilized, the effect of using the point cloud and the image information for improving the pedestrian target identification is enhanced, the precision of 3D pedestrian detection is effectively improved, and the defects of missing image color information and missing three-dimensional positions of the image target in the point cloud features are overcome; the method has very important significance and application value in the fields of intelligent robots, augmented reality, automatic driving and the like.
Description
Technical Field
The invention belongs to the technical field of 3D target detection, and relates to a 3D pedestrian detection method based on fusion of human body key points and a laser radar.
Background
The 3D pedestrian detection task depends on application scenes such as automatic driving, augmented reality and intelligent robots, and is one of research hotspots in the field of computer vision at present. Under the above scenario, human is the most common detection target as the main action subject. Especially in the context of vehicle autopilot, uncertainty often originates from pedestrians or riders. The flexibility of people in traffic environment and the important position of people, so that the pedestrians need higher detection precision. But because of the problems of small pedestrian target, insufficient features, background interference and the like, great challenges are brought to 3D pedestrian detection.
The laser radar is an optical remote sensing technology for acquiring relevant target information by detecting laser scattering of a long-distance object, and is a technical product combining the traditional radar and modern laser. The method obtains information by detecting laser scattering on the surface of a target object, and is widely applied to the fields of distance measurement, speed measurement, scanning, target detection and the like. In the automatic driving technology, the surrounding space environment is sensed to mainly pass through a laser radar scanner so as to plan the vehicle traveling route and control the vehicle to safely reach a preset destination. Compared with the traditional measurement technology, the laser radar data acquisition unit has the advantages of high measurement precision, high detection efficiency, all-weather detection and non-contact detection.
Human body key point detection is a basic task in computer vision, and is a preposed task of human body action recognition, behavior analysis, human-computer interaction and the like. The human skeleton key point is important for describing human body posture and predicting human body behavior, and is the basis of many computer vision tasks, such as action classification, abnormal behavior detection, automatic driving and the like. The human body key point detection has the precision reaching 80 percent in human body identification, and has good performance in the aspect of human body behavior prediction. Therefore, the point cloud and image-based three-dimensional pedestrian detection is researched by combining human key points, and the method has very important significance and application value in the field of automatic driving.
The existing 3D target detection mode is mainly used for target identification based on point cloud data, and the algorithm has the characteristic of good detection precision performance, but partial background features are easy to be mistakenly detected as pedestrians due to the fact that image data are not used and color information is lost. And the detection of the human key points is mainly used for the research on human target detection, corresponding behavior prediction and the like in a 2D scene, and the characteristics of positions, sizes and the like of pedestrians in a three-dimensional space are lost.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a 3D pedestrian detection method with human body key points and a laser radar fused, which combines the advantages of high cloud distance measurement precision and high pedestrian recognition capability of the human body key points to realize the detection of 3D pedestrian targets in a three-dimensional space. The specific technical scheme of the invention is as follows:
A3D pedestrian detection method integrating human body key points and a laser radar is characterized in that a fisheye camera and the laser radar are mounted at the front end of a vehicle and used for acquiring visible light images of the front area and the side area of the vehicle and 3D space radar point cloud data of the front area and the side area of the vehicle, and the method comprises the following steps:
s1: detecting key points of the human body based on the visible light image; extracting the positions of key points of the human body from the image, deducing the connection relation between the key points, and tracing to the position of a detection frame of each pedestrian;
s2: building a radar 3D point cloud feature extraction network based on human body key points to extract features; based on a voxel-based radar signal detection network, registering through feature dimension reduction and a two-dimensional image, and accurately introducing key point positions according to a registration result so that the network can extract three-dimensional features around the key point positions;
s3: training and predicting a pedestrian 3D position detection network; carrying out regression prediction on the central point position and the length, width and height information of the pedestrian in the 3D space through a neural network, and training by using a loss function to obtain a final 3D detection network of the pedestrian; and the final prediction result is given after the detection frame post-processing.
Further, the specific process of step S1 is as follows:
s1-1: marking training data; training an OpenPose key point recognition algorithm according to key point data and labeling information of an MSCOCO data set, and carrying out human body key point detection on input image information, wherein the human body key point detection comprises 14 key points: head, neck, left and right shoulders, left and right elbows, left and right wrists, left and right waists, left and right knees, left and right ankles;
s1-2: training by using the OpenPose key point detection algorithm and the training data in the step S1-1 to obtain a pedestrian key point detection network, inputting visible light images acquired by the fisheye camera to the trained network to obtain key point detection results of all pedestrians, obtaining pedestrians to which each key point belongs by the Hungary algorithm, and finally giving 2D key points of all people in the images;
s1-3: and obtaining the maximum and minimum position coordinates of the pedestrian according to the information of the pedestrian to which the key point belongs, and generating a human body candidate region from bottom to top to obtain a pedestrian candidate frame result.
Further, the radar 3D point cloud feature extraction network based on human body key points in step S2 includes a voxel division module, a feature mapping matching module, a feature enhancement module, and a prediction module connected in series, wherein,
the voxel division module is characterized in that 3D space radar point cloud comprises three-dimensional space information, the width along an X, Y, Z axis is W, the height is H, and the depth is D, voxels are divided into cuboid blocks with uniform equal size, and the minimum units of the width, the height and the depth adopted by point cloud voxel division are respectively vW、νH、νDThe size of the three-dimensional voxel network generated after segmentation is as follows: w' ═ W/vW、H′=H/νH、D′=D/νD(ii) a After the voxels are divided, radar points are contained in the voxel grid, non-empty voxels are divided into the voxels containing more than T radar points in the voxel grid, otherwise the voxels are empty voxels, and T points are randomly sampled in each non-empty voxel grid to serve as the characteristics of the voxels;
the feature mapping matching module extracts point cloud features from the output of the voxel division module through a three-layer convolution network, divides the point cloud features into two paths, respectively uses a feature addition layer to perform dimensionality reduction processing on the 3D space radar point cloud, and enables dimensionality reduction directions to respectively correspond to a radar front view and a two-dimensional aerial view, wherein the radar front view direction is consistent with the visible light image direction, signal registration is performed in the direction based on the radar front view and the visible light image, and human body key points obtained in the step S1 are introduced into the radar front view;
the feature enhancement module is used for stacking and enhancing features from the direction of the two-dimensional aerial view based on the processing result of the feature mapping matching module, introducing layers before feature summation layers, namely feature building feature pyramids of three layers of convolution networks in the feature mapping matching module, carrying out feature summation along 3 pixel areas around the points where local extrema in the radar front view and the two-dimensional aerial view are located, and then connecting the feature summation with the features of the two-dimensional aerial view in series to obtain enhanced features;
and fourthly, the prediction module is used for building a prediction module to predict the position of the 3D detection frame of the pedestrian based on the enhanced features, and comprises three full connection layers and two prediction branches for feature abstraction, wherein each full connection layer carries out downsampling 1/2, and the two prediction branches respectively correspond to the prediction of the pedestrian category and the 3D coordinate of the pedestrian.
Further, the specific process of step S3 is as follows:
s3-1: a radar point cloud characteristic extraction network based on human body key point information is built in a step S2 of training on a 3D pedestrian detection data set in a unified mode, a focal loss function is introduced to optimize a prediction result, and the mathematical expression form of the method is as follows:
FL(pt)=-(1-pt)γlog(pt)
wherein y represents a label, the value of y is { +1, -1} due to the binary classification, p represents the probability that the prediction sample belongs to 1, and the range is 0 to 1; for convenience of presentation, p is usedtIn place of p, ptRepresents the probability of the sample belonging to the positive sample, gamma focusing parameter, gamma is more than or equal to 0, (1-p)t)γIs modulationParameters, namely, small targets which are difficult to classify are paid attention to during model training by increasing the weight of samples which are difficult to classify, and finally a high-precision pedestrian 3D detection network is obtained;
s3-2: and S3-1, training to obtain a 3D pedestrian detection network, directly inputting the key point image and the radar point cloud into the network in the detection process, predicting a 3D pedestrian detection frame by the network, and outputting a final result after non-maximum value suppression.
The invention has the beneficial effects that:
1. the method disclosed by the invention is used for realizing the 3D detection of the pedestrian target by combining the human body key points in the image and the depth information of the laser radar point cloud. The method has the advantages that the pedestrian key points in the image and the depth features in the point cloud data are fully utilized, the effect of identifying the pedestrian target in the point cloud and the image information is enhanced, the precision of 3D pedestrian detection is effectively improved, and the defects that image color information is lost in the point cloud features and the precision of identifying the three-dimensional target in the image is low are overcome;
2. the method has very important significance and application value in the fields of automatic driving and the like.
Drawings
In order to illustrate embodiments of the present invention or technical solutions in the prior art more clearly, the drawings which are needed in the embodiments will be briefly described below, so that the features and advantages of the present invention can be understood more clearly by referring to the drawings, which are schematic and should not be construed as limiting the present invention in any way, and for a person skilled in the art, other drawings can be obtained on the basis of these drawings without any inventive effort. Wherein:
FIG. 1 is a schematic diagram of a main flow of 3D pedestrian detection according to the present invention;
FIG. 2 is a schematic diagram of the overall design framework of the method of the present invention;
fig. 3 is a schematic diagram of pyramid feature enhancement in the area generation network according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
In the 3D pedestrian detection research based on the point cloud, it is found that the point cloud data is not accurate for the 3D pedestrian detection task, although the point cloud data is excellent in performance in detecting a large target such as a vehicle. The reason is that the pedestrians belong to small targets in the whole road scene and are easily interfered by the background; because of the non-rigid structure of the pedestrian, the point cloud information obtained by radar scanning is less than that of the vehicle, and even part of characteristics are lost; due to the lack of color information of the image, reference is lacked when the pedestrian is identified, so that the predicted detection result cannot be further calibrated, and the detection accuracy of the pedestrian target is not high.
The invention provides a method for detecting a 3D pedestrian target by fusing human key points and laser radar point cloud, which identifies the 3D characteristics of pedestrians with depth information through point cloud data, and calibrates pedestrians identified by point cloud in combination with the pedestrian target identified by the human key points in an image to realize the 3D pedestrian target detection.
In the invention, a method for realizing 3D pedestrian target detection by fusing human key points and laser radar point clouds is provided, fig. 1 is a schematic diagram of a 3D pedestrian detection process, the overall idea is that a 3D target detection algorithm of the radar point clouds identifies depth information and preliminary category prediction of pedestrians, a human key point detection scheme realizes identification of key point characteristics of a human body from image information, and further detection of a pedestrian target is realized through the relevance of limbs. And calibrating the 3D pedestrian detection result predicted by the radar point cloud through the pedestrian information predicted by the human body key points, and outputting the final prediction result. And carrying out model training on the overall algorithm, carrying out test verification and analyzing the detection result.
Specifically, as shown in fig. 1-2, a 3D pedestrian detection method with human body key points fused with a laser radar includes the following steps that a fisheye camera and a laser radar are installed at the front end of a vehicle and are used for acquiring visible light images of the front area and the side area of the vehicle and 3D space radar point cloud data of the front area and the side area of the vehicle:
s1: detecting key points of the human body based on the visible light image; extracting the positions of key points of the human body from the image, deducing the connection relation between the key points, and tracing to the position of a detection frame of each pedestrian; the specific process is as follows:
s1-1: marking training data; training an OpenPose key point recognition algorithm according to key point data and labeling information of an MSCOCO data set, and carrying out human body key point detection on input image information, wherein the human body key point detection comprises 14 key points: head, neck, left and right shoulders, left and right elbows, left and right wrists, left and right waists, left and right knees, left and right ankles;
s1-2: training by using the OpenPose key point detection algorithm and the training data in the step S1-1 to obtain a pedestrian key point detection network, inputting visible light images acquired by the fisheye cameras into the trained network to obtain key point detection results of all pedestrians, obtaining pedestrians to which each key point belongs by using the Hungary algorithm, and finally giving 2D key points of all people in the images;
s1-3: and obtaining the maximum and minimum position coordinates of the pedestrian according to the information of the pedestrian to which the key point belongs, and generating a human body candidate region from bottom to top to obtain a pedestrian candidate frame result.
S2: building a radar 3D point cloud feature extraction network based on human body key points to extract features; based on a voxel-based radar signal detection network, registering through feature dimension reduction and a two-dimensional image, and accurately introducing key point positions according to a registration result so that the network can extract three-dimensional features around the key point positions;
s3: training and predicting a pedestrian 3D position detection network; carrying out regression prediction on the central point position and length, width and height information of the pedestrian in the 3D space through a neural network, and training by using a loss function to obtain a final pedestrian 3D detection network; and the final prediction result is given after the detection frame post-processing.
In some embodiments, the human body key point-based radar 3D point cloud feature extraction network in step S2 includes a voxel division module, a feature mapping matching module, a feature enhancement module, and a prediction module connected in series, wherein,
the system comprises a voxel division module, wherein 3D space radar point cloud comprises three-dimensional space information, the width of the point cloud along an X, Y, Z axis is W, the height of the point cloud is H, the depth of the point cloud is D, voxel division is carried out on the point cloud into rectangular blocks with uniform equal size, and the minimum units of the width, the height and the depth of the point cloud voxel division are respectively vW、νH、νDThe size of the three-dimensional voxel network generated after segmentation is as follows: w' ═ W/vW、H′=H/νH、D′=D/νD(ii) a After the voxels are divided, radar points are contained in the voxel grid, non-empty voxels are divided into the voxels containing more than T radar points in the voxel grid, otherwise the voxels are empty voxels, and T points are randomly sampled in each non-empty voxel grid to serve as the characteristics of the voxels;
a feature mapping matching module, extracting point cloud features from the output of the voxel division module through a three-layer convolution network, dividing the point cloud features into two paths, respectively using a feature addition layer to perform dimensionality reduction processing on the 3D space radar point cloud, wherein the dimensionality reduction directions respectively correspond to a radar front view and a two-dimensional aerial view, the radar front view direction is consistent with the visible light image direction, signal registration is performed in the direction based on the radar front view and the visible light image, and the human body key points obtained in the step S1 are introduced into the radar front view;
taking a KITTI data acquisition vehicle as an example, converting a space point m in a laser radar coordinate system into n in a camera coordinate system, wherein the specific conversion relationship is as follows:
representing the corrected camera rotation matrix, bringing the image in one plane,at the time of actual calculation, it is expanded intoThe method comprises the following specific steps:
in the formulaA matrix of rotations is represented, which is,and the translation matrix is represented as a transformation matrix of the laser radar to the 0 # gray scale camera coordinate system.Representing the corrected camera projection matrix, expressed as follows:
in the formulaThe offset of the ith camera to the 0 # gray scale camera in the X-axis direction is shown, and when a point under a laser radar point cloud coordinate system is projected to a color image on the left side, the value of i is 2.Andrefers to the focal length of the camera,andrefers to the offset of the principal point.
The feature enhancement module is used for stacking and enhancing features from the direction of the two-dimensional aerial view based on the processing result of the feature mapping matching module, introducing a layer before a feature summation layer, namely a feature building feature pyramid of a three-layer convolution network in the feature mapping matching module, carrying out feature summation along a 3 pixel area around the point where a local extreme value in the radar front view and the two-dimensional aerial view is located, and then connecting the feature summation with the two-dimensional aerial view features in series to obtain the enhanced features;
and the prediction module is built based on the enhanced features to predict the position of the 3D detection frame of the pedestrian, and comprises three full connection layers and two prediction branches for feature abstraction, wherein each full connection layer is subjected to downsampling 1/2, and the two prediction branches respectively correspond to the prediction of the category of the pedestrian and the 3D coordinate of the pedestrian.
In some embodiments, the specific process of step S3 is:
s3-1: a radar point cloud characteristic extraction network based on human body key point information is built in a step S2 of training on a 3D pedestrian detection data set in a unified mode, a focal loss function is introduced to optimize a prediction result, and the mathematical expression form of the method is as follows:
FL(pt)=-(1-pt)γlog(pt)
wherein y represents a label, the value of y is { +1, -1} due to the binary classification, p represents the probability that the prediction sample belongs to 1, and the range is 0 to 1; for convenience of presentation, p is usedtIn place of p, ptRepresenting the probability that the sample belongs to a positive sample, a gamma focusing parameter, gamma ≧ 0, (1-p)t)γThe method is characterized in that the parameters are modulated, and small targets which are difficult to classify are paid attention to during model training by increasing the weight of samples which are difficult to classify, so that a high-precision pedestrian 3D detection network is obtained finally;
s3-2: and S3-1, training to obtain a 3D pedestrian detection network, directly inputting the key point image and the radar point cloud into the network in the detection process, predicting a 3D pedestrian detection frame by the network, and outputting a final result after non-maximum value suppression.
In summary, the invention provides a method for realizing fusion of key points of a human body and a laser radar for 3D pedestrian detection, which can be applied to various fields such as intelligent robots, augmented reality, automatic driving and the like, for example, in automatic driving, an optical image acquired by a camera and a point cloud scanned by the laser radar are applied, and the method of the invention is applied to realize a multi-sensor fusion 3D pedestrian detection mode.
In the present invention, the terms "first", "second", "third" and "fourth" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless expressly limited otherwise.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (4)
1. A3D pedestrian detection method integrating human body key points and a laser radar is characterized in that a fisheye camera and the laser radar are mounted at the front end of a vehicle and used for acquiring visible light images of the front area and the side area of the vehicle and 3D space radar point cloud data of the front area and the side area of the vehicle, and the method comprises the following steps:
s1: detecting key points of the human body based on the visible light image; extracting the positions of key points of the human body from the image, deducing the connection relation between the key points, and tracing to the position of a detection frame of each pedestrian;
s2: building a radar 3D point cloud feature extraction network based on human body key points to extract features; based on a voxel-based radar signal detection network, registering through feature dimension reduction and a two-dimensional image, and accurately introducing key point positions according to a registration result so that the network can extract three-dimensional features around the key point positions;
s3: training and predicting a pedestrian 3D position detection network; carrying out regression prediction on the central point position and the length, width and height information of the pedestrian in the 3D space through a neural network, and training by using a loss function to obtain a final 3D detection network of the pedestrian; and the final prediction result is given after the detection frame post-processing.
2. The method for detecting pedestrian key points in visible light images according to claim 1, wherein the specific process of step S1 is as follows:
s1-1: marking training data; training an OpenPose key point recognition algorithm according to key point data and annotation information of an MSCOCO data set, and carrying out human body key point detection on input image information, wherein the human body key point detection comprises 14 key points: head, neck, left and right shoulders, left and right elbows, left and right wrists, left and right waists, left and right knees, left and right ankles;
s1-2: training by using the OpenPose key point detection algorithm and the training data in the step S1-1 to obtain a pedestrian key point detection network, inputting visible light images acquired by the fisheye camera to the trained network to obtain key point detection results of all pedestrians, obtaining pedestrians to which each key point belongs by the Hungary algorithm, and finally giving 2D key points of all people in the images;
s1-3: and obtaining the maximum and minimum position coordinates of the pedestrian according to the information of the pedestrian to which the key point belongs, and generating a human body candidate region from bottom to top to obtain a pedestrian candidate frame result.
3. The construction of the pedestrian key point-based 3D point cloud feature extraction network according to claim 1 or 2, wherein the human body key point-based radar 3D point cloud feature extraction network in the step S2 comprises a voxel division module, a feature mapping matching module, a feature enhancement module and a prediction module which are connected in series,
the voxel division module is characterized in that 3D space radar point cloud comprises three-dimensional space information, the width along an X, Y, Z axis is W, the height is H, and the depth is D, voxels are divided into cuboid blocks with uniform equal size, and the minimum units of the width, the height and the depth adopted by point cloud voxel division are respectively vW、νH、νDThe size of the three-dimensional voxel network generated after segmentation is as follows: w' ═ W/vW、H′=H/νH、D′=D/νD(ii) a After the voxels are divided, radar points are contained in the voxel grid, non-empty voxels are divided into the voxels containing more than T radar points in the voxel grid, otherwise the voxels are empty voxels, and T points are randomly sampled in each non-empty voxel grid to serve as the characteristics of the voxels;
the feature mapping matching module extracts point cloud features from the output of the voxel division module through a three-layer convolution network, divides the point cloud features into two paths, respectively uses a feature addition layer to perform dimensionality reduction processing on the 3D space radar point cloud, and enables dimensionality reduction directions to respectively correspond to a radar front view and a two-dimensional aerial view, wherein the radar front view direction is consistent with the visible light image direction, signal registration is performed in the direction based on the radar front view and the visible light image, and human body key points obtained in the step S1 are introduced into the radar front view;
the feature enhancement module is used for stacking and enhancing features from the direction of the two-dimensional aerial view based on the processing result of the feature mapping matching module, introducing layers before feature summation layers, namely feature building feature pyramids of three layers of convolution networks in the feature mapping matching module, carrying out feature summation along 3 pixel areas around the points where local extrema in the radar front view and the two-dimensional aerial view are located, and then connecting the feature summation with the features of the two-dimensional aerial view in series to obtain enhanced features;
and fourthly, the prediction module is used for building a prediction module to predict the position of the 3D detection frame of the pedestrian based on the enhanced features, and comprises three full connection layers and two prediction branches for feature abstraction, wherein each full connection layer carries out downsampling 1/2, and the two prediction branches respectively correspond to the prediction of the pedestrian category and the 3D coordinate of the pedestrian.
4. The network training and predicting method according to any one of claims 1-3, wherein said step S3 comprises the following steps:
s3-1: a radar point cloud characteristic extraction network based on human body key point information is built in a step S2 of training on a 3D pedestrian detection data set in a unified mode, a focal loss function is introduced to optimize a prediction result, and the mathematical expression form of the method is as follows:
FL(pt)=-(1-pt)γlog(pt)
wherein y represents a label, the value of y is { +1, -1} due to the binary classification, p represents the probability that the prediction sample belongs to 1, and the range is 0 to 1; for convenience of presentation, p is usedtIn place of p, ptRepresenting the probability that the sample belongs to a positive sample, a gamma focusing parameter, gamma ≧ 0, (1-p)t)γThe method is characterized in that the parameters are modulated, and small targets which are difficult to classify are paid attention to during model training by increasing the weight of samples which are difficult to classify, so that a high-precision pedestrian 3D detection network is obtained finally;
s3-2: and S3-1, training to obtain a 3D pedestrian detection network, directly inputting the key point image and the radar point cloud into the network in the detection process, predicting a 3D pedestrian detection frame by the network, and outputting a final result after non-maximum value suppression.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210155255.XA CN114639115B (en) | 2022-02-21 | 2022-02-21 | Human body key point and laser radar fused 3D pedestrian detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210155255.XA CN114639115B (en) | 2022-02-21 | 2022-02-21 | Human body key point and laser radar fused 3D pedestrian detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114639115A true CN114639115A (en) | 2022-06-17 |
CN114639115B CN114639115B (en) | 2024-07-05 |
Family
ID=81946596
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210155255.XA Active CN114639115B (en) | 2022-02-21 | 2022-02-21 | Human body key point and laser radar fused 3D pedestrian detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114639115B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114862957A (en) * | 2022-07-08 | 2022-08-05 | 西南交通大学 | Subway car bottom positioning method based on 3D laser radar |
CN114881906A (en) * | 2022-06-24 | 2022-08-09 | 福建省海峡智汇科技有限公司 | Method and system for fusing laser point cloud and visible light image |
US20230219578A1 (en) * | 2022-01-07 | 2023-07-13 | Ford Global Technologies, Llc | Vehicle occupant classification using radar point cloud |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111127667A (en) * | 2019-11-19 | 2020-05-08 | 西北大学 | Point cloud initial registration method based on region curvature binary descriptor |
CN111243093A (en) * | 2020-01-07 | 2020-06-05 | 腾讯科技(深圳)有限公司 | Three-dimensional face grid generation method, device, equipment and storage medium |
CN111898405A (en) * | 2020-06-03 | 2020-11-06 | 东南大学 | Three-dimensional human ear recognition method based on 3DHarris key points and optimized SHOT characteristics |
CN113313822A (en) * | 2021-06-30 | 2021-08-27 | 深圳市豪恩声学股份有限公司 | 3D human ear model construction method, system, device and medium |
US20210365697A1 (en) * | 2020-05-20 | 2021-11-25 | Toyota Research Institute, Inc. | System and method for generating feature space data |
CN113807366A (en) * | 2021-09-16 | 2021-12-17 | 电子科技大学 | Point cloud key point extraction method based on deep learning |
-
2022
- 2022-02-21 CN CN202210155255.XA patent/CN114639115B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111127667A (en) * | 2019-11-19 | 2020-05-08 | 西北大学 | Point cloud initial registration method based on region curvature binary descriptor |
CN111243093A (en) * | 2020-01-07 | 2020-06-05 | 腾讯科技(深圳)有限公司 | Three-dimensional face grid generation method, device, equipment and storage medium |
US20210365697A1 (en) * | 2020-05-20 | 2021-11-25 | Toyota Research Institute, Inc. | System and method for generating feature space data |
CN111898405A (en) * | 2020-06-03 | 2020-11-06 | 东南大学 | Three-dimensional human ear recognition method based on 3DHarris key points and optimized SHOT characteristics |
CN113313822A (en) * | 2021-06-30 | 2021-08-27 | 深圳市豪恩声学股份有限公司 | 3D human ear model construction method, system, device and medium |
CN113807366A (en) * | 2021-09-16 | 2021-12-17 | 电子科技大学 | Point cloud key point extraction method based on deep learning |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230219578A1 (en) * | 2022-01-07 | 2023-07-13 | Ford Global Technologies, Llc | Vehicle occupant classification using radar point cloud |
US12017657B2 (en) * | 2022-01-07 | 2024-06-25 | Ford Global Technologies, Llc | Vehicle occupant classification using radar point cloud |
CN114881906A (en) * | 2022-06-24 | 2022-08-09 | 福建省海峡智汇科技有限公司 | Method and system for fusing laser point cloud and visible light image |
CN114862957A (en) * | 2022-07-08 | 2022-08-05 | 西南交通大学 | Subway car bottom positioning method based on 3D laser radar |
Also Published As
Publication number | Publication date |
---|---|
CN114639115B (en) | 2024-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110415342B (en) | Three-dimensional point cloud reconstruction device and method based on multi-fusion sensor | |
CN113111887B (en) | Semantic segmentation method and system based on information fusion of camera and laser radar | |
CN110443898A (en) | A kind of AR intelligent terminal target identification system and method based on deep learning | |
CN110852182B (en) | Depth video human body behavior recognition method based on three-dimensional space time sequence modeling | |
CN114639115B (en) | Human body key point and laser radar fused 3D pedestrian detection method | |
CN114359181B (en) | Intelligent traffic target fusion detection method and system based on image and point cloud | |
CN114114312A (en) | Three-dimensional target detection method based on fusion of multi-focal-length camera and laser radar | |
CN114494248B (en) | Three-dimensional target detection system and method based on point cloud and images under different visual angles | |
CN113688738B (en) | Target identification system and method based on laser radar point cloud data | |
CN113298781B (en) | Mars surface three-dimensional terrain detection method based on image and point cloud fusion | |
TWI745204B (en) | High-efficiency LiDAR object detection method based on deep learning | |
Ouyang et al. | A cgans-based scene reconstruction model using lidar point cloud | |
CN112270694B (en) | Method for detecting urban environment dynamic target based on laser radar scanning pattern | |
CN116486287A (en) | Target detection method and system based on environment self-adaptive robot vision system | |
CN114966696A (en) | Transformer-based cross-modal fusion target detection method | |
Alidoost et al. | Y-shaped convolutional neural network for 3d roof elements extraction to reconstruct building models from a single aerial image | |
Priya et al. | 3dyolo: Real-time 3d object detection in 3d point clouds for autonomous driving | |
CN118429524A (en) | Binocular stereoscopic vision-based vehicle running environment modeling method and system | |
CN118038226A (en) | Road safety monitoring method based on LiDAR and thermal infrared visible light information fusion | |
CN112233079B (en) | Method and system for fusing images of multiple sensors | |
CN117372697A (en) | Point cloud segmentation method and system for single-mode sparse orbit scene | |
CN116386003A (en) | Three-dimensional target detection method based on knowledge distillation | |
CN113836975A (en) | Binocular vision unmanned aerial vehicle obstacle avoidance method based on YOLOV3 | |
Yang et al. | Analysis of Model Optimization Strategies for a Low-Resolution Camera-Lidar Fusion Based Road Detection Network | |
Nagiub et al. | 3D Object Detection for Autonomous Driving: A Comprehensive Review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |