CN114639115A - 3D pedestrian detection method based on fusion of human body key points and laser radar - Google Patents

3D pedestrian detection method based on fusion of human body key points and laser radar Download PDF

Info

Publication number
CN114639115A
CN114639115A CN202210155255.XA CN202210155255A CN114639115A CN 114639115 A CN114639115 A CN 114639115A CN 202210155255 A CN202210155255 A CN 202210155255A CN 114639115 A CN114639115 A CN 114639115A
Authority
CN
China
Prior art keywords
pedestrian
detection
radar
network
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210155255.XA
Other languages
Chinese (zh)
Other versions
CN114639115B (en
Inventor
程景春
杨生
张春熹
金靖
戴敏鹏
高爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202210155255.XA priority Critical patent/CN114639115B/en
Publication of CN114639115A publication Critical patent/CN114639115A/en
Application granted granted Critical
Publication of CN114639115B publication Critical patent/CN114639115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Processing (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a 3D pedestrian detection method integrating human body key points and a laser radar, which comprises key point detection, 3D feature extraction and pedestrian position prediction; according to the method, the pedestrian key points in the image and the depth features in the point cloud data are fully utilized, the effect of using the point cloud and the image information for improving the pedestrian target identification is enhanced, the precision of 3D pedestrian detection is effectively improved, and the defects of missing image color information and missing three-dimensional positions of the image target in the point cloud features are overcome; the method has very important significance and application value in the fields of intelligent robots, augmented reality, automatic driving and the like.

Description

3D pedestrian detection method based on fusion of human body key points and laser radar
Technical Field
The invention belongs to the technical field of 3D target detection, and relates to a 3D pedestrian detection method based on fusion of human body key points and a laser radar.
Background
The 3D pedestrian detection task depends on application scenes such as automatic driving, augmented reality and intelligent robots, and is one of research hotspots in the field of computer vision at present. Under the above scenario, human is the most common detection target as the main action subject. Especially in the context of vehicle autopilot, uncertainty often originates from pedestrians or riders. The flexibility of people in traffic environment and the important position of people, so that the pedestrians need higher detection precision. But because of the problems of small pedestrian target, insufficient features, background interference and the like, great challenges are brought to 3D pedestrian detection.
The laser radar is an optical remote sensing technology for acquiring relevant target information by detecting laser scattering of a long-distance object, and is a technical product combining the traditional radar and modern laser. The method obtains information by detecting laser scattering on the surface of a target object, and is widely applied to the fields of distance measurement, speed measurement, scanning, target detection and the like. In the automatic driving technology, the surrounding space environment is sensed to mainly pass through a laser radar scanner so as to plan the vehicle traveling route and control the vehicle to safely reach a preset destination. Compared with the traditional measurement technology, the laser radar data acquisition unit has the advantages of high measurement precision, high detection efficiency, all-weather detection and non-contact detection.
Human body key point detection is a basic task in computer vision, and is a preposed task of human body action recognition, behavior analysis, human-computer interaction and the like. The human skeleton key point is important for describing human body posture and predicting human body behavior, and is the basis of many computer vision tasks, such as action classification, abnormal behavior detection, automatic driving and the like. The human body key point detection has the precision reaching 80 percent in human body identification, and has good performance in the aspect of human body behavior prediction. Therefore, the point cloud and image-based three-dimensional pedestrian detection is researched by combining human key points, and the method has very important significance and application value in the field of automatic driving.
The existing 3D target detection mode is mainly used for target identification based on point cloud data, and the algorithm has the characteristic of good detection precision performance, but partial background features are easy to be mistakenly detected as pedestrians due to the fact that image data are not used and color information is lost. And the detection of the human key points is mainly used for the research on human target detection, corresponding behavior prediction and the like in a 2D scene, and the characteristics of positions, sizes and the like of pedestrians in a three-dimensional space are lost.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a 3D pedestrian detection method with human body key points and a laser radar fused, which combines the advantages of high cloud distance measurement precision and high pedestrian recognition capability of the human body key points to realize the detection of 3D pedestrian targets in a three-dimensional space. The specific technical scheme of the invention is as follows:
A3D pedestrian detection method integrating human body key points and a laser radar is characterized in that a fisheye camera and the laser radar are mounted at the front end of a vehicle and used for acquiring visible light images of the front area and the side area of the vehicle and 3D space radar point cloud data of the front area and the side area of the vehicle, and the method comprises the following steps:
s1: detecting key points of the human body based on the visible light image; extracting the positions of key points of the human body from the image, deducing the connection relation between the key points, and tracing to the position of a detection frame of each pedestrian;
s2: building a radar 3D point cloud feature extraction network based on human body key points to extract features; based on a voxel-based radar signal detection network, registering through feature dimension reduction and a two-dimensional image, and accurately introducing key point positions according to a registration result so that the network can extract three-dimensional features around the key point positions;
s3: training and predicting a pedestrian 3D position detection network; carrying out regression prediction on the central point position and the length, width and height information of the pedestrian in the 3D space through a neural network, and training by using a loss function to obtain a final 3D detection network of the pedestrian; and the final prediction result is given after the detection frame post-processing.
Further, the specific process of step S1 is as follows:
s1-1: marking training data; training an OpenPose key point recognition algorithm according to key point data and labeling information of an MSCOCO data set, and carrying out human body key point detection on input image information, wherein the human body key point detection comprises 14 key points: head, neck, left and right shoulders, left and right elbows, left and right wrists, left and right waists, left and right knees, left and right ankles;
s1-2: training by using the OpenPose key point detection algorithm and the training data in the step S1-1 to obtain a pedestrian key point detection network, inputting visible light images acquired by the fisheye camera to the trained network to obtain key point detection results of all pedestrians, obtaining pedestrians to which each key point belongs by the Hungary algorithm, and finally giving 2D key points of all people in the images;
s1-3: and obtaining the maximum and minimum position coordinates of the pedestrian according to the information of the pedestrian to which the key point belongs, and generating a human body candidate region from bottom to top to obtain a pedestrian candidate frame result.
Further, the radar 3D point cloud feature extraction network based on human body key points in step S2 includes a voxel division module, a feature mapping matching module, a feature enhancement module, and a prediction module connected in series, wherein,
the voxel division module is characterized in that 3D space radar point cloud comprises three-dimensional space information, the width along an X, Y, Z axis is W, the height is H, and the depth is D, voxels are divided into cuboid blocks with uniform equal size, and the minimum units of the width, the height and the depth adopted by point cloud voxel division are respectively vW、νH、νDThe size of the three-dimensional voxel network generated after segmentation is as follows: w' ═ W/vW、H′=H/νH、D′=D/νD(ii) a After the voxels are divided, radar points are contained in the voxel grid, non-empty voxels are divided into the voxels containing more than T radar points in the voxel grid, otherwise the voxels are empty voxels, and T points are randomly sampled in each non-empty voxel grid to serve as the characteristics of the voxels;
the feature mapping matching module extracts point cloud features from the output of the voxel division module through a three-layer convolution network, divides the point cloud features into two paths, respectively uses a feature addition layer to perform dimensionality reduction processing on the 3D space radar point cloud, and enables dimensionality reduction directions to respectively correspond to a radar front view and a two-dimensional aerial view, wherein the radar front view direction is consistent with the visible light image direction, signal registration is performed in the direction based on the radar front view and the visible light image, and human body key points obtained in the step S1 are introduced into the radar front view;
the feature enhancement module is used for stacking and enhancing features from the direction of the two-dimensional aerial view based on the processing result of the feature mapping matching module, introducing layers before feature summation layers, namely feature building feature pyramids of three layers of convolution networks in the feature mapping matching module, carrying out feature summation along 3 pixel areas around the points where local extrema in the radar front view and the two-dimensional aerial view are located, and then connecting the feature summation with the features of the two-dimensional aerial view in series to obtain enhanced features;
and fourthly, the prediction module is used for building a prediction module to predict the position of the 3D detection frame of the pedestrian based on the enhanced features, and comprises three full connection layers and two prediction branches for feature abstraction, wherein each full connection layer carries out downsampling 1/2, and the two prediction branches respectively correspond to the prediction of the pedestrian category and the 3D coordinate of the pedestrian.
Further, the specific process of step S3 is as follows:
s3-1: a radar point cloud characteristic extraction network based on human body key point information is built in a step S2 of training on a 3D pedestrian detection data set in a unified mode, a focal loss function is introduced to optimize a prediction result, and the mathematical expression form of the method is as follows:
FL(pt)=-(1-pt)γlog(pt)
Figure BDA0003512243530000031
wherein y represents a label, the value of y is { +1, -1} due to the binary classification, p represents the probability that the prediction sample belongs to 1, and the range is 0 to 1; for convenience of presentation, p is usedtIn place of p, ptRepresents the probability of the sample belonging to the positive sample, gamma focusing parameter, gamma is more than or equal to 0, (1-p)t)γIs modulationParameters, namely, small targets which are difficult to classify are paid attention to during model training by increasing the weight of samples which are difficult to classify, and finally a high-precision pedestrian 3D detection network is obtained;
s3-2: and S3-1, training to obtain a 3D pedestrian detection network, directly inputting the key point image and the radar point cloud into the network in the detection process, predicting a 3D pedestrian detection frame by the network, and outputting a final result after non-maximum value suppression.
The invention has the beneficial effects that:
1. the method disclosed by the invention is used for realizing the 3D detection of the pedestrian target by combining the human body key points in the image and the depth information of the laser radar point cloud. The method has the advantages that the pedestrian key points in the image and the depth features in the point cloud data are fully utilized, the effect of identifying the pedestrian target in the point cloud and the image information is enhanced, the precision of 3D pedestrian detection is effectively improved, and the defects that image color information is lost in the point cloud features and the precision of identifying the three-dimensional target in the image is low are overcome;
2. the method has very important significance and application value in the fields of automatic driving and the like.
Drawings
In order to illustrate embodiments of the present invention or technical solutions in the prior art more clearly, the drawings which are needed in the embodiments will be briefly described below, so that the features and advantages of the present invention can be understood more clearly by referring to the drawings, which are schematic and should not be construed as limiting the present invention in any way, and for a person skilled in the art, other drawings can be obtained on the basis of these drawings without any inventive effort. Wherein:
FIG. 1 is a schematic diagram of a main flow of 3D pedestrian detection according to the present invention;
FIG. 2 is a schematic diagram of the overall design framework of the method of the present invention;
fig. 3 is a schematic diagram of pyramid feature enhancement in the area generation network according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
In the 3D pedestrian detection research based on the point cloud, it is found that the point cloud data is not accurate for the 3D pedestrian detection task, although the point cloud data is excellent in performance in detecting a large target such as a vehicle. The reason is that the pedestrians belong to small targets in the whole road scene and are easily interfered by the background; because of the non-rigid structure of the pedestrian, the point cloud information obtained by radar scanning is less than that of the vehicle, and even part of characteristics are lost; due to the lack of color information of the image, reference is lacked when the pedestrian is identified, so that the predicted detection result cannot be further calibrated, and the detection accuracy of the pedestrian target is not high.
The invention provides a method for detecting a 3D pedestrian target by fusing human key points and laser radar point cloud, which identifies the 3D characteristics of pedestrians with depth information through point cloud data, and calibrates pedestrians identified by point cloud in combination with the pedestrian target identified by the human key points in an image to realize the 3D pedestrian target detection.
In the invention, a method for realizing 3D pedestrian target detection by fusing human key points and laser radar point clouds is provided, fig. 1 is a schematic diagram of a 3D pedestrian detection process, the overall idea is that a 3D target detection algorithm of the radar point clouds identifies depth information and preliminary category prediction of pedestrians, a human key point detection scheme realizes identification of key point characteristics of a human body from image information, and further detection of a pedestrian target is realized through the relevance of limbs. And calibrating the 3D pedestrian detection result predicted by the radar point cloud through the pedestrian information predicted by the human body key points, and outputting the final prediction result. And carrying out model training on the overall algorithm, carrying out test verification and analyzing the detection result.
Specifically, as shown in fig. 1-2, a 3D pedestrian detection method with human body key points fused with a laser radar includes the following steps that a fisheye camera and a laser radar are installed at the front end of a vehicle and are used for acquiring visible light images of the front area and the side area of the vehicle and 3D space radar point cloud data of the front area and the side area of the vehicle:
s1: detecting key points of the human body based on the visible light image; extracting the positions of key points of the human body from the image, deducing the connection relation between the key points, and tracing to the position of a detection frame of each pedestrian; the specific process is as follows:
s1-1: marking training data; training an OpenPose key point recognition algorithm according to key point data and labeling information of an MSCOCO data set, and carrying out human body key point detection on input image information, wherein the human body key point detection comprises 14 key points: head, neck, left and right shoulders, left and right elbows, left and right wrists, left and right waists, left and right knees, left and right ankles;
s1-2: training by using the OpenPose key point detection algorithm and the training data in the step S1-1 to obtain a pedestrian key point detection network, inputting visible light images acquired by the fisheye cameras into the trained network to obtain key point detection results of all pedestrians, obtaining pedestrians to which each key point belongs by using the Hungary algorithm, and finally giving 2D key points of all people in the images;
s1-3: and obtaining the maximum and minimum position coordinates of the pedestrian according to the information of the pedestrian to which the key point belongs, and generating a human body candidate region from bottom to top to obtain a pedestrian candidate frame result.
S2: building a radar 3D point cloud feature extraction network based on human body key points to extract features; based on a voxel-based radar signal detection network, registering through feature dimension reduction and a two-dimensional image, and accurately introducing key point positions according to a registration result so that the network can extract three-dimensional features around the key point positions;
s3: training and predicting a pedestrian 3D position detection network; carrying out regression prediction on the central point position and length, width and height information of the pedestrian in the 3D space through a neural network, and training by using a loss function to obtain a final pedestrian 3D detection network; and the final prediction result is given after the detection frame post-processing.
In some embodiments, the human body key point-based radar 3D point cloud feature extraction network in step S2 includes a voxel division module, a feature mapping matching module, a feature enhancement module, and a prediction module connected in series, wherein,
the system comprises a voxel division module, wherein 3D space radar point cloud comprises three-dimensional space information, the width of the point cloud along an X, Y, Z axis is W, the height of the point cloud is H, the depth of the point cloud is D, voxel division is carried out on the point cloud into rectangular blocks with uniform equal size, and the minimum units of the width, the height and the depth of the point cloud voxel division are respectively vW、νH、νDThe size of the three-dimensional voxel network generated after segmentation is as follows: w' ═ W/vW、H′=H/νH、D′=D/νD(ii) a After the voxels are divided, radar points are contained in the voxel grid, non-empty voxels are divided into the voxels containing more than T radar points in the voxel grid, otherwise the voxels are empty voxels, and T points are randomly sampled in each non-empty voxel grid to serve as the characteristics of the voxels;
a feature mapping matching module, extracting point cloud features from the output of the voxel division module through a three-layer convolution network, dividing the point cloud features into two paths, respectively using a feature addition layer to perform dimensionality reduction processing on the 3D space radar point cloud, wherein the dimensionality reduction directions respectively correspond to a radar front view and a two-dimensional aerial view, the radar front view direction is consistent with the visible light image direction, signal registration is performed in the direction based on the radar front view and the visible light image, and the human body key points obtained in the step S1 are introduced into the radar front view;
taking a KITTI data acquisition vehicle as an example, converting a space point m in a laser radar coordinate system into n in a camera coordinate system, wherein the specific conversion relationship is as follows:
Figure BDA0003512243530000061
Figure BDA0003512243530000062
representing the corrected camera rotation matrix, bringing the image in one plane,at the time of actual calculation, it is expanded into
Figure BDA0003512243530000063
The method comprises the following specific steps:
Figure BDA0003512243530000064
Figure BDA0003512243530000065
a transformation matrix representing the camera from the laser radar is expressed as follows:
Figure BDA0003512243530000066
in the formula
Figure BDA0003512243530000067
A matrix of rotations is represented, which is,
Figure BDA0003512243530000068
and the translation matrix is represented as a transformation matrix of the laser radar to the 0 # gray scale camera coordinate system.
Figure BDA0003512243530000069
Representing the corrected camera projection matrix, expressed as follows:
Figure BDA0003512243530000071
in the formula
Figure BDA0003512243530000072
The offset of the ith camera to the 0 # gray scale camera in the X-axis direction is shown, and when a point under a laser radar point cloud coordinate system is projected to a color image on the left side, the value of i is 2.
Figure BDA0003512243530000073
And
Figure BDA0003512243530000074
refers to the focal length of the camera,
Figure BDA0003512243530000075
and
Figure BDA0003512243530000076
refers to the offset of the principal point.
The feature enhancement module is used for stacking and enhancing features from the direction of the two-dimensional aerial view based on the processing result of the feature mapping matching module, introducing a layer before a feature summation layer, namely a feature building feature pyramid of a three-layer convolution network in the feature mapping matching module, carrying out feature summation along a 3 pixel area around the point where a local extreme value in the radar front view and the two-dimensional aerial view is located, and then connecting the feature summation with the two-dimensional aerial view features in series to obtain the enhanced features;
and the prediction module is built based on the enhanced features to predict the position of the 3D detection frame of the pedestrian, and comprises three full connection layers and two prediction branches for feature abstraction, wherein each full connection layer is subjected to downsampling 1/2, and the two prediction branches respectively correspond to the prediction of the category of the pedestrian and the 3D coordinate of the pedestrian.
In some embodiments, the specific process of step S3 is:
s3-1: a radar point cloud characteristic extraction network based on human body key point information is built in a step S2 of training on a 3D pedestrian detection data set in a unified mode, a focal loss function is introduced to optimize a prediction result, and the mathematical expression form of the method is as follows:
FL(pt)=-(1-pt)γlog(pt)
Figure BDA0003512243530000077
wherein y represents a label, the value of y is { +1, -1} due to the binary classification, p represents the probability that the prediction sample belongs to 1, and the range is 0 to 1; for convenience of presentation, p is usedtIn place of p, ptRepresenting the probability that the sample belongs to a positive sample, a gamma focusing parameter, gamma ≧ 0, (1-p)t)γThe method is characterized in that the parameters are modulated, and small targets which are difficult to classify are paid attention to during model training by increasing the weight of samples which are difficult to classify, so that a high-precision pedestrian 3D detection network is obtained finally;
s3-2: and S3-1, training to obtain a 3D pedestrian detection network, directly inputting the key point image and the radar point cloud into the network in the detection process, predicting a 3D pedestrian detection frame by the network, and outputting a final result after non-maximum value suppression.
In summary, the invention provides a method for realizing fusion of key points of a human body and a laser radar for 3D pedestrian detection, which can be applied to various fields such as intelligent robots, augmented reality, automatic driving and the like, for example, in automatic driving, an optical image acquired by a camera and a point cloud scanned by the laser radar are applied, and the method of the invention is applied to realize a multi-sensor fusion 3D pedestrian detection mode.
In the present invention, the terms "first", "second", "third" and "fourth" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless expressly limited otherwise.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. A3D pedestrian detection method integrating human body key points and a laser radar is characterized in that a fisheye camera and the laser radar are mounted at the front end of a vehicle and used for acquiring visible light images of the front area and the side area of the vehicle and 3D space radar point cloud data of the front area and the side area of the vehicle, and the method comprises the following steps:
s1: detecting key points of the human body based on the visible light image; extracting the positions of key points of the human body from the image, deducing the connection relation between the key points, and tracing to the position of a detection frame of each pedestrian;
s2: building a radar 3D point cloud feature extraction network based on human body key points to extract features; based on a voxel-based radar signal detection network, registering through feature dimension reduction and a two-dimensional image, and accurately introducing key point positions according to a registration result so that the network can extract three-dimensional features around the key point positions;
s3: training and predicting a pedestrian 3D position detection network; carrying out regression prediction on the central point position and the length, width and height information of the pedestrian in the 3D space through a neural network, and training by using a loss function to obtain a final 3D detection network of the pedestrian; and the final prediction result is given after the detection frame post-processing.
2. The method for detecting pedestrian key points in visible light images according to claim 1, wherein the specific process of step S1 is as follows:
s1-1: marking training data; training an OpenPose key point recognition algorithm according to key point data and annotation information of an MSCOCO data set, and carrying out human body key point detection on input image information, wherein the human body key point detection comprises 14 key points: head, neck, left and right shoulders, left and right elbows, left and right wrists, left and right waists, left and right knees, left and right ankles;
s1-2: training by using the OpenPose key point detection algorithm and the training data in the step S1-1 to obtain a pedestrian key point detection network, inputting visible light images acquired by the fisheye camera to the trained network to obtain key point detection results of all pedestrians, obtaining pedestrians to which each key point belongs by the Hungary algorithm, and finally giving 2D key points of all people in the images;
s1-3: and obtaining the maximum and minimum position coordinates of the pedestrian according to the information of the pedestrian to which the key point belongs, and generating a human body candidate region from bottom to top to obtain a pedestrian candidate frame result.
3. The construction of the pedestrian key point-based 3D point cloud feature extraction network according to claim 1 or 2, wherein the human body key point-based radar 3D point cloud feature extraction network in the step S2 comprises a voxel division module, a feature mapping matching module, a feature enhancement module and a prediction module which are connected in series,
the voxel division module is characterized in that 3D space radar point cloud comprises three-dimensional space information, the width along an X, Y, Z axis is W, the height is H, and the depth is D, voxels are divided into cuboid blocks with uniform equal size, and the minimum units of the width, the height and the depth adopted by point cloud voxel division are respectively vW、νH、νDThe size of the three-dimensional voxel network generated after segmentation is as follows: w' ═ W/vW、H′=H/νH、D′=D/νD(ii) a After the voxels are divided, radar points are contained in the voxel grid, non-empty voxels are divided into the voxels containing more than T radar points in the voxel grid, otherwise the voxels are empty voxels, and T points are randomly sampled in each non-empty voxel grid to serve as the characteristics of the voxels;
the feature mapping matching module extracts point cloud features from the output of the voxel division module through a three-layer convolution network, divides the point cloud features into two paths, respectively uses a feature addition layer to perform dimensionality reduction processing on the 3D space radar point cloud, and enables dimensionality reduction directions to respectively correspond to a radar front view and a two-dimensional aerial view, wherein the radar front view direction is consistent with the visible light image direction, signal registration is performed in the direction based on the radar front view and the visible light image, and human body key points obtained in the step S1 are introduced into the radar front view;
the feature enhancement module is used for stacking and enhancing features from the direction of the two-dimensional aerial view based on the processing result of the feature mapping matching module, introducing layers before feature summation layers, namely feature building feature pyramids of three layers of convolution networks in the feature mapping matching module, carrying out feature summation along 3 pixel areas around the points where local extrema in the radar front view and the two-dimensional aerial view are located, and then connecting the feature summation with the features of the two-dimensional aerial view in series to obtain enhanced features;
and fourthly, the prediction module is used for building a prediction module to predict the position of the 3D detection frame of the pedestrian based on the enhanced features, and comprises three full connection layers and two prediction branches for feature abstraction, wherein each full connection layer carries out downsampling 1/2, and the two prediction branches respectively correspond to the prediction of the pedestrian category and the 3D coordinate of the pedestrian.
4. The network training and predicting method according to any one of claims 1-3, wherein said step S3 comprises the following steps:
s3-1: a radar point cloud characteristic extraction network based on human body key point information is built in a step S2 of training on a 3D pedestrian detection data set in a unified mode, a focal loss function is introduced to optimize a prediction result, and the mathematical expression form of the method is as follows:
FL(pt)=-(1-pt)γlog(pt)
Figure FDA0003512243520000021
wherein y represents a label, the value of y is { +1, -1} due to the binary classification, p represents the probability that the prediction sample belongs to 1, and the range is 0 to 1; for convenience of presentation, p is usedtIn place of p, ptRepresenting the probability that the sample belongs to a positive sample, a gamma focusing parameter, gamma ≧ 0, (1-p)t)γThe method is characterized in that the parameters are modulated, and small targets which are difficult to classify are paid attention to during model training by increasing the weight of samples which are difficult to classify, so that a high-precision pedestrian 3D detection network is obtained finally;
s3-2: and S3-1, training to obtain a 3D pedestrian detection network, directly inputting the key point image and the radar point cloud into the network in the detection process, predicting a 3D pedestrian detection frame by the network, and outputting a final result after non-maximum value suppression.
CN202210155255.XA 2022-02-21 2022-02-21 Human body key point and laser radar fused 3D pedestrian detection method Active CN114639115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210155255.XA CN114639115B (en) 2022-02-21 2022-02-21 Human body key point and laser radar fused 3D pedestrian detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210155255.XA CN114639115B (en) 2022-02-21 2022-02-21 Human body key point and laser radar fused 3D pedestrian detection method

Publications (2)

Publication Number Publication Date
CN114639115A true CN114639115A (en) 2022-06-17
CN114639115B CN114639115B (en) 2024-07-05

Family

ID=81946596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210155255.XA Active CN114639115B (en) 2022-02-21 2022-02-21 Human body key point and laser radar fused 3D pedestrian detection method

Country Status (1)

Country Link
CN (1) CN114639115B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114862957A (en) * 2022-07-08 2022-08-05 西南交通大学 Subway car bottom positioning method based on 3D laser radar
CN114881906A (en) * 2022-06-24 2022-08-09 福建省海峡智汇科技有限公司 Method and system for fusing laser point cloud and visible light image
US20230219578A1 (en) * 2022-01-07 2023-07-13 Ford Global Technologies, Llc Vehicle occupant classification using radar point cloud

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111127667A (en) * 2019-11-19 2020-05-08 西北大学 Point cloud initial registration method based on region curvature binary descriptor
CN111243093A (en) * 2020-01-07 2020-06-05 腾讯科技(深圳)有限公司 Three-dimensional face grid generation method, device, equipment and storage medium
CN111898405A (en) * 2020-06-03 2020-11-06 东南大学 Three-dimensional human ear recognition method based on 3DHarris key points and optimized SHOT characteristics
CN113313822A (en) * 2021-06-30 2021-08-27 深圳市豪恩声学股份有限公司 3D human ear model construction method, system, device and medium
US20210365697A1 (en) * 2020-05-20 2021-11-25 Toyota Research Institute, Inc. System and method for generating feature space data
CN113807366A (en) * 2021-09-16 2021-12-17 电子科技大学 Point cloud key point extraction method based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111127667A (en) * 2019-11-19 2020-05-08 西北大学 Point cloud initial registration method based on region curvature binary descriptor
CN111243093A (en) * 2020-01-07 2020-06-05 腾讯科技(深圳)有限公司 Three-dimensional face grid generation method, device, equipment and storage medium
US20210365697A1 (en) * 2020-05-20 2021-11-25 Toyota Research Institute, Inc. System and method for generating feature space data
CN111898405A (en) * 2020-06-03 2020-11-06 东南大学 Three-dimensional human ear recognition method based on 3DHarris key points and optimized SHOT characteristics
CN113313822A (en) * 2021-06-30 2021-08-27 深圳市豪恩声学股份有限公司 3D human ear model construction method, system, device and medium
CN113807366A (en) * 2021-09-16 2021-12-17 电子科技大学 Point cloud key point extraction method based on deep learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230219578A1 (en) * 2022-01-07 2023-07-13 Ford Global Technologies, Llc Vehicle occupant classification using radar point cloud
US12017657B2 (en) * 2022-01-07 2024-06-25 Ford Global Technologies, Llc Vehicle occupant classification using radar point cloud
CN114881906A (en) * 2022-06-24 2022-08-09 福建省海峡智汇科技有限公司 Method and system for fusing laser point cloud and visible light image
CN114862957A (en) * 2022-07-08 2022-08-05 西南交通大学 Subway car bottom positioning method based on 3D laser radar

Also Published As

Publication number Publication date
CN114639115B (en) 2024-07-05

Similar Documents

Publication Publication Date Title
CN110415342B (en) Three-dimensional point cloud reconstruction device and method based on multi-fusion sensor
CN113111887B (en) Semantic segmentation method and system based on information fusion of camera and laser radar
CN110443898A (en) A kind of AR intelligent terminal target identification system and method based on deep learning
CN110852182B (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
CN114639115B (en) Human body key point and laser radar fused 3D pedestrian detection method
CN114359181B (en) Intelligent traffic target fusion detection method and system based on image and point cloud
CN114114312A (en) Three-dimensional target detection method based on fusion of multi-focal-length camera and laser radar
CN114494248B (en) Three-dimensional target detection system and method based on point cloud and images under different visual angles
CN113688738B (en) Target identification system and method based on laser radar point cloud data
CN113298781B (en) Mars surface three-dimensional terrain detection method based on image and point cloud fusion
TWI745204B (en) High-efficiency LiDAR object detection method based on deep learning
Ouyang et al. A cgans-based scene reconstruction model using lidar point cloud
CN112270694B (en) Method for detecting urban environment dynamic target based on laser radar scanning pattern
CN116486287A (en) Target detection method and system based on environment self-adaptive robot vision system
CN114966696A (en) Transformer-based cross-modal fusion target detection method
Alidoost et al. Y-shaped convolutional neural network for 3d roof elements extraction to reconstruct building models from a single aerial image
Priya et al. 3dyolo: Real-time 3d object detection in 3d point clouds for autonomous driving
CN118429524A (en) Binocular stereoscopic vision-based vehicle running environment modeling method and system
CN118038226A (en) Road safety monitoring method based on LiDAR and thermal infrared visible light information fusion
CN112233079B (en) Method and system for fusing images of multiple sensors
CN117372697A (en) Point cloud segmentation method and system for single-mode sparse orbit scene
CN116386003A (en) Three-dimensional target detection method based on knowledge distillation
CN113836975A (en) Binocular vision unmanned aerial vehicle obstacle avoidance method based on YOLOV3
Yang et al. Analysis of Model Optimization Strategies for a Low-Resolution Camera-Lidar Fusion Based Road Detection Network
Nagiub et al. 3D Object Detection for Autonomous Driving: A Comprehensive Review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant