CN115861944A - Traffic target detection system based on laser radar - Google Patents

Traffic target detection system based on laser radar Download PDF

Info

Publication number
CN115861944A
CN115861944A CN202211692170.1A CN202211692170A CN115861944A CN 115861944 A CN115861944 A CN 115861944A CN 202211692170 A CN202211692170 A CN 202211692170A CN 115861944 A CN115861944 A CN 115861944A
Authority
CN
China
Prior art keywords
voxel
network
point
detection system
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211692170.1A
Other languages
Chinese (zh)
Inventor
王秉路
张磊
胡世超
李宁
王小旭
赵永强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Architecture and Technology
Original Assignee
Xian University of Architecture and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Architecture and Technology filed Critical Xian University of Architecture and Technology
Priority to CN202211692170.1A priority Critical patent/CN115861944A/en
Publication of CN115861944A publication Critical patent/CN115861944A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a traffic target detection system based on a laser radar, and relates to the technical field of traffic target detection systems; the method comprises a radar voxel-point set characteristic extraction network, wherein the radar voxel-point set characteristic extraction network comprises the following steps: a voxel-based multi-scale sparse convolutional network; extracting a network based on the feature of the key point sampling; features of the region of interest pool the network. The invention combines the voxel-based method and the point-based method together, so that the model has the high efficiency of the voxel-based model and the high performance of the point-based model; meanwhile, in order to further optimize the process of model training, a self-distillation mode is used for improving the training target in the process of model training, so that the model can be converged more quickly, and the accuracy is improved.

Description

Traffic target detection system based on laser radar
Technical Field
The invention relates to the technical field of traffic target detection systems, in particular to a traffic target detection system based on a laser radar.
Background
With the rapid development of the laser radar point cloud in the field of automatic driving for 3D target detection, the environmental perception in the intelligent driving system is continuously expanded. In practical application, the obstacle detection function is a basic function in the environment sensing system, and the types of obstacles mainly include common objects on a structured road such as pedestrians, vehicles, bicycles and the like. The laser radar can directly detect the real distance and size of a target, is a main sensor in an automatic driving scheme, is different from a camera, and can directly measure the real distance and size of the target.
However, the point cloud data generated by the laser radar is disordered, and the point cloud data is more sparse than the image, and the density of the point cloud data is gradually reduced along with the increase of the distance. Aiming at the characteristics of sparseness and disorder of laser radar point cloud data, two radar point cloud characteristic processing modes, namely voxel-based and poiin-based, are adopted.
However, the voxel-based approach inevitably causes information loss due to the use of voxels to gather point set characteristics, and the point-based approach causes a problem of low computational efficiency due to the need of processing characteristics point by point. There is a need for a lidar based traffic target detection system that overcomes the above-mentioned deficiencies.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a traffic target detection system based on a laser radar.
In order to achieve the purpose, the invention adopts the following technical scheme:
a traffic target detection system based on laser radar comprises a radar voxel-point set characteristic extraction network, wherein the radar voxel-point set characteristic extraction network comprises:
a voxel-based multi-scale sparse convolutional network;
extracting a network based on the feature of the key point sampling;
a feature pooling network of the region of interest;
the feature extraction and correction method of the radar voxel-point set feature extraction network comprises the following steps of:
s1: firstly, dividing voxels through a point cloud scene and sending the voxels into a multi-scale 3D coefficient convolution network to extract voxel characteristics and convert the voxel characteristics into BEV visual angle characteristics, predicting a target category and a target frame and generating a proposal;
s2: secondly, sampling key points in an FPS mode, and extracting multi-scale voxels, original point clouds and BEV characteristics for the key points by using a VSA module;
s3: and finally, dividing the proposal obtained by the multi-scale sparse convolution network based on the voxels into a plurality of grids, extracting key point features onto grid points through pooling operation, and further performing refined correction on the target frame by using the features.
Preferably, the following components: in the detection system, a voxel characteristic extraction method is carried out on the 3D sparse convolution network, and the voxel characteristic extraction method is used as a backbone to generate 3D propofol; wherein:
a 3D voxel CNN network; dividing the point cloud input P into L multiplied by H multiplied by W individual pixel grids, wherein the non-empty voxel grid characteristic is the characteristic average value of a plurality of points contained in the voxel; the point cloud features after voxelization are subjected to point cloud local features and global features through a series of multi-scale 3D convolutional networks, and the down-sampling scales of the 3D convolutional networks are 1 x, 2 x, 4 x and 8 x respectively;
3 Dpropofol generation; stacking the 8 Xdown-sampled 3D voxel characteristics along the Z axis to obtain a BEV characteristic map representation of L/8 XW/8; and then, candidate anchor frames are generated for various targets in the point cloud scene, each type of target is provided with a 3D anchor frame of 2 xL/8 xW/8, and the size of the anchor frame is the average size of the targets in the type, namely, the anchor frame is in two directions of 0 degrees and 90 degrees.
And further: in the detection system, the multi-scale speed-up features are gathered to a small number of key points, so that the key points become a bridge between the 3D voxel features and a propofol fine adjustment network;
selecting n key points k = { p1, p2,. Multidot, pn } from a Point cloud scene through a Furthest-Point-Sampling (FPS) algorithm; the FPS algorithm is adopted for sampling the key points, so that the key points can be distributed in the whole point cloud scene, and the characteristic of the key points can be ensured to represent the characteristic information of the whole scene;
and with the key point position as a reference point, using an SVA module to gather voxel characteristics around the key point.
Further preferred is: in the detection system, F (lk) ={f 1 (lk) ,...,f Nk (lk) Denotes the K-th scale 3D voxel characteristics; v () = { V = 1 (lk) ,...,v Nk (lk) Expressing 3D coordinates corresponding to the acceleration features, wherein the coordinates are calculated through voxel indexes and the size of voxels corresponding to scales, and Nk expresses the number of non-empty voxels in the k scale; for each keypoint pi, its neighboring non-empty voxels are first determined by rk, which is denoted as:
Figure BDA0004021703560000041
in this process, pi is added to v j (lk) The relative coordinates of (a) and the voxel characteristics are merged together to represent the relative positional relationship between the two; generating, by a Pointblock module, a feature representation of the keypoints pi:
Figure BDA0004021703560000042
m represents S i (lk) Randomly sampling the characteristics of Tk voxels to achieve the purpose of saving the calculation amount; g represents a MLP network for encoding voxel characteristics and relative relations; selecting a plurality of radius distances simultaneously to perform VSA operation; clustering voxel features of multiple scales onto keypoints, the feature of keypoint pi representing:
Figure BDA0004021703560000043
as a preferable aspect of the present invention: in the detection system, the expanded VSA module: in addition to gathering multi-scale voxel characteristics on key points, gathering characteristics of BEVs obtained by octave down-sampling of original point clouds on the key points, projecting the key points pi onto a BEV view, and then gathering adjacent BEV characteristics fi (BEVs) on the key points by a bilinear interpolation method; finally, the keypoint features are represented by the following formula:
Figure BDA0004021703560000044
further preferred as the invention: in the detection system, a Predicted Keypoint Weighting module is used for predicting the weight of key points; the PKW module takes the label of the 3D bounding box as supervision, and the supervision label of the key point contained in the 3D target frame is the foreground key point; finally, after weight prediction network processing, the key point characteristics are shown as follows:
Figure BDA0004021703560000051
a represents a three-layer MLP network and a sigmoid function for foreground confidence prediction; the PKW network is trained over focal local.
As a still further scheme of the invention: in the detection system, a pooling network of an interested area is used for simultaneously gathering the characteristics of key points to grid points of the interested area according to a plurality of receiving areas; the 3D propofol is divided into 6 × 6 × 6 grid points, as represented by:
G={g 1 ,……,g 216 }
the neighboring keypoints for each grid point are determined by:
Figure BDA0004021703560000052
wherein, pj-gi is used for preserving the relative position relation between the grid point and the adjacent key point pj; and then, encoding the feature aggregation of the adjacent key points to the grid points gi by adopting a PointNet module:
Figure BDA0004021703560000053
gathering key point characteristics of different receiving domains by adopting various radii r, and combining the characteristics of the different receiving domains together; the vectorized feature is then converted into a 256-dimensional feature through a two-layer MLP network to represent the propofol.
On the basis of the scheme: in the detection system, the size and position information of the 3D propofol can be predicted by a refined correction network by utilizing the characteristics of the propofol; the whole refinement and correction network consists of two branches: confidence prediction branches and frame regression are grouped, and each branch consists of two layers of MLP networks;
the confidence coefficient prediction network adopts the ROI of the 3D interested region and the 3D IoU between the corresponding GT as training targets, and for the kth 3D interested region, the confidence coefficient training target yk is as follows:
Y k =min(1,max(0,2IoU k -0.5))
then, the confidence gt and the predicted confidence score are subjected to loss calculation:
Figure BDA0004021703560000061
the regression target of the target frame is obtained through a traditional residual-based mode, and smooth L1 loss is used for optimization.
On the basis of the foregoing scheme, it is preferable that: in the detection system, for a model with input of X and a K-dimension one-hot supervision target Y, inputting X into the model to obtain a logit vector of z (X) = [ z1 (X) ], zK (X) ]; obtaining a prediction confidence coefficient P (x) = [ P1 (x) ], pK (x) ] through a softmax function; softening the confidence coefficient:
Figure BDA0004021703560000062
τ represents a temperature coefficient of temperature scaling; output of the teacher model and the student model is subjected to softmax to obtain PT (x) and PS (x); for the student model, the training goal is as follows:
Figure BDA0004021703560000071
when the temperature coefficient tau is 1, the objective function degenerates to P S (x) Cross entry function on soft supervision objective.
It is further preferable on the basis of the foregoing scheme that: in the detection system, the knowledge self-distillation model acquires knowledge from the model so as to improve the generalization capability of the model; the t-th epoch, predicted for x under self-distillation treatment to be P t s The objective function of (a) is:
Figure BDA0004021703560000072
for the model of epoch, the training target is (1- α) y + α Pt-1S (x); the parameter α is the degree of confidence in the teacher model.
The invention has the beneficial effects that:
1. the invention combines the voxel-based method and the point-based method together, so that the model has the high efficiency of the voxel-based model and the high performance of the point-based model; meanwhile, in order to further optimize the process of model training, a self-distillation mode is used for improving the training target in the process of model training, so that the model can be converged more quickly, and the accuracy is improved.
2. According to the invention, a plurality of target detection models based on laser radar point cloud are subjected to detection performance comparison experiments, the improved models obtain excellent performance on a KITTI data set, meanwhile, the self-distillation module is subjected to ablation experiments, and the experimental results prove that the self-distillation module plays a great role in the training process.
Drawings
Fig. 1 is a flowchart of a traffic target detection system based on a laser radar according to the present invention.
Detailed Description
The technical solution of the present patent will be described in further detail with reference to the following embodiments.
Example 1:
a traffic target detection system based on laser radar comprises a radar voxel-point set characteristic extraction network, wherein the radar voxel-point set characteristic extraction network comprises:
a voxel-based multi-scale sparse convolutional network;
extracting a network based on the feature of the key point sampling;
a feature pooling network of the region of interest;
the feature extraction and correction method of the radar voxel-point set feature extraction network comprises the following steps:
s1: firstly, dividing voxels through a point cloud scene and sending the voxels into a multi-scale 3D coefficient convolution network to extract voxel characteristics and convert the voxel characteristics into BEV visual angle characteristics, predicting a target category and a target frame and generating a proposal;
s2: secondly, sampling key points in an FPS mode, and extracting multi-scale voxels, original point clouds and BEV characteristics for the key points by using a VSA module;
s3: and finally, dividing the proposal obtained by the multi-scale sparse convolution network based on the voxels into a plurality of grids, extracting key point features to grid points through pooling operation, and further performing fine correction on the target frame by using the features.
In the detection system, a voxel characteristic extraction method is carried out on the 3D sparse convolution network, and the voxel characteristic extraction method is used as a backbone to generate 3D proposal; wherein:
a 3D voxel CNN network; dividing the point cloud input P into L multiplied by H multiplied by W individual pixel grids, wherein the non-empty voxel grid characteristics are characteristic average values of a plurality of points contained in voxels, and the characteristics generally comprise 3D coordinates and reflection intensity attributes of the points; the point cloud characteristics after voxelization are to be aggregated by a series of multi-scale 3D convolution networks to obtain point cloud local characteristics and global characteristics, wherein the down-sampling scales of the 3D convolution networks are 1 x, 2 x, 4 x and 8 x respectively;
3 Dpropofol generation; stacking the 8 Xdown-sampled 3D voxel characteristics along the Z axis to obtain a BEV characteristic map representation of L/8 XW/8; then, candidate anchor frames are generated for various targets in the point cloud scene, each type of target is provided with a 3D anchor frame of 2 xL/8 xW/8, and the size of the anchor frame is the average size of the type of target and is in two directions of 0 degrees and 90 degrees; compared with a PointNet-based method, the method has the advantages that a higher recall rate can be realized by adopting a 3D voxel CNN network and an anchor frame strategy;
and (5) a discussion of fine modification of the detection target. On the one hand, the refinement and correction of propofol directly on the 3D voxelized feature or the 2D feature map brings many problems. Firstly, point cloud characteristics processed by a backbone downsampling network cannot complete fine positioning of a target frame, and the fine correction effect of the target frame is influenced; secondly, even if the features are subjected to upsampling processing by means of linear interpolation and the like, the features are sparse, and fine correction of the target frame cannot be realized.
On the other hand, the set interaction operation proposed in the PointNet network can extract features from surrounding points in an arbitrary radius range; therefore, the effective fine correction work of the target box is realized by the operation of the set iteration. The problems of memory occupation and computational efficiency caused by extracting all the voxel characteristics are avoided. And selecting partial key points in the point cloud scene, and extracting the voxel characteristics to the partial key points. By the method, the problem that the voxelization features are too sparse can be avoided, the implementation problems of memory occupation and the like caused by extraction of all the voxelization features can also be avoided, and the fine correction effect of the target box is greatly improved.
In the detection system, the multi-scale speed-up features are gathered to a small number of key points, so that the key points become a bridge between the 3D voxel features and a propofol fine adjustment network;
sampling key points, namely firstly, selecting n key points k = { p1, p2,. Multidot.pn } from a Point cloud scene through a Furthest-Point-Sampling (FPS) algorithm; the FPS algorithm is adopted for sampling the key points, so that the key points can be distributed in the whole point cloud scene, and the characteristic of the key points can be ensured to represent the characteristic information of the whole scene;
a VSA module; the selected key points only contain a few parts of characteristic information in the whole point cloud scene; therefore, the voxel characteristics around the key points need to be gathered by using the SVA module with the key points as reference points;
F (lk) ={f 1 (lk) ,...,f Nk (lk) denotes the K-th scale 3D voxel characteristics; v () = { V = 1 (lk) ,...,v Nk (lk) Denotes the 3D coordinates of the corresponding acceleration feature, calculated from the voxel index and the size of the voxel of the corresponding scale, nk denotes the number of non-empty voxels in the k-th scale. For each keypoint pi, its neighboring non-empty voxels are first determined by rk, which is denoted as:
Figure BDA0004021703560000111
in this process, pi is added to v j (lk) The relative coordinates of (a) and the voxel characteristics are merged together to represent the relative positional relationship between the two; generating, by a Pointblock module, a feature representation of the keypoints pi:
Figure BDA0004021703560000112
/>
where M denotes a slave S i (lk) Randomly sampling the characteristics of Tk voxels to achieve the purpose of saving the calculation amount; g represents a MLP network for encoding voxel characteristics and relative relations; the number of voxels in adjacent regions of different key points is different, and the feature matching problem among the key points can be effectively solved by performing maximum pooling operation along the channel; in order to gather multi-scale semantic information, a plurality of radius distances are selected at the same time for VSA operation;
through multiple VSA operations, voxel characteristics of multiple scales can be gathered on key points; after the merging operation, a characteristic representation of the key point pi as shown in the following formula will be obtained:
Figure BDA0004021703560000113
extended VSA module: in addition to gathering multi-scale voxel features on the key points, gathering features of the original point cloud and the BEV obtained by eight-time down-sampling on the key points; the characteristic aggregation operation of the original point cloud is shown as a formula (2); in order to realize BEV feature aggregation, a key point pi is projected onto a BEV view, and then adjacent BEV features fi (BEV) are aggregated onto the key point by a bilinear interpolation method; finally, the keypoint features are represented by the following formula:
Figure BDA0004021703560000121
predicting the weight of the key point:
now, scene feature representation with a small part of key point codes is obtained, and the key point features are required to be further utilized to carry out refinement and correction work on a target frame; key points obtained by sampling by using an FPS algorithm are distributed over the whole point cloud scene, and a foreground area and a background area are both distributed in relation to the key points; in order to improve the precision rate of fine correction, the weight of the key points in the foreground region needs to be set to be larger, and the weight of the key points in the background region needs to be smaller;
therefore, the weighted prediction of the key points is carried out through a Predicted Keypoint Weighting module; the PKW module takes the label of the 3D bounding box as supervision, namely the supervision label of the key point contained in the 3D target frame is the foreground key point; finally, after the weight prediction network processing, the key point characteristics are shown as follows:
Figure BDA0004021703560000122
a represents a three-layer MLP network and a sigmoid function for foreground confidence prediction; the PKW network is trained through the focal local; and setting the hyper-parameters to solve the problem of imbalance of foreground and background points.
Region of interest feature pooling network: the method comprises the steps that a pooling network of an interested area is used for gathering the characteristics of key points to grid points of the interested area according to a plurality of receiving areas; the 3Dproposal is divided into 6 × 6 × 6 grid points, as represented by the following equation:
G={g 1 ,……,g 216
the neighboring keypoints for each grid point are determined by:
Figure BDA0004021703560000131
wherein pj-gi is used to preserve the relative positional relationship between the grid point and the neighboring key point pj. And then, encoding the feature aggregation of the adjacent key points to the grid points gi by adopting a PointNet module:
Figure BDA0004021703560000132
wherein M and G are in accordance with formula (2); gathering key point characteristics of different receiving domains by adopting various radii r, and combining the characteristics of the different receiving domains together; then converting the vectorized feature into a 256-dimensional feature through a two-layer MLP network to represent the propofol;
compared to previous voxel-based region-of-interest feature pooling operations. The network can obtain richer semantic information and more flexibly accept domain selection.
Proposal refinement correction and confidence prediction: by utilizing the characteristics of the propofol, the fine modification network can predict the size and the position information of the 3D propofol. The whole refinement and correction network consists of two branches: confidence prediction branches and box regression are grouped, and each branch consists of two layers of MLP networks.
The confidence coefficient prediction network adopts the ROI of the 3D interested region and the 3D IoU between the corresponding GT as training targets, and for the kth 3D interested region, the confidence coefficient training target yk is as follows:
Y k =min(1,max(0,2IoU k -0.5))
then, the confidence gt and the predicted confidence score are subjected to loss calculation:
Figure BDA0004021703560000141
the regression target of the target frame is obtained through a traditional residual-based mode, and smooth L1 loss is used for optimization.
Self-distillation network: knowledge distillation as a soft supervision goal is just to distill the knowledge of one model (teacher model) to migrate to another model (student model). Usually from one large model to a small model. In addition to the one-hot supervision objective, the student model can also obtain information from the teacher model for learning. Finally, the smaller student model is trained on the basis of the teacher model to obtain performance consistent with that of the teacher model. The performance of the student model is even better if the two models are of the same size and scale.
For a model with X and K dimensions one-hot surveillance target Y, inputting X into the model results in a logit vector of z (X) = [ z1 (X),.., zK (X) ]. Then, a prediction confidence P (x) = [ P1 (x),.. So, pK (x) ] is obtained through a softmax function. For better knowledge distillation, the confidence is softened:
Figure BDA0004021703560000142
τ denotes the temperature coefficient of the temperature scaling. And (5) passing the output of the teacher model and the output of the student model through softmax to obtain PT (x) and PS (x). For the student model, the training goal is as follows:
Figure BDA0004021703560000151
when the temperature coefficient tau is 1, the objective function degenerates to P S (x) Cross entry function on soft supervision objective.
Knowledge distillation is carried out from the prediction result of the last stage, and knowledge is obtained from the distillation model from the model so as to improve the generalization capability of the model; the t-th epoch, predicted for x under self-distillation treatment to be P t s The objective function of (a) is:
Figure BDA0004021703560000152
compared to the traditional knowledge distillation model, the teacher model of the self-distillation model is in dynamic change. For student models, models for training epochs in the past can be used as teacher models. In order to obtain more valuable information, the model at the moment t-1 is selected as the teacher model. For the model of epoch, the training target is (1- α) y + α Pt-1S (x). The parameter a reflects the degree of trust in the teacher model.
The training target is improved in a self-distillation mode in the model training process, so that the model can be converged more quickly, and the accuracy is improved.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (10)

1. A traffic target detection system based on laser radar is characterized by comprising a radar voxel-point set feature extraction network, wherein the radar voxel-point set feature extraction network comprises:
a voxel-based multi-scale sparse convolutional network;
extracting a network based on the feature of the key point sampling;
a feature pooling network of regions of interest;
the feature extraction and correction method of the radar voxel-point set feature extraction network comprises the following steps:
s1: firstly, dividing voxels through a point cloud scene, sending the voxels into a multi-scale 3D coefficient convolution network to extract voxel characteristics, converting the voxel characteristics into BEV (Bev view angle) characteristics, predicting target types and target frames and generating a propofol;
s2: secondly, sampling key points in an FPS mode, and extracting multi-scale voxels, original point clouds and BEV characteristics for the key points by using a VSA module;
s3: and finally, dividing the proposal obtained by the multi-scale sparse convolution network based on the voxels into a plurality of grids, extracting key point features to grid points through pooling operation, and further performing fine correction on the target frame by using the features.
2. The system of claim 1, wherein in the detection system, a 3D sparse convolution network is subjected to a voxel feature extraction method to serve as a backbone and generate 3D propossal; wherein:
a 3D voxel CNN network; dividing the point cloud input P into L multiplied by H multiplied by W individual pixel grids, wherein the non-empty voxel grid characteristic is the characteristic average value of a plurality of points contained in a voxel; the point cloud characteristics after voxelization are subjected to point cloud local characteristics and global characteristics through a series of multi-scale 3D convolution networks, and the down-sampling scales of the 3D convolution networks are 1 x, 2 x, 4 x and 8 x respectively;
3 Dpropofol generation; stacking the 8 Xdown-sampled 3D voxel characteristics along the Z axis to obtain a BEV characteristic map representation of L/8 XW/8; and then, candidate anchor frames are generated for various targets in the point cloud scene, each type of target is provided with a 2 xL/8 xW/8 3D anchor frame, and the size of the anchor frame is the average size of the targets, namely, the anchor frame is in two directions of 0 degree and 90 degrees.
3. The lidar-based traffic target detection system according to claim 2, wherein in the detection system, it is proposed to cluster multi-scale acceleration features into a small number of key points, so that the key points become a bridge between the 3D voxel features and the propofol refinement network;
selecting n key points k = { p1, p2,. Multidot, pn } from a Point cloud scene through a Furthest-Point-Sampling (FPS) algorithm; the FPS algorithm is adopted for sampling the key points, so that the key points can be distributed in the whole point cloud scene, and the characteristic of the key points can be ensured to represent the characteristic information of the whole scene;
and with the key point position as a reference point, gathering voxel characteristics around the key point by using an SVA module.
4. The lidar-based traffic target detection system of claim 3, wherein in the detection system, F (lk) ={f 1 (lk) ,...,f Nk (lk) Denotes the K-th scale 3D voxel characteristics; v () = { V = 1 (lk) ,...,v Nk (lk) Denotes the 3D coordinates of the corresponding acceleration feature, which is measured by the voxel index and the size of the corresponding scale voxelCalculated, nk represents the number of non-empty voxels in the k-th scale; for each keypoint pi, its neighboring non-empty voxels are first determined by rk, which is denoted as:
Figure FDA0004021703550000031
in this process, pi is added to v j (lk) The relative coordinates of (a) and the voxel characteristics are merged together to represent the relative positional relationship between the two; generating, by a Pointblock module, a feature representation of the keypoints pi:
Figure FDA0004021703550000032
m represents S i (lk) Randomly sampling the characteristics of Tk voxels to achieve the purpose of saving the calculation amount; g represents a MLP network for encoding voxel characteristics and relative relations; selecting a plurality of radius distances simultaneously to perform VSA operation; clustering voxel characteristics of multiple scales onto keypoints, the characteristics of keypoints pi representing:
Figure FDA0004021703550000033
5. the lidar-based traffic target detection system of claim 4, wherein the extended VSA module: in addition to clustering multi-scale voxel features on key points, clustering features of original point clouds and BEVs obtained by eight-time down-sampling on the key points, projecting the key points pi onto a BEV view, and then clustering adjacent BEV features fi (BEVs) on the key points by a bilinear interpolation method; finally, the keypoint features are represented by the following formula:
Figure FDA0004021703550000041
6. the lidar-based traffic target detection system according to claim 5, wherein in the detection system, a Predicted keypoint weight prediction module is used for performing weight prediction on keypoints; the PKW module takes the label of the 3D bounding box as supervision, and the supervision label of the key point contained in the 3D target frame is the foreground key point; finally, after the weight prediction network processing, the key point characteristics are shown as follows:
Figure FDA0004021703550000042
a represents a three-layer MLP network and a sigmoid function for foreground confidence prediction; the PKW network is trained over focal loss.
7. The lidar-based traffic target detection system according to claim 6, wherein the detection system uses a pooling network of interested areas to simultaneously gather the features of key points to a grid point of the interested areas according to a plurality of receiving areas; the 3Dproposal is divided into 6 × 6 × 6 grid points, as represented by:
G={g 1 ,……,g 216 }
the neighboring keypoints for each grid point are determined by:
Figure FDA0004021703550000043
wherein, pj-gi is used for preserving the relative position relation between the grid point and the adjacent key point pj; and then, encoding the feature aggregation of the adjacent key points to the grid points gi by adopting a PointNet module:
Figure FDA0004021703550000051
gathering key point characteristics of different receiving domains by adopting various radii r, and combining the characteristics of the different receiving domains together; the vectorized feature is then converted into a 256-dimensional feature through a two-layer MLP network to represent the propofol.
8. The lidar based traffic target detection system according to claim 7, wherein the refinement and correction network can predict the size and position information of the 3D propofol by using the characteristics of the propofol; the whole refinement and correction network consists of two branches: confidence prediction branches and frame regression are grouped, and each branch consists of two layers of MLP networks;
the confidence prediction network adopts the ROI of the 3D interested region and the 3D IoU between the corresponding GT as training targets, and for the kth 3D interested region, the confidence training target yk is as follows:
Y k =min(1,max(0,2IoU k -0.5))
then, the confidence gt and the predicted confidence score are subjected to loss calculation:
Figure FDA0004021703550000052
a regression target of the target frame is obtained through a traditional residual-based mode, and smooth L1 loss is used for optimization.
9. The lidar-based traffic target detection system of claim 8, wherein for the model with input of X and K dimensions one-hot surveillance target Y, inputting X into the model results in a logit vector of z (X) = [ z1 (X),.. Once, zK (X) ]; obtaining a prediction confidence P (x) = [ P1 (x),. And pK (x) ] through a softmax function; softening the confidence coefficient:
Figure FDA0004021703550000061
τ represents a temperature coefficient of temperature scaling; output of the teacher model and the student model is subjected to softmax to obtain PT (x) and PS (x); for the student model, the training goal is as follows:
Figure FDA0004021703550000062
when the temperature coefficient tau is 1, the objective function is degenerated to P S (x) Cross entry function on soft supervision target.
10. The lidar based traffic target detection system of claim 9, wherein in the detection system, knowledge is obtained from the distillation model from the model itself so as to improve generalization ability of the model; the t-th epoch, predicted for x under self-distillation treatment to be P t s The objective function of (a) is:
Figure FDA0004021703550000063
for the model of epoch, the training target is (1- α) y + α Pt-1S (x); the parameter α is the degree of confidence in the teacher model.
CN202211692170.1A 2022-12-28 2022-12-28 Traffic target detection system based on laser radar Pending CN115861944A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211692170.1A CN115861944A (en) 2022-12-28 2022-12-28 Traffic target detection system based on laser radar

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211692170.1A CN115861944A (en) 2022-12-28 2022-12-28 Traffic target detection system based on laser radar

Publications (1)

Publication Number Publication Date
CN115861944A true CN115861944A (en) 2023-03-28

Family

ID=85655309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211692170.1A Pending CN115861944A (en) 2022-12-28 2022-12-28 Traffic target detection system based on laser radar

Country Status (1)

Country Link
CN (1) CN115861944A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116259028A (en) * 2023-05-06 2023-06-13 杭州宏景智驾科技有限公司 Abnormal scene detection method for laser radar, electronic device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116259028A (en) * 2023-05-06 2023-06-13 杭州宏景智驾科技有限公司 Abnormal scene detection method for laser radar, electronic device and storage medium

Similar Documents

Publication Publication Date Title
Sirohi et al. Efficientlps: Efficient lidar panoptic segmentation
CN111797716B (en) Single target tracking method based on Siamese network
WO2020244653A1 (en) Object identification method and device
CN112800906B (en) Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
CN113095152B (en) Regression-based lane line detection method and system
CN112418236A (en) Automobile drivable area planning method based on multitask neural network
Bieder et al. Exploiting multi-layer grid maps for surround-view semantic segmentation of sparse lidar data
CN115861944A (en) Traffic target detection system based on laser radar
CN115071762A (en) Pedestrian trajectory prediction method, model and storage medium oriented to urban scene
CN115035599A (en) Armed personnel identification method and armed personnel identification system integrating equipment and behavior characteristics
WO2022141718A1 (en) Method and system for assisting point cloud-based object detection
CN112950786A (en) Vehicle three-dimensional reconstruction method based on neural network
CN117173399A (en) Traffic target detection method and system of cross-modal cross-attention mechanism
CN117237660A (en) Point cloud data processing and segmentation method based on deep learning feature aggregation
CN116468950A (en) Three-dimensional target detection method for neighborhood search radius of class guide center point
CN115424225A (en) Three-dimensional real-time target detection method for automatic driving system
CN114492732A (en) Lightweight model distillation method for automatic driving visual inspection
Chen et al. 3D Object Detection of Cars and Pedestrians by Deep Neural Networks from Unit-Sharing One-Shot NAS
Li et al. A fast detection method for polynomial fitting lane with self-attention module added
Hoang et al. Lane Road Segmentation Based on Improved UNet Architecture for Autonomous Driving
CN114549917B (en) Point cloud classification method with enhanced data characterization
US20230105331A1 (en) Methods and systems for semantic scene completion for sparse 3d data
Bai et al. Vehicle Detection Based on Deep Neural Network Combined with Radar Attention Mechanism
Xiao et al. YOLO-LCD: A Lightweight Algorithm for Crosswalk Detection Based on Improved YOLOv5s
Kang 3D Objects Detection and Recognition from Colour and LiDAR Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination