CN115908829A - Point column-based two-order multi-attention mechanism 3D point cloud target detection method - Google Patents
Point column-based two-order multi-attention mechanism 3D point cloud target detection method Download PDFInfo
- Publication number
- CN115908829A CN115908829A CN202211104980.0A CN202211104980A CN115908829A CN 115908829 A CN115908829 A CN 115908829A CN 202211104980 A CN202211104980 A CN 202211104980A CN 115908829 A CN115908829 A CN 115908829A
- Authority
- CN
- China
- Prior art keywords
- order
- point
- attention mechanism
- pseudo
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention provides a point-pillar-based second-order multi-attention mechanism 3D point cloud target detection method, which comprises the following steps of: providing a method for respectively realizing target detection by a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism based on point columns; providing a network, wherein the network mainly comprises a second-order point attention mechanism, a point column characteristic network, a second-order channel attention mechanism, a backbone network, a pseudo image space attention mechanism and an SSD detection head, then performing point cloud voxelization, performing second-order point attention mechanism operation on point cloud, converting the point cloud into the characteristics of a pseudo image, performing second-order channel attention mechanism operation on the characteristics of the pseudo image, outputting the characteristics of the pseudo space, performing pseudo image space attention mechanism operation on the characteristics of the pseudo space, and outputting to obtain a detection result; by the method, relatively high detection speed and extraction accuracy are guaranteed.
Description
Technical Field
The invention belongs to the field of 3D pure laser radar point cloud targets, and particularly relates to a method for respectively realizing target detection based on three mechanisms of a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism of a point column.
Background
Currently, 3D point cloud target detection methods are increasingly widely used in the fields of computer vision, autopilot, robots, virtual reality, and the like. Lidar can provide more reliable depth information, more accurately locate objects and provide shape information than target detection of two-dimensional images. However, the 3D point cloud has no texture, blocking and intercepting, and uneven reflection, the lidar point cloud is sparse, the density is greatly changed, and the precision of the traditional 3D target detection method based on manual features is often affected. In recent years, as the deep neural network shows excellent feature extraction capability, high-dimensional data can be processed, and the precision of the 3D point cloud target detection method based on the deep neural network is improved to a certain extent. Nevertheless, due to the high sparsity and the intrinsic irregularity of the point cloud, the accuracy of the detection results of some categories is still a great space for improvement.
As proposed by Li et al in 2016, veloFCN, converts the point cloud into a look-ahead representation of the signature, and then uses an off-the-shelf detector. (see B.Li, T.Zhang, and T.Xia, "VeloFCN: vehicle detection from 3D lidar using full volumetric connected network," in Robotics, 2016.). Qi et al in 2017 proposed PointNet, which puts the original point cloud data into a model for deep neural network training for the first time. (refer to C.R.Qi, H.Su, K.Mo, and L.J.Guibas, "Point: deep learning on point sections for 3d classification and segmentation," in CVPR, 2017.). In 2018, complete-yolo was introduced by Martin Simon et al, and the model projects point cloud to a two-dimensional plane and uses an image method for target detection, thereby accelerating network reasoning. However, the projection mode is limited by the sparsity of the point cloud, so that the convolution cannot extract features well. (see M.Simon, S.Milz, K.Amend, and H. -M.Gross. "Complex-Yolo: real-time 3dobject detection on points clusters," arXiv:1803.06199,2018.). In order to alleviate the problem of occlusion caused by front view overlapping, yang et al propose PIXOR to rasterize point clouds into a more compact BEV representation, but have the obvious disadvantage of needing to manually extract features, but the manual design cannot fully utilize the three-dimensional information of an object, and is not beneficial to popularization in other radars. (refer to B.Y ang, W.Luo, and R.Urtastun, "Pixor: real-time 3d object detection from points clusters," in CVPR, 2018.). In 2018, zhou et al first proposed an end-to-end trainable network VoxelNet, a universal 3D detection framework. Unlike most previous work, voxelNet starts learning information rich feature representations and can simultaneously learn different feature representations from the point cloud. However, a drawback of 3D convolution is that it is too time consuming and faces a large amount of computation, resulting in a slow inference speed of the network. (see Y.Zhou and O.Tuzel, "Voxelnet: end-to-End learning for point closed based 3d object detection," In CVPR, 2018.). Then Yan et al propose SECOND, which reduces memory consumption and speeds up computation through sparse convolution operations. (see Y.Yan, Y.Mao, and B.Li, "SECOND: sparary embedded capacitive detection,". In Sensors,18 (10), 2018.). H.lang et al propose pointpilars to encode point clouds into vertical columns in 2019, which are essentially special divisions of voxels, in order to improve the inference speed with standard 2D convolution detection pipelines. (see A.H.Lang, S.V ora, H.Caesar, L.Zhou, J.Y ang, and O.Beijbom, "points encoders for object detection from points clusters," in CVPR, 2019.).
Furthermore, the method proposed in the prior art paper A.H.Lang, S.Vora, H.Caesar, L.Zhou, J.Y ang, and O.Beijbom, "Pointpilars: fast encoders for object detection from point clusters," in CVPR,2019, is specifically implemented by the steps of: firstly, the region division is carried out on the input original point cloud, the point cloud is voxelized and then converted into a sparse pseudo image form. A fixed number of points are randomly reserved in each pilar, and in this step, the characteristic dimensions of the points in the pilars are augmented from the original 4-dimensional information to 9-dimensional, and each point in the lidar has 9-dimensional characteristics. In the backbone network, learning of features is performed using a 2D network. The backbone network includes two sub-networks: one top-down network produces features with smaller and smaller spatial resolution, and a second network performs the functions of upsampling and cascading top-down. The final output feature is a concatenation of all features originating from the same dimension for different steps. In the detection head module, an SSD detection head is selected to carry out Bbox regression. A 2D joint cross-section (Iou) is used to match the prior box to the ground truth. The height and elevation of the Bbox are not used for matching, and 2D matching is used here, and the height and elevation serve as additional regression targets. Although the pointpilars network proposes that the pixelation of the point cloud is utilized to increase the speed, the characteristic information of an input image is usually lost in the down-sampling process of the backbone network, and points in the voxels have relevance with each other, so that a part of useful geometric information is inevitably lost when the points in the point cloud are processed in an isolated manner, and the detection precision is further influenced. In the backbone network, the processing of each channel separately and independently ignores the correlation between the channels, so that a part of useful information is lost, and the detection precision is reduced. After the generation of the pseudo-image, the same process is performed on the features in the pseudo-space. Since not all the characteristics of the pseudo space have the same contribution to the detection task, the greater the importance of the region with the greater relevance to the task, the greater the direct same processing will also reduce the final detection precision, so a real-time and accurate 3D point cloud target detection method is urgently needed, and a dynamic balance between speed and precision is realized.
Disclosure of Invention
In view of the above defects in the prior art, an object of the present invention is to provide a real-time and accurate 3D point cloud target detection method, which can achieve a dynamic balance between speed and accuracy, and solve the problem that the existing method cannot perform higher-accuracy target detection in real time through three mechanisms, namely, a second-order point attention mechanism, a second-order channel attention mechanism, and a pseudo-image space attention mechanism, based on a point column.
The technical problems solved by the invention are as follows:
firstly, in the step of a feature extraction network, feature information of an input image is usually lost in the down-sampling process of a main network, and points in voxels have relevance with each other, so that processing points in point clouds in an isolated manner will definitely lose a part of useful geometric information, and further detection precision is influenced. The invention provides a point-column-based second-order point attention mechanism, which can extract more fine characteristic information at relatively low inference speed by connecting points in the same voxel and points to retain more useful information.
Secondly, in the backbone network, the processing of each channel in an isolated manner ignores the correlation between the channels, so that a part of useful information is lost, and the detection precision is reduced. The invention provides a point-pillar-based second-order channel attention mechanism, which links channels, retains more useful characteristic information and improves the overall detection precision of the channels.
Thirdly, after the pseudo image is generated, the same processing is performed on the features in the pseudo space. Since not all the features of the pseudo space have the same contribution to the detection task, the greater the importance of the region having a greater correlation with the task, the greater the detection accuracy will be affected by directly performing the same processing. In view of the above, the invention provides a pseudo-image space attention mechanism based on a point column, which allocates different weights to each pixel point in a pseudo-space according to the importance degree of a region in the pseudo-space to a task, so as to obtain a more accurate detection result.
The technical scheme adopted by the invention for solving the technical problems is as follows: a point-column-based second-order multi-attention-mechanism 3D point cloud target detection method comprises the following steps:
s1: providing a method for respectively realizing target detection by a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism based on point columns;
s2: the method comprises the steps that a network is provided based on S1, the network mainly comprises a second-order point attention mechanism, a point column characteristic network, a second-order channel attention mechanism, a backbone network, a pseudo image space attention mechanism and an SSD detection head, and the network is also divided into a second-order attention module, a second-order point attention module and a second-order channel attention module;
s3: performing voxelization on the point cloud, and then performing second-order point attention mechanism operation on the point cloud to convert the point cloud into the characteristics of a pseudo image;
s4: performing a second-order channel attention mechanism operation on the characteristics of the pseudo image, and outputting the characteristics of a pseudo space;
s5: performing pseudo-image space attention mechanism operation on the features of the pseudo space, and outputting to obtain a detection result;
wherein the SSD detection head predicts a three-dimensional bounding box of the object using features of the trunk; the second order attention module comprises global maximum pooling, covariance pooling and row convolution; in the S3, under the condition that the point characteristics are used as the input of the second-order attention module, the obtained second-order point attention mechanism weight is used as the output, and the process is a second-order point attention module; when the channel characteristics are input to the second order attention module, a second order channel attention mechanism weight is obtained, and the process is a second order channel attention module.
In a given K-th voxel, for all points in the voxelWherein N represents the maximum value of the number of points, C represents the number of channels, and after global maximum pooling, a vector composed of the maximum values in each dimension is obtainedWill->Input into a fully connected layer, where Nx 1 represents a vector of N rows and 1 columns, resulting in a vector->Wherein t is through W 1 The number of points after the full link layer is reduced, W 1 After the full-connection layer, a ReLU activation function is used for calculating to obtain a covariance matrix between two points in the same voxel>Performing convolution on the covariance matrix line by line to obtain a vector ^ er for the second-order point attention mechanism, wherein t is the number of points, t is the number of channels in the second-order channel attention mechanism, and t multiplied by t is the dimension>Then the vector is->Is input to W 2 Fully connecting layers and using an activation function Sigmoid function to obtain an N-dimensional attention vector->In S3, the second-order point attention mechanism is represented as:
s=σ(W 2 RC(Cov(σ(W 1 (GMP(X))))))
where Cov (. Circle.) is the covariance matrix of the computation points, RC (. Circle.) is the row convolution, GMP (. Circle.) is the global maximum pooling, σ is the ReLU activation function,And/or>For two different fully connected layers, X is the point in the given Kth voxel->
The second-order channel attention mechanism is similar to the second-order point attention mechanism, the channel characteristics generate similar weights after passing through the second-order attention module, and in S4, the second-order channel attention mechanism is represented as:
M=σ(W 2 RC(Cov(σ(W 1 (GMP(Y))))))
in the formula (I), the compound is shown in the specification,for the features of the pseudo-image, superscript H, W is the height and width of the pseudo-image.
According to the importance degree of the region in the pseudo space to the task, different weights are distributed to each pixel point in the pseudo space to obtain a more accurate detection result, the characteristic P and the signal G of the pseudo space are used as input, the space attention weight generated by final output is S, and a pseudo image space attention mechanism is represented as follows:
the relationship of P and G can be expressed as:
in the formula (I), the compound is shown in the specification,and &>For performing a linear transformation operation with a convolution of 1 x 1, in which->For a linear transformation operation on P with a convolution of 1 x 1, a decision is made as to whether P is present in the transformed image or not>A linear transformation operation is performed on G with a convolution of 1 x 1.
Parameterizing the three-dimensional ground truth box as (x, y, z, w, l, h, θ), wherein (x, y, z) represents the center position, (w, l, h) and θ represent the magnitude and direction angles of the bounding box, and the localization regression residual between the ground truth and the anchor is defined as follows:
Δθ=sin(θ gt -θ a ),
in the formula (I), the compound is shown in the specification,gt is ground truth value, a is parameters of anchor box, (x) gt ,y gt ,z gt ) As the coordinates of the center position of the 3D true value box (l) gt ,w gt ,h gt ) Is the length, width and height theta of the 3D true value frame gt For yaw angle, (x) of 3D truth frame around Z axis a ,y a ,z a ) As the coordinates of the center position of the anchor box (l) a ,w a ,h a ) The length, width and height of the anchor box.
θ a The yaw angle of the anchor box around the Z axis is adopted;
the regression loss is expressed as:
wherein SmoothL1 is a SmoothL1 loss function;
since the angular positioning penalty cannot distinguish between flipped frames, a softmax classification penalty L is used in the discretization direction dir This enables the network to learn orientation, using focus loss for object classification loss:
L cls =-a(1-p) r logp
in the formula, p is the probability of correctly detecting the frame, r and a are parameter settings, and the total loss obtained finally is expressed as:
in the formula, N pos Indicates the number of correct detection boxes, beta loc 、β cls 、β dir Is a preset value.
The invention has the advantages and beneficial effects that:
the second-order point attention mechanism in the invention considers that the points in the voxels have relevance, and compared with the existing method in which the points in the voxels are processed in an isolated manner, more useful geometric characteristic information can be reserved, and the detection accuracy is improved. Similarly, the second-order channel attention mechanism in the invention considers the correlation among the channels, and further improves the detection precision. The pseudo-image space attention mechanism provided by the invention considers that not all features of the pseudo space have the same contribution to the detection task, and the region importance with the larger task relevance is higher, so that different weights are distributed to each pixel point in the pseudo-space features, and the feature extraction effect is further improved. Therefore, the three mechanisms based on the point column ensure relatively high detection speed and extraction accuracy.
Drawings
FIG. 1 is a diagram of one embodiment of a second order attention module architecture of the present invention;
FIG. 2 is an overall frame diagram of the point-based second-order multi-attention mechanism 3D point cloud target detection method.
Detailed Description
The following examples are given to illustrate the present invention in detail, and the following examples are given to illustrate the detailed embodiments and specific procedures of the present invention, but the scope of the present invention is not limited to the following examples.
The invention provides a method for realizing target detection by using a Second-order point Attention machine (SOPA), a Second-order channel Attention machine (SOCA) and a Pseudo-Image space Attention machine (SAPI) based on a point column, which comprises the following steps:
s1: providing a method for respectively realizing target detection by a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism based on point columns;
s2: the method comprises the steps that a network is provided based on S1, the network mainly comprises a second-order point attention mechanism, a point column characteristic network, a second-order channel attention mechanism, a backbone network, a pseudo image space attention mechanism and an SSD detection head, and the network is also divided into a second-order attention module, a second-order point attention module and a second-order channel attention module;
s3: voxelizing the point cloud, and then performing second-order point attention mechanism operation on the point cloud to convert the point cloud into the characteristics of a pseudo image;
s4: performing a second-order channel attention mechanism operation on the characteristics of the pseudo image, and outputting the characteristics of a pseudo space;
s5: performing pseudo-image space attention mechanism operation on the features of the pseudo space, and outputting to obtain a detection result;
wherein the SSD detection head predicts a three-dimensional bounding box of the object using features of the trunk; the second order attention module comprises global maximum pooling, covariance pooling and row convolution; in the S3, under the condition that the point characteristics are used as the input of the second-order attention module, the obtained second-order point attention mechanism weight is used as the output, and the process is a second-order point attention module; when the channel features are input to the second order attention module, a second order channel attention mechanism weight is obtained, and the process is a second order channel attention module.
In a given K-th voxel, for all points in the voxelWherein N represents the maximum value of the number of points, C represents the number of channels, and after global maximum pooling, a vector composed of the maximum values in each dimension is obtainedWill->Input into a fully connected layer, where Nx 1 represents a vector of N rows and 1 columns, resulting in a vector->Wherein t is through W 1 The number of points after the full link layer is reduced, W 1 After the full-connection layer, a ReLU activation function is used for calculating to obtain a covariance matrix between two points in the same voxel>Wherein t is a point at the second order point attention mechanismThe number, the attention mechanism of the second-order channel, t is the number of the channels, and t multiplied by t is the dimension, and the covariance matrix is convolved line by line to obtain the vector ^ er>Then the vector is->Is input to W 2 Fully connecting layers and using an activation function Sigmoid function to obtain an N-dimensional attention vector->In S3, the second order point attention mechanism is expressed as:
s=σ(W 2 RC(Cov(σ(W 1 (GMP(X))))))
where Cov (. Circle.) is the covariance matrix of the computation points, RC (. Circle.) is the row convolution, GMP (. Circle.) is the global maximum pooling, σ is the ReLU activation function,And &>For two different fully connected layers, X is the point in the given Kth voxel->
The second-order channel attention mechanism is similar to the second-order point attention mechanism, the channel characteristics generate similar weights after passing through the second-order attention module, and in S4, the second-order channel attention mechanism is represented as:
M=σ(W 2 RC(Cov(σ(W 1 (GMP(Y))))))
in the formula (I), the compound is shown in the specification,for the features of the pseudo-image, superscript H, W is the height and width of the pseudo-image.
According to the importance degree of the region in the pseudo space to the task, different weights are distributed to each pixel point in the pseudo space to obtain a more accurate detection result, the characteristic P and the signal G of the pseudo space are used as input, the space attention weight generated by final output is S, and a pseudo image space attention mechanism is represented as follows:
the relationship of P and G can be expressed as:
in the formula (I), the compound is shown in the specification,and &>For performing a linear transformation operation with a convolution of 1 x 1, in which->For a linear transformation operation on P with a convolution of 1 x 1>A linear transformation operation is performed on G with a convolution of 1 x 1.
Parameterizing the three-dimensional ground truth box as (x, y, z, w, l, h, θ), wherein (x, y, z) represents the center position, (w, l, h) and θ represent the magnitude and direction angles of the bounding box, and the localization regression residual between the ground truth and the anchor is defined as follows:
Δθ=sin(θ gt -θ a ),
in the formula (I), the compound is shown in the specification,gt is ground truth value, a is parameters of anchor box, (x) gt ,y gt ,z gt ) As the center position coordinates of the 3D truth frame, (l) gt ,w gt ,h gt ) Is the length, width and height theta of the 3D true value frame gt For yaw angle, (x) of 3D truth frame around Z axis a ,y a ,z a ) As the coordinates of the center position of the anchor box (l) a ,w a ,h a ) The length, width and height of the anchor box. Theta a The yaw angle of the anchor box around the Z axis is adopted;
the regression loss is expressed as:
wherein SmoothL1 is a SmoothL1 loss function;
since the angular positioning penalty cannot distinguish between flipped frames, a softmax classification penalty L is used in the discretization direction dir This enables the network to learn orientation, using focus loss for object classification loss:
L cls =-a(1-p) r logp
in the formula, p is the probability of correctly detecting the frame, r and a are parameter settings, and the total loss obtained finally is expressed as:
in the formula, N pos Indicating the number of correct detection boxes. Here beta loc = 2 、β cls = 1 、β dir =0.2。
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.
Claims (4)
1. A point-pillar-based second-order multi-attention-mechanism 3D point cloud target detection method is characterized by comprising the following steps of:
s1: providing a method for respectively realizing target detection by a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism based on point columns;
s2: the method comprises the steps that a network is provided based on S1, the network mainly comprises a second-order point attention mechanism, a point column characteristic network, a second-order channel attention mechanism, a backbone network, a pseudo image space attention mechanism and an SSD detection head, and the network is also divided into a second-order attention module, a second-order point attention module and a second-order channel attention module;
s3: voxelizing the point cloud, and then performing second-order point attention mechanism operation on the point cloud to convert the point cloud into the characteristics of a pseudo image;
s4: performing a second-order channel attention mechanism operation on the characteristics of the pseudo image, and outputting the characteristics of a pseudo space;
s5: performing pseudo-image space attention mechanism operation on the characteristics of the pseudo space, and outputting to obtain a detection result;
wherein the SSD detection head predicts a three-dimensional bounding box of the object using features of the trunk; the second order attention module comprises global max pooling, covariance pooling and row convolution; in the S3, under the condition that the point characteristics are used as the input of the second-order attention module, the obtained second-order point attention mechanism weight is used as the output, and the process is a second-order point attention module; when the channel features are input to the second order attention module, a second order channel attention mechanism weight is obtained, and the process is a second order channel attention module.
2. The method for detecting the second-order multi-attention mechanism 3D point cloud target based on the point column as claimed in claim 1, wherein the method comprises the following steps:
in a given K-th voxel, for all points in the voxelWherein N represents the maximum value of the number of points and C represents the number of channels, and after global maximum pooling, the vector consisting of the maximum values in each dimension is obtained>Will be provided withInput into a fully connected layer, where Nx 1 represents a vector of N rows and 1 columns, resulting in a vector->Wherein t is through W 1 Number of points after full connection layer reduction, W 1 After the full-connection layer, a ReLU activation function is used for calculating to obtain a covariance matrix between two points in the same voxel>Performing convolution on the covariance matrix line by line to obtain a vector ^ er/ion, wherein t is the number of points in a second-order point attention mechanism, t is the number of channels in a second-order channel attention mechanism, and t multiplied by t is a dimension>Then the vector is->Is input to W 2 Fully connecting layers and using an activation function Sigmoid function to obtain an N-dimensional attention vectorIn S3, the second order point attention mechanism is expressed as:
s=σ(W 2 RC(Cov(σ(W 1 (GMP(X))))))
where Cov (. Circle.) is the covariance matrix of the computation points, RC (. Circle.) is the row convolution, GMP (. Circle.) is the global maximum pooling, σ is the ReLU activation function,And/or>For two different fully-connected layers, X is the point in the given Kth voxel
The second-order channel attention mechanism is similar to the second-order point attention mechanism, the channel characteristics are output after passing through the second-order attention module to generate similar weights, and in S4, the second-order channel attention mechanism is expressed as:
M=σ(W 2 RC(Cov(σ(W 1 (GMP(Y))))))
3. The method for detecting the second-order multi-attention mechanism 3D point cloud target based on the point column as claimed in claim 2, wherein the method comprises the following steps: according to the importance degree of the region in the pseudo space to the task, different weights are distributed to each pixel point in the pseudo space to obtain a more accurate detection result, the characteristic P and the signal G of the pseudo space are used as input, the space attention weight generated by final output is S, and a pseudo image space attention mechanism is represented as follows:
the relationship of P and G can be expressed as:
in the formula (I), the compound is shown in the specification,for a linear transformation operation with a convolution of 1 x 1, a decision is made as to whether the transformation is based on the result of the convolution operation>For a linear transformation operation on P with a convolution of 1 x 1, a decision is made as to whether P is present in the transformed image or not>A linear transformation operation is performed on G with a convolution of 1 x 1.
4. The method for detecting the second-order multi-attention mechanism 3D point cloud target based on the point column as claimed in claim 3, wherein: parameterizing the three-dimensional ground truth box as (x, y, z, w, l, h, θ), wherein (x, y, z) represents the center position, (w, l, h) and θ represent the magnitude and direction angles of the bounding box, and the localization regression residual between the ground truth and the anchor is defined as follows:
Δθ=sin(θ gt -θ a ),
in the formula (I), the compound is shown in the specification,gt is ground truth value, a is parameters of anchor box, (x) gt ,y gt ,z gt ) As the center position coordinates of the 3D truth frame, (l) gt ,w gt ,h gt ) Is the length, width and height theta of the 3D true value frame gt For yaw angle, (x) of 3D truth frame around Z axis a ,y a ,z a ) As the coordinates of the center position of the anchor box、(l a ,w a ,h a ) Is the length, width and height of the anchor box theta a The yaw angle of the anchor box around the Z axis is adopted;
the regression loss is expressed as:
wherein SmoothL1 is a SmoothL1 loss function;
since the angular positioning penalty cannot distinguish between flipped frames, a softmax classification penalty L is used in the discretization direction dir This enables the network to learn orientation, using focus loss for object classification loss:
L cls =-a(1-p) r log p
in the formula, p is the probability of correctly detecting the frame, r and a are parameter settings, and the total loss obtained finally is expressed as:
in the formula, N pos Indicates the number of correct detection boxes, beta loc 、β cls 、β dir Is a preset value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211104980.0A CN115908829A (en) | 2022-09-09 | 2022-09-09 | Point column-based two-order multi-attention mechanism 3D point cloud target detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211104980.0A CN115908829A (en) | 2022-09-09 | 2022-09-09 | Point column-based two-order multi-attention mechanism 3D point cloud target detection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115908829A true CN115908829A (en) | 2023-04-04 |
Family
ID=86488691
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211104980.0A Pending CN115908829A (en) | 2022-09-09 | 2022-09-09 | Point column-based two-order multi-attention mechanism 3D point cloud target detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115908829A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116863433A (en) * | 2023-09-04 | 2023-10-10 | 深圳大学 | Target detection method based on point cloud sampling and weighted fusion and related equipment |
CN117111013A (en) * | 2023-08-22 | 2023-11-24 | 南京慧尔视智能科技有限公司 | Radar target tracking track starting method, device, equipment and medium |
-
2022
- 2022-09-09 CN CN202211104980.0A patent/CN115908829A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117111013A (en) * | 2023-08-22 | 2023-11-24 | 南京慧尔视智能科技有限公司 | Radar target tracking track starting method, device, equipment and medium |
CN117111013B (en) * | 2023-08-22 | 2024-04-30 | 南京慧尔视智能科技有限公司 | Radar target tracking track starting method, device, equipment and medium |
CN116863433A (en) * | 2023-09-04 | 2023-10-10 | 深圳大学 | Target detection method based on point cloud sampling and weighted fusion and related equipment |
CN116863433B (en) * | 2023-09-04 | 2024-01-09 | 深圳大学 | Target detection method based on point cloud sampling and weighted fusion and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111429514B (en) | Laser radar 3D real-time target detection method integrating multi-frame time sequence point cloud | |
Wen et al. | Fast and accurate 3D object detection for lidar-camera-based autonomous vehicles using one shared voxel-based backbone | |
CN111798475B (en) | Indoor environment 3D semantic map construction method based on point cloud deep learning | |
CN111201451B (en) | Method and device for detecting object in scene based on laser data and radar data of scene | |
CN115908829A (en) | Point column-based two-order multi-attention mechanism 3D point cloud target detection method | |
CN113159151B (en) | Multi-sensor depth fusion 3D target detection method for automatic driving | |
CN110097553A (en) | The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system | |
CN113052109A (en) | 3D target detection system and 3D target detection method thereof | |
Wang et al. | CenterNet3D: An anchor free object detector for point cloud | |
Chen et al. | SAANet: Spatial adaptive alignment network for object detection in automatic driving | |
Wang et al. | Multi-modal and multi-scale fusion 3D object detection of 4D radar and LiDAR for autonomous driving | |
Zhang et al. | Robust-FusionNet: Deep multimodal sensor fusion for 3-D object detection under severe weather conditions | |
CN114549537A (en) | Unstructured environment point cloud semantic segmentation method based on cross-modal semantic enhancement | |
CN116612468A (en) | Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism | |
Stanisz et al. | Optimisation of the pointpillars network for 3d object detection in point clouds | |
Li et al. | Enhancing multi-modal features using local self-attention for 3d object detection | |
Beltrán et al. | A method for synthetic LiDAR generation to create annotated datasets for autonomous vehicles perception | |
CN112950786A (en) | Vehicle three-dimensional reconstruction method based on neural network | |
CN116682105A (en) | Millimeter wave radar and visual feature attention fusion target detection method | |
Li et al. | Monocular 3-D Object Detection Based on Depth-Guided Local Convolution for Smart Payment in D2D Systems | |
Yang et al. | SACINet: Semantic-Aware Cross-Modal Interaction Network for Real-Time 3D Object Detection | |
CN116152800A (en) | 3D dynamic multi-target detection method, system and storage medium based on cross-view feature fusion | |
Tian et al. | Jyolo: Joint point cloud for autonomous driving 3d object detection | |
Zhang et al. | FS-Net: LiDAR-Camera Fusion With Matched Scale for 3D Object Detection in Autonomous Driving | |
Reddy et al. | Machine Learning Based VoxelNet and LUNET architectures for Object Detection using LiDAR Cloud Points |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |