CN115908829A - Point column-based two-order multi-attention mechanism 3D point cloud target detection method - Google Patents

Point column-based two-order multi-attention mechanism 3D point cloud target detection method Download PDF

Info

Publication number
CN115908829A
CN115908829A CN202211104980.0A CN202211104980A CN115908829A CN 115908829 A CN115908829 A CN 115908829A CN 202211104980 A CN202211104980 A CN 202211104980A CN 115908829 A CN115908829 A CN 115908829A
Authority
CN
China
Prior art keywords
order
point
attention mechanism
pseudo
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211104980.0A
Other languages
Chinese (zh)
Inventor
严一尔
李鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202211104980.0A priority Critical patent/CN115908829A/en
Publication of CN115908829A publication Critical patent/CN115908829A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a point-pillar-based second-order multi-attention mechanism 3D point cloud target detection method, which comprises the following steps of: providing a method for respectively realizing target detection by a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism based on point columns; providing a network, wherein the network mainly comprises a second-order point attention mechanism, a point column characteristic network, a second-order channel attention mechanism, a backbone network, a pseudo image space attention mechanism and an SSD detection head, then performing point cloud voxelization, performing second-order point attention mechanism operation on point cloud, converting the point cloud into the characteristics of a pseudo image, performing second-order channel attention mechanism operation on the characteristics of the pseudo image, outputting the characteristics of the pseudo space, performing pseudo image space attention mechanism operation on the characteristics of the pseudo space, and outputting to obtain a detection result; by the method, relatively high detection speed and extraction accuracy are guaranteed.

Description

Point column-based two-order multi-attention mechanism 3D point cloud target detection method
Technical Field
The invention belongs to the field of 3D pure laser radar point cloud targets, and particularly relates to a method for respectively realizing target detection based on three mechanisms of a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism of a point column.
Background
Currently, 3D point cloud target detection methods are increasingly widely used in the fields of computer vision, autopilot, robots, virtual reality, and the like. Lidar can provide more reliable depth information, more accurately locate objects and provide shape information than target detection of two-dimensional images. However, the 3D point cloud has no texture, blocking and intercepting, and uneven reflection, the lidar point cloud is sparse, the density is greatly changed, and the precision of the traditional 3D target detection method based on manual features is often affected. In recent years, as the deep neural network shows excellent feature extraction capability, high-dimensional data can be processed, and the precision of the 3D point cloud target detection method based on the deep neural network is improved to a certain extent. Nevertheless, due to the high sparsity and the intrinsic irregularity of the point cloud, the accuracy of the detection results of some categories is still a great space for improvement.
As proposed by Li et al in 2016, veloFCN, converts the point cloud into a look-ahead representation of the signature, and then uses an off-the-shelf detector. (see B.Li, T.Zhang, and T.Xia, "VeloFCN: vehicle detection from 3D lidar using full volumetric connected network," in Robotics, 2016.). Qi et al in 2017 proposed PointNet, which puts the original point cloud data into a model for deep neural network training for the first time. (refer to C.R.Qi, H.Su, K.Mo, and L.J.Guibas, "Point: deep learning on point sections for 3d classification and segmentation," in CVPR, 2017.). In 2018, complete-yolo was introduced by Martin Simon et al, and the model projects point cloud to a two-dimensional plane and uses an image method for target detection, thereby accelerating network reasoning. However, the projection mode is limited by the sparsity of the point cloud, so that the convolution cannot extract features well. (see M.Simon, S.Milz, K.Amend, and H. -M.Gross. "Complex-Yolo: real-time 3dobject detection on points clusters," arXiv:1803.06199,2018.). In order to alleviate the problem of occlusion caused by front view overlapping, yang et al propose PIXOR to rasterize point clouds into a more compact BEV representation, but have the obvious disadvantage of needing to manually extract features, but the manual design cannot fully utilize the three-dimensional information of an object, and is not beneficial to popularization in other radars. (refer to B.Y ang, W.Luo, and R.Urtastun, "Pixor: real-time 3d object detection from points clusters," in CVPR, 2018.). In 2018, zhou et al first proposed an end-to-end trainable network VoxelNet, a universal 3D detection framework. Unlike most previous work, voxelNet starts learning information rich feature representations and can simultaneously learn different feature representations from the point cloud. However, a drawback of 3D convolution is that it is too time consuming and faces a large amount of computation, resulting in a slow inference speed of the network. (see Y.Zhou and O.Tuzel, "Voxelnet: end-to-End learning for point closed based 3d object detection," In CVPR, 2018.). Then Yan et al propose SECOND, which reduces memory consumption and speeds up computation through sparse convolution operations. (see Y.Yan, Y.Mao, and B.Li, "SECOND: sparary embedded capacitive detection,". In Sensors,18 (10), 2018.). H.lang et al propose pointpilars to encode point clouds into vertical columns in 2019, which are essentially special divisions of voxels, in order to improve the inference speed with standard 2D convolution detection pipelines. (see A.H.Lang, S.V ora, H.Caesar, L.Zhou, J.Y ang, and O.Beijbom, "points encoders for object detection from points clusters," in CVPR, 2019.).
Furthermore, the method proposed in the prior art paper A.H.Lang, S.Vora, H.Caesar, L.Zhou, J.Y ang, and O.Beijbom, "Pointpilars: fast encoders for object detection from point clusters," in CVPR,2019, is specifically implemented by the steps of: firstly, the region division is carried out on the input original point cloud, the point cloud is voxelized and then converted into a sparse pseudo image form. A fixed number of points are randomly reserved in each pilar, and in this step, the characteristic dimensions of the points in the pilars are augmented from the original 4-dimensional information to 9-dimensional, and each point in the lidar has 9-dimensional characteristics. In the backbone network, learning of features is performed using a 2D network. The backbone network includes two sub-networks: one top-down network produces features with smaller and smaller spatial resolution, and a second network performs the functions of upsampling and cascading top-down. The final output feature is a concatenation of all features originating from the same dimension for different steps. In the detection head module, an SSD detection head is selected to carry out Bbox regression. A 2D joint cross-section (Iou) is used to match the prior box to the ground truth. The height and elevation of the Bbox are not used for matching, and 2D matching is used here, and the height and elevation serve as additional regression targets. Although the pointpilars network proposes that the pixelation of the point cloud is utilized to increase the speed, the characteristic information of an input image is usually lost in the down-sampling process of the backbone network, and points in the voxels have relevance with each other, so that a part of useful geometric information is inevitably lost when the points in the point cloud are processed in an isolated manner, and the detection precision is further influenced. In the backbone network, the processing of each channel separately and independently ignores the correlation between the channels, so that a part of useful information is lost, and the detection precision is reduced. After the generation of the pseudo-image, the same process is performed on the features in the pseudo-space. Since not all the characteristics of the pseudo space have the same contribution to the detection task, the greater the importance of the region with the greater relevance to the task, the greater the direct same processing will also reduce the final detection precision, so a real-time and accurate 3D point cloud target detection method is urgently needed, and a dynamic balance between speed and precision is realized.
Disclosure of Invention
In view of the above defects in the prior art, an object of the present invention is to provide a real-time and accurate 3D point cloud target detection method, which can achieve a dynamic balance between speed and accuracy, and solve the problem that the existing method cannot perform higher-accuracy target detection in real time through three mechanisms, namely, a second-order point attention mechanism, a second-order channel attention mechanism, and a pseudo-image space attention mechanism, based on a point column.
The technical problems solved by the invention are as follows:
firstly, in the step of a feature extraction network, feature information of an input image is usually lost in the down-sampling process of a main network, and points in voxels have relevance with each other, so that processing points in point clouds in an isolated manner will definitely lose a part of useful geometric information, and further detection precision is influenced. The invention provides a point-column-based second-order point attention mechanism, which can extract more fine characteristic information at relatively low inference speed by connecting points in the same voxel and points to retain more useful information.
Secondly, in the backbone network, the processing of each channel in an isolated manner ignores the correlation between the channels, so that a part of useful information is lost, and the detection precision is reduced. The invention provides a point-pillar-based second-order channel attention mechanism, which links channels, retains more useful characteristic information and improves the overall detection precision of the channels.
Thirdly, after the pseudo image is generated, the same processing is performed on the features in the pseudo space. Since not all the features of the pseudo space have the same contribution to the detection task, the greater the importance of the region having a greater correlation with the task, the greater the detection accuracy will be affected by directly performing the same processing. In view of the above, the invention provides a pseudo-image space attention mechanism based on a point column, which allocates different weights to each pixel point in a pseudo-space according to the importance degree of a region in the pseudo-space to a task, so as to obtain a more accurate detection result.
The technical scheme adopted by the invention for solving the technical problems is as follows: a point-column-based second-order multi-attention-mechanism 3D point cloud target detection method comprises the following steps:
s1: providing a method for respectively realizing target detection by a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism based on point columns;
s2: the method comprises the steps that a network is provided based on S1, the network mainly comprises a second-order point attention mechanism, a point column characteristic network, a second-order channel attention mechanism, a backbone network, a pseudo image space attention mechanism and an SSD detection head, and the network is also divided into a second-order attention module, a second-order point attention module and a second-order channel attention module;
s3: performing voxelization on the point cloud, and then performing second-order point attention mechanism operation on the point cloud to convert the point cloud into the characteristics of a pseudo image;
s4: performing a second-order channel attention mechanism operation on the characteristics of the pseudo image, and outputting the characteristics of a pseudo space;
s5: performing pseudo-image space attention mechanism operation on the features of the pseudo space, and outputting to obtain a detection result;
wherein the SSD detection head predicts a three-dimensional bounding box of the object using features of the trunk; the second order attention module comprises global maximum pooling, covariance pooling and row convolution; in the S3, under the condition that the point characteristics are used as the input of the second-order attention module, the obtained second-order point attention mechanism weight is used as the output, and the process is a second-order point attention module; when the channel characteristics are input to the second order attention module, a second order channel attention mechanism weight is obtained, and the process is a second order channel attention module.
In a given K-th voxel, for all points in the voxel
Figure SMS_2
Wherein N represents the maximum value of the number of points, C represents the number of channels, and after global maximum pooling, a vector composed of the maximum values in each dimension is obtained
Figure SMS_4
Will->
Figure SMS_5
Input into a fully connected layer, where Nx 1 represents a vector of N rows and 1 columns, resulting in a vector->
Figure SMS_3
Wherein t is through W 1 The number of points after the full link layer is reduced, W 1 After the full-connection layer, a ReLU activation function is used for calculating to obtain a covariance matrix between two points in the same voxel>
Figure SMS_6
Performing convolution on the covariance matrix line by line to obtain a vector ^ er for the second-order point attention mechanism, wherein t is the number of points, t is the number of channels in the second-order channel attention mechanism, and t multiplied by t is the dimension>
Figure SMS_7
Then the vector is->
Figure SMS_8
Is input to W 2 Fully connecting layers and using an activation function Sigmoid function to obtain an N-dimensional attention vector->
Figure SMS_1
In S3, the second-order point attention mechanism is represented as:
s=σ(W 2 RC(Cov(σ(W 1 (GMP(X))))))
where Cov (. Circle.) is the covariance matrix of the computation points, RC (. Circle.) is the row convolution, GMP (. Circle.) is the global maximum pooling, σ is the ReLU activation function,
Figure SMS_9
And/or>
Figure SMS_10
For two different fully connected layers, X is the point in the given Kth voxel->
Figure SMS_11
The second-order channel attention mechanism is similar to the second-order point attention mechanism, the channel characteristics generate similar weights after passing through the second-order attention module, and in S4, the second-order channel attention mechanism is represented as:
M=σ(W 2 RC(Cov(σ(W 1 (GMP(Y))))))
in the formula (I), the compound is shown in the specification,
Figure SMS_12
for the features of the pseudo-image, superscript H, W is the height and width of the pseudo-image.
According to the importance degree of the region in the pseudo space to the task, different weights are distributed to each pixel point in the pseudo space to obtain a more accurate detection result, the characteristic P and the signal G of the pseudo space are used as input, the space attention weight generated by final output is S, and a pseudo image space attention mechanism is represented as follows:
Figure SMS_13
/>
the relationship of P and G can be expressed as:
Figure SMS_14
in the formula (I), the compound is shown in the specification,
Figure SMS_15
and &>
Figure SMS_16
For performing a linear transformation operation with a convolution of 1 x 1, in which->
Figure SMS_17
For a linear transformation operation on P with a convolution of 1 x 1, a decision is made as to whether P is present in the transformed image or not>
Figure SMS_18
A linear transformation operation is performed on G with a convolution of 1 x 1.
Parameterizing the three-dimensional ground truth box as (x, y, z, w, l, h, θ), wherein (x, y, z) represents the center position, (w, l, h) and θ represent the magnitude and direction angles of the bounding box, and the localization regression residual between the ground truth and the anchor is defined as follows:
Figure SMS_19
Figure SMS_20
Δθ=sin(θ gta ),
in the formula (I), the compound is shown in the specification,
Figure SMS_21
gt is ground truth value, a is parameters of anchor box, (x) gt ,y gt ,z gt ) As the coordinates of the center position of the 3D true value box (l) gt ,w gt ,h gt ) Is the length, width and height theta of the 3D true value frame gt For yaw angle, (x) of 3D truth frame around Z axis a ,y a ,z a ) As the coordinates of the center position of the anchor box (l) a ,w a ,h a ) The length, width and height of the anchor box.
θ a The yaw angle of the anchor box around the Z axis is adopted;
the regression loss is expressed as:
Figure SMS_22
wherein SmoothL1 is a SmoothL1 loss function;
since the angular positioning penalty cannot distinguish between flipped frames, a softmax classification penalty L is used in the discretization direction dir This enables the network to learn orientation, using focus loss for object classification loss:
L cls =-a(1-p) r logp
in the formula, p is the probability of correctly detecting the frame, r and a are parameter settings, and the total loss obtained finally is expressed as:
Figure SMS_23
in the formula, N pos Indicates the number of correct detection boxes, beta loc 、β cls 、β dir Is a preset value.
The invention has the advantages and beneficial effects that:
the second-order point attention mechanism in the invention considers that the points in the voxels have relevance, and compared with the existing method in which the points in the voxels are processed in an isolated manner, more useful geometric characteristic information can be reserved, and the detection accuracy is improved. Similarly, the second-order channel attention mechanism in the invention considers the correlation among the channels, and further improves the detection precision. The pseudo-image space attention mechanism provided by the invention considers that not all features of the pseudo space have the same contribution to the detection task, and the region importance with the larger task relevance is higher, so that different weights are distributed to each pixel point in the pseudo-space features, and the feature extraction effect is further improved. Therefore, the three mechanisms based on the point column ensure relatively high detection speed and extraction accuracy.
Drawings
FIG. 1 is a diagram of one embodiment of a second order attention module architecture of the present invention;
FIG. 2 is an overall frame diagram of the point-based second-order multi-attention mechanism 3D point cloud target detection method.
Detailed Description
The following examples are given to illustrate the present invention in detail, and the following examples are given to illustrate the detailed embodiments and specific procedures of the present invention, but the scope of the present invention is not limited to the following examples.
The invention provides a method for realizing target detection by using a Second-order point Attention machine (SOPA), a Second-order channel Attention machine (SOCA) and a Pseudo-Image space Attention machine (SAPI) based on a point column, which comprises the following steps:
s1: providing a method for respectively realizing target detection by a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism based on point columns;
s2: the method comprises the steps that a network is provided based on S1, the network mainly comprises a second-order point attention mechanism, a point column characteristic network, a second-order channel attention mechanism, a backbone network, a pseudo image space attention mechanism and an SSD detection head, and the network is also divided into a second-order attention module, a second-order point attention module and a second-order channel attention module;
s3: voxelizing the point cloud, and then performing second-order point attention mechanism operation on the point cloud to convert the point cloud into the characteristics of a pseudo image;
s4: performing a second-order channel attention mechanism operation on the characteristics of the pseudo image, and outputting the characteristics of a pseudo space;
s5: performing pseudo-image space attention mechanism operation on the features of the pseudo space, and outputting to obtain a detection result;
wherein the SSD detection head predicts a three-dimensional bounding box of the object using features of the trunk; the second order attention module comprises global maximum pooling, covariance pooling and row convolution; in the S3, under the condition that the point characteristics are used as the input of the second-order attention module, the obtained second-order point attention mechanism weight is used as the output, and the process is a second-order point attention module; when the channel features are input to the second order attention module, a second order channel attention mechanism weight is obtained, and the process is a second order channel attention module.
In a given K-th voxel, for all points in the voxel
Figure SMS_26
Wherein N represents the maximum value of the number of points, C represents the number of channels, and after global maximum pooling, a vector composed of the maximum values in each dimension is obtained
Figure SMS_27
Will->
Figure SMS_29
Input into a fully connected layer, where Nx 1 represents a vector of N rows and 1 columns, resulting in a vector->
Figure SMS_25
Wherein t is through W 1 The number of points after the full link layer is reduced, W 1 After the full-connection layer, a ReLU activation function is used for calculating to obtain a covariance matrix between two points in the same voxel>
Figure SMS_28
Wherein t is a point at the second order point attention mechanismThe number, the attention mechanism of the second-order channel, t is the number of the channels, and t multiplied by t is the dimension, and the covariance matrix is convolved line by line to obtain the vector ^ er>
Figure SMS_30
Then the vector is->
Figure SMS_31
Is input to W 2 Fully connecting layers and using an activation function Sigmoid function to obtain an N-dimensional attention vector->
Figure SMS_24
In S3, the second order point attention mechanism is expressed as:
s=σ(W 2 RC(Cov(σ(W 1 (GMP(X))))))
where Cov (. Circle.) is the covariance matrix of the computation points, RC (. Circle.) is the row convolution, GMP (. Circle.) is the global maximum pooling, σ is the ReLU activation function,
Figure SMS_32
And &>
Figure SMS_33
For two different fully connected layers, X is the point in the given Kth voxel->
Figure SMS_34
The second-order channel attention mechanism is similar to the second-order point attention mechanism, the channel characteristics generate similar weights after passing through the second-order attention module, and in S4, the second-order channel attention mechanism is represented as:
M=σ(W 2 RC(Cov(σ(W 1 (GMP(Y))))))
in the formula (I), the compound is shown in the specification,
Figure SMS_35
for the features of the pseudo-image, superscript H, W is the height and width of the pseudo-image.
According to the importance degree of the region in the pseudo space to the task, different weights are distributed to each pixel point in the pseudo space to obtain a more accurate detection result, the characteristic P and the signal G of the pseudo space are used as input, the space attention weight generated by final output is S, and a pseudo image space attention mechanism is represented as follows:
Figure SMS_36
the relationship of P and G can be expressed as:
Figure SMS_37
in the formula (I), the compound is shown in the specification,
Figure SMS_38
and &>
Figure SMS_39
For performing a linear transformation operation with a convolution of 1 x 1, in which->
Figure SMS_40
For a linear transformation operation on P with a convolution of 1 x 1>
Figure SMS_41
A linear transformation operation is performed on G with a convolution of 1 x 1.
Parameterizing the three-dimensional ground truth box as (x, y, z, w, l, h, θ), wherein (x, y, z) represents the center position, (w, l, h) and θ represent the magnitude and direction angles of the bounding box, and the localization regression residual between the ground truth and the anchor is defined as follows:
Figure SMS_42
Figure SMS_43
Δθ=sin(θ gta ),
in the formula (I), the compound is shown in the specification,
Figure SMS_44
gt is ground truth value, a is parameters of anchor box, (x) gt ,y gt ,z gt ) As the center position coordinates of the 3D truth frame, (l) gt ,w gt ,h gt ) Is the length, width and height theta of the 3D true value frame gt For yaw angle, (x) of 3D truth frame around Z axis a ,y a ,z a ) As the coordinates of the center position of the anchor box (l) a ,w a ,h a ) The length, width and height of the anchor box. Theta a The yaw angle of the anchor box around the Z axis is adopted;
the regression loss is expressed as:
Figure SMS_45
wherein SmoothL1 is a SmoothL1 loss function;
since the angular positioning penalty cannot distinguish between flipped frames, a softmax classification penalty L is used in the discretization direction dir This enables the network to learn orientation, using focus loss for object classification loss:
L cls =-a(1-p) r logp
in the formula, p is the probability of correctly detecting the frame, r and a are parameter settings, and the total loss obtained finally is expressed as:
Figure SMS_46
in the formula, N pos Indicating the number of correct detection boxes. Here beta loc2 、β cls1 、β dir =0.2。
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (4)

1. A point-pillar-based second-order multi-attention-mechanism 3D point cloud target detection method is characterized by comprising the following steps of:
s1: providing a method for respectively realizing target detection by a second-order point attention mechanism, a second-order channel attention mechanism and a pseudo-image space attention mechanism based on point columns;
s2: the method comprises the steps that a network is provided based on S1, the network mainly comprises a second-order point attention mechanism, a point column characteristic network, a second-order channel attention mechanism, a backbone network, a pseudo image space attention mechanism and an SSD detection head, and the network is also divided into a second-order attention module, a second-order point attention module and a second-order channel attention module;
s3: voxelizing the point cloud, and then performing second-order point attention mechanism operation on the point cloud to convert the point cloud into the characteristics of a pseudo image;
s4: performing a second-order channel attention mechanism operation on the characteristics of the pseudo image, and outputting the characteristics of a pseudo space;
s5: performing pseudo-image space attention mechanism operation on the characteristics of the pseudo space, and outputting to obtain a detection result;
wherein the SSD detection head predicts a three-dimensional bounding box of the object using features of the trunk; the second order attention module comprises global max pooling, covariance pooling and row convolution; in the S3, under the condition that the point characteristics are used as the input of the second-order attention module, the obtained second-order point attention mechanism weight is used as the output, and the process is a second-order point attention module; when the channel features are input to the second order attention module, a second order channel attention mechanism weight is obtained, and the process is a second order channel attention module.
2. The method for detecting the second-order multi-attention mechanism 3D point cloud target based on the point column as claimed in claim 1, wherein the method comprises the following steps:
in a given K-th voxel, for all points in the voxel
Figure FDA0003841344520000011
Wherein N represents the maximum value of the number of points and C represents the number of channels, and after global maximum pooling, the vector consisting of the maximum values in each dimension is obtained>
Figure FDA0003841344520000012
Will be provided with
Figure FDA0003841344520000013
Input into a fully connected layer, where Nx 1 represents a vector of N rows and 1 columns, resulting in a vector->
Figure FDA0003841344520000014
Wherein t is through W 1 Number of points after full connection layer reduction, W 1 After the full-connection layer, a ReLU activation function is used for calculating to obtain a covariance matrix between two points in the same voxel>
Figure FDA0003841344520000015
Performing convolution on the covariance matrix line by line to obtain a vector ^ er/ion, wherein t is the number of points in a second-order point attention mechanism, t is the number of channels in a second-order channel attention mechanism, and t multiplied by t is a dimension>
Figure FDA0003841344520000016
Then the vector is->
Figure FDA0003841344520000017
Is input to W 2 Fully connecting layers and using an activation function Sigmoid function to obtain an N-dimensional attention vector
Figure FDA0003841344520000018
In S3, the second order point attention mechanism is expressed as:
s=σ(W 2 RC(Cov(σ(W 1 (GMP(X))))))
where Cov (. Circle.) is the covariance matrix of the computation points, RC (. Circle.) is the row convolution, GMP (. Circle.) is the global maximum pooling, σ is the ReLU activation function,
Figure FDA0003841344520000019
And/or>
Figure FDA00038413445200000110
For two different fully-connected layers, X is the point in the given Kth voxel
Figure FDA00038413445200000111
The second-order channel attention mechanism is similar to the second-order point attention mechanism, the channel characteristics are output after passing through the second-order attention module to generate similar weights, and in S4, the second-order channel attention mechanism is expressed as:
M=σ(W 2 RC(Cov(σ(W 1 (GMP(Y))))))
in the formula (I), the compound is shown in the specification,
Figure FDA0003841344520000021
for the features of the pseudo-image, superscript H, W is the height and width of the pseudo-image.
3. The method for detecting the second-order multi-attention mechanism 3D point cloud target based on the point column as claimed in claim 2, wherein the method comprises the following steps: according to the importance degree of the region in the pseudo space to the task, different weights are distributed to each pixel point in the pseudo space to obtain a more accurate detection result, the characteristic P and the signal G of the pseudo space are used as input, the space attention weight generated by final output is S, and a pseudo image space attention mechanism is represented as follows:
Figure FDA0003841344520000022
the relationship of P and G can be expressed as:
Figure FDA0003841344520000023
in the formula (I), the compound is shown in the specification,
Figure FDA0003841344520000024
for a linear transformation operation with a convolution of 1 x 1, a decision is made as to whether the transformation is based on the result of the convolution operation>
Figure FDA0003841344520000025
For a linear transformation operation on P with a convolution of 1 x 1, a decision is made as to whether P is present in the transformed image or not>
Figure FDA0003841344520000026
A linear transformation operation is performed on G with a convolution of 1 x 1.
4. The method for detecting the second-order multi-attention mechanism 3D point cloud target based on the point column as claimed in claim 3, wherein: parameterizing the three-dimensional ground truth box as (x, y, z, w, l, h, θ), wherein (x, y, z) represents the center position, (w, l, h) and θ represent the magnitude and direction angles of the bounding box, and the localization regression residual between the ground truth and the anchor is defined as follows:
Figure FDA0003841344520000027
Figure FDA0003841344520000028
Δθ=sin(θ gta ),
in the formula (I), the compound is shown in the specification,
Figure FDA0003841344520000029
gt is ground truth value, a is parameters of anchor box, (x) gt ,y gt ,z gt ) As the center position coordinates of the 3D truth frame, (l) gt ,w gt ,h gt ) Is the length, width and height theta of the 3D true value frame gt For yaw angle, (x) of 3D truth frame around Z axis a ,y a ,z a ) As the coordinates of the center position of the anchor box、(l a ,w a ,h a ) Is the length, width and height of the anchor box theta a The yaw angle of the anchor box around the Z axis is adopted;
the regression loss is expressed as:
Figure FDA00038413445200000210
wherein SmoothL1 is a SmoothL1 loss function;
since the angular positioning penalty cannot distinguish between flipped frames, a softmax classification penalty L is used in the discretization direction dir This enables the network to learn orientation, using focus loss for object classification loss:
L cls =-a(1-p) r log p
in the formula, p is the probability of correctly detecting the frame, r and a are parameter settings, and the total loss obtained finally is expressed as:
Figure FDA0003841344520000031
in the formula, N pos Indicates the number of correct detection boxes, beta loc 、β cls 、β dir Is a preset value.
CN202211104980.0A 2022-09-09 2022-09-09 Point column-based two-order multi-attention mechanism 3D point cloud target detection method Pending CN115908829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211104980.0A CN115908829A (en) 2022-09-09 2022-09-09 Point column-based two-order multi-attention mechanism 3D point cloud target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211104980.0A CN115908829A (en) 2022-09-09 2022-09-09 Point column-based two-order multi-attention mechanism 3D point cloud target detection method

Publications (1)

Publication Number Publication Date
CN115908829A true CN115908829A (en) 2023-04-04

Family

ID=86488691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211104980.0A Pending CN115908829A (en) 2022-09-09 2022-09-09 Point column-based two-order multi-attention mechanism 3D point cloud target detection method

Country Status (1)

Country Link
CN (1) CN115908829A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863433A (en) * 2023-09-04 2023-10-10 深圳大学 Target detection method based on point cloud sampling and weighted fusion and related equipment
CN117111013A (en) * 2023-08-22 2023-11-24 南京慧尔视智能科技有限公司 Radar target tracking track starting method, device, equipment and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117111013A (en) * 2023-08-22 2023-11-24 南京慧尔视智能科技有限公司 Radar target tracking track starting method, device, equipment and medium
CN117111013B (en) * 2023-08-22 2024-04-30 南京慧尔视智能科技有限公司 Radar target tracking track starting method, device, equipment and medium
CN116863433A (en) * 2023-09-04 2023-10-10 深圳大学 Target detection method based on point cloud sampling and weighted fusion and related equipment
CN116863433B (en) * 2023-09-04 2024-01-09 深圳大学 Target detection method based on point cloud sampling and weighted fusion and related equipment

Similar Documents

Publication Publication Date Title
CN111429514B (en) Laser radar 3D real-time target detection method integrating multi-frame time sequence point cloud
Wen et al. Fast and accurate 3D object detection for lidar-camera-based autonomous vehicles using one shared voxel-based backbone
CN111798475B (en) Indoor environment 3D semantic map construction method based on point cloud deep learning
CN111201451B (en) Method and device for detecting object in scene based on laser data and radar data of scene
CN115908829A (en) Point column-based two-order multi-attention mechanism 3D point cloud target detection method
CN113159151B (en) Multi-sensor depth fusion 3D target detection method for automatic driving
CN110097553A (en) The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
CN113052109A (en) 3D target detection system and 3D target detection method thereof
Wang et al. CenterNet3D: An anchor free object detector for point cloud
Chen et al. SAANet: Spatial adaptive alignment network for object detection in automatic driving
Wang et al. Multi-modal and multi-scale fusion 3D object detection of 4D radar and LiDAR for autonomous driving
Zhang et al. Robust-FusionNet: Deep multimodal sensor fusion for 3-D object detection under severe weather conditions
CN114549537A (en) Unstructured environment point cloud semantic segmentation method based on cross-modal semantic enhancement
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
Stanisz et al. Optimisation of the pointpillars network for 3d object detection in point clouds
Li et al. Enhancing multi-modal features using local self-attention for 3d object detection
Beltrán et al. A method for synthetic LiDAR generation to create annotated datasets for autonomous vehicles perception
CN112950786A (en) Vehicle three-dimensional reconstruction method based on neural network
CN116682105A (en) Millimeter wave radar and visual feature attention fusion target detection method
Li et al. Monocular 3-D Object Detection Based on Depth-Guided Local Convolution for Smart Payment in D2D Systems
Yang et al. SACINet: Semantic-Aware Cross-Modal Interaction Network for Real-Time 3D Object Detection
CN116152800A (en) 3D dynamic multi-target detection method, system and storage medium based on cross-view feature fusion
Tian et al. Jyolo: Joint point cloud for autonomous driving 3d object detection
Zhang et al. FS-Net: LiDAR-Camera Fusion With Matched Scale for 3D Object Detection in Autonomous Driving
Reddy et al. Machine Learning Based VoxelNet and LUNET architectures for Object Detection using LiDAR Cloud Points

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination