CN113378854A - Point cloud target detection method integrating original point cloud and voxel division - Google Patents

Point cloud target detection method integrating original point cloud and voxel division Download PDF

Info

Publication number
CN113378854A
CN113378854A CN202110651776.XA CN202110651776A CN113378854A CN 113378854 A CN113378854 A CN 113378854A CN 202110651776 A CN202110651776 A CN 202110651776A CN 113378854 A CN113378854 A CN 113378854A
Authority
CN
China
Prior art keywords
point cloud
point
voxel
layer
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110651776.XA
Other languages
Chinese (zh)
Inventor
姚剑
蒋天园
李寅暄
龚烨
李礼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110651776.XA priority Critical patent/CN113378854A/en
Publication of CN113378854A publication Critical patent/CN113378854A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a point cloud target detection method fusing original point cloud and voxel division. Firstly, extracting local detail features and semantic features of point cloud by utilizing a lossless feature extraction network Pointnet + +, then constructing a loss function to further improve the perception capability of the lossless feature extraction network Pointnt + + on local neighborhood information, embedding the local detail features and the semantic features without information loss into a point cloud target detection network based on voxel division at a voxel feature initialization stage and a sparse convolution perception stage by adopting trilinear interpolation, and finally classifying and regressing a preset detection anchor frame through a two-dimensional RPN to obtain a final detection target. The invention enables the detection network to have multi-scale and multi-level information fusion perception capability by embedding the lossless coding multi-scale of the point cloud into the voxel method, and integrates two types of point cloud target detection methods based on original point cloud and voxel division, and simultaneously has high-efficiency point cloud perception capability and lossless feature coding capability.

Description

Point cloud target detection method integrating original point cloud and voxel division
Technical Field
The invention belongs to the technical field of 3D point cloud target detection, and particularly relates to a point cloud target detection method fusing original point cloud and voxel division.
Background
With the continuous upgrading of the vehicle-mounted laser radar technology, the vehicle-mounted laser radar can quickly and conveniently acquire point cloud data of a current scene, the extraction of a target in the scene can be realized by utilizing the geometric structure information of the scene point cloud, and the technical method has penetrated into various industries such as smart city construction, automatic driving, unmanned distribution and the like. Due to the random and disordered laser point cloud and the large difference of density and sparsity, if the traditional target detection algorithm is adopted to carry out uniform manual feature extraction on massive point cloud data, the method cannot adapt to the shape change of the target under the complex road scene of automatic driving. Therefore, the point cloud target detection algorithm based on deep learning is rapidly developed and applied in an automatic driving scene.
The current general point cloud target detection method based on deep learning mainly comprises the following steps: target detection based on original point cloud and point cloud target detection based on voxel division.
The 3D target detection algorithm based on the original point cloud does not carry out any pretreatment on the scene point cloud, the coordinates of the original point cloud and the corresponding reflectivity numerical value are directly input into a neural network built by a multilayer perceptron (MLP), the point cloud scene is sampled layer by layer from shallow to deep by adopting Farthest Point Sampling (FPS), local detail characteristics and semantic characteristics are extracted by a local point Set characteristic extraction module (Set Abstract), and finally, the detail information characteristics and the semantic information characteristics are endowed to all points in the original scene by a characteristic transmission layer (Feature prediction) by adopting trilinear interpolation. The method has no information loss, but the perception capability of the multilayer perceptron to the disordered point cloud is lower than that of a structure built by a convolutional neural network based on a voxel division method.
The method comprises the steps of carrying out point cloud target detection based on voxel division to divide scene point cloud into uniform voxel grids according to the point cloud density scanned by different linear laser radars, carrying out feature extraction on each voxel by adopting a voxel feature extraction mode adapting to different voxel sizes, carrying out feature extraction on initialized voxel scene semantic information by utilizing 3D convolution or 3D sparse convolution, gradually compressing height dimensions to one dimension, and further adopting a two-dimensional convolution building region to provide a network (RPN) to classify and predict an anchor frame preset for each convolution grid point under a scene top view. According to the method, objects which are not easy to deform and have high point cloud density can be quickly and efficiently classified in an automatic driving point cloud scene, however, due to voxel division, geometric deformation of an original point cloud structure is caused, and particularly for small objects such as pedestrians and bicycles, local detail information is lost due to deformation caused by voxel division, so that the detected classification and regression effects deviate from real targets.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a point cloud target detection method fusing original point cloud and voxel division. Firstly, extracting local detail features and semantic features of point cloud by utilizing a lossless feature extraction network Pointnet + +, then constructing a loss function to further improve the perception capability of the lossless feature extraction network Pointnt + + on local neighborhood information, then embedding the local detail features and the semantic features without information loss into a point cloud target detection network based on voxel division in a voxel feature initialization stage and a sparse convolution perception stage by respectively adopting trilinear interpolation, and finally classifying and regressing each preset detection anchor frame through a two-dimensional RPN to obtain a final detection target.
In order to achieve the aim, the technical scheme provided by the invention is a point cloud target detection method fusing original point cloud and voxel division, which comprises the following steps:
step 1, extracting local detail features and semantic features of point cloud by using a lossless feature extraction network Polnnet + +;
step 1.1, constructing a multilayer encoder;
step 1.2, extracting local detail features and semantic features of each layer of point cloud through an SA (security association) module without information loss;
step 1.3, endowing the detail features and the semantic features extracted in the step 1.2 to all points in an original scene through a feature transfer layer by adopting trilinear interpolation;
step 2, constructing a loss function, supervising the execution of the feature extraction in the step 1, and promoting the lossless feature extraction of the network Pointnet + + perception feature information;
step 3, embedding the local detail features and semantic features without information loss into a point cloud target detection network based on voxel division;
step 3.1, initializing voxel characteristics by using the local detail characteristics extracted in the step 1;
step 3.2, performing feature extraction on the voxel scene semantic information initialized in the step 3.1 by using 3D sparse convolution;
step 3.3, converting the semantic features obtained in the step 1 into voxel features by adopting trilinear interpolation;
step 3.4, fusing the semantic features subjected to sparse convolution sensing in the step 3.2 with the voxel features obtained by conversion in the step 3.3 by adopting an attention mechanism mode to obtain semantic information fusing two sensing modes;
step 4, projecting the semantic features obtained by fusing in the step 3 to a two-dimensional top view, building a region extraction network (RPN) through two-dimensional convolution, and classifying and regressing a detection anchor frame preset for each pixel point under the scene top view to obtain a final detection target;
step 4.1, setting an RPN network structure and a predefined detection anchor frame;
and 4.2, designing a point cloud target detection loss function.
Moreover, in the step 1.1, the multilayer encoder is constructed by firstly collecting N points from the original point cloud as the input point cloud by using a farthest point sampling strategy (FPS), and then collecting the number of the N points from the input point cloud data layer by using the FPS
Figure BDA0003111771480000031
The point clouds in the layers form a 4-layer encoder, and the point clouds input in each layer are point sets output in the previous layer.
Moreover, the input of each layer of SA module in step 1.2 is obtained by the last layer through FPS samplingTo a fixed number of point sets, set point piFor the ith point obtained by FPS sampling of the current layer,
Figure BDA0003111771480000032
for the point p in the previous layeriSet of points within a sphere neighborhood of radius r, point p, centered oniThe calculation of the output characteristics comprises the following steps:
step 1.2.1, from the set
Figure BDA0003111771480000033
K points in the random sampling form a set
Figure BDA0003111771480000034
Step 1.2.2, performing feature fusion extraction on the points sampled in step 1.2.1 through a multilayer perceptron, wherein a calculation formula is as follows:
Figure BDA0003111771480000035
where MLP represents a high-dimensional mapping of the point features by the multi-tier perceptron, max () represents taking the maximum value on the feature dimension of the set of points, f (p)i) I.e. point piThe output characteristics of (1);
and 1.2.3, repeatedly carrying out FPS sampling on each layer of input point cloud to obtain a corresponding number of point clouds, and aggregating neighborhood characteristics on the sampled points through the step 1.2.2, thereby completing the characteristic extraction without information loss. The first layer extracts local detail features, and the last three layers extract semantic features.
Moreover, the feature transfer in step 1.3 is the reverse process of feature extraction, starting from the last layer of the extraction layer, and proceeding to the next layer in sequence, i.e. from the last layer
Figure BDA0003111771480000036
Layer transfer to
Figure BDA0003111771480000037
A layer,
Figure BDA0003111771480000038
Layer transfer to
Figure BDA0003111771480000039
A layer,
Figure BDA00031117714800000310
Layer transfer to
Figure BDA00031117714800000311
A layer,
Figure BDA00031117714800000312
The layers are transferred to the N layers. To be provided with
Figure BDA00031117714800000313
Layer transfer to
Figure BDA00031117714800000314
Layer is a transfer process for example introduction characteristics, assuming point piIs composed of
Figure BDA00031117714800000315
The point at which a layer needs to transfer a feature, phi (P)i) To represent
Figure BDA00031117714800000316
Distance P in the Ou-space in the layeriCombination of nearest k points, PjDenotes phi (P)i) In (2), the computation method of the trilinear interpolation feature transfer is as follows:
Figure BDA0003111771480000041
Figure BDA0003111771480000042
in the formula, f (p)i) Is a feature that needs to be transferred,f(pj) Is indicated at point PiThe j point P in the neighborhoodjIs characterized by wijRepresenting point PiThe j point P in the neighborhoodjThe feature weighted weight of (1).
The characteristic of each transmitted point is obtained by carrying out weighted summation of Euclidean distances on the characteristics of k points in the next layer of field, and the transmitted points can be transmitted to each point cloud in the scene layer by layer forwards so as to enable the point cloud to have lossless information characteristics.
In step 2, the point cloud coordinates in the original scene are used as point cloud supervision information, and the smoothening-L1 loss is used as a loss function, and the calculation method is as follows:
Figure BDA0003111771480000043
Figure BDA0003111771480000044
wherein r' and r respectively represent the point cloud space coordinates predicted by the lossless feature extraction network and the space coordinates of the original point cloud, phi (p) represents a point cloud set in the whole original scene, and the perception effect of the lossless feature extraction network Pointnt + + on the local neighborhood information is further improved under the supervision of a loss function.
In the step 3.1, the initialization is to uniformly divide the point cloud space into a voxel grid, retain voxels containing points, discard voxels not containing points, and initialize the retained voxels by using the local detail features obtained in the step 1. Assume that the output of the first layer of the encoder network in step 1 is
Figure BDA0003111771480000045
Wherein P isiRepresenting points in the original point cloud space that require transfer of features, Fi PIs a point PiIs characterized in that it is a mixture of two or more of the above-mentioned components,
Figure BDA0003111771480000046
indicates that the encoder has extracted a total amount in step 1
Figure BDA0003111771480000047
Local detail features of the points. Center of voxel
Figure BDA0003111771480000048
VjThe center of the voxel is represented by the center of the voxel,
Figure BDA0003111771480000049
representing the voxel center VjFeatures need to be assigned, M indicates a common M voxel center needs to be assigned. The characteristics of the voxel center are assigned through a trilinear interpolation function to ensure that
Figure BDA0003111771480000051
Representing a distance V in Euclidean spacejCombination of nearest k points, PtTo represent
Figure BDA0003111771480000052
A point of (1), then
Figure BDA0003111771480000053
The calculation method of (c) is as follows:
Figure BDA0003111771480000054
Figure BDA0003111771480000055
wherein, Ft PRepresenting the voxel center point VjT-th feature point P in neighborhoodtIs characterized by wtjRepresenting the voxel center point VjT point in the neighborhood PtThe feature weighted weight of (1).
Furthermore, step 3.2 is to stack 4 layers of sparse convolution modules using the Spconv library, wherein each sparse convolution module comprises two layers of sub-stream type convolution modules and one layerPoint cloud sparse convolution module with down-sampling of 2. Assuming that the input voxel sign tensor is represented as L × W × H × C, where L, W, H, C represents the length, width, height of the voxel scene and the characteristic dimension of each voxel, respectively, the output can be represented as L × W × H × C through 4 layers of sparse convolution
Figure BDA0003111771480000056
Wherein C' represents the feature dimension after feature extraction.
Furthermore, in step 3.3, it is assumed that the three-layer semantic information features extracted from step 1 are represented as
Figure BDA0003111771480000057
Wherein 4 x represents four times of down-sampling, and the voxel coordinate after sparse convolution is
Figure BDA0003111771480000058
Figure BDA0003111771480000059
The center of the voxel is represented by the center of the voxel,
Figure BDA00031117714800000510
representing the center of a voxel
Figure BDA00031117714800000511
The characteristics that need to be imparted. Converting the semantic features of the points into voxel center representation by trilinear interpolation
Figure BDA00031117714800000512
Representing distance in Euclidean space
Figure BDA00031117714800000513
Combination of nearest k points, Pt,4×、Pt,8×、Pt,16×Are all made of
Figure BDA00031117714800000514
A point of middle, then
Figure BDA00031117714800000515
The calculation method of (c) is as follows:
Figure BDA00031117714800000516
Figure BDA00031117714800000517
Figure BDA00031117714800000518
Figure BDA0003111771480000061
wherein,
Figure BDA0003111771480000062
representing the voxel center after 3D sparse convolution, Pt,4×、Pt,8×、Pt,16×Representing the spatial points at which the feature weighting is performed,
Figure BDA0003111771480000063
representing voxel center point
Figure BDA0003111771480000064
The t-th point in the neighborhood downsamples four times the layer's features, wtj,4×Representing the center of a voxel
Figure BDA0003111771480000065
The t-th point in the neighborhood downsamples four times the layer's feature weighting.
In step 3.4, two semantic information are connected in series in the feature dimension, and the voxel feature dimension obtained by conversion in step 3.3 is assumed to be M1And the voxel characteristic obtained by sparse convolution perception is M2Then the characteristic dimension of the voxel after superposition is M1+M2Then, a layer of multilayer perceptron is adopted to combine the characteristics M1+M2Dimension mapping as M1
And in the step 4.1, the RPN is built by a four-layer two-dimensional convolutional neural network and is output layer by adopting a U-Net network structure, which is specifically expressed as
Figure BDA0003111771480000066
Figure BDA0003111771480000067
Each layer employs a 3 x 3 convolution to reduce the learning parameters. And further performing feature abstraction on the fused information by adopting an encoding-decoding network structure, presetting a corresponding detection anchor frame for each pixel point on a final feature map, and classifying and regressing the preset detection anchor frames to obtain an object detected by the RPN. A three-dimensional detection anchor frame can be expressed as { x, y, z, l, w, h, r }, (x, y, z) represents the center position of the detection anchor frame, l, w, h respectively correspond to length, width and height, and r is the rotation angle in the x-y plane. The voxel characteristics after 3D sparse convolution and semantic information fusion can be characterized as
Figure BDA0003111771480000068
Compressing the height dimension to the characteristic dimension to obtain a two-dimensional image which is characterized by
Figure BDA0003111771480000069
Thus for a size of
Figure BDA00031117714800000610
All together have
Figure BDA00031117714800000611
And a predefined detection anchor frame.
Furthermore, the classification loss function L in said step 4.2clsUsing a cross entropy loss function, namely:
Figure BDA00031117714800000612
wherein n represents the number of the detection anchor frames set in advance, P (a)i) Represents the score of the ith test anchor frame prediction, Q (a)i) A true tag value representing the probe anchor box.
Regression loss function LregUsing the Smooth-L1 loss function, namely:
Figure BDA0003111771480000071
Figure BDA0003111771480000072
in the formula, n represents the number of preset detection anchor frames, v represents the true value of the detection anchor frames, and v' represents the value of the detection anchor frames predicted by the RPN.
Through the combined supervision of the classification loss function and the regression loss function, the network can finally learn the capability of detecting the point cloud target.
Compared with the prior art, the invention has the following advantages: (1) the method combines the advantages of the current point cloud target detection method based on voxel division and the original point cloud, and has high-efficiency point cloud sensing capability and lossless feature coding capability; (2) by embedding the lossless encoding of the point cloud into a voxel method in a multi-scale and multi-level mode, the detection network is promoted to have multi-scale and multi-level information fusion perception capability.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a diagram of an example of detection according to an embodiment of the present invention, in which fig. 2(a) is an input point cloud, and fig. 2(b) is a point cloud detection anchor box.
Detailed Description
The invention provides a point cloud target detection method integrating original point cloud and voxel division, which comprises the steps of firstly extracting local detail features and semantic features of the point cloud by utilizing a lossless feature extraction network Pointnet + +, then constructing a loss function to further improve the perception capability of the lossless feature extraction network Pointnt + + on local neighborhood information, then embedding the local detail features and the semantic features without information loss into a point cloud target detection network based on voxel division in a voxel feature initialization stage and a sparse convolution perception stage by respectively adopting trilinear interpolation, and finally classifying and regressing each preset detection anchor frame through a two-dimensional RPN to obtain a final detection target.
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
Step 1, extracting local detail features and semantic features of the point cloud by using a lossless feature extraction network Polnnet + +.
Firstly, a fixed number N of input point clouds are collected, then, a local point cloud Feature extractor (setab interaction, SA) is sampled layer by layer and built for Feature extraction of a local scene, and then, a trilinear interpolation is adopted to endow local detail features and semantic features to all points in the original scene through a Feature transmission layer (Feature Propagation). The method comprises the following substeps:
and 1.1, constructing a multilayer encoder.
Firstly, a farthest point sampling strategy (FPS) is utilized to collect N points from an original point cloud as an input point cloud, and then the FPS is utilized to collect the number of the points layer by layer from the input point cloud data as
Figure BDA0003111771480000081
The point clouds in the layers form a 4-layer encoder, and the point clouds input in each layer are point sets output in the previous layer.
And 1.2, extracting the local detail features and semantic features of each layer of point cloud through an SA module without information loss.
The input of each layer of SA module is a fixed number of point sets obtained by FPS sampling of the previous layer, and a point p is setiFor the ith point obtained by FPS sampling of the current layer,
Figure BDA0003111771480000082
for the point p in the previous layeriThe set of points inside a sphere neighborhood of radius r, centered. p is a radical ofiOf the output characteristicsThe calculation comprises the following steps:
step 1.2.1, from the set
Figure BDA0003111771480000083
K points in the random sampling form a set
Figure BDA0003111771480000084
Step 1.2.2, performing feature fusion extraction on the points sampled in the step 1.2.1 through a multilayer perceptron, and calculating to obtain a point piThe output characteristic of (1).
Firstly, a multi-layer perceptron is adopted to collect the randomly sampled point set in the step 1.2.1
Figure BDA0003111771480000085
Extracting local detail feature to obtain
Figure BDA0003111771480000086
The high-dimensional mapping characteristic of the point is obtained, then the maximum information representation on the characteristic dimension is obtained through maximum pooling on the characteristic dimension, and the high-dimensional mapping characteristic of the maximum information representation is the point piThe output characteristic of (1). The calculation formula is as follows:
Figure BDA0003111771480000087
where MLP represents a high-dimensional mapping of the point features by the multi-tier perceptron, max () represents taking the maximum value on the feature dimension of the set of points, f (p)i) I.e. point piThe output characteristic of (1).
And 1.2.3, repeatedly carrying out FPS sampling on each layer of input point cloud to obtain a corresponding number of point clouds, and aggregating neighborhood characteristics on the sampled points through the step 1.2.2, thereby completing the characteristic extraction without information loss. The first layer extracts local detail features, and the last three layers extract semantic features.
And step 1.3, endowing the detail features and the semantic features extracted in the step 1.2 to all points in the original scene through a feature transfer layer by adopting tri-linear interpolation.
The feature transfer is the reverse process of feature extraction, starting from the last layer of the extraction layer, and sequentially proceeding to the upper layer for feature transfer, namely from the last layer
Figure BDA0003111771480000091
Layer transfer to
Figure BDA0003111771480000092
A layer,
Figure BDA0003111771480000093
Layer transfer to
Figure BDA0003111771480000094
A layer,
Figure BDA0003111771480000095
Layer transfer to
Figure BDA0003111771480000096
A layer,
Figure BDA0003111771480000097
The layers are transferred to the N layers. To be provided with
Figure BDA0003111771480000098
Layer transfer to
Figure BDA0003111771480000099
Layer is a transfer process for example introduction characteristics, assuming point piIs composed of
Figure BDA00031117714800000910
The point at which a layer needs to transfer a feature, phi (P)i) To represent
Figure BDA00031117714800000911
Distance P in the Ou-space in the layeriCombination of nearest k points, PjDenotes phi (P)i) In (2), the computation method of the trilinear interpolation feature transfer is as follows:
Figure BDA00031117714800000912
Figure BDA00031117714800000913
in the formula, f (p)i) Is a feature that requires transfer, f (p)j) Is indicated at point PiThe j point P in the neighborhoodjIs characterized by wijRepresenting point PiThe j point P in the neighborhoodjThe feature weighted weight of (1).
The characteristic of each transmitted point is obtained by carrying out weighted summation of Euclidean distances on the characteristics of k points in the next layer of field, and the transmitted points can be transmitted to each point cloud in the scene layer by layer forwards so as to enable the point cloud to have lossless information characteristics.
And 2, constructing a loss function, supervising the execution of the feature extraction in the step 1, and promoting the lossless feature extraction of the network Pointnet + + perception feature information.
The method adopts point cloud coordinates in an original scene as point cloud supervision information, Smooth-L1 loss as a loss function, and comprises the following calculation modes:
Figure BDA00031117714800000914
Figure BDA00031117714800000915
wherein r' and r respectively represent the point cloud space coordinates predicted by the lossless feature extraction network and the space coordinates of the original point cloud, phi (p) represents a point cloud set in the whole original scene, and the perception effect of the lossless feature extraction network Pointnt + + on the local neighborhood information is further improved under the supervision of a loss function.
And 3, embedding the local detail features and semantic features without information loss into a point cloud target detection network based on voxel division.
Firstly, dividing original point cloud into voxels, initializing the voxel characteristics by using the local detail characteristics extracted in the step 1, then perceiving a point cloud space structure through sparse 3D convolution, and then fusing the semantic characteristics extracted in the step 1 at a semantic level, wherein the method comprises the following substeps:
and 3.1, initializing the voxel characteristics by using the local detail characteristics extracted in the step 1.
Firstly, uniformly dividing a point cloud space into voxel grids, reserving voxels containing points, discarding voxels not containing points, and then initializing the reserved voxels by using the local detail features obtained in the step 1. Assume that the output of the first layer of the encoder network in step 1 is
Figure BDA0003111771480000101
Wherein P isiRepresenting points in the original point cloud space that require transfer of features, Fi PIs a point PiIs characterized in that it is a mixture of two or more of the above-mentioned components,
Figure BDA0003111771480000102
indicates that the encoder has extracted a total amount in step 1
Figure BDA0003111771480000103
Local detail features of the points. Center of voxel
Figure BDA0003111771480000104
VjThe center of the voxel is represented by the center of the voxel,
Figure BDA0003111771480000105
representing the voxel center VjFeatures need to be assigned, M indicates a common M voxel center needs to be assigned. The characteristics of the voxel center are assigned through a trilinear interpolation function to ensure that
Figure BDA0003111771480000106
Representing a distance V in Euclidean spacejCombination of nearest k points, PtTo represent
Figure BDA0003111771480000107
A point of (1), then
Figure BDA0003111771480000108
The calculation method of (c) is as follows:
Figure BDA0003111771480000109
Figure BDA00031117714800001010
wherein, Ft PRepresenting the voxel center point VjT-th feature point P in neighborhoodtIs characterized by wtjRepresenting the voxel center point VjT point in the neighborhood PtThe feature weighted weight of (1).
And 3.2, performing feature extraction on the voxel scene semantic information initialized in the step 3.1 by using 3D sparse convolution.
And stacking 4 layers of sparse convolution modules by using a Spconv library, wherein each sparse convolution module comprises two layers of sub-stream type convolution modules and a layer of point cloud sparse convolution module with the down sampling of 2. Assuming that the input voxel sign tensor is represented as L × W × H × C, where L, W, H, C represents the length, width, height of the voxel scene and the characteristic dimension of each voxel, respectively, the output can be represented as L × W × H × C through 4 layers of sparse convolution
Figure BDA0003111771480000111
Wherein C' represents the feature dimension after feature extraction.
And 3.3, converting the semantic features obtained in the step 1 into voxel features by adopting trilinear interpolation.
Assuming that the three-layer semantic features obtained in the step 1 are
Figure BDA0003111771480000112
Where 4 x represents four times the down-sampling, being subject to the thinningThe convolved voxel coordinate is
Figure BDA0003111771480000113
Figure BDA0003111771480000114
The center of the voxel is represented by the center of the voxel,
Figure BDA0003111771480000115
representing the center of a voxel
Figure BDA0003111771480000116
The characteristics that need to be imparted. Converting the semantic features of the points into voxel center representation by trilinear interpolation
Figure BDA0003111771480000117
Representing distance in Euclidean space
Figure BDA0003111771480000118
Combination of nearest k points, Pt,4×、Pt,8×、Pt,16×Are all made of
Figure BDA0003111771480000119
A point of middle, then
Figure BDA00031117714800001110
The calculation method of (c) is as follows:
Figure BDA00031117714800001111
Figure BDA00031117714800001112
Figure BDA00031117714800001113
Figure BDA00031117714800001114
wherein,
Figure BDA00031117714800001115
representing the voxel center after 3D sparse convolution, Pt,4×、Pt,8×、Pt,16×Representing the spatial points at which the feature weighting is performed,
Figure BDA00031117714800001116
representing voxel center point
Figure BDA00031117714800001117
The t-th point in the neighborhood downsamples four times the layer's features, wtj,4×Representing the center of a voxel
Figure BDA00031117714800001118
The t-th point in the neighborhood downsamples four times the layer's feature weighting.
And 3.4, fusing the semantic features subjected to sparse convolution sensing in the step 3.2 with the voxel features obtained by conversion in the step 3.3 by adopting an attention mechanism mode to obtain semantic information fusing two sensing modes.
Firstly, two semantic information are connected in series on a characteristic dimension, and the characteristic dimension of the voxel obtained by conversion in the step 3.3 is assumed to be M1And the voxel characteristic obtained by sparse convolution perception is M2Then the characteristic dimension of the voxel after superposition is M1+M2Then, a layer of multilayer perceptron is adopted to combine the characteristics M1+M2Dimension mapping as M1
Step 4, projecting the semantic features obtained by fusing in the step 3 to a two-dimensional top view, building a network (RPN) through a two-dimensional convolution, classifying and regressing a detection anchor frame preset for each pixel point under the scene top view to obtain a final detection target, and the method comprises the following substeps:
step 4.1, RPN network structure and predefined box setting.
The RPN is built by a four-layer two-dimensional convolution neural network and is output layer by adopting a U-Net network structure, and the specific expression is
Figure BDA0003111771480000121
Each layer employs a 3 x 3 convolution to reduce the learning parameters. And further performing feature abstraction on the fused information by adopting an encoding-decoding network structure, presetting a corresponding detection anchor frame for each pixel point on a final feature map, and classifying and regressing the preset detection anchor frames to obtain an object detected by the RPN. A three-dimensional detection anchor frame can be expressed as { x, y, z, l, w, h, r }, (x, y, z) represents the center position of the detection anchor frame, l, w, h respectively correspond to length, width and height, and r is the rotation angle in the x-y plane. The voxel characteristics after 3D sparse convolution and semantic information fusion can be characterized as
Figure BDA0003111771480000122
Compressing the height dimension to the characteristic dimension to obtain a two-dimensional image which is characterized by
Figure BDA0003111771480000123
Thus for a size of
Figure BDA0003111771480000124
All together have
Figure BDA0003111771480000125
And a predefined detection anchor frame.
And 4.2, designing a point cloud target detection loss function.
And classifying and regressing the preset detection anchor frame by utilizing a classification loss function and a regression loss function so as to obtain the object detected by the RPN.
Classification loss function LclsUsing a cross entropy loss function, namely:
Figure BDA0003111771480000126
wherein n represents the number of the detection anchor frames set in advance, P (a)i) Represents the score of the ith test anchor frame prediction, Q (a)i) A tag value indicating the authenticity of the test anchor box.
Regression loss function LregUsing the Smooth-L1 loss function, namely:
Figure BDA0003111771480000127
Figure BDA0003111771480000131
in the formula, n represents the number of preset detection anchor frames, v represents the true value of the detection anchor frames, and v' represents the value of the detection anchor frames predicted by the RPN.
Through the combined supervision of the classification loss function and the regression loss function, the network can finally learn the capability of detecting the point cloud target.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (10)

1. A point cloud target detection method fusing an original point cloud and voxel division is characterized by comprising the following steps:
step 1, extracting local detail features and semantic features of point cloud by using a lossless feature extraction network Polnnet + +;
step 2, constructing a loss function, supervising the execution of the feature extraction in the step 1, and promoting the lossless feature extraction of the network Pointnet + + perception feature information;
step 3, embedding the local detail features and semantic features without information loss into a point cloud target detection network based on voxel division;
step 3.1, initializing voxel characteristics by using the local detail characteristics extracted in the step 1;
step 3.2, performing feature extraction on the voxel scene semantic information initialized in the step 3.1 by using 3D sparse convolution;
step 3.3, converting the semantic features obtained in the step 1 into voxel features by adopting trilinear interpolation;
step 3.4, fusing the semantic features subjected to sparse convolution sensing in the step 3.2 with the voxel features obtained by conversion in the step 3.3 by adopting an attention mechanism mode to obtain semantic information fusing two sensing modes;
and 4, projecting the semantic features obtained by fusing in the step 3 to a two-dimensional top view, providing a network RPN through a two-dimensional convolution building region, and classifying and regressing a detection anchor frame preset by each pixel point under the scene top view to obtain a final detection target.
2. The method for point cloud target detection by fusing original point cloud and voxel division according to claim 1, wherein the method comprises the following steps: the step 1 comprises the following substeps:
step 1.1, constructing a multilayer encoder;
collecting N points from the original point cloud as input point cloud by utilizing a farthest point sampling strategy, and then collecting the number of the points layer by layer from the input point cloud data by utilizing the farthest point sampling strategy
Figure FDA0003111771470000011
The point clouds of (1) form a 4-layer encoder, and the point clouds input by each layer are point sets output by the previous layer;
step 1.2, extracting the local detail features and semantic features of each layer of point cloud through a local point set feature extraction module without information loss;
and step 1.3, endowing the detail features and the semantic features extracted in the step 1.2 to all points in the original scene through a feature transfer layer by adopting tri-linear interpolation.
3. The fused raw point cloud and volume of claim 2The pixel division point cloud target detection method is characterized by comprising the following steps: the input of each layer of local point set feature extraction module in the step 1.2 is a fixed number of point sets obtained by sampling the last layer through a farthest point sampling strategy, and a point p is setiFor the ith point sampled by the farthest point sampling strategy in the current layer,
Figure FDA0003111771470000021
for the point p in the previous layeriSet of points within a sphere neighborhood of radius r, point p, centered oniThe calculation of the output characteristics comprises the following steps:
step 1.2.1, from the set
Figure FDA0003111771470000022
K points in the random sampling form a set
Figure FDA0003111771470000023
Step 1.2.2, performing feature fusion extraction on the points sampled in step 1.2.1 through a multilayer perceptron, wherein a calculation formula is as follows:
Figure FDA0003111771470000024
where MLP represents a high-dimensional mapping of the point features by the multi-tier perceptron, max () represents taking the maximum value on the feature dimension of the set of points, f (p)i) I.e. point piThe output characteristics of (1);
step 1.2.3, a farthest point sampling strategy is repeated for each layer of input point cloud to sample point cloud with a corresponding number, and neighborhood features are aggregated for the sampled points through step 1.2.2, so that feature extraction without information loss is completed, wherein local detail features are extracted from the first layer, and semantic features are extracted from the last three layers.
4. The method of claim 2, wherein the method comprises the step of fusing the original point cloud and the voxel divisionCharacterized in that: the feature transfer in the step 1.3 is the reverse process of feature extraction, starting from the last layer of the extraction layer, and sequentially performing feature transfer to the upper layer, namely, starting from the last layer
Figure FDA0003111771470000025
Layer transfer to
Figure FDA0003111771470000026
A layer,
Figure FDA0003111771470000027
Layer transfer to
Figure FDA0003111771470000028
A layer,
Figure FDA0003111771470000029
Layer transfer to
Figure FDA00031117714700000210
A layer,
Figure FDA00031117714700000211
The layers are transferred to the N layers to
Figure FDA00031117714700000212
Layer transfer to
Figure FDA00031117714700000213
Layer is a transfer process for example introduction characteristics, assuming point piIs composed of
Figure FDA00031117714700000214
The point at which a layer needs to transfer a feature, phi (P)i) To represent
Figure FDA00031117714700000215
Distance P in the Ou-space in the layeriCombination of nearest k points, PjTo representφ(Pi) In (2), the computation method of the trilinear interpolation feature transfer is as follows:
Figure FDA00031117714700000216
Figure FDA00031117714700000217
in the formula, f (p)i) Is a feature that requires transfer, f (p)j) Is indicated at point PiThe j point P in the neighborhoodjIs characterized by wijRepresenting point PiThe j point P in the neighborhoodjThe feature weighted weight of (1).
5. The method for point cloud target detection by fusing original point cloud and voxel division according to claim 1, wherein the method comprises the following steps: in the step 2, point cloud coordinates in an original scene are used as point cloud supervision information, Smooth-L1 loss is used as a loss function, and the calculation mode is as follows:
Figure FDA0003111771470000031
Figure FDA0003111771470000032
wherein r' and r respectively represent the point cloud space coordinates predicted by the lossless feature extraction network and the space coordinates of the original point cloud, phi (p) represents a point cloud set in the whole original scene, and the perception effect of the lossless feature extraction network Pointnt + + on the local neighborhood information is further improved under the supervision of a loss function.
6. The method for point cloud target detection by fusing original point cloud and voxel division according to claim 1, wherein the method comprises the following steps: said step (c) is3.1 the initialization is to divide the point cloud space into voxel grid, to keep the voxel containing point and discard the voxel not containing point, then to initialize the retained voxel by using the local detail character obtained in step 1, to make the local detail character
Figure FDA0003111771470000033
Representing distance V from voxel center in euclidean spacejCombination of nearest k points, PtTo represent
Figure FDA0003111771470000034
A point of (2), a voxel center feature
Figure FDA0003111771470000035
The calculation method of (c) is as follows:
Figure FDA0003111771470000036
Figure FDA0003111771470000037
wherein, Ft PRepresenting the voxel center point VjT-th feature point P in neighborhoodtIs characterized by wtjRepresenting the voxel center point VjT point in the neighborhood PtThe feature weighted weight of (1).
7. The method for point cloud target detection by fusing original point cloud and voxel division according to claim 1, wherein the method comprises the following steps: step 3.2 is to stack 4 layers of sparse convolution modules by using a Spconv library, wherein each sparse convolution module comprises two layers of sub-stream type convolution modules and one layer of point cloud sparse convolution module with 2 down-sampling, and if the input voxel sign tensor is represented as L multiplied by W multiplied by H multiplied by C, L, W, H, C respectively represents the length, width and height of a voxel scene and the characteristic dimension of each voxel, the 4 layers of sparse convolution modules are passed throughThe convolution output can be expressed as
Figure FDA0003111771470000041
C' represents a feature dimension after feature extraction.
8. The method for point cloud target detection by fusing original point cloud and voxel division according to claim 1, wherein the method comprises the following steps: in the step 3.3, the three layers of semantic information features extracted from the step 1 are assumed to be represented as
Figure FDA0003111771470000042
Where 4 x represents the down-sampling quadruple, order
Figure FDA0003111771470000043
Representing the voxel center of distance-passing sparse convolution in Euclidean space
Figure FDA0003111771470000044
Combination of nearest k points, Pt,4×、Pt,8×、Pt,16×Are all made of
Figure FDA0003111771470000045
Medium point, sparse convolved voxel center feature
Figure FDA0003111771470000046
The calculation method of (c) is as follows:
Figure FDA0003111771470000047
Figure FDA0003111771470000048
Figure FDA0003111771470000049
Figure FDA00031117714700000410
wherein,
Figure FDA00031117714700000411
representing the voxel center after 3D sparse convolution, Pt,4×、Pt,8×、Pt,16×Representing the spatial points at which the feature weighting is performed,
Figure FDA00031117714700000412
representing voxel center point
Figure FDA00031117714700000413
The t-th point in the neighborhood downsamples four times the layer's features, wtj,4×Representing the center of a voxel
Figure FDA00031117714700000414
The t-th point in the neighborhood downsamples four times the layer's feature weighting.
9. The method for point cloud target detection by fusing original point cloud and voxel division according to claim 1, wherein the method comprises the following steps: the step 4 comprises the following two substeps:
step 4.1, setting an RPN network structure and a predefined detection anchor frame;
the RPN is built by a four-layer two-dimensional convolution neural network and is output layer by adopting a U-Net network structure, and the specific expression is
Figure FDA00031117714700000415
Each layer adopts 3 multiplied by 3 convolution to reduce learning parameters, adopts a coding-decoding network structure to further abstract the characteristics of the fused information, and presets a corresponding pixel point on a final characteristic diagramThe detection anchor frame is characterized in that objects detected by RPN are obtained by classifying and regressing the preset detection anchor frame, one three-dimensional detection anchor frame can be expressed as { x, y, z, l, w, h, r }, (x, y, z) represents the central position of the detection anchor frame, l, w and h respectively correspond to length, width and height, r is a rotation angle on an x-y plane, and voxel characteristics after 3D sparse convolution and semantic information fusion can be represented as
Figure FDA0003111771470000051
Compressing the height dimension to the characteristic dimension to obtain a two-dimensional image which is characterized by
Figure FDA0003111771470000052
Thus for a size of
Figure FDA0003111771470000053
All together have
Figure FDA0003111771470000054
A predefined detection anchor frame;
and 4.2, designing a point cloud target detection loss function.
10. The method of detecting a point cloud target by fusing an original point cloud and a voxel partition according to claim 9, wherein: the point cloud target detection loss function in the step 4.2 comprises a classification loss function and a regression loss function, wherein the classification loss function LclsUsing a cross entropy loss function, namely:
Figure FDA0003111771470000055
wherein n represents the number of the detection anchor frames set in advance, P (a)i) Represents the score of the ith test anchor frame prediction, Q (a)i) A tag value representing the authenticity of the test anchor box;
regression loss function LregUsing the Smooth-L1 loss function, namely:
Figure FDA0003111771470000056
Figure FDA0003111771470000057
in the formula, n represents the number of preset detection anchor frames, v represents the true value of the detection anchor frames, and v' represents the value of the detection anchor frames predicted by the RPN.
CN202110651776.XA 2021-06-11 2021-06-11 Point cloud target detection method integrating original point cloud and voxel division Pending CN113378854A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110651776.XA CN113378854A (en) 2021-06-11 2021-06-11 Point cloud target detection method integrating original point cloud and voxel division

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110651776.XA CN113378854A (en) 2021-06-11 2021-06-11 Point cloud target detection method integrating original point cloud and voxel division

Publications (1)

Publication Number Publication Date
CN113378854A true CN113378854A (en) 2021-09-10

Family

ID=77573977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110651776.XA Pending CN113378854A (en) 2021-06-11 2021-06-11 Point cloud target detection method integrating original point cloud and voxel division

Country Status (1)

Country Link
CN (1) CN113378854A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900119A (en) * 2021-09-29 2022-01-07 苏州浪潮智能科技有限公司 Laser radar vehicle detection method, system, storage medium and equipment
CN114120115A (en) * 2021-11-19 2022-03-01 东南大学 Point cloud target detection method for fusing point features and grid features
CN114155524A (en) * 2021-10-29 2022-03-08 中国科学院信息工程研究所 Single-stage 3D point cloud target detection method and device, computer equipment and medium
CN114463736A (en) * 2021-12-28 2022-05-10 天津大学 Multi-target detection method and device based on multi-mode information fusion
CN114494183A (en) * 2022-01-25 2022-05-13 哈尔滨医科大学附属第一医院 Artificial intelligence-based automatic acetabular radius measurement method and system
CN114638953A (en) * 2022-02-22 2022-06-17 深圳元戎启行科技有限公司 Point cloud data segmentation method and device and computer readable storage medium
CN114821033A (en) * 2022-03-23 2022-07-29 西安电子科技大学 Three-dimensional information enhanced detection and identification method and device based on laser point cloud
CN114882495A (en) * 2022-04-02 2022-08-09 华南理工大学 3D target detection method based on context-aware feature aggregation
CN115222988A (en) * 2022-07-17 2022-10-21 桂林理工大学 Laser radar point cloud data urban ground feature PointEFF fine classification method
CN115375731A (en) * 2022-07-29 2022-11-22 大连宗益科技发展有限公司 3D point cloud single-target tracking method of associated points and voxels and related device
CN115471513A (en) * 2022-11-01 2022-12-13 小米汽车科技有限公司 Point cloud segmentation method and device
CN116664874A (en) * 2023-08-02 2023-08-29 安徽大学 Single-stage fine-granularity light-weight point cloud 3D target detection system and method
CN117058402A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Real-time point cloud segmentation method and device based on 3D sparse convolution
CN117475410A (en) * 2023-12-27 2024-01-30 山东海润数聚科技有限公司 Three-dimensional target detection method, system, equipment and medium based on foreground point screening

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109118564A (en) * 2018-08-01 2019-01-01 湖南拓视觉信息技术有限公司 A kind of three-dimensional point cloud labeling method and device based on fusion voxel
CN111160214A (en) * 2019-12-25 2020-05-15 电子科技大学 3D target detection method based on data fusion
CN112052860A (en) * 2020-09-11 2020-12-08 中国人民解放军国防科技大学 Three-dimensional target detection method and system
CN112418084A (en) * 2020-11-23 2021-02-26 同济大学 Three-dimensional target detection method based on point cloud time sequence information fusion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109118564A (en) * 2018-08-01 2019-01-01 湖南拓视觉信息技术有限公司 A kind of three-dimensional point cloud labeling method and device based on fusion voxel
CN111160214A (en) * 2019-12-25 2020-05-15 电子科技大学 3D target detection method based on data fusion
CN112052860A (en) * 2020-09-11 2020-12-08 中国人民解放军国防科技大学 Three-dimensional target detection method and system
CN112418084A (en) * 2020-11-23 2021-02-26 同济大学 Three-dimensional target detection method based on point cloud time sequence information fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHARLES R. QI ET AL.: "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space", 《ADVANCES 31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017)》 *
TIANYUAN JIANG,ET AL.: "VIC-Net: Voxelization Information Compensation Network for Point Cloud 3D Object Detection", 《2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021)》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900119B (en) * 2021-09-29 2024-01-30 苏州浪潮智能科技有限公司 Method, system, storage medium and equipment for laser radar vehicle detection
CN113900119A (en) * 2021-09-29 2022-01-07 苏州浪潮智能科技有限公司 Laser radar vehicle detection method, system, storage medium and equipment
CN114155524A (en) * 2021-10-29 2022-03-08 中国科学院信息工程研究所 Single-stage 3D point cloud target detection method and device, computer equipment and medium
CN114120115A (en) * 2021-11-19 2022-03-01 东南大学 Point cloud target detection method for fusing point features and grid features
CN114120115B (en) * 2021-11-19 2024-08-23 东南大学 Point cloud target detection method integrating point features and grid features
CN114463736A (en) * 2021-12-28 2022-05-10 天津大学 Multi-target detection method and device based on multi-mode information fusion
CN114494183A (en) * 2022-01-25 2022-05-13 哈尔滨医科大学附属第一医院 Artificial intelligence-based automatic acetabular radius measurement method and system
CN114494183B (en) * 2022-01-25 2024-04-02 哈尔滨医科大学附属第一医院 Automatic acetabular radius measurement method and system based on artificial intelligence
CN114638953A (en) * 2022-02-22 2022-06-17 深圳元戎启行科技有限公司 Point cloud data segmentation method and device and computer readable storage medium
CN114638953B (en) * 2022-02-22 2023-12-22 深圳元戎启行科技有限公司 Point cloud data segmentation method and device and computer readable storage medium
CN114821033A (en) * 2022-03-23 2022-07-29 西安电子科技大学 Three-dimensional information enhanced detection and identification method and device based on laser point cloud
CN114882495A (en) * 2022-04-02 2022-08-09 华南理工大学 3D target detection method based on context-aware feature aggregation
CN114882495B (en) * 2022-04-02 2024-04-12 华南理工大学 3D target detection method based on context-aware feature aggregation
CN115222988A (en) * 2022-07-17 2022-10-21 桂林理工大学 Laser radar point cloud data urban ground feature PointEFF fine classification method
CN115375731A (en) * 2022-07-29 2022-11-22 大连宗益科技发展有限公司 3D point cloud single-target tracking method of associated points and voxels and related device
CN115471513A (en) * 2022-11-01 2022-12-13 小米汽车科技有限公司 Point cloud segmentation method and device
CN116664874B (en) * 2023-08-02 2023-10-20 安徽大学 Single-stage fine-granularity light-weight point cloud 3D target detection system and method
CN116664874A (en) * 2023-08-02 2023-08-29 安徽大学 Single-stage fine-granularity light-weight point cloud 3D target detection system and method
CN117058402B (en) * 2023-08-15 2024-03-12 北京学图灵教育科技有限公司 Real-time point cloud segmentation method and device based on 3D sparse convolution
CN117058402A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Real-time point cloud segmentation method and device based on 3D sparse convolution
CN117475410B (en) * 2023-12-27 2024-03-15 山东海润数聚科技有限公司 Three-dimensional target detection method, system, equipment and medium based on foreground point screening
CN117475410A (en) * 2023-12-27 2024-01-30 山东海润数聚科技有限公司 Three-dimensional target detection method, system, equipment and medium based on foreground point screening

Similar Documents

Publication Publication Date Title
CN113378854A (en) Point cloud target detection method integrating original point cloud and voxel division
Zamanakos et al. A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving
CN112529015B (en) Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping
CN109410307B (en) Scene point cloud semantic segmentation method
CN114937151B (en) Lightweight target detection method based on multiple receptive fields and attention feature pyramid
Ye et al. 3d recurrent neural networks with context fusion for point cloud semantic segmentation
CN111242041B (en) Laser radar three-dimensional target rapid detection method based on pseudo-image technology
CN112488210A (en) Three-dimensional point cloud automatic classification method based on graph convolution neural network
CN113850270B (en) Semantic scene completion method and system based on point cloud-voxel aggregation network model
CN114255238A (en) Three-dimensional point cloud scene segmentation method and system fusing image features
CN113345082A (en) Characteristic pyramid multi-view three-dimensional reconstruction method and system
CN112347987A (en) Multimode data fusion three-dimensional target detection method
Cheng et al. S3Net: 3D LiDAR sparse semantic segmentation network
CN112560865B (en) Semantic segmentation method for point cloud under outdoor large scene
CN113870160B (en) Point cloud data processing method based on transformer neural network
CN114373104A (en) Three-dimensional point cloud semantic segmentation method and system based on dynamic aggregation
Ahmad et al. 3D capsule networks for object classification from 3D model data
CN115147601A (en) Urban street point cloud semantic segmentation method based on self-attention global feature enhancement
CN112488117B (en) Point cloud analysis method based on direction-induced convolution
Hazer et al. Deep learning based point cloud processing techniques
CN117765258A (en) Large-scale point cloud semantic segmentation method based on density self-adaption and attention mechanism
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN111860668A (en) Point cloud identification method of deep convolution network for original 3D point cloud processing
CN116894940A (en) Point cloud semantic segmentation method based on feature fusion and attention mechanism
CN115424225A (en) Three-dimensional real-time target detection method for automatic driving system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210910