CN116385452A - LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph - Google Patents
LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph Download PDFInfo
- Publication number
- CN116385452A CN116385452A CN202310273933.7A CN202310273933A CN116385452A CN 116385452 A CN116385452 A CN 116385452A CN 202310273933 A CN202310273933 A CN 202310273933A CN 116385452 A CN116385452 A CN 116385452A
- Authority
- CN
- China
- Prior art keywords
- bev
- segmentation
- point cloud
- instance
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 129
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 238000011176 pooling Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 230000001629 suppression Effects 0.000 claims description 4
- 238000013459 approach Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 abstract description 3
- 239000011159 matrix material Substances 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 6
- 238000013139 quantization Methods 0.000 description 4
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 101001115724 Homo sapiens MORN repeat-containing protein 2 Proteins 0.000 description 1
- 102100023291 MORN repeat-containing protein 2 Human genes 0.000 description 1
- 101100077717 Mus musculus Morn2 gene Proteins 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Compression Of Band Width Or Redundancy In Fax (AREA)
- Image Generation (AREA)
Abstract
The invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV diagram, which comprises the steps of polar coordinate BEV coding, semantic/instance segmentation prediction, point cloud panorama segmentation fusion and the like, wherein the polar coordinate BEV coding is 2D BEV coding for coding original point cloud data into fixed size under polar coordinates; the semantic/instance segmentation prediction is to generate independent 3D semantic prediction, 2D BEV center heat map and 2D instance center offset for the coded point cloud feature matrix through a reference depth network; the point cloud panoramic segmentation fusion firstly generates a 2D BEV Things mask through a 3D semantic segmentation result, forms a class-agnostic instance cluster with a 2D BEV central heat map and an instance center offset, and is combined with 3D semantic segmentation prediction to form a final panoramic segmentation result.
Description
Technical Field
The invention relates to the technical field of digital image processing, in particular to a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV graph.
Background
Image segmentation for video analysis plays an important role in different research fields such as smart cities, medical care, computer vision, remote sensing applications and the like. Panoramic segmentation is the result of fusion of semantic segmentation and instance segmentation, and can help to obtain finer knowledge of image scenes such as video monitoring, crowd counting, automatic driving, medical image analysis and the like, and deeper understanding of general scenes. With the introduction of LiDAR point cloud datasets, the nature of 3D data, real-time processing requirements, and the level of accuracy required for security and security (e.g., in an autonomous car) present new challenges to panoramic segmentation. The goal is to effectively resolve panorama segmentation with minimal prediction conflicts (instances and classes) and achieve real-time or near real-time speeds without affecting accuracy.
Some researchers have explored indoor point cloud panorama segmentation methods that combine instance segmentation and semantic segmentation methods. Liu et al published paper "Self-prediction for joint instance and semantic segmentation of point clouds" (In ECCV, 2020), proposed to use discriminant loss to learn embedded feature space to cluster instances; zhou et al published paper "join 3d instance segmentation and object detection for autonomous driving" (In CVPR, 2020), proposes to extract instance partitions from region proposals for semantic partition clusters; hurtado et al In the paper "Mopt: multi-object panoptic tracking" (In CVPR workbench, 2020) propose a MOPT model, appending semantic headers to Mask R-CNN to generate panoramic segmentations on the range images; milio et al In paper "Lidar panoptic segmentation for autonomous driving" (In IROS, 2020) propose to first resolve the LiDAR point Yun Quanjing segmentation on the range image and then restore it to the point cloud level by tri-linear up-sampling; the paper "panotic-polar net: proposal-free LiDAR Point Cloud Panoptic Segmentation" (InCVPR, 2021) by Zhou et al proposes a fast, robust LiDAR point cloud panorama segmentation framework (panotic-polar net), using polar Bird's Eye View (BEV) representation, learning semantic segmentation and class-independent clustering of examples In a single inference network, which can circumvent the occlusion problem between examples In urban street scenes, and proposes a highly adaptive example enhancement technique and a novel antagonistic point cloud pruning method to improve the network's learning ability.
The invention patent application with the application number of CN113379748A discloses a point cloud panorama segmentation method and device, and the method comprises the following steps: a point cloud mapping step of projecting the acquired point cloud to a world coordinate system to acquire a mapping point cloud; a video frame association step of projecting each point cloud point in the map-building point cloud into a projectable video frame; and panoramic segmentation, namely panoramic segmentation is carried out on the projectable video frames so as to carry out unified numbering on semantic identification probability of each point cloud point. This method has a disadvantage in that although panoramic segmentation is possible, the segmentation speed is relatively slow and accuracy is not high.
Disclosure of Invention
In order to solve the technical problems, the invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV graph, which comprises the steps of polar coordinate BEV coding, semantic/instance segmentation prediction, point cloud panorama segmentation fusion and the like. Wherein the polar BEV encoding is a 2D BEV encoding that encodes the original point cloud data into a fixed size at polar coordinates; the semantic/instance segmentation prediction is to generate independent 3D semantic prediction, 2D BEV center heat map and 2D instance center offset for the coded point cloud feature matrix through a reference depth network; the point cloud panoramic segmentation fusion firstly generates a 2D BEV Things mask through a 3D semantic segmentation result, forms a class-agnostic instance cluster with a 2D BEV central heat map and an instance center offset, and is combined with 3D semantic segmentation prediction to form a final panoramic segmentation result. The invention can improve the accuracy and the robustness of panoramic segmentation and realize the real-time or near real-time segmentation speed.
The invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV graph, which comprises the steps of obtaining original point cloud data containing points with random sizes, and further comprises the following steps:
step 1: performing polar coordinate BEV coding on the original point cloud data;
step 2: given a LiDAR point cloud space, carrying out semantic/instance segmentation prediction on the BEV codes with fixed sizes;
step 3: and carrying out panorama segmentation fusion on the semantic/instance segmentation prediction result to form a 3D panorama segmentation result.
Preferably, the polar BEV encoding means that the original point cloud data is processed by creating a fixed-size representation by projection and quantization, and the distribution of points in different ranges is balanced by the polar representation points.
In any of the above schemes, preferably, the step 1 includes the following substeps:
step 11: grouping the original point cloud data according to the position of the BEV graph in polar coordinates;
step 12: performing block coding by using polar Net point cloud;
step 13: loading a max-pooling layer on each BEV grid, creating a fixed-size BEV code,wherein (1)>For real space, H and W are the mesh sizes of the BEV map, and C is the characteristic channel.
In any of the above aspects, preferably, the step 11 includes combining point cloud dataGrouping intoWherein D is the input feature dimension, N is the number of point clouds, N * Is the number of points in each BEV mesh.
In any of the above embodiments, preferably, the stepStep 12 includes sharing a multi-layer perceptron MLP, using polar Net network to group point cloudsEncoding is performed.
In any of the above schemes, preferably, the step 2 includes the following substeps:
step 21: traversing all points in the LiDAR point cloud, and calculating the visibility of the whole 3D space, namely under a polar coordinate system, taking all points (x, y, z) which are in the same direction alpha (x, y, z) and meet 0< alpha <1 into the visibility space;
step 22: constructing a reference depth network by taking a Unet as a basic framework, wherein the reference depth network comprises a depth network model of 4 coding layers and 4 decoding layers;
step 23: connecting the visibility feature with a feature representation generated by a polar BEV encoder, inputting the visibility feature into the reference depth network, and generating a 2D instance header and a 3D semantic header;
step 24: processing the 2D instance header;
step 25: and processing the 3D semantic header.
In any of the above-described aspects it is preferred that, each layer of the coding portion of the reference depth network consists of a 3 x 3 convolution, a batch normalization process, a correction linear unit, and a max pooling operation; each layer of the decoding section consists of an upsampling convolution, attention-gating based feature concatenation, and a 3 x 3 convolution; the last layer in FCN-1 normalizes the output to a probability map of [0,1] using the sigmoid function as the activation function.
Preferably in any of the above schemes, the step 24 further comprises predicting a center heat map and an offset to the center of the object for each BEV pixel using a 2D instance header, grouping pixels having the same nearest center into the same group, providing class independent instance groupings using a bottom-up approach, and encoding the group-trunk center map by training with a two-dimensional gaussian distribution centered around each instance centroid without marking the bounding box.
Any of the abovePreferably, each pixel in the BEV map is set to be p, and then the center thereof is the thermal map H p The expression is as follows:
wherein C is i Is the centroid of one example in the polar BEV.
In any of the above schemes, preferably, the step 25 further includes sharing the first 3 decoding layers with the instance segmentation, generating a plurality of predictions at each pixel point, and recombining the predictions into 3D voxels to separate markers at different heights along the Z-axis, calculating voxel level losses for a plurality of points within the same voxel using a voting algorithm, and generating the 3D semantic segmentation predictions.
In any of the above schemes, preferably, the step 3 includes the following substeps:
step 31: selecting the first k centers from the 2D BEV center heat map by a non-maximum suppression operation;
step 32: creating a 2D BEV foreground mask using the 3D semantic segmentation prediction while ensuring that at least one thongs class can be detected for each BEV pixel;
step 33: calculating the foreground pixels p to k example centroids c i Minimum distance d (p, c) of (i=1, 2, …, k) i ) And groups them;
step 34: prediction of thins classes in semantic segmentation heads using majority voting based on semantic segmentation probabilities for each group G in BEV i Designating a unique instance tag L;
step 35: and fusing the generated class agnostic instance cluster with 3D semantic segmentation prediction, and finally outputting a 3D panoramic segmentation result through a majority voting mechanism.
In any of the above embodiments, it is preferable that the minimum distance d (p, c i ) The expression of (2) is:
d(p,c i )=‖p+offset(p)-c i ‖
wherein offset (p) is the center offset of pixel p.
In any of the above aspects, preferably, the expression of the semantic segmentation probability is:
the invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV map, which simultaneously learns semantic and instance characteristics on a discretized BEV map, rapidly and robustly realizes the point cloud panorama segmentation based on LiDAR, effectively solves panorama segmentation with minimum conflict between predicted instance and class, and realizes real-time or near real-time speed under the condition of not affecting accuracy.
BEV: birds eye view, i.e., a bird's eye view.
LiDAR: is a system integrating laser, global positioning system and inertial navigation system.
Polar net network: is a lightweight neural network used for realizing real-time on-line semantic segmentation for single laser radar scanning data.
The thins class: i.e. class of things.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of a LiDAR point cloud panorama segmentation method based on polar BEV graphs according to the present invention.
FIG. 2 is a flow chart of another preferred embodiment of a LiDAR point cloud panorama segmentation method based on polar BEV graphs according to the present invention.
FIG. 3 is a schematic diagram of one embodiment of a reference network model of a LiDAR point cloud panorama segmentation method based on polar BEV graphs, according to the present invention.
FIG. 4 is a semantic/instance segmentation visualization at one embodiment of a COCO dataset of a LiDAR point cloud panorama segmentation method based on polar BEV graphs according to the present invention.
FIG. 5 is a schematic view of a panoramic segmentation visualization on a Semantic KITTI dataset of an embodiment of a LiDAR point cloud panoramic segmentation method based on polar BEV graphs in accordance with the present invention.
Detailed Description
The invention is further illustrated by the following figures and specific examples.
Example 1
As shown in fig. 1, step 100 is performed, including obtaining origin point cloud data comprising randomly sized points,
step 111 is executed to group the original point cloud data according to the position of the BEV graph in polar coordinates, and the point cloud data is obtainedGrouping->Wherein (1)>For real space, D is the dimension of the input feature, N is the number of point clouds, N * Is the number of points in each BEV mesh.
Step 112 is executed to share a multi-layer perceptron MLP, and point-to-point cloud grouping is performed by using polar Net networkEncoding is performed.
Step 113 is performed, loading a max-pooling layer on each BEV grid, creating a fixed-size BEV code,where H and W are the mesh sizes of the BEV map and C is the characteristic channel.
Step 120 is performed to perform semantic/instance segmentation prediction on the BEV code of fixed size, the semantic/instance segmentation prediction being given a LiDAR point cloud space for all points And realizing the segmentation prediction of semantics and examples. In this step, the following sub-steps are included:
step 121 is performed to traverse all points in the LiDAR point cloud, and calculate the visibility of the entire 3D space, i.e., under a polar coordinate system, all points (x, y, z) along the same direction α (x, y, z) and satisfying 0< α <1 are included in the visibility space.
Step 122 is performed to construct a reference depth network based on the Unet framework, including depth network models of 4 coding layers and 4 decoding layers.
Step 123 is performed to connect the visibility feature with the feature representation generated by the polar BEV encoder, input into the reference depth network, and generate a 2D instance header and a 3D semantic header. Each layer of the coding portion of the reference depth network consists of a 3 x 3 convolution, a batch normalization process, a correction linear unit, and a max pooling operation; each layer of the decoding section consists of an upsampling convolution, attention-gating based feature concatenation, and a 3 x 3 convolution; the last layer in FCN-1 normalizes the output to a probability map of [0,1] using the sigmoid function as the activation function.
Step 124 is performed to process the 2D instance header, predict a center heat map of each BEV pixel and an offset to the center of the object using the 2D instance header, group pixels with the same nearest center into the same group, provide class independent instance groupings using a bottom-up approach to avoid collisions between class prediction and training instance headers, and do not mark bounding boxes, encode the group-trunk center map by training with a two-dimensional gaussian distribution centered on each instance centroid. Setting each pixel in the BEV plot to p, then center heat map H p The expression is as follows:
wherein the method comprises the steps of,C i Is the centroid of one example in the polar BEV.
Step 125 is executed to process the 3D semantic header, share the first 3 decoding layers with the instance segmentation, generate multiple predictions at each pixel point, and recombine into 3D voxels to separate markers at different heights along the Z-axis, calculate voxel level loss for multiple points within the same voxel using a voting algorithm, and generate a 3D semantic segmentation prediction.
Step 130 is executed to perform panorama segmentation fusion on the semantic/instance segmentation prediction result to form a 3D panorama segmentation result. In this step, the following sub-steps are included:
step 131 is performed to select the first k centers from the 2D BEV center heat map by a non-maximum suppression operation.
Step 132 is performed to create a 2D BEV foreground mask using the 3D semantic segmentation prediction while ensuring that at least one thongs class can be detected for each BEV pixel.
Step 133 is performed to calculate the foreground pixels p to k instance centroids c i Minimum distance d (p, c) of (i=1, 2, …, k) i ) And groups them, the minimum distance d (p, c i ) The expression of (2) is:
d(p,c i )=‖p+offset(p)-c i ‖
wherein offset (p) is the center offset of pixel p.
Step 134 is executed to predict Things class in semantic segmentation head using majority voting method according to semantic segmentation probability for each group G in BEV i Assigning a unique instance label L, wherein the expression of the semantic segmentation probability is as follows:
step 135 is executed, in which the generated class-agnostic instance clusters are fused with 3D semantic segmentation predictions, and the 3D panorama segmentation result is finally output through a majority voting mechanism.
Example two
The invention provides a panoramic segmentation framework which can learn the semantics and the example characteristics on a discretized BEV map at the same time and realize the point cloud panoramic segmentation based on LiDAR rapidly and robustly. Considering the unique features of LiDAR data, panoramic segmentation is effectively resolved with minimal prediction conflicts (instances and classes) and real-time or near real-time speeds are achieved without affecting accuracy.
As shown in fig. 2, the method for segmenting the LiDAR point cloud panorama based on the polar coordinate BEV graph provided by the invention comprises the following specific steps:
and 1, performing polar coordinate BEV coding on the original point cloud data. First, the point cloud data is calculated from the position in the polar BEV graphGrouping->Where D is the input feature dimension, H and W are the mesh size of the BEV map, N * Is the number of points in each BEV grid; then, sharing a multi-layer perceptron MLP, grouping point clouds by using a polar Net network +.>Coding; finally, a Max-pooling layer is loaded on each BEV grid creating a fixed-size representation +.>Wherein C is a characteristic channel, and c=512 is taken here.
And 2, carrying out semantic/instance segmentation prediction on BEV codes with fixed sizes. Given a LiDAR point cloud space, for all pointsAnd realizing the segmentation prediction of semantics and examples. The specific implementation method is as follows:
1) Traversing all points in the LiDAR point cloud, and calculating the visibility of the whole 3D space, namely under a polar coordinate system, taking all points (x, y, z) which are in the same direction alpha (x, y, z) and meet 0< alpha <1 into the visibility space;
2) A reference depth network is designed, based on the Unet, comprising a depth network model of 4 coding layers and 4 decoding layers. Wherein each layer of the coding section consists of a 3 x 3 convolution, a batch normalization process, a correction linear unit (ReLU) and a max pooling operation; each layer of the decoding section consists of an upsampling convolution, attention Gate (AG) based feature concatenation, and a 3 x 3 convolution. The last layer in FCN-1 uses sigmoid function as activation function, normalizes the output to probability map of [0,1], and the specific network model is shown in FIG. 3;
3) Connecting the visibility feature with the feature representation generated by the polar BEV encoder and inputting the connection result in the implementation into the reference depth network shown in fig. 3, generating a 2D instance header and a 3D semantic header;
4) The center heat map of each BEV pixel and the offset to the center of the object are predicted by using the 2D example head, the pixels with the same nearest center are divided into the same group, the example grouping irrelevant to the class is provided by adopting a bottom-up method, so as to avoid the conflict between the class prediction and the training example head, and the bounding box is not marked, the group-trunk center map is trained by two-dimensional Gaussian distribution taking the center of mass of each example as the center. Every pixel in the BEV plot is p, then its center is heat-map H p The expression can be as follows:
wherein C is i Is the centroid of one instance in the polar BEV;
5) The first 3 decoding layers are shared with the example segmentation, a plurality of predictions are generated at each pixel point and recombined into 3D voxels to separate marks at different heights along a Z axis, a voting algorithm is utilized for a plurality of points in the same voxel, voxel level losses are calculated, and 3D semantic segmentation predictions are generated.
And 3, carrying out panoramic segmentation fusion on the semantic/instance segmentation prediction result to form a 3D panoramic segmentation result. The specific calculation method is as follows:
1) Selecting the first k centers from the 2D BEV center heat map by a non-maximum suppression operation;
2) Creating a 2D BEV foreground mask using the 3D semantic segmentation predictions generated in steps 2-5) while ensuring that each BEV pixel can detect at least one "thongs" class;
3) Calculating the foreground pixels p to k example centroids c i Minimum distance d (p, c) of (i=1, 2, …, k) i ) And groups them, the expression of the minimum distance is as follows:
d(p,c i )=‖p+offset(p)-c i ‖ (2)
wherein offset (p) is the center offset of pixel p;
4) Predicting the "thins" class in the semantic segmentation head using majority voting based on semantic segmentation probabilities, for each group G in the BEV i A unique instance tag L is specified, wherein the semantic segmentation probability is expressed as follows:
5) And fusing the generated class agnostic instance cluster with 3D semantic segmentation prediction, and finally outputting a 3D panoramic segmentation result through a majority voting mechanism.
Example III
The invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV graph, which comprises the following steps:
step 1. Polar BEV encoding step
The polar BEV encoding is to create a fixed-size representation by projection and quantization to process a point cloud containing randomly sized points, and to use the polar representation points to balance the distribution of points in different ranges, specifically the steps are:
(1.1) grouping the raw point cloud data according to the position in the polar BEV plot;
(1.2) adopting a shared multi-layer perceptron mechanism, and utilizing a simplified polar net point cloud to carry out block coding;
(1.3) loading a max pooling layer on each BEV grid, creating a fixed size BEV code;
step 2. Semantic/instance segmentation prediction step
The semantic/instance segmentation prediction is realized by taking U-Net with a symmetrical structure as a reference network, and comprises the following steps of:
(2.1) fusing the BEV codes output in the step (1.3) with visibility characteristics, and sending the fused BEV codes as input into a reference depth network for training to generate a 2D instance header and a 3D semantic header;
(2.2) processing the 2D instance header output from (2.1), calculating pixel level loss, generating a 2D BEV center heat map and a 2D instance center offset;
(2.3) processing the 3D semantic header output in (2.1), calculating voxel level loss, and generating 3D semantic segmentation prediction.
Step 3, panoramic segmentation and fusion step
The panorama segmentation fusion is to fuse the prediction results from the semantic header and the instance header to create a final panorama segmentation result, and the steps are as follows:
(3.1) framing the 3D semantic segmentation prediction output in (2.3) to generate a 2D BEV inputs mask;
(3.2) merging the 2D BEV thongs mask generated in (3.1) with the 2D BEV center heat map and the 2D instance center offset output in (2.2) to generate a class-agnostic instance cluster;
and (3.3) performing majority voting fusion on the class-agnostic instance cluster generated in the step (3.2) and the 3D semantic segmentation prediction output in the step (2.3), and finally generating a 3D panoramic segmentation result.
The invention discloses a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV diagram, which belongs to a panorama segmentation frame based on no proposal. Aiming at the requirement of 3D point cloud panoramic segmentation, the invention carries out technical research and algorithm improvement on the aspects of point cloud coding, semantic/instance segmentation prediction conflict, panoramic segmentation strategy and the like, and provides an effective processing strategy. Compared with the prior art, the invention has the advantages that:
1) In the original point cloud coding link, a polar coordinate BEV coding mode is adopted, the polar coordinates balance the distribution of points in different ranges, better potential is provided for a neural network, distinguishing characteristics can be learned at a position close to a sensor, and information loss caused by quantization is reduced to the minimum; at the same time, BEV provides a compromise between computational cost and accuracy, enabling the use of more efficient 2D convolution networks to process data and obtain an optimal projection of object detection;
2) In the semantic/instance segmentation link, the non-proposal design is utilized, and the instance head is trained under the condition of no boundary box annotation, so that the conflict of class prediction is effectively avoided;
3) In the panoramic segmentation link, a strategy of sharing a decoding layer between a semantic header and an instance header and performing early fusion at a feature extraction level is designed, so that redundancy among networks is reduced, and the calculation efficiency is improved.
The foregoing description of the invention has been presented for purposes of illustration and description, but is not intended to be limiting. Any simple modification of the above embodiments according to the technical substance of the present invention still falls within the scope of the technical solution of the present invention. In this specification, each embodiment is mainly described in the specification as a difference from other embodiments, and the same or similar parts between the embodiments need to be referred to each other. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.
Claims (10)
1. The LiDAR point cloud panorama segmentation method based on the polar coordinate BEV graph comprises the steps of obtaining original point cloud data containing points with random sizes, and is characterized by further comprising the following steps:
step 1: performing polar coordinate BEV coding on the original point cloud data;
step 2: given a LiDAR point cloud space, carrying out semantic/instance segmentation prediction on the BEV codes with fixed sizes;
step 3: and carrying out panorama segmentation fusion on the semantic/instance segmentation prediction result to form a 3D panorama segmentation result.
2. The method for segmenting the LiDAR point cloud panorama based on the polar BEV graph according to claim 1, wherein the step 1 comprises the following steps:
step 11: grouping the original point cloud data according to the position of the BEV graph in polar coordinates;
step 12: performing block coding by using polar Net point cloud;
3. The method for panoramic segmentation of LiDAR point clouds based on polar BEV graphs according to claim 2, wherein said step 11 comprises the step of mapping the point cloud dataGrouping->Wherein D is the input feature dimension, N is the number of point clouds, N * Is the number of points in each BEV mesh.
5. The method for panoramic segmentation of LiDAR point clouds based on polar BEV graphs according to claim 4, wherein said step 2 comprises the sub-steps of:
step 21: traversing all points in the LiDAR point cloud, and calculating the visibility of the whole 3D space, namely under a polar coordinate system, taking all points (x, y, z) which are along the same direction alpha (x, y, z) and meet 0< alpha <1 into the visibility space;
step 22: constructing a reference depth network by taking a Unet as a basic framework, wherein the reference depth network comprises a depth network model of 4 coding layers and 4 decoding layers;
step 23: connecting the visibility feature with a feature representation generated by a polar BEV encoder, inputting the visibility feature into the reference depth network, and generating a 2D instance header and a 3D semantic header;
step 24: processing the 2D instance header;
step 25: and processing the 3D semantic header.
6. The method of claim 5, wherein each layer of the coded portion of the reference depth network consists of a 3 x 3 convolution, a batch normalization process, a correction linear unit, and a max pooling operation; each layer of the decoding section consists of an upsampling convolution, attention-gating based feature concatenation, and a 3 x 3 convolution; the last layer in FCN-1 normalizes the output to a probability map of [0,1] using the sigmoid function as the activation function.
7. The method of claim 6, wherein step 24 further comprises predicting a center heat map and an offset to the center of the object for each BEV pixel using a 2D instance header, grouping pixels having the same nearest center into the same group, providing class independent instance groupings using a bottom-up approach, and encoding the group-trunk center map by training a two-dimensional gaussian distribution centered around each instance centroid without marking the bounding box.
8. The LiDAR point cloud panorama segmentation method based on polar BEV images according to claim 7, wherein each pixel in the BEV image is set to be p, and then the center of the BEV image is heat-map H p The expression is as follows:
wherein C is i Is the centroid of one example in the polar BEV.
9. The method of claim 8, wherein step 25 further comprises sharing the first 3 decoding layers with the instance segmentation, generating a plurality of predictions at each pixel point and reorganizing into 3D voxels to separate markers at different heights along the Z-axis, calculating voxel level losses for a plurality of points within the same voxel using a voting algorithm, and generating a 3D semantic segmentation prediction.
10. The method for panoramic segmentation of LiDAR point clouds based on polar BEV graphs according to claim 9, wherein said step 3 comprises the sub-steps of:
step 31: selecting the first k centers from the 2D BEV center heat map by a non-maximum suppression operation;
step 32: creating a 2D BEV foreground mask using the 3D semantic segmentation prediction while ensuring that at least one thongs class can be detected for each BEV pixel;
step 33: calculating the foreground pixels p to k example centroids c i Minimum distance d (p, c) of (i=1, 2, …, k) i ) And groups them;
step 34: prediction of thins classes in semantic segmentation heads using majority voting based on semantic segmentation probabilities for each group G in BEV i Designating a unique instance tag L;
step 35: and fusing the generated class agnostic instance cluster with 3D semantic segmentation prediction, and finally outputting a 3D panoramic segmentation result through a majority voting mechanism.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310273933.7A CN116385452A (en) | 2023-03-20 | 2023-03-20 | LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310273933.7A CN116385452A (en) | 2023-03-20 | 2023-03-20 | LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116385452A true CN116385452A (en) | 2023-07-04 |
Family
ID=86962623
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310273933.7A Pending CN116385452A (en) | 2023-03-20 | 2023-03-20 | LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116385452A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111145174A (en) * | 2020-01-02 | 2020-05-12 | 南京邮电大学 | 3D target detection method for point cloud screening based on image semantic features |
CN114529727A (en) * | 2022-04-25 | 2022-05-24 | 武汉图科智能科技有限公司 | Street scene semantic segmentation method based on LiDAR and image fusion |
JP7224682B1 (en) * | 2021-08-17 | 2023-02-20 | 忠北大学校産学協力団 | 3D multiple object detection device and method for autonomous driving |
US20230072731A1 (en) * | 2021-08-30 | 2023-03-09 | Thomas Enxu LI | System and method for panoptic segmentation of point clouds |
-
2023
- 2023-03-20 CN CN202310273933.7A patent/CN116385452A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111145174A (en) * | 2020-01-02 | 2020-05-12 | 南京邮电大学 | 3D target detection method for point cloud screening based on image semantic features |
JP7224682B1 (en) * | 2021-08-17 | 2023-02-20 | 忠北大学校産学協力団 | 3D multiple object detection device and method for autonomous driving |
US20230072731A1 (en) * | 2021-08-30 | 2023-03-09 | Thomas Enxu LI | System and method for panoptic segmentation of point clouds |
CN114529727A (en) * | 2022-04-25 | 2022-05-24 | 武汉图科智能科技有限公司 | Street scene semantic segmentation method based on LiDAR and image fusion |
Non-Patent Citations (1)
Title |
---|
贾喆姝: ""基于深度学习的图像语义分割技术研究"", 《中国博士学位论文全文数据库》, no. 01, pages 27 - 30 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Huang et al. | Autonomous driving with deep learning: A survey of state-of-art technologies | |
CN111626217B (en) | Target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion | |
CN110827415B (en) | All-weather unknown environment unmanned autonomous working platform | |
CN111666921B (en) | Vehicle control method, apparatus, computer device, and computer-readable storage medium | |
CN111080659A (en) | Environmental semantic perception method based on visual information | |
KR20210074353A (en) | Point cloud segmentation method, computer readable storage medium and computer device | |
CN110956651A (en) | Terrain semantic perception method based on fusion of vision and vibrotactile sense | |
CN110852182B (en) | Depth video human body behavior recognition method based on three-dimensional space time sequence modeling | |
CN110688905B (en) | Three-dimensional object detection and tracking method based on key frame | |
Paz et al. | Probabilistic semantic mapping for urban autonomous driving applications | |
US20230072731A1 (en) | System and method for panoptic segmentation of point clouds | |
CN114972763A (en) | Laser radar point cloud segmentation method, device, equipment and storage medium | |
Ouyang et al. | A cgans-based scene reconstruction model using lidar point cloud | |
Maalej et al. | Vanets meet autonomous vehicles: A multimodal 3d environment learning approach | |
Berrio et al. | Octree map based on sparse point cloud and heuristic probability distribution for labeled images | |
CN115984586A (en) | Multi-target tracking method and device under aerial view angle | |
Liu et al. | Layered interpretation of street view images | |
Florea et al. | Enhanced perception for autonomous driving using semantic and geometric data fusion | |
Dewangan et al. | Towards the design of vision-based intelligent vehicle system: methodologies and challenges | |
Gosala et al. | Skyeye: Self-supervised bird's-eye-view semantic mapping using monocular frontal view images | |
Pu et al. | Visual SLAM integration with semantic segmentation and deep learning: A review | |
WO2023155903A1 (en) | Systems and methods for generating road surface semantic segmentation map from sequence of point clouds | |
CN116664851A (en) | Automatic driving data extraction method based on artificial intelligence | |
Zhao et al. | DHA: Lidar and vision data fusion-based on road object classifier | |
CN116385452A (en) | LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |