CN116385452A - LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph - Google Patents

LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph Download PDF

Info

Publication number
CN116385452A
CN116385452A CN202310273933.7A CN202310273933A CN116385452A CN 116385452 A CN116385452 A CN 116385452A CN 202310273933 A CN202310273933 A CN 202310273933A CN 116385452 A CN116385452 A CN 116385452A
Authority
CN
China
Prior art keywords
bev
segmentation
point cloud
instance
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310273933.7A
Other languages
Chinese (zh)
Inventor
王波
陈宗仁
张军
余君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Institute of Science and Technology
Original Assignee
Guangdong Institute of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Institute of Science and Technology filed Critical Guangdong Institute of Science and Technology
Priority to CN202310273933.7A priority Critical patent/CN116385452A/en
Publication of CN116385452A publication Critical patent/CN116385452A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Image Generation (AREA)

Abstract

The invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV diagram, which comprises the steps of polar coordinate BEV coding, semantic/instance segmentation prediction, point cloud panorama segmentation fusion and the like, wherein the polar coordinate BEV coding is 2D BEV coding for coding original point cloud data into fixed size under polar coordinates; the semantic/instance segmentation prediction is to generate independent 3D semantic prediction, 2D BEV center heat map and 2D instance center offset for the coded point cloud feature matrix through a reference depth network; the point cloud panoramic segmentation fusion firstly generates a 2D BEV Things mask through a 3D semantic segmentation result, forms a class-agnostic instance cluster with a 2D BEV central heat map and an instance center offset, and is combined with 3D semantic segmentation prediction to form a final panoramic segmentation result.

Description

LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph
Technical Field
The invention relates to the technical field of digital image processing, in particular to a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV graph.
Background
Image segmentation for video analysis plays an important role in different research fields such as smart cities, medical care, computer vision, remote sensing applications and the like. Panoramic segmentation is the result of fusion of semantic segmentation and instance segmentation, and can help to obtain finer knowledge of image scenes such as video monitoring, crowd counting, automatic driving, medical image analysis and the like, and deeper understanding of general scenes. With the introduction of LiDAR point cloud datasets, the nature of 3D data, real-time processing requirements, and the level of accuracy required for security and security (e.g., in an autonomous car) present new challenges to panoramic segmentation. The goal is to effectively resolve panorama segmentation with minimal prediction conflicts (instances and classes) and achieve real-time or near real-time speeds without affecting accuracy.
Some researchers have explored indoor point cloud panorama segmentation methods that combine instance segmentation and semantic segmentation methods. Liu et al published paper "Self-prediction for joint instance and semantic segmentation of point clouds" (In ECCV, 2020), proposed to use discriminant loss to learn embedded feature space to cluster instances; zhou et al published paper "join 3d instance segmentation and object detection for autonomous driving" (In CVPR, 2020), proposes to extract instance partitions from region proposals for semantic partition clusters; hurtado et al In the paper "Mopt: multi-object panoptic tracking" (In CVPR workbench, 2020) propose a MOPT model, appending semantic headers to Mask R-CNN to generate panoramic segmentations on the range images; milio et al In paper "Lidar panoptic segmentation for autonomous driving" (In IROS, 2020) propose to first resolve the LiDAR point Yun Quanjing segmentation on the range image and then restore it to the point cloud level by tri-linear up-sampling; the paper "panotic-polar net: proposal-free LiDAR Point Cloud Panoptic Segmentation" (InCVPR, 2021) by Zhou et al proposes a fast, robust LiDAR point cloud panorama segmentation framework (panotic-polar net), using polar Bird's Eye View (BEV) representation, learning semantic segmentation and class-independent clustering of examples In a single inference network, which can circumvent the occlusion problem between examples In urban street scenes, and proposes a highly adaptive example enhancement technique and a novel antagonistic point cloud pruning method to improve the network's learning ability.
The invention patent application with the application number of CN113379748A discloses a point cloud panorama segmentation method and device, and the method comprises the following steps: a point cloud mapping step of projecting the acquired point cloud to a world coordinate system to acquire a mapping point cloud; a video frame association step of projecting each point cloud point in the map-building point cloud into a projectable video frame; and panoramic segmentation, namely panoramic segmentation is carried out on the projectable video frames so as to carry out unified numbering on semantic identification probability of each point cloud point. This method has a disadvantage in that although panoramic segmentation is possible, the segmentation speed is relatively slow and accuracy is not high.
Disclosure of Invention
In order to solve the technical problems, the invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV graph, which comprises the steps of polar coordinate BEV coding, semantic/instance segmentation prediction, point cloud panorama segmentation fusion and the like. Wherein the polar BEV encoding is a 2D BEV encoding that encodes the original point cloud data into a fixed size at polar coordinates; the semantic/instance segmentation prediction is to generate independent 3D semantic prediction, 2D BEV center heat map and 2D instance center offset for the coded point cloud feature matrix through a reference depth network; the point cloud panoramic segmentation fusion firstly generates a 2D BEV Things mask through a 3D semantic segmentation result, forms a class-agnostic instance cluster with a 2D BEV central heat map and an instance center offset, and is combined with 3D semantic segmentation prediction to form a final panoramic segmentation result. The invention can improve the accuracy and the robustness of panoramic segmentation and realize the real-time or near real-time segmentation speed.
The invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV graph, which comprises the steps of obtaining original point cloud data containing points with random sizes, and further comprises the following steps:
step 1: performing polar coordinate BEV coding on the original point cloud data;
step 2: given a LiDAR point cloud space, carrying out semantic/instance segmentation prediction on the BEV codes with fixed sizes;
step 3: and carrying out panorama segmentation fusion on the semantic/instance segmentation prediction result to form a 3D panorama segmentation result.
Preferably, the polar BEV encoding means that the original point cloud data is processed by creating a fixed-size representation by projection and quantization, and the distribution of points in different ranges is balanced by the polar representation points.
In any of the above schemes, preferably, the step 1 includes the following substeps:
step 11: grouping the original point cloud data according to the position of the BEV graph in polar coordinates;
step 12: performing block coding by using polar Net point cloud;
step 13: loading a max-pooling layer on each BEV grid, creating a fixed-size BEV code,
Figure SMS_1
wherein (1)>
Figure SMS_2
For real space, H and W are the mesh sizes of the BEV map, and C is the characteristic channel.
In any of the above aspects, preferably, the step 11 includes combining point cloud data
Figure SMS_3
Grouping into
Figure SMS_4
Wherein D is the input feature dimension, N is the number of point clouds, N * Is the number of points in each BEV mesh.
In any of the above embodiments, preferably, the stepStep 12 includes sharing a multi-layer perceptron MLP, using polar Net network to group point clouds
Figure SMS_5
Encoding is performed.
In any of the above schemes, preferably, the step 2 includes the following substeps:
step 21: traversing all points in the LiDAR point cloud, and calculating the visibility of the whole 3D space, namely under a polar coordinate system, taking all points (x, y, z) which are in the same direction alpha (x, y, z) and meet 0< alpha <1 into the visibility space;
step 22: constructing a reference depth network by taking a Unet as a basic framework, wherein the reference depth network comprises a depth network model of 4 coding layers and 4 decoding layers;
step 23: connecting the visibility feature with a feature representation generated by a polar BEV encoder, inputting the visibility feature into the reference depth network, and generating a 2D instance header and a 3D semantic header;
step 24: processing the 2D instance header;
step 25: and processing the 3D semantic header.
In any of the above-described aspects it is preferred that, each layer of the coding portion of the reference depth network consists of a 3 x 3 convolution, a batch normalization process, a correction linear unit, and a max pooling operation; each layer of the decoding section consists of an upsampling convolution, attention-gating based feature concatenation, and a 3 x 3 convolution; the last layer in FCN-1 normalizes the output to a probability map of [0,1] using the sigmoid function as the activation function.
Preferably in any of the above schemes, the step 24 further comprises predicting a center heat map and an offset to the center of the object for each BEV pixel using a 2D instance header, grouping pixels having the same nearest center into the same group, providing class independent instance groupings using a bottom-up approach, and encoding the group-trunk center map by training with a two-dimensional gaussian distribution centered around each instance centroid without marking the bounding box.
Any of the abovePreferably, each pixel in the BEV map is set to be p, and then the center thereof is the thermal map H p The expression is as follows:
Figure SMS_6
wherein C is i Is the centroid of one example in the polar BEV.
In any of the above schemes, preferably, the step 25 further includes sharing the first 3 decoding layers with the instance segmentation, generating a plurality of predictions at each pixel point, and recombining the predictions into 3D voxels to separate markers at different heights along the Z-axis, calculating voxel level losses for a plurality of points within the same voxel using a voting algorithm, and generating the 3D semantic segmentation predictions.
In any of the above schemes, preferably, the step 3 includes the following substeps:
step 31: selecting the first k centers from the 2D BEV center heat map by a non-maximum suppression operation;
step 32: creating a 2D BEV foreground mask using the 3D semantic segmentation prediction while ensuring that at least one thongs class can be detected for each BEV pixel;
step 33: calculating the foreground pixels p to k example centroids c i Minimum distance d (p, c) of (i=1, 2, …, k) i ) And groups them;
step 34: prediction of thins classes in semantic segmentation heads using majority voting based on semantic segmentation probabilities for each group G in BEV i Designating a unique instance tag L;
step 35: and fusing the generated class agnostic instance cluster with 3D semantic segmentation prediction, and finally outputting a 3D panoramic segmentation result through a majority voting mechanism.
In any of the above embodiments, it is preferable that the minimum distance d (p, c i ) The expression of (2) is:
d(p,c i )=‖p+offset(p)-c i
wherein offset (p) is the center offset of pixel p.
In any of the above aspects, preferably, the expression of the semantic segmentation probability is:
Figure SMS_7
the invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV map, which simultaneously learns semantic and instance characteristics on a discretized BEV map, rapidly and robustly realizes the point cloud panorama segmentation based on LiDAR, effectively solves panorama segmentation with minimum conflict between predicted instance and class, and realizes real-time or near real-time speed under the condition of not affecting accuracy.
BEV: birds eye view, i.e., a bird's eye view.
LiDAR: is a system integrating laser, global positioning system and inertial navigation system.
Polar net network: is a lightweight neural network used for realizing real-time on-line semantic segmentation for single laser radar scanning data.
The thins class: i.e. class of things.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of a LiDAR point cloud panorama segmentation method based on polar BEV graphs according to the present invention.
FIG. 2 is a flow chart of another preferred embodiment of a LiDAR point cloud panorama segmentation method based on polar BEV graphs according to the present invention.
FIG. 3 is a schematic diagram of one embodiment of a reference network model of a LiDAR point cloud panorama segmentation method based on polar BEV graphs, according to the present invention.
FIG. 4 is a semantic/instance segmentation visualization at one embodiment of a COCO dataset of a LiDAR point cloud panorama segmentation method based on polar BEV graphs according to the present invention.
FIG. 5 is a schematic view of a panoramic segmentation visualization on a Semantic KITTI dataset of an embodiment of a LiDAR point cloud panoramic segmentation method based on polar BEV graphs in accordance with the present invention.
Detailed Description
The invention is further illustrated by the following figures and specific examples.
Example 1
As shown in fig. 1, step 100 is performed, including obtaining origin point cloud data comprising randomly sized points,
step 110 is performed to perform polar BEV encoding on the original point cloud data, where the polar BEV encoding is to create a representation of a fixed size by projection and quantization to process the original point cloud data, and to balance the distribution of points in different ranges by using polar representation points. In this step, the following sub-steps are included:
step 111 is executed to group the original point cloud data according to the position of the BEV graph in polar coordinates, and the point cloud data is obtained
Figure SMS_8
Grouping->
Figure SMS_9
Wherein (1)>
Figure SMS_10
For real space, D is the dimension of the input feature, N is the number of point clouds, N * Is the number of points in each BEV mesh.
Step 112 is executed to share a multi-layer perceptron MLP, and point-to-point cloud grouping is performed by using polar Net network
Figure SMS_11
Encoding is performed.
Step 113 is performed, loading a max-pooling layer on each BEV grid, creating a fixed-size BEV code,
Figure SMS_12
where H and W are the mesh sizes of the BEV map and C is the characteristic channel.
Step 120 is performed to perform semantic/instance segmentation prediction on the BEV code of fixed size, the semantic/instance segmentation prediction being given a LiDAR point cloud space for all points
Figure SMS_13
Figure SMS_14
And realizing the segmentation prediction of semantics and examples. In this step, the following sub-steps are included:
step 121 is performed to traverse all points in the LiDAR point cloud, and calculate the visibility of the entire 3D space, i.e., under a polar coordinate system, all points (x, y, z) along the same direction α (x, y, z) and satisfying 0< α <1 are included in the visibility space.
Step 122 is performed to construct a reference depth network based on the Unet framework, including depth network models of 4 coding layers and 4 decoding layers.
Step 123 is performed to connect the visibility feature with the feature representation generated by the polar BEV encoder, input into the reference depth network, and generate a 2D instance header and a 3D semantic header. Each layer of the coding portion of the reference depth network consists of a 3 x 3 convolution, a batch normalization process, a correction linear unit, and a max pooling operation; each layer of the decoding section consists of an upsampling convolution, attention-gating based feature concatenation, and a 3 x 3 convolution; the last layer in FCN-1 normalizes the output to a probability map of [0,1] using the sigmoid function as the activation function.
Step 124 is performed to process the 2D instance header, predict a center heat map of each BEV pixel and an offset to the center of the object using the 2D instance header, group pixels with the same nearest center into the same group, provide class independent instance groupings using a bottom-up approach to avoid collisions between class prediction and training instance headers, and do not mark bounding boxes, encode the group-trunk center map by training with a two-dimensional gaussian distribution centered on each instance centroid. Setting each pixel in the BEV plot to p, then center heat map H p The expression is as follows:
Figure SMS_15
wherein the method comprises the steps of,C i Is the centroid of one example in the polar BEV.
Step 125 is executed to process the 3D semantic header, share the first 3 decoding layers with the instance segmentation, generate multiple predictions at each pixel point, and recombine into 3D voxels to separate markers at different heights along the Z-axis, calculate voxel level loss for multiple points within the same voxel using a voting algorithm, and generate a 3D semantic segmentation prediction.
Step 130 is executed to perform panorama segmentation fusion on the semantic/instance segmentation prediction result to form a 3D panorama segmentation result. In this step, the following sub-steps are included:
step 131 is performed to select the first k centers from the 2D BEV center heat map by a non-maximum suppression operation.
Step 132 is performed to create a 2D BEV foreground mask using the 3D semantic segmentation prediction while ensuring that at least one thongs class can be detected for each BEV pixel.
Step 133 is performed to calculate the foreground pixels p to k instance centroids c i Minimum distance d (p, c) of (i=1, 2, …, k) i ) And groups them, the minimum distance d (p, c i ) The expression of (2) is:
d(p,c i )=‖p+offset(p)-c i
wherein offset (p) is the center offset of pixel p.
Step 134 is executed to predict Things class in semantic segmentation head using majority voting method according to semantic segmentation probability for each group G in BEV i Assigning a unique instance label L, wherein the expression of the semantic segmentation probability is as follows:
Figure SMS_16
step 135 is executed, in which the generated class-agnostic instance clusters are fused with 3D semantic segmentation predictions, and the 3D panorama segmentation result is finally output through a majority voting mechanism.
Example two
The invention provides a panoramic segmentation framework which can learn the semantics and the example characteristics on a discretized BEV map at the same time and realize the point cloud panoramic segmentation based on LiDAR rapidly and robustly. Considering the unique features of LiDAR data, panoramic segmentation is effectively resolved with minimal prediction conflicts (instances and classes) and real-time or near real-time speeds are achieved without affecting accuracy.
As shown in fig. 2, the method for segmenting the LiDAR point cloud panorama based on the polar coordinate BEV graph provided by the invention comprises the following specific steps:
and 1, performing polar coordinate BEV coding on the original point cloud data. First, the point cloud data is calculated from the position in the polar BEV graph
Figure SMS_17
Grouping->
Figure SMS_18
Where D is the input feature dimension, H and W are the mesh size of the BEV map, N * Is the number of points in each BEV grid; then, sharing a multi-layer perceptron MLP, grouping point clouds by using a polar Net network +.>
Figure SMS_19
Coding; finally, a Max-pooling layer is loaded on each BEV grid creating a fixed-size representation +.>
Figure SMS_20
Wherein C is a characteristic channel, and c=512 is taken here.
And 2, carrying out semantic/instance segmentation prediction on BEV codes with fixed sizes. Given a LiDAR point cloud space, for all points
Figure SMS_21
And realizing the segmentation prediction of semantics and examples. The specific implementation method is as follows:
1) Traversing all points in the LiDAR point cloud, and calculating the visibility of the whole 3D space, namely under a polar coordinate system, taking all points (x, y, z) which are in the same direction alpha (x, y, z) and meet 0< alpha <1 into the visibility space;
2) A reference depth network is designed, based on the Unet, comprising a depth network model of 4 coding layers and 4 decoding layers. Wherein each layer of the coding section consists of a 3 x 3 convolution, a batch normalization process, a correction linear unit (ReLU) and a max pooling operation; each layer of the decoding section consists of an upsampling convolution, attention Gate (AG) based feature concatenation, and a 3 x 3 convolution. The last layer in FCN-1 uses sigmoid function as activation function, normalizes the output to probability map of [0,1], and the specific network model is shown in FIG. 3;
3) Connecting the visibility feature with the feature representation generated by the polar BEV encoder and inputting the connection result in the implementation into the reference depth network shown in fig. 3, generating a 2D instance header and a 3D semantic header;
4) The center heat map of each BEV pixel and the offset to the center of the object are predicted by using the 2D example head, the pixels with the same nearest center are divided into the same group, the example grouping irrelevant to the class is provided by adopting a bottom-up method, so as to avoid the conflict between the class prediction and the training example head, and the bounding box is not marked, the group-trunk center map is trained by two-dimensional Gaussian distribution taking the center of mass of each example as the center. Every pixel in the BEV plot is p, then its center is heat-map H p The expression can be as follows:
Figure SMS_22
wherein C is i Is the centroid of one instance in the polar BEV;
5) The first 3 decoding layers are shared with the example segmentation, a plurality of predictions are generated at each pixel point and recombined into 3D voxels to separate marks at different heights along a Z axis, a voting algorithm is utilized for a plurality of points in the same voxel, voxel level losses are calculated, and 3D semantic segmentation predictions are generated.
And 3, carrying out panoramic segmentation fusion on the semantic/instance segmentation prediction result to form a 3D panoramic segmentation result. The specific calculation method is as follows:
1) Selecting the first k centers from the 2D BEV center heat map by a non-maximum suppression operation;
2) Creating a 2D BEV foreground mask using the 3D semantic segmentation predictions generated in steps 2-5) while ensuring that each BEV pixel can detect at least one "thongs" class;
3) Calculating the foreground pixels p to k example centroids c i Minimum distance d (p, c) of (i=1, 2, …, k) i ) And groups them, the expression of the minimum distance is as follows:
d(p,c i )=‖p+offset(p)-c i ‖ (2)
wherein offset (p) is the center offset of pixel p;
4) Predicting the "thins" class in the semantic segmentation head using majority voting based on semantic segmentation probabilities, for each group G in the BEV i A unique instance tag L is specified, wherein the semantic segmentation probability is expressed as follows:
Figure SMS_23
5) And fusing the generated class agnostic instance cluster with 3D semantic segmentation prediction, and finally outputting a 3D panoramic segmentation result through a majority voting mechanism.
Example III
The invention provides a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV graph, which comprises the following steps:
step 1. Polar BEV encoding step
The polar BEV encoding is to create a fixed-size representation by projection and quantization to process a point cloud containing randomly sized points, and to use the polar representation points to balance the distribution of points in different ranges, specifically the steps are:
(1.1) grouping the raw point cloud data according to the position in the polar BEV plot;
(1.2) adopting a shared multi-layer perceptron mechanism, and utilizing a simplified polar net point cloud to carry out block coding;
(1.3) loading a max pooling layer on each BEV grid, creating a fixed size BEV code;
step 2. Semantic/instance segmentation prediction step
The semantic/instance segmentation prediction is realized by taking U-Net with a symmetrical structure as a reference network, and comprises the following steps of:
(2.1) fusing the BEV codes output in the step (1.3) with visibility characteristics, and sending the fused BEV codes as input into a reference depth network for training to generate a 2D instance header and a 3D semantic header;
(2.2) processing the 2D instance header output from (2.1), calculating pixel level loss, generating a 2D BEV center heat map and a 2D instance center offset;
(2.3) processing the 3D semantic header output in (2.1), calculating voxel level loss, and generating 3D semantic segmentation prediction.
Step 3, panoramic segmentation and fusion step
The panorama segmentation fusion is to fuse the prediction results from the semantic header and the instance header to create a final panorama segmentation result, and the steps are as follows:
(3.1) framing the 3D semantic segmentation prediction output in (2.3) to generate a 2D BEV inputs mask;
(3.2) merging the 2D BEV thongs mask generated in (3.1) with the 2D BEV center heat map and the 2D instance center offset output in (2.2) to generate a class-agnostic instance cluster;
and (3.3) performing majority voting fusion on the class-agnostic instance cluster generated in the step (3.2) and the 3D semantic segmentation prediction output in the step (2.3), and finally generating a 3D panoramic segmentation result.
The invention discloses a LiDAR point cloud panorama segmentation method based on a polar coordinate BEV diagram, which belongs to a panorama segmentation frame based on no proposal. Aiming at the requirement of 3D point cloud panoramic segmentation, the invention carries out technical research and algorithm improvement on the aspects of point cloud coding, semantic/instance segmentation prediction conflict, panoramic segmentation strategy and the like, and provides an effective processing strategy. Compared with the prior art, the invention has the advantages that:
1) In the original point cloud coding link, a polar coordinate BEV coding mode is adopted, the polar coordinates balance the distribution of points in different ranges, better potential is provided for a neural network, distinguishing characteristics can be learned at a position close to a sensor, and information loss caused by quantization is reduced to the minimum; at the same time, BEV provides a compromise between computational cost and accuracy, enabling the use of more efficient 2D convolution networks to process data and obtain an optimal projection of object detection;
2) In the semantic/instance segmentation link, the non-proposal design is utilized, and the instance head is trained under the condition of no boundary box annotation, so that the conflict of class prediction is effectively avoided;
3) In the panoramic segmentation link, a strategy of sharing a decoding layer between a semantic header and an instance header and performing early fusion at a feature extraction level is designed, so that redundancy among networks is reduced, and the calculation efficiency is improved.
The foregoing description of the invention has been presented for purposes of illustration and description, but is not intended to be limiting. Any simple modification of the above embodiments according to the technical substance of the present invention still falls within the scope of the technical solution of the present invention. In this specification, each embodiment is mainly described in the specification as a difference from other embodiments, and the same or similar parts between the embodiments need to be referred to each other. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.

Claims (10)

1. The LiDAR point cloud panorama segmentation method based on the polar coordinate BEV graph comprises the steps of obtaining original point cloud data containing points with random sizes, and is characterized by further comprising the following steps:
step 1: performing polar coordinate BEV coding on the original point cloud data;
step 2: given a LiDAR point cloud space, carrying out semantic/instance segmentation prediction on the BEV codes with fixed sizes;
step 3: and carrying out panorama segmentation fusion on the semantic/instance segmentation prediction result to form a 3D panorama segmentation result.
2. The method for segmenting the LiDAR point cloud panorama based on the polar BEV graph according to claim 1, wherein the step 1 comprises the following steps:
step 11: grouping the original point cloud data according to the position of the BEV graph in polar coordinates;
step 12: performing block coding by using polar Net point cloud;
step 13: loading a max-pooling layer on each BEV grid, creating a fixed-size BEV code,
Figure FDA0004135599050000011
wherein, wherein->
Figure FDA0004135599050000012
For real space, H and W are the mesh sizes of the BEV map, and C is the characteristic channel.
3. The method for panoramic segmentation of LiDAR point clouds based on polar BEV graphs according to claim 2, wherein said step 11 comprises the step of mapping the point cloud data
Figure FDA0004135599050000013
Grouping->
Figure FDA0004135599050000014
Wherein D is the input feature dimension, N is the number of point clouds, N * Is the number of points in each BEV mesh.
4. The method of claim 3, wherein the step 12 includes sharing a multi-layer perceptron MLP, grouping point clouds using polar net network
Figure FDA0004135599050000015
Encoding is performed.
5. The method for panoramic segmentation of LiDAR point clouds based on polar BEV graphs according to claim 4, wherein said step 2 comprises the sub-steps of:
step 21: traversing all points in the LiDAR point cloud, and calculating the visibility of the whole 3D space, namely under a polar coordinate system, taking all points (x, y, z) which are along the same direction alpha (x, y, z) and meet 0< alpha <1 into the visibility space;
step 22: constructing a reference depth network by taking a Unet as a basic framework, wherein the reference depth network comprises a depth network model of 4 coding layers and 4 decoding layers;
step 23: connecting the visibility feature with a feature representation generated by a polar BEV encoder, inputting the visibility feature into the reference depth network, and generating a 2D instance header and a 3D semantic header;
step 24: processing the 2D instance header;
step 25: and processing the 3D semantic header.
6. The method of claim 5, wherein each layer of the coded portion of the reference depth network consists of a 3 x 3 convolution, a batch normalization process, a correction linear unit, and a max pooling operation; each layer of the decoding section consists of an upsampling convolution, attention-gating based feature concatenation, and a 3 x 3 convolution; the last layer in FCN-1 normalizes the output to a probability map of [0,1] using the sigmoid function as the activation function.
7. The method of claim 6, wherein step 24 further comprises predicting a center heat map and an offset to the center of the object for each BEV pixel using a 2D instance header, grouping pixels having the same nearest center into the same group, providing class independent instance groupings using a bottom-up approach, and encoding the group-trunk center map by training a two-dimensional gaussian distribution centered around each instance centroid without marking the bounding box.
8. The LiDAR point cloud panorama segmentation method based on polar BEV images according to claim 7, wherein each pixel in the BEV image is set to be p, and then the center of the BEV image is heat-map H p The expression is as follows:
Figure FDA0004135599050000021
wherein C is i Is the centroid of one example in the polar BEV.
9. The method of claim 8, wherein step 25 further comprises sharing the first 3 decoding layers with the instance segmentation, generating a plurality of predictions at each pixel point and reorganizing into 3D voxels to separate markers at different heights along the Z-axis, calculating voxel level losses for a plurality of points within the same voxel using a voting algorithm, and generating a 3D semantic segmentation prediction.
10. The method for panoramic segmentation of LiDAR point clouds based on polar BEV graphs according to claim 9, wherein said step 3 comprises the sub-steps of:
step 31: selecting the first k centers from the 2D BEV center heat map by a non-maximum suppression operation;
step 32: creating a 2D BEV foreground mask using the 3D semantic segmentation prediction while ensuring that at least one thongs class can be detected for each BEV pixel;
step 33: calculating the foreground pixels p to k example centroids c i Minimum distance d (p, c) of (i=1, 2, …, k) i ) And groups them;
step 34: prediction of thins classes in semantic segmentation heads using majority voting based on semantic segmentation probabilities for each group G in BEV i Designating a unique instance tag L;
step 35: and fusing the generated class agnostic instance cluster with 3D semantic segmentation prediction, and finally outputting a 3D panoramic segmentation result through a majority voting mechanism.
CN202310273933.7A 2023-03-20 2023-03-20 LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph Pending CN116385452A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310273933.7A CN116385452A (en) 2023-03-20 2023-03-20 LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310273933.7A CN116385452A (en) 2023-03-20 2023-03-20 LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph

Publications (1)

Publication Number Publication Date
CN116385452A true CN116385452A (en) 2023-07-04

Family

ID=86962623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310273933.7A Pending CN116385452A (en) 2023-03-20 2023-03-20 LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph

Country Status (1)

Country Link
CN (1) CN116385452A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145174A (en) * 2020-01-02 2020-05-12 南京邮电大学 3D target detection method for point cloud screening based on image semantic features
CN114529727A (en) * 2022-04-25 2022-05-24 武汉图科智能科技有限公司 Street scene semantic segmentation method based on LiDAR and image fusion
JP7224682B1 (en) * 2021-08-17 2023-02-20 忠北大学校産学協力団 3D multiple object detection device and method for autonomous driving
US20230072731A1 (en) * 2021-08-30 2023-03-09 Thomas Enxu LI System and method for panoptic segmentation of point clouds

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145174A (en) * 2020-01-02 2020-05-12 南京邮电大学 3D target detection method for point cloud screening based on image semantic features
JP7224682B1 (en) * 2021-08-17 2023-02-20 忠北大学校産学協力団 3D multiple object detection device and method for autonomous driving
US20230072731A1 (en) * 2021-08-30 2023-03-09 Thomas Enxu LI System and method for panoptic segmentation of point clouds
CN114529727A (en) * 2022-04-25 2022-05-24 武汉图科智能科技有限公司 Street scene semantic segmentation method based on LiDAR and image fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贾喆姝: ""基于深度学习的图像语义分割技术研究"", 《中国博士学位论文全文数据库》, no. 01, pages 27 - 30 *

Similar Documents

Publication Publication Date Title
Huang et al. Autonomous driving with deep learning: A survey of state-of-art technologies
CN111626217B (en) Target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion
CN110827415B (en) All-weather unknown environment unmanned autonomous working platform
CN111666921B (en) Vehicle control method, apparatus, computer device, and computer-readable storage medium
CN111080659A (en) Environmental semantic perception method based on visual information
KR20210074353A (en) Point cloud segmentation method, computer readable storage medium and computer device
CN110956651A (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN110852182B (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
CN110688905B (en) Three-dimensional object detection and tracking method based on key frame
Paz et al. Probabilistic semantic mapping for urban autonomous driving applications
US20230072731A1 (en) System and method for panoptic segmentation of point clouds
CN114972763A (en) Laser radar point cloud segmentation method, device, equipment and storage medium
Ouyang et al. A cgans-based scene reconstruction model using lidar point cloud
Maalej et al. Vanets meet autonomous vehicles: A multimodal 3d environment learning approach
Berrio et al. Octree map based on sparse point cloud and heuristic probability distribution for labeled images
CN115984586A (en) Multi-target tracking method and device under aerial view angle
Liu et al. Layered interpretation of street view images
Florea et al. Enhanced perception for autonomous driving using semantic and geometric data fusion
Dewangan et al. Towards the design of vision-based intelligent vehicle system: methodologies and challenges
Gosala et al. Skyeye: Self-supervised bird's-eye-view semantic mapping using monocular frontal view images
Pu et al. Visual SLAM integration with semantic segmentation and deep learning: A review
WO2023155903A1 (en) Systems and methods for generating road surface semantic segmentation map from sequence of point clouds
CN116664851A (en) Automatic driving data extraction method based on artificial intelligence
Zhao et al. DHA: Lidar and vision data fusion-based on road object classifier
CN116385452A (en) LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination