CN116229076A

CN116229076A - Point cloud semantic segmentation method and device based on object shielding compensation

Info

Publication number: CN116229076A
Application number: CN202310233767.8A
Authority: CN
Inventors: 张新钰; 谢涛; 李骏; 王力; 戴崑; 蒋志强; 吴新刚
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-06-06

Abstract

The application provides a point cloud semantic segmentation method and device based on object shielding compensation, and relates to the technical field of automatic driving, wherein the method comprises the following steps: randomly selecting a point from the original point cloud as a center point, and generating m original point cloud data with different scales by using n randomly selected points positioned in m different preset radiuses; processing m original point cloud data with different scales by utilizing a pre-trained point cloud semantic segmentation model to obtain a point cloud semantic segmentation result; the point cloud semantic segmentation model is used for fusing the context relation of m original point cloud data with different scales. According to the method and the device, the context relation of the original point cloud data of multiple scales is fused, so that the semantic segmentation precision of the point cloud is improved.

Description

Point cloud semantic segmentation method and device based on object shielding compensation

Technical Field

The application relates to the technical field of automatic driving, in particular to a point cloud semantic segmentation method and device based on object shielding compensation.

Background

Currently, pointNet achieves good results in 3D semantic segmentation. PointNet is a deep neural network that takes a point cloud as input and outputs semantic class labels for each point when used for semantic segmentation. First, it partitions the point cloud into one 3D block, then takes N points within one block, and then passes through a series of multi-layer perceptrons (MLPs), which are mapped to a higher dimensional space, which are referred to as local point features. The max pooling is used to aggregate information from all points, producing a generic, global feature invariant input arrangement. The global feature is then connected to all the point features. After a series of MLPs, these combined features are used to predict the output scores.

The problems with the above method are: the global property in PointNet summarizes the context of a single block (block property), so that aggregated information is passed only between points within the same block. The context outside the block is also important and can help make more intelligent class label predictions.

Disclosure of Invention

In view of the above, the present application provides a point cloud semantic segmentation method and device based on object occlusion compensation, so as to solve the above technical problems.

In a first aspect, an embodiment of the present application provides a point cloud semantic segmentation method based on object occlusion compensation, including:

randomly selecting a point from the original point cloud as a center point, and generating m original point cloud data with different scales by using n randomly selected points positioned in m different preset radiuses;

processing m original point cloud data with different scales by utilizing a pre-trained point cloud semantic segmentation model to obtain a point cloud semantic segmentation result; the point cloud semantic segmentation model is used for fusing the context relation of m original point cloud data with different scales.

Further, the point cloud semantic segmentation model includes: the system comprises a fusion unit, a first fusion unit, a second fusion unit and a multi-layer perceptron;

processing m original point cloud data with different scales by utilizing a pre-trained point cloud semantic segmentation model to obtain a point cloud semantic segmentation result; the point cloud semantic segmentation model is used for fusing the context relation of m original point cloud data with different scales; comprising the following steps:

respectively carrying out ascending and maximum pooling operation on the original point cloud features of m scales by utilizing the fusion unit to obtain 128-dimensional feature vectors of m scales, and splicing the 128-dimensional feature vectors of m scales to obtain 128 multiplied by m-dimensional feature vectors; splicing the n 128 Xm dimension feature vectors with point cloud features of any one scale to obtain fusion point cloud features, wherein the size of the fusion point cloud features is n× (128 Xm+128);

processing the fusion point cloud characteristics by using the first merging unit to obtain first common point cloud characteristics, wherein the size of the first common point cloud characteristics is n multiplied by 512;

processing the first point cloud public features by using the second merging unit to obtain second point cloud public features, wherein the size of the second point cloud public features is n multiplied by 256;

and processing the public features of the second point cloud by using the multi-layer perceptron to obtain scores of c categories of n points, thereby obtaining a classification result.

Further, the fusion unit includes: m dimension-increasing branches, a first splicing unit, a first copying unit and a second splicing unit; the dimension-increasing branch comprises a first multi-layer perceptron, a second multi-layer perceptron and a first maximum pooling layer which are sequentially connected;

respectively carrying out ascending and maximum pooling operation on the original point cloud features of m scales by utilizing the fusion unit to obtain 128-dimensional feature vectors of m scales, and splicing the 128-dimensional feature vectors of m scales to obtain 128 multiplied by m-dimensional feature vectors; splicing the n 128 Xm dimension feature vectors with point cloud features of any one scale to obtain fusion point cloud features; comprising the following steps:

the d-dimensional original point cloud features of n points are upscaled to 64 dimensions by using the first multi-layer perceptron;

utilizing the second multi-layer perceptron to upscale the 64-dimensional point cloud characteristics of the n points to 128 dimensions;

performing maximum pooling operation on 128-dimensional point cloud characteristics of n points by using the first maximum pooling layer to obtain 128-dimensional characteristic vectors;

splicing the m 128-dimensional feature vectors output by the m ascending-dimension branches by using the first splicing unit to obtain 128 multiplied by m-dimensional feature vectors;

copying n-1 128 Xm-dimensional feature vectors by using the first copying unit to generate 128 Xm-dimensional first intermediate point cloud features of n points;

and splicing the 128 Xm-dimensional first intermediate point cloud features of the n points and the point cloud features output by the second multi-layer perceptron of any one dimension-increasing branch by using the second splicing unit to obtain a fused point cloud feature fusing the context relations of multiple scales, wherein the size of the fused point cloud feature is n X (128 Xm+128).

Further, the first merging unit includes: the system comprises a third multi-layer perceptron, a second maximum pooling layer, a second copying unit and a third splicing unit;

processing the fusion point cloud feature by using the first merging unit to obtain a first common point cloud feature, including:

converting the fusion point cloud characteristics into 256-dimensional point cloud characteristics of n points by using the third multi-layer perceptron;

performing maximum pooling operation on 256-dimensional point cloud characteristics of n points by using the second maximum pooling layer to obtain 256-dimensional characteristic vectors;

copying n-1 256-dimensional feature vectors by using the second copying unit to generate 256-dimensional second intermediate point cloud features of n points;

and utilizing the second splicing unit to splice the 256-dimensional point cloud characteristics of the n points output by the third multi-layer perceptron and the 256-dimensional second intermediate point cloud characteristics of the n points to obtain a first common point cloud characteristic, wherein the size of the first common point cloud characteristic is n multiplied by 512.

Further, the second merging unit includes: the system comprises a fourth multi-layer perceptron, a third maximum pooling layer, a third copying unit and a fourth splicing unit;

processing the first point cloud public feature by using the second merging unit to obtain a second point public cloud feature, wherein the processing comprises the following steps:

converting the 512-dimensional first common point cloud characteristics of the n points into 128-dimensional point cloud characteristics of the n points by using the fourth multi-layer perceptron;

performing maximum pooling operation on 128-dimensional point cloud characteristics of n points by using the third maximum pooling layer to obtain 128-dimensional characteristic vectors;

copying n-1 128-dimensional feature vectors by using the third copying unit to generate 128-dimensional third intermediate point cloud features of n points;

and utilizing the fourth splicing unit to splice the 128-dimensional point cloud characteristics of n points output by the fourth multi-layer perceptron and the 128-dimensional third intermediate point cloud characteristics of n points to obtain a second common point cloud characteristic, wherein the size of the second common point cloud characteristic is n multiplied by 256.

Further, the method further comprises: training the point cloud semantic segmentation model.

In a second aspect, an embodiment of the present application provides a point cloud semantic segmentation apparatus based on object occlusion compensation, including:

the multi-scale data generation unit is used for randomly selecting one point from the original point cloud as a central point, and generating m original point cloud data with different scales from n randomly selected points positioned in m different preset radiuses;

the point cloud semantic segmentation unit is used for processing m original point cloud data with different scales by utilizing a pre-trained point cloud semantic segmentation model to obtain a point cloud semantic segmentation result; the point cloud semantic segmentation model is used for fusing the context relation of m original point cloud data with different scales.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the methods of the embodiments of the present application when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when executed by a processor, implement a method of embodiments of the present application.

According to the method and the device, the context relation of the original point cloud data of a plurality of scales is fused, so that the semantic segmentation precision of the point cloud under object shielding is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a point cloud semantic segmentation method based on object occlusion compensation according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a point cloud semantic segmentation model according to an embodiment of the present application;

fig. 3 is a functional block diagram of a point cloud semantic segmentation device based on object occlusion compensation according to an embodiment of the present application;

fig. 4 is a functional block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

First, the design concept of the embodiment of the present application will be briefly described.

Semantic segmentation is an important function of intelligent vehicles such as automatic driving automobiles or mobile robots. The semantic meaning of recognizing the three-dimensional structure around the vehicle is a prerequisite for solving the tasks of subsequent navigation or reconstruction and the like. Therefore, this problem has attracted considerable attention, and has been significantly successful with the aid of deep learning techniques. However, most of the most advanced semantic segmentation methods are applied on 2D images, which naturally use convolutional neural networks for processing.

Processing unstructured 3D point clouds (such as those obtained by lidar or 3D sensors) is a difficult problem, and Hackel et al use a conventional random forest classifier with 3D features (no color), creating nearest neighbor points based on covariance tensors of eigenvalues and eigenvectors. Their main contribution is the efficient approximate nearest neighbor computation at different scales. Munoz et al used a similar approach but replaced a random forest classifier with a associative Markov network. There are works in which random forest classifiers are used to classify data from 2D images and 3D point clouds. Also Xu et al fuse camera and LiDAR sensor data. Xiong et al propose a sequential parsing process for learning spatial relationships of objects. Lai et al introduced a hierarchical sparse coding technique for learning features from synthetic data. Vosselman et al combine multiple segmentation and post-processing methods to achieve useful point cloud segmentation.

In a deep learning environment, the point cloud may be represented with a regular volumetric mesh in order to apply a 3D convolution. In addition, 3D points may be mapped to a 2D representation and then a 2D convolution applied. In some works, a 2D convolution is performed in a 2D projection of the 3D point cloud, and then the labels are projected back into 3D space. In other works, the deep learning framework learns semantic segmentation by tracking point clouds. Yi et al use the spectrum CNN to perform shape portion segmentation on a three-dimensional model represented as a shape graph. Until recently, the first successful deep learning method was proposed to accomplish this task. Such a point cloud may be obtained from a radar sensor mounted on top of the recording vehicle, or may be obtained by a visual SLAM method on the vehicle camera. Finding a way to directly manipulate point cloud data is highly desirable because it avoids expensive preprocessing and format conversion steps. However, the problem of what the optimal network architecture to handle unstructured 3D point clouds is remains largely unexplained. PointNet has recently achieved a good result, which currently defines the latest level of 3D semantic segmentation. PointNet is a deep neural network that takes a point cloud as input and outputs semantic class labels for each point when used for semantic segmentation. First, it partitions the point cloud into one 3D block, then takes N points within one block, and then passes through a series of multi-layer perceptrons (MLPs), which are mapped to a higher dimensional space, which are referred to as local point features. The max pooling is used to aggregate information from all points, producing a generic, global feature invariant input arrangement. The global feature is then connected to all the point features. After a series of MLPs, these combined features are used to predict the output scores. There is a problem in that the global property in the PointNet summarizes the context of a single block (block property), so that aggregated information is transferred only between points within the same block.

In view of the above, the present application provides a point cloud semantic segmentation method based on object occlusion compensation, which increases the context awareness capability of a network by considering a set of point cloud inputs at the same time, instead of considering only one point cloud input at a time like a PointNet, where the context is shared among all blocks in the set, these inputs can be obtained from different scales at the same location, and for each input, training an input feature for each scale using a mechanism of the PointNet to obtain scale-related features, where the obtained context is of input level, and where the context outside the blocks is equally important, which can help make more accurate label-like predictions.

After the application scenario and the design idea of the embodiment of the present application are introduced, the technical solution provided by the embodiment of the present application is described below.

As shown in fig. 1, the embodiment of the application provides a point cloud semantic segmentation method based on object occlusion compensation, which specifically includes the following steps:

step 101: randomly selecting a point from the original point cloud as a center point, and generating m original point cloud data with different scales by using n randomly selected points positioned in m different preset radiuses;

step 102: processing m original point cloud data with different scales by utilizing a pre-trained point cloud semantic segmentation model to obtain a point cloud semantic segmentation result; the point cloud semantic segmentation model is used for fusing the context relation of m original point cloud data with different scales.

As shown in fig. 2, the point cloud semantic segmentation model includes: the system comprises a fusion unit, a first fusion unit, a second fusion unit and a multi-layer perceptron MLP; in order to further consolidate the context, the embodiment proposes a merging unit to further integrate the upper fusion feature vectors. The merging unit generates a common feature by converting the feature vector into a high-dimensional space using MLP and max pooling, which is again connected to each high-dimensional input feature, a process similar to the mechanism of PointNet. The key is that multiple merging units can be strung together to form a sequence to form a deeper network.

In this embodiment, the steps include:

Wherein the fusion unit comprises: m dimension-ascending branches (m=3 in fig. 2), a first splicing unit, a first copying unit, and a second splicing unit; the dimension-increasing branch comprises a first multi-layer perceptron, a second multi-layer perceptron and a first maximum pooling layer which are sequentially connected;

Wherein the first merging unit includes: the system comprises a third multi-layer perceptron, a second maximum pooling layer, a second copying unit and a third splicing unit;

Wherein the second merging unit includes: the system comprises a fourth multi-layer perceptron, a third maximum pooling layer, a third copying unit and a fourth splicing unit;

Furthermore, the method comprises the following steps: training the point cloud semantic segmentation model.

Based on the foregoing embodiments, the embodiment of the present application provides a point cloud semantic segmentation device based on object occlusion compensation, as shown in fig. 3, and the point cloud semantic segmentation device 200 based on object occlusion compensation provided in the embodiment of the present application at least includes:

a multi-scale data generating unit 201, configured to randomly select a point from an original point cloud as a center point, and generate m original point cloud data with different scales from n randomly selected points located in m different preset radii;

the point cloud semantic segmentation unit 202 is configured to process m original point cloud data with different scales by using a pre-trained point cloud semantic segmentation model to obtain a point cloud semantic segmentation result; the point cloud semantic segmentation model is used for fusing the context relation of m original point cloud data with different scales.

It should be noted that, the principle of solving the technical problem by the object occlusion compensation-based point cloud semantic segmentation apparatus 200 provided in the embodiment of the present application is similar to that of the method provided in the embodiment of the present application, so that the implementation of the object occlusion compensation-based point cloud semantic segmentation apparatus 200 provided in the embodiment of the present application can be referred to the implementation of the method provided in the embodiment of the present application, and the repetition is omitted.

Based on the foregoing embodiments, the embodiment of the present application further provides an electronic device, as shown in fig. 4, where the electronic device 300 provided in the embodiment of the present application includes at least: the point cloud semantic segmentation method based on object occlusion compensation provided by the embodiment of the application is realized when the processor 301 executes the computer program.

The electronic device 300 provided by the embodiments of the present application may also include a bus 303 that connects the different components, including the processor 301 and the memory 302. Bus 303 represents one or more of several types of bus structures, including a memory bus, a peripheral bus, a local bus, and so forth.

The Memory 302 may include readable media in the form of volatile Memory, such as random access Memory (Random Access Memory, RAM) 3021 and/or cache Memory 3022, and may further include Read Only Memory (ROM) 3023.

The memory 302 may also include a program tool 3025 having a set (at least one) of program modules 3024, the program modules 3024 including, but not limited to: an operating subsystem, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The electronic device 300 may also communicate with one or more external devices 304 (e.g., keyboard, remote control, etc.), one or more devices that enable a user to interact with the electronic device 300 (e.g., cell phone, computer, etc.), and/or any device that enables the electronic device 300 to communicate with one or more other electronic devices 300 (e.g., router, modem, etc.). Such communication may occur through an Input/Output (I/O) interface 305. Also, electronic device 300 may communicate with one or more networks such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), and/or a public network such as the internet via network adapter 306. As shown in fig. 4, the network adapter 306 communicates with other modules of the electronic device 300 over the bus 303. It should be appreciated that although not shown in fig. 4, other hardware and/or software modules may be used in connection with electronic device 300, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, disk array (Redundant Arrays of Independent Disks, RAID) subsystems, tape drives, data backup storage subsystems, and the like.

It should be noted that the electronic device 300 shown in fig. 4 is only an example, and should not impose any limitation on the functions and application scope of the embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores computer instructions which are executed by a processor to realize the point cloud semantic segmentation method based on object shielding compensation. Specifically, the executable program may be built-in or installed in the electronic device 300, so that the electronic device 300 may implement the point cloud semantic segmentation method based on object occlusion compensation provided in the embodiment of the present application by executing the built-in or installed executable program.

The object occlusion compensation-based point cloud semantic segmentation method provided in the embodiments of the present application may also be implemented as a program product, where the program product includes program code, and when the program product may be executed on the electronic device 300, the program code is configured to cause the electronic device 300 to execute the object occlusion compensation-based point cloud semantic segmentation method provided in the embodiments of the present application.

The program product provided by the embodiments of the present application may employ any combination of one or more readable media, where the readable media may be a readable signal medium or a readable storage medium, and the readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof, and more specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), an optical fiber, a portable compact disk read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product provided by the embodiments of the present application may be implemented as a CD-ROM and include program code that may also be run on a computing device. However, the program product provided by the embodiments of the present application is not limited thereto, and in the embodiments of the present application, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solution of the present application and not limiting. Although the present application has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that the modifications and equivalents may be made to the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present application.

Claims

1. The point cloud semantic segmentation method based on object shielding compensation is characterized by comprising the following steps of:

2. The method of claim 1, wherein the point cloud semantic segmentation model comprises: the system comprises a fusion unit, a first fusion unit, a second fusion unit and a multi-layer perceptron;

3. The method of claim 2, wherein the fusion unit comprises: m dimension-increasing branches, a first splicing unit, a first copying unit and a second splicing unit; the dimension-increasing branch comprises a first multi-layer perceptron, a second multi-layer perceptron and a first maximum pooling layer which are sequentially connected;

4. A method according to claim 3, wherein the first merging unit comprises: the system comprises a third multi-layer perceptron, a second maximum pooling layer, a second copying unit and a third splicing unit;

5. The method of claim 4, wherein the second merging unit comprises: the system comprises a fourth multi-layer perceptron, a third maximum pooling layer, a third copying unit and a fourth splicing unit;

6. The method of claim 5, wherein the method further comprises: training the point cloud semantic segmentation model.

7. The utility model provides a point cloud semantic segmentation device based on object shielding compensation which characterized in that includes:

8. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of claims 1-6 when the computer program is executed.

9. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-6.