CN114638954B

CN114638954B - Training method of point cloud segmentation model, point cloud data segmentation method and related device

Info

Publication number: CN114638954B
Application number: CN202210163274.7A
Authority: CN
Inventors: 许双杰; 万锐; 邹晓艺
Original assignee: DeepRoute AI Ltd
Current assignee: DeepRoute AI Ltd
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2024-04-19
Anticipated expiration: 2042-02-22
Also published as: CN114638954A

Abstract

The application discloses a training method of a point cloud segmentation model, a point cloud data segmentation method and a related device. The method comprises the following steps: acquiring a training point cloud; the data points in the training point cloud are classified according to voxels to obtain corresponding real voxels, and each real voxel is marked with real information; inputting the training point cloud into the point cloud segmentation model to obtain a predicted voxel output by the point cloud segmentation model and detection information corresponding to the predicted voxel; determining at least one voxel pair, each voxel pair comprising a corresponding predicted voxel and a real voxel; and adjusting network parameters of the point cloud segmentation model according to the difference between the real information and the detection information of the voxel pairs. Through the method, the conversion from the voxel characteristics to the data point characteristics can be reduced, the loss in the conversion process is reduced, the calculated amount is reduced, and therefore the segmentation accuracy of the point cloud segmentation model is improved.

Description

Training method of point cloud segmentation model, point cloud data segmentation method and related device

Technical Field

The application relates to the technical field of point cloud data processing, in particular to a training method of a point cloud segmentation model, a point cloud data segmentation method and a related device.

Background

Three-dimensional scene segmentation is essential for many robotic applications, especially for autopilot. In the three-dimensional laser point cloud data processing process, three-dimensional sparse convolution is often used at present to directly process the voxelized point cloud features.

But often three-dimensional sparse features are mapped to dense features, which can make it easy to implement the computational loss function during training. However, this increases the computational effort, and the sparse features extracted from the three-dimensional coefficient convolution lose some effectiveness in the mapping process.

Disclosure of Invention

The application mainly solves the technical problems of providing a training method of a point cloud segmentation model, a point cloud data segmentation method and a related device, which can reduce the conversion from voxel characteristics to data point characteristics, reduce the loss in the conversion process and the calculated amount, thereby improving the segmentation accuracy of the point cloud segmentation model.

In order to solve the technical problems, the application adopts a technical scheme that: the method for training the point cloud segmentation model comprises the following steps: acquiring a training point cloud; the data points in the training point cloud are classified according to voxels to obtain corresponding real voxels, and each real voxel is marked with real information; inputting the training point cloud into the point cloud segmentation model to obtain a predicted voxel output by the point cloud segmentation model and detection information corresponding to the predicted voxel; determining at least one voxel pair, each voxel pair comprising a corresponding predicted voxel and a real voxel; and adjusting network parameters of the point cloud segmentation model according to the difference between the real information and the detection information of the voxel pairs.

Wherein determining at least one voxel pair comprises: acquiring a first coordinate of each real voxel and a second coordinate of each predicted voxel; determining a first target coordinate and a second target coordinate with the same coordinates; and determining the real voxels of the first coordinate of the target and the predicted voxels corresponding to the second coordinate of the target as voxel pairs.

Wherein obtaining the first coordinate of each real voxel and the second coordinate of each predicted voxel comprises: carrying out hash coding on the first coordinates of all the real voxels to obtain a corresponding first hash table; carrying out hash coding on the second coordinates of all the prediction voxels to obtain a corresponding second hash table; determining a first coordinate of the target and a second coordinate of the target having the same coordinates includes: traversing the second hash table of each prediction voxel and all the first hash tables to determine a target first coordinate and a target second coordinate with the same coordinates.

The point cloud segmentation model comprises the following components: a sparse voxel feature encoding network, a three-dimensional sparse network, a point voxel network, a first supervision network, a second supervision network, a third supervision network and a fourth supervision network; inputting the training point cloud into the point cloud segmentation model to obtain a predicted voxel output by the point cloud segmentation model and detection information corresponding to the predicted voxel, wherein the method comprises the following steps of: inputting the training point cloud into a sparse voxel feature coding network for feature extraction to obtain voxel features; inputting the voxel characteristics into a three-dimensional sparse network for sparse voxel characteristic extraction to obtain sparse voxel characteristics; inputting the sparse voxel characteristics into a point voxel network for characteristic conversion to obtain data point characteristics; inputting the sparse voxel characteristics into a first supervision network to perform semantic segmentation learning on three-dimensional voxels, and obtaining first detection information corresponding to the predicted voxels; inputting the sparse voxel characteristic into a second monitoring network to perform thermodynamic diagram learning in three dimensions, and obtaining second detection information corresponding to the prediction voxels; inputting the data point characteristics into a third supervision network to perform semantic segmentation learning at a point level to obtain third detection information corresponding to each data point; and inputting the data point characteristics into a fourth supervision network to perform offset supervision learning at the point level, and obtaining a predicted offset corresponding to each data point.

The real information comprises first real information, second real information, third real information and fourth real information; the first real information represents semantic information of real voxels; the second real information represents the object centroid information of the real voxels; the third real information characterizes semantic information of the data points; the fourth real information characterizes offset information of the data point.

Wherein, according to the difference between the true information and the detection information of voxel pair, the network parameter of the adjustment point cloud segmentation model includes: determining the difference between the first detection information and the first real information to obtain a first loss value; determining the difference between the second detection information and the second real information to obtain a second loss value; determining the difference between the third detection information and the third real information to obtain a third loss value; determining the difference between the predicted offset and fourth real information to obtain a fourth loss value; and adjusting network parameters of the point cloud segmentation model by using the first loss value, the second loss value, the third loss value and the fourth loss value.

Wherein determining a difference between the first detection information and the first real information, to obtain a first loss value, comprises: determining a first sub-loss value between the first detection information and the real information by using a Lovasz-Softmax loss function; determining a second sub-loss value between the first detection information and the real information by using the cross entropy loss function; and summing the first sub-loss value and the second sub-loss value to obtain a first loss value.

Wherein determining the difference between the second detection information and the second real information, before obtaining the second loss value, comprises: inputting the sparse voxel features into a second monitoring network for carrying out thermodynamic diagram learning in three dimensions, and obtaining the probability that each sparse voxel feature belongs to the mass center of the object; taking the probability as second detection information; and determining second real information by using the mass center of each object and sparse voxel characteristics within a preset distance.

Wherein determining a difference between the second detection information and the second real information to obtain a second loss value includes: and determining the difference between the second detection information and the second real information by using the Focal loss function to obtain a second loss value.

Wherein determining a difference between the third detection information and the third real information to obtain a third loss value includes: determining a third sub-loss value between the third detection information and the third real information by using a Lovasz-Softmax loss function; determining a fourth sub-loss value between the third detection information and the third real information using a cross entropy loss function; and summing the third sub-loss value and the fourth sub-loss value to obtain a third loss value.

Wherein determining a difference between the predicted offset and the fourth real information to obtain a fourth loss value comprises: determining a true object centroid using fourth true information of the data point; determining a true offset of the data point from a true object centroid; and obtaining a fourth loss value by using the predicted offset and the real offset.

In order to solve the technical problems, the application adopts another technical scheme that: provided is a point cloud data segmentation method, comprising the following steps: acquiring point cloud data; and inputting the point cloud data into a point cloud segmentation model, and outputting the segmented point cloud data, wherein the point cloud segmentation model is trained by using the method provided by the technical scheme.

In order to solve the technical problems, the application adopts another technical scheme that: there is provided a processing device for point cloud data, the processing device comprising a processor and a memory coupled to the processor, the memory being configured to store a computer program, the processor being configured to execute the computer program to implement the method provided in the above technical solution.

In order to solve the technical problems, the application adopts another technical scheme that: there is provided a computer readable storage medium for storing a computer program for implementing the method provided by the above technical solution when being executed by a processor.

The embodiment of the application has the beneficial effects that: different from the prior art, the training method of the point cloud segmentation model provided by the application comprises the following steps: acquiring a training point cloud; the data points in the training point cloud are classified according to voxels to obtain corresponding real voxels, and each real voxel is marked with real information; inputting the training point cloud into the point cloud segmentation model to obtain a predicted voxel output by the point cloud segmentation model and detection information corresponding to the predicted voxel; determining at least one voxel pair, each voxel pair comprising a corresponding predicted voxel and a real voxel; and adjusting network parameters of the point cloud segmentation model according to the difference between the real information and the detection information of the voxel pairs. By means of the method, the corresponding prediction voxels and the corresponding real voxels are determined by means of the determined voxel pairs, and further, the network parameters of the point cloud segmentation model are adjusted by means of the difference between the real information and the detection information of the voxel pairs, so that conversion from voxel characteristics to data point characteristics can be reduced, loss in the conversion process is reduced, calculated amount is reduced, and therefore segmentation accuracy of the point cloud segmentation model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a schematic flow chart of a first embodiment of a training method of a point cloud segmentation model provided by the application;

FIG. 2 is a schematic flow chart of a second embodiment of a training method of a point cloud segmentation model provided by the application;

FIG. 3 is a schematic flow chart of a third embodiment of a training method of a point cloud segmentation model according to the present application;

FIGS. 4 and 5 are schematic diagrams of determining voxel pairs provided by the present application;

FIG. 6 is a flowchart of a fourth embodiment of a training method of a point cloud segmentation model according to the present application;

FIG. 7 is a flow chart of an embodiment of step 603 provided by the present application;

FIG. 8 is a flow chart of an embodiment of step 607 provided in the present application;

FIG. 9 is a flow chart illustrating an embodiment of the present application before step 610;

FIG. 10 is a flow chart of an embodiment of step 612 provided in the present application;

FIG. 11 is a flowchart illustrating an embodiment of step 614 according to the present application;

FIG. 12 is a schematic diagram of a point cloud segmentation model according to the present application;

fig. 13 is a schematic flow chart of a first embodiment of a point cloud data segmentation method provided by the present application;

Fig. 14 is a schematic structural diagram of an embodiment of a processing device for point cloud data according to the present application;

Fig. 15 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present application are shown in the drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

Referring to fig. 1, fig. 1 is a flowchart of a first embodiment of a training method of a point cloud segmentation model according to the present application. The method comprises the following steps:

Step 11: acquiring a training point cloud; the data points in the training point cloud are classified according to the voxels to obtain corresponding real voxels, and each real voxel is marked with real information.

It will be appreciated that the training point cloud may be collected manually or automatically. For example, the radar sensor is manually controlled to collect point clouds in the coverage area of the radar sensor. And the point cloud is autonomously acquired in the moving process through a radar sensor on the autonomous mobile equipment. The training point cloud includes objects in the scene. Such as buildings, trees, human bodies, vehicles, etc.

In some embodiments, the corresponding information in the training point cloud, such as the type corresponding to the point cloud, is marked by using a manual marking mode. And classifying the data points in the training point cloud according to the voxels to obtain corresponding real voxels, labeling each real voxel, determining the type corresponding to the real voxels, the mass center of the real voxels and the offset of each data point and the mass center of the object corresponding to the data point.

The labeling information is taken as real information and participates in the follow-up supervised learning.

In some embodiments, after the training point cloud is obtained, the training point cloud is preprocessed, such as by adopting operations of random rotation, mirror image overturning, random blurring, random clipping and the like, and the training point cloud is processed to obtain a plurality of training point clouds corresponding to the operations, and labels of the training point clouds correspondingly change according to actual operations, so that the number of the training point clouds is greatly expanded. The training can be completed without inputting other related training point clouds again.

Where a voxel is a pixel of 3D space. Quantized, fixed-size point clouds. Each cell is a fixed size and discrete coordinates. The size of the voxels may be set in advance, for example, 0.1 mm by 0.1 mm, or 0.2 mm by 0.2 mm. That is, a data point in a plurality of point cloud data can be included in one voxel.

Step 12: and inputting the training point cloud into the point cloud segmentation model to obtain a predicted voxel output by the point cloud segmentation model and detection information corresponding to the predicted voxel.

In some embodiments, the point cloud segmentation model includes a feature extraction network for extracting voxel features of the input training point cloud to obtain corresponding voxel features and detection information of the corresponding voxel features. These voxel features may be used as prediction voxels.

In some embodiments, a point cloud segmentation model may be built using multi-layer perceptrons, sub-manifold sparse convolution, and the like.

Step 13: at least one voxel pair is determined, each voxel pair comprising a corresponding predicted voxel and a real voxel.

After the point cloud segmentation model determines the predicted voxels, the predicted voxels cannot be made to correspond to the real voxels because the predicted voxels are unordered, and if the predicted voxels cannot correspond to the real voxels, the difference between the predicted voxels and the real voxels on the voxel level cannot be determined.

Based on the above, the application provides a mode of corresponding the predicted voxels and the real voxels to form a corresponding relation, and based on the corresponding relation, difference comparison is carried out.

In some embodiments, the same characteristics of coordinates between the predicted voxel and the real voxel may be utilized to determine the predicted voxel and the real voxel corresponding to the same coordinates. If only the predicted voxel exists at the coordinates, the predicted voxel is discarded. If only a real voxel exists in the coordinates, the detection information of the corresponding prediction voxel is determined to be 0, which indicates that the accuracy of the point cloud segmentation model is low and continuous training is needed.

Step 14: and adjusting network parameters of the point cloud segmentation model according to the difference between the real information and the detection information of the voxel pairs.

In some embodiments, the training times of the point cloud segmentation model can be adjusted according to the difference between the real information and the detection information of the voxel pair, so as to achieve the purpose of adjusting the network parameters of the point cloud segmentation model. If the real information is A and the detection information is B, the training times of the point cloud segmentation model can be adjusted at the moment, so that the network parameters of the point cloud segmentation model can be adjusted; if the real information is A and the detection information is B, but the confidence coefficient is lower than the set threshold value, the training times of the point cloud segmentation model are adjusted, and then the network parameters of the point cloud segmentation model are adjusted.

In some embodiments, network parameters of the point cloud segmentation model may be adjusted according to differences between real information and detection information of voxel pairs, and if a convolutional neural network exists in the point cloud segmentation model, the number, step size, filling of convolutional kernels may be set, an excitation function may be adjusted, parameters of a pooling layer may be adjusted, and the like.

In some embodiments, the calculation of the loss value may also be performed by the data of the real information and the detection information of the voxel pair, and if the loss value is different from a preset loss threshold, the network parameter of the point cloud segmentation model is adjusted.

In the embodiment, training point clouds are obtained; the data points in the training point cloud are classified according to voxels to obtain corresponding real voxels, and each real voxel is marked with real information; inputting the training point cloud into the point cloud segmentation model to obtain a predicted voxel output by the point cloud segmentation model and detection information corresponding to the predicted voxel; determining at least one voxel pair, each voxel pair comprising a corresponding predicted voxel and a real voxel; according to the difference between the real information and the detection information of the voxel pairs, the network parameters of the point cloud segmentation model are adjusted, the corresponding prediction voxels and the real voxels are determined by utilizing the determined voxel pairs, and then the network parameters of the point cloud segmentation model are adjusted by utilizing the difference between the real information and the detection information of the voxel pairs, so that the conversion from voxel characteristics to data point characteristics can be reduced, the loss in the conversion process is reduced, the calculated amount is reduced, and the segmentation accuracy of the point cloud segmentation model is improved.

Referring to fig. 2, fig. 2 is a flowchart of a second embodiment of a training method of a point cloud segmentation model according to the present application. The method comprises the following steps:

Step 21: acquiring a training point cloud; the data points in the training point cloud are classified according to the voxels to obtain corresponding real voxels, and each real voxel is marked with real information.

Step 22: and inputting the training point cloud into the point cloud segmentation model to obtain a predicted voxel output by the point cloud segmentation model and detection information corresponding to the predicted voxel.

Steps 21 to 22 have the same or similar technical solutions as the above embodiments, and are not repeated here.

Step 23: the first coordinates of each real voxel and the second coordinates of each predicted voxel are obtained.

It will be appreciated that the first coordinates of each real voxel may be determined at the time of labeling. The second coordinates of each of the predicted voxels may be determined when determining the predicted voxels.

Step 24: the first coordinates of the target and the second coordinates of the target having the same coordinates are determined.

At this time, the target first coordinates and the target second coordinates having the same coordinates can be determined.

Step 25: and determining the real voxels of the first coordinate of the target and the predicted voxels corresponding to the second coordinate of the target as voxel pairs.

For example, the first coordinate of the true voxel a is a, the first coordinate of the true voxel B is B, the first coordinate of the true voxel C is C, the first coordinate of the true voxel D is D, the first coordinate of the true voxel E is E, and the first coordinate of the true voxel F is F. The second coordinate of the predicted voxel A 'is a, the second coordinate of the predicted voxel B' is C, the second coordinate of the predicted voxel C 'is D, the second coordinate of the predicted voxel D' is B, the second coordinate of the predicted voxel E 'is E, the second coordinate of the predicted voxel F' is F, the second coordinate of the predicted voxel G 'is G, and the second coordinate of the predicted voxel H' is H.

The true voxel a and the predicted voxel a 'are an integral element pair, the true voxel B and the predicted voxel D' are an integral element pair, the true voxel C and the predicted voxel B 'are an integral element pair, the true voxel D and the predicted voxel C' are an integral element pair, the true voxel E and the predicted voxel E 'are an integral element pair, and the true voxel F and the predicted voxel F' are an integral element pair.

The remaining unpaired predicted voxels may then be discarded.

Step 26: and adjusting network parameters of the point cloud segmentation model according to the difference between the real information and the detection information of the voxel pairs.

In this embodiment, the corresponding predicted voxel and the corresponding real voxel are determined by determining the voxel pair, and then the network parameters of the point cloud segmentation model are adjusted by using the difference between the real information and the detection information of the voxel pair, so that the conversion from the voxel characteristics to the data point characteristics can be reduced, the loss in the conversion process is reduced, the calculation amount is reduced, and the segmentation accuracy of the point cloud segmentation model is improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of a training method of a point cloud segmentation model according to the present application. The method comprises the following steps:

Step 31: acquiring a training point cloud; the data points in the training point cloud are classified according to the voxels to obtain corresponding real voxels, and each real voxel is marked with real information.

Step 32: and inputting the training point cloud into the point cloud segmentation model to obtain a predicted voxel output by the point cloud segmentation model and detection information corresponding to the predicted voxel.

Steps 31 to 32 have the same or similar technical solutions as any of the above embodiments, and are not described here again.

Step 33: and carrying out hash coding on the first coordinates of all the real voxels to obtain a corresponding first hash table.

The first coordinates of all real voxels are subjected to hash coding to form a key value pair, the first coordinates are used as values in the key value pair, a value is set for the first coordinates, and then a first hash table is formed.

For example, the first coordinate of the true voxel a is a, the first coordinate of the true voxel B is B, the first coordinate of the true voxel C is C, the first coordinate of the true voxel D is D, the first coordinate of the true voxel E is E, and the first coordinate of the true voxel F is F. The corresponding first hash table may be expressed as { (0, a), (1, b), (2, c), (3, d) (4, e), (5, f) }.

Step 34: and carrying out hash coding on the second coordinates of all the predicted voxels to obtain a corresponding second hash table.

And similarly, carrying out hash coding on the second coordinates of all the prediction voxels to obtain a corresponding second hash table.

Step 35: traversing the second hash table of each prediction voxel and all the first hash tables to determine a target first coordinate and a target second coordinate with the same coordinates.

And e.g. according to the data in the second hash table, performing traversal matching on each data and all the data in the first hash table, and determining the first coordinates of the target and the second coordinates of the target with the same coordinates.

In other embodiments, the first hash table and all second hash tables for each real voxel may be utilized to traverse to determine the target first coordinate and the target second coordinate with the same coordinates.

Step 36: and determining the real voxels of the first coordinate of the target and the predicted voxels corresponding to the second coordinate of the target as voxel pairs.

The description is given with reference to fig. 4 and 5:

As shown in fig. 4, the sparse voxel level prediction is denoted as a matrix of n×c, where N represents the number of all non-empty voxels and C represents the feature dimension, and may be denoted as a sparse voxel level with a true value of N1×c1. The number of real voxels is different, such as N and N1, and the order of arrangement of the predicted voxels and the real voxels is also different, such as the feature with voxel coordinates (x, y, z) being located at the 2 nd in N x C, but possibly being the 4 th or not at all in N1 x C1, which is a feature of the sparse feature, namely the arrangement randomness. Unlike an image, there is a one-to-one coordinate relationship. The reason for this is that since the points distributed in space are themselves sparse, some voxels must be empty after the whole space is voxelized, and thus there is no value, and the sparsification expresses where these empty values are discarded, and thus there is a phenomenon that the features are not aligned.

The above technical solution is provided for enabling an operation between a real voxel and a predicted voxel, such as a cross entropy loss function or an L1 loss function. Firstly, constructing a hash table to encode the coordinates of all effective real voxels, then using a hash value as a key value of the position coordinates, and continuously inquiring the characteristics of the corresponding coordinate hash value in the target by using the hash value of the coordinates of the hash table, so that the characteristics of the same position of the coordinates between two real voxels and the predicted voxel can be obtained, and then using a general method for calculating a loss function to supervise.

Specifically, as shown in FIG. 5, two sparse features are givenAnd/>Hash coding is respectively carried out on the coordinate values to obtain hash tables X and Y

For the followingEach value of/>For the value inside X, hash matching is used to match the value of Y.

After obtaining the matching value, goFeature taken to the corresponding location/>And/>To do the loss function.

And not matching, and skipping to the next.

Step 37: and adjusting network parameters of the point cloud segmentation model according to the difference between the real information and the detection information of the voxel pairs.

In this embodiment, the hash encoding manner encodes the predicted voxel and the real voxel, and further determines the corresponding predicted voxel and the real voxel by using the determined voxel pair, and further adjusts the network parameters of the point cloud segmentation model by using the difference between the real information and the detection information of the voxel pair, so that the conversion from the voxel characteristics to the data point characteristics can be reduced, the loss in the conversion process is reduced, the calculation amount is reduced, and the segmentation accuracy of the point cloud segmentation model is improved.

Referring to fig. 6, fig. 6 is a flowchart of a fourth embodiment of a training method of a point cloud segmentation model according to the present application. The point cloud segmentation model comprises: a sparse voxel feature encoding network, a three-dimensional sparse network, a point voxel network, a first supervisory network, a second supervisory network, a third supervisory network, and a fourth supervisory network. The method comprises the following steps:

step 601: acquiring a training point cloud; the data points in the training point cloud are classified according to the voxels to obtain corresponding real voxels, and each real voxel is marked with real information.

In some embodiments, each data point is acquired by a radar sensor, and then the data point has four-dimensional data, namely, a three-dimensional coordinate and a reflection intensity corresponding to the radar sensor.

Step 602: and inputting the training point cloud into a sparse voxel feature coding network to perform feature extraction to obtain voxel features.

The method comprises the steps of carrying out voxelized feature extraction on a training point cloud through a sparse voxel feature coding network to obtain feature vectors of point levels and voxel features of voxel levels. Firstly, extracting characteristics of each data point in the training point cloud through a point-by-point multi-layer perceptron (MLP) in a sparse voxel characteristic coding network to obtain the point cloud characteristics of each data point. Such as a two-layer linear multi-layer perceptron, the output channel of each layer of perceptron is 32 or 64.

And dividing the training point cloud according to the size of the voxels to obtain data points corresponding to each voxel. Because the data points all obtain the corresponding point cloud characteristics, the point cloud characteristics of the data points can be aggregated at the moment to form voxel characteristics.

Specifically, all data points in the target voxel are determined, and the operation of taking the maximum value or taking the minimum value or taking the average value is performed on the point cloud characteristics corresponding to the data points, so that one target point cloud characteristic is obtained. The target point cloud feature is taken as the voxel feature of the target voxel.

Further, feature combination can be performed on the voxel features and the point cloud features corresponding to all the data points in the target voxels, and multi-layer perceptron feature extraction operation is performed on the combined features, so that the point cloud features of the final data points have the information of the voxel features, namely the point cloud features contain the context information of the voxel features.

And then carrying out maximum value taking or minimum value taking or average value taking on the point cloud characteristics corresponding to all the data points in the target voxel to obtain one target point cloud characteristic. The target point cloud feature is taken as the voxel feature of the target voxel.

At this time, the voxel features and the point cloud features have stronger correlation.

Step 603: and inputting the voxel characteristics into a three-dimensional sparse network for sparse voxel characteristic extraction to obtain sparse voxel characteristics.

In some embodiments, the three-dimensional sparse network comprises: the system comprises a first network block, a second network block, a third network block, a fourth network block and a fusion layer. The first network block comprises 2 base units, the second network block comprises 2 base units, the third network block comprises 3 base units, the fourth network block comprises 4 base units, and each base unit comprises two layers of sub-manifold sparse convolution and one layer of sparse convolution.

In an application scenario, referring to fig. 7, step 603 may be the following flow:

Step 6031: and carrying out sparse feature extraction on the voxel features by using the first network block to obtain first sparse voxel features.

And carrying out sparse feature extraction on the voxel features by using the sub-manifold sparse convolution and the sparse convolution in 2 basic units in the first network block to obtain first sparse voxel features.

Step 6032: and carrying out sparse feature extraction on the first sparse voxel feature by using the second network block to obtain a second sparse voxel feature.

And carrying out sparse feature extraction on the first sparse voxel feature by using the sub-manifold sparse convolution and the sparse convolution in 2 basic units in the second network block to obtain a second sparse voxel feature.

Step 6033: and carrying out sparse feature extraction on the second sparse voxel feature by using a third network block to obtain a third sparse voxel feature.

And carrying out sparse feature extraction on the second sparse voxel feature by using the sub-manifold sparse convolution and the sparse convolution in 3 basic units in the third network block to obtain a third sparse voxel feature.

Step 6034: and carrying out sparse feature extraction on the third sparse voxel feature by using a fourth network block to obtain a fourth sparse voxel feature.

And carrying out sparse feature extraction on the third sparse voxel feature by utilizing the sub-manifold sparse convolution and the sparse convolution in 4 basic units in the fourth network block to obtain a fourth sparse voxel feature.

Step 6035: and splicing and fusing the second sparse voxel feature, the third sparse voxel feature and the fourth sparse voxel feature by using a fusion layer to obtain a fifth sparse voxel feature.

The fifth sparse voxel feature has more information.

In the process, the sub-manifold sparse convolution can keep feature sparsity in calculation. Sparse convolution produces dilution of the activation region for out-diffusion of features to cover the true object centroid, which might otherwise be without data points. Thus, the combined application of sub-manifold sparse convolution and sparse convolution is well suited for sparse point clouds that are distributed only on the object surface.

Specifically, the sub-manifold sparse convolution in each base unit is used for feature extraction, and the sparse convolution is used for short-circuiting the input and the output of the base unit to complete the splicing.

In some embodiments, the first network block and the second network block employ sub-flow type sparse max pooling to expand voxel receptive fields.

Step 604: and inputting the sparse voxel characteristics into a point voxel network for characteristic conversion to obtain data point characteristics.

In some embodiments, feature conversion may be performed on the point voxel network based on the second sparse voxel feature, the third sparse voxel feature, the fourth sparse voxel feature, and the point cloud feature output in the sparse voxel feature encoding network, to obtain the data point feature.

Specifically, feature combination is carried out on the second sparse voxel feature, the third sparse voxel feature, the fourth sparse voxel feature and the point cloud feature, and the obtained data point features are obtained. I.e. the data point features have information of voxel features of different dimensions, i.e. the point voxel features contain context information of the voxel features.

Step 605: and inputting the sparse voxel characteristics into a first supervision network to perform semantic segmentation learning on the three-dimensional voxels, and obtaining first detection information corresponding to the predicted voxels.

Step 606: at least one voxel pair is determined, each voxel pair comprising a corresponding predicted voxel and a real voxel.

Step 606 has the same or similar technical scheme as any of the above embodiments, and will not be described herein.

Step 607: and determining the difference between the first detection information and the first real information to obtain a first loss value.

In some embodiments, referring to fig. 8, step 607 may be the following procedure:

step 6071: a first sub-loss value between the first detected information and the real information is determined using a lovassz-Softmax loss function.

The Lovasz-Softmax loss function is expressed as follows:

。

Step 6072: a second sub-loss value between the first detection information and the real information is determined using a cross entropy loss function.

The cross entropy loss function is expressed as follows:

。

Step 6073: and summing the first sub-loss value and the second sub-loss value to obtain a first loss value.

Step 608: and inputting the sparse voxel characteristic into a second monitoring network to perform thermodynamic diagram learning in three dimensions, and obtaining second detection information corresponding to the predicted voxel.

Step 609: at least one voxel pair is determined, each voxel pair comprising a corresponding predicted voxel and a real voxel.

Step 609 has the same or similar technical scheme as any of the above embodiments, and will not be described herein.

Step 610: and determining the difference between the second detection information and the second real information to obtain a second loss value.

And determining the difference between the second detection information and the second real information by using the Focal loss function to obtain a second loss value.

Wherein the Focal loss function is expressed as follows:

。

In some embodiments, referring to fig. 9, step 610 may be preceded by the following procedure:

Step 81: and inputting the sparse voxel features into a second monitoring network to perform thermodynamic diagram learning in three dimensions, and obtaining the probability that each sparse voxel feature belongs to the mass center of the object.

Step 82: the probability is taken as second detection information.

Step 83: and determining second real information by using the mass center of each object and sparse voxel characteristics within a preset distance.

Step 611: and inputting the data point characteristics into a third supervision network to perform semantic segmentation learning at the point level, and obtaining third detection information corresponding to each data point.

Step 612: and determining the difference between the third detection information and the third real information to obtain a third loss value.

In some embodiments, referring to fig. 10, step 612 may be the following procedure:

Step 6121: a third sub-loss value between the third detection information and the third real information is determined using a lovassz-Softmax loss function.

Step 6122: a fourth sub-loss value between the third detection information and the third real information is determined using a cross entropy loss function.

Step 6123: and summing the third sub-loss value and the fourth sub-loss value to obtain a third loss value.

Step 613: and inputting the data point characteristics into a fourth supervision network to perform offset supervision learning at the point level, and obtaining a predicted offset corresponding to each data point.

Step 614: and determining the difference between the predicted offset and the fourth real information to obtain a fourth loss value.

In some embodiments, referring to fig. 11, step 614 may be the following procedure:

Step 6141: and determining the mass center of the real object by using fourth real information of the data points.

It is understood that the true object centroid may not be a data point.

Step 6142: a true offset of the data point from a true object centroid is determined.

In some embodiments, this true offset may be determined in advance at the time of labeling.

Step 6143: and obtaining a fourth loss value by using the predicted offset and the real offset.

In some embodiments, a smoothl 1 loss function may be used to derive a fourth loss value.

Step 615: and adjusting network parameters of the point cloud segmentation model by using the first loss value, the second loss value, the third loss value and the fourth loss value.

The network parameters of the point cloud segmentation model can be adjusted by adopting a gradient descent algorithm and a random gradient descent algorithm.

In an application scenario, the following is described with reference to fig. 12: the point cloud segmentation model comprises: a sparse voxel feature encoding network, a three-dimensional sparse network, a point voxel network, a first supervisory network, a second supervisory network, a third supervisory network, and a fourth supervisory network.

The point voxel network comprises a first point cloud feature extraction layer, a first point voxel feature extraction layer, a second point voxel feature extraction layer and a third point voxel feature extraction layer. The three-dimensional sparse network includes: the system comprises a first network block, a second network block, a third network block, a fourth network block and a fusion layer.

The second supervisory network is a three-dimensional thermodynamic diagram network, the third supervisory network is a point-by-point semantic network and the fourth supervisory network is a point-by-point offset network.

And inputting the training point cloud data into a sparse voxel feature encoding network for feature extraction, and correspondingly obtaining the point cloud features and voxel features of each data point.

And inputting the voxel characteristics into a first network block for characteristic extraction to obtain first sparse voxel characteristics. And inputting the first voxel characteristic into a second network block for characteristic extraction to obtain a second sparse voxel characteristic.

And inputting the second voxel characteristic into a third network block for characteristic extraction to obtain a third sparse voxel characteristic.

And inputting the third voxel characteristic into a fourth network block for characteristic extraction to obtain a fourth sparse voxel characteristic.

The point cloud features are input to a first point cloud feature extraction layer to perform feature extraction, and the first point cloud features are correspondingly obtained.

And inputting the first point cloud feature and the second sparse voxel feature into a first point voxel feature extraction layer to perform feature extraction and fusion to obtain a first point voxel feature.

And inputting the first point voxel characteristic and the third sparse voxel characteristic into a second point voxel characteristic extraction layer to perform characteristic extraction and fusion, so as to obtain the second point voxel characteristic.

And inputting the second point voxel characteristic and the fourth voxel characteristic into a third point voxel characteristic extraction layer to perform characteristic extraction and fusion, so as to obtain the third point voxel characteristic.

And respectively inputting the third point voxel characteristic into a point-by-point semantic network and a point-by-point offset network. Semantic information and offset for each data point is obtained.

And inputting the second sparse voxel feature, the third sparse voxel feature and the fourth sparse voxel feature into a fusion layer for splicing and fusion to obtain a fifth sparse voxel feature.

And inputting the fifth sparse voxel characteristic into a three-dimensional thermodynamic diagram network to perform thermodynamic diagram learning, and obtaining second detection information corresponding to the fifth sparse voxel characteristic.

And inputting the second sparse voxel feature, the third sparse voxel feature and the fourth sparse voxel feature into a first supervision network to perform semantic segmentation learning on the three-dimensional voxels, so as to obtain first detection information corresponding to the predicted voxels.

In particular, hybrid sparse supervision consists of four supervision networks responsible for different tasks: a point-by-point semantic network for predicting an amorphous surface; shifting the network point by point; 3D class-independent sparse coding centroid thermodynamic diagram networks for object clustering; and an auxiliary sparse voxel semantic network for better feature learning, i.e. a first supervision network. The four networks share the backbone network, are trained end to end, and play an effective role in joint learning of semantic and instance segmentation.

In which a point-by-point semantic network consists of a series of linear layers, applied to many of the previous works. The sum of the Lov sz-Softmax penalty and the cross entropy penalty is used for supervision of the point-wise semantic network. Record this loss as。

A point-by-point offset network for supervising the offset of each data point. Assuming that the number of points belonging to an object is the number, the offset prediction is noted asI.e. the predicted offset. Then by will/>Adding to the original coordinates of the point cloud, the points after the offset are obtained, which are expected to be distributed around the object centroid. For true values, build an instance tensor/>Wherein/>Representing instance partition truth labels,/>A true value binary mask representing only object points. Use/>A centroid truth value representing an object point. To get/>Pair/>Use/>Manipulating operators to obtain/>Centroid is then used/>Refer to mean/>Operators are operated on to return to each instance point. As expressed by the following formula:

。

The migration was regressed with a smoothl 1 loss, where only object points participated in the calculation of this loss, expressed using the following formula:

。

3D class independent sparse coding centroid thermodynamic diagram networks. Recording the number of activated voxels as This thermodynamic network models a probability that each 3D voxel is an object centroid, i.e./>. We therefore calculate the true value/>, of the sparse coding by the inverse of the distance of each object centroid and its surrounding activated voxels. Specifically, the expression is carried out by adopting the following formula:

；

。

wherein, Representing an averaging operator,/>Operator representing the maximum number of labels taken to compute an instance,/>Instance labels representing voxels/>，/>Representing different instance tags/>Corresponding centroid/>. To realize/>And/>To build an instance vector/>Wherein/>Representing instance tags. /(I)

In addition, atThe voxels around the centroid are set to 1 and 0.8 to ensure that the true value can contain the true centroid. On the other hand, the sparse convolution layer in the sparse coding centroid heat map network adopts SC convolution, so that the characteristics of the heat map can be outwards diffused to the real object centroid. Thus,/>Operators need to be applied here to align unmatched/>And/>. Loss calculation was performed using focal loss, expressed using the following formula:

。

Sparse voxel semantic network. Sparse voxel features from multiple levels in a backbone network Are input to the sparse voxel semantic network, respectively, containing a series of SSC convolution layers for preserving the activated region. Note the sparse voxel prediction of the i-th hierarchy as/>The corresponding true value is/>One feature is a sparse coding tensor for most point classes in the valid voxels. Adoption/>To align/>And/>. The loss is calculated using the following formula:

；

。

Wherein the method comprises the steps of Expressed as =lov sz-Softmax loss,/>Representing cross entropy loss. The sparse voxel semantic network serves as an auxiliary supervision, and more sufficient feature learning is obtained in the combined training with the network.

The overall loss of the point cloud segmentation model is the sum of the above, expressed by the following formula:

。

the operator in the above procedure is described below.

For most voxel-based approaches, feature alignment is a common operation of a voxel feature encoder or collection of point-like features to pass between points and voxel-like features. However, the previous work only considered two cases: 1. voxel a point feature into voxels;2. Collecting point features from voxels/>Both of these approaches fail to address the alignment problem between mismatched voxel features. In order to supervise sparse voxel characteristics, the application introduces a new operator/>。

The data of disordered points and sparse voxels (including predictions and labels) are unified into one sparse representation. The sparse tensor is expressed as:

。

wherein, Is a spatial coordinate in a 3D voxel or point cloud,/>Is a feature vector corresponding to the coordinates. More specifically, the point cloud segmentation network operates on two broad classes of tensors: point cloud tensor/>And sparse voxel tensor。/>And/>And interconverts to align features between points and voxels.

1）: Given a point cloud feature T, pass/>The operator converts it into sparse voxel tensor/>。

；

。

Where s refers to the size of the voxel,Default refers to the operator that takes the largest value. /(I)The simultaneous voxelization of coordinates and features is actually shown.

2）: To aggregate the point tensors T,/>, from the sparse voxel tensors SThe operator specifies each point feature of the voxel it is in, and represents it as:

；

。

3） : above/> And/>The operator only considers the transformation between points and voxels, and cannot handle cases such as sparse voxel tensor alignment or supervision. Two tensors/>, where given coordinates may not matchAnd/>，/>Their features are greedy matched according to the corresponding coordinates, e.g., a hash table is first constructed to encode the coordinates of all activated voxels. Then, the target sparse element coordinates are used as keywords.

The working principles of the sparse voxel feature coding network, the three-dimensional sparse network and the point voxel network are respectively described below.

The sparse voxel feature encoding network distributes each data point of the training point cloud to voxels uniformly distributed in space, and simultaneously extracts point-by-point features and sparse voxel features. For point cloudsBy/>Representation, wherein original features/>Centroid with the voxel midpoint/>Voxel center point coordinates/>And connecting. After several linear layers,/>And/>The operators are used jointly to extract the output of each layer, expressed using the following formula:

；

。

wherein, Representing a feature concat operation. In the sparse voxel feature coding network, point-by-point features comprise the geometric context of voxels, and sparse voxel features/>Is fed into the next three-dimensional sparse network. /(I)Representing the averaging operator.

Two kinds of sparse convolution (SC and SSC) are used in combination in a three-dimensional sparse network. SSC maintains feature sparsity in calculation and is widely used in the network; at the same time SC creates a dilution of the activation region, which is used only in the ad hoc network head to out-diffuse features, covering the true instance centroid, which might otherwise be puncteless. This integrated application is very suitable for sparse point clouds distributed only on the object surface.

The three-dimensional sparse network includes four network blocks. The basic block SUBM is defined as a basic unit that contains two layers of SSCs with a convolution kernel size of 3 and one layer of SCs with a convolution kernel size of 1. The former is used for feature extraction and the latter is used for shorting the input to the output of the cell. The network blocks 1 to 4 comprise 2,3,4 basic block units, respectively. In addition, the first two network blocks employ sub-flow sparse max pooling to expand the voxel receptive field. Recording input sparse featuresThe output characteristics of each network block are then noted/>Where i is equal to 1 to 4.

A network of point voxels. The multi-level sparse features are jointly encoded with point-by-point features in a network of point voxels. This joint coding is a very efficient feature aggregation. However, in the related art, only non-empty voxels corresponding to key points of the neighborhood are indexed, and the extraction of the application is performed byThe operator covers the whole point cloud and is expressed by the following formula:

。

In this way, sparse voxel features in the last three network blocks and data point features output by the sparse voxel feature coding network are aggregated, so that the output P of the point branches synthesizes shallow geometric information and deep context information.

Further, when the trained point cloud segmentation model is utilized to conduct panoramic instance reasoning, the reasoning of the mass center of the object is as follows:

At the time of reasoning, to further obtain centroid predictions First to/>The activated voxels of (1) are subjected to a sparse max pooling, followed by preserving the voxel coordinates of the unchanged features before and after pooling. The expression is carried out by adopting the following formula:

。

wherein, Indicating a core size of/>Is a 3D sparse max pooling layer of (c). Since many unclean predictions are involved, the application sets a threshold/>To filter out low-to-medium confidence predictions. The K highest confidence centroids are then taken as the final centroid predictions.

Class independent instance tag assignment. Through predicted K centroidsAnd a point-by-point offset/>Each shifted data point is assigned to its nearest centroid prediction by:

。

wherein, Representing the coordinates of the predicted object point, while/>Representing the predicted instance ID. Since K should be set to a value greater than the maximum number of objects in a single scene, some predicted centroids cannot be assigned to any point, and therefore these centroids are deleted in the reasoning. Further, the instance ID of the point of the amorphous surface category is set to 0.

And the final panoramic segmentation result is obtained by fusing the instance segmentation result irrelevant to the category and the point-by-point semantic result. The application adopts a parallelizable fusion strategy: for each centroidIts semantic tag/>, is obtained by: Semantic prediction of a set of points assigned to the centroid/>The row votes, the category with the highest number of votes, is set as the semantic label for that centroid. The point set is then labeled/>Is modified as/>This operation results in improved semantic prediction and instance prediction.

Referring to fig. 13, fig. 13 is a flowchart illustrating a first embodiment of a point cloud data segmentation method according to the present application. The method comprises the following steps:

step 131: and acquiring point cloud data.

Step 132: and inputting the point cloud data into a point cloud segmentation model, and outputting the segmented point cloud data.

The point cloud segmentation model is obtained by training the method provided by the technical scheme.

Thus, the embodiment can be applied to a scene where an autonomous vehicle and a robot perform panoramic segmentation.

Referring to fig. 14, fig. 14 is a schematic structural diagram of an embodiment of a processing device for point cloud data according to the present application. The processing device 140 comprises a processor 141 and a memory 142 coupled to the processor 141, the memory 142 for storing a computer program, the processor 141 for executing the computer program to implement the method of:

Acquiring a training point cloud; the data points in the training point cloud are classified according to voxels to obtain corresponding real voxels, and each real voxel is marked with real information; inputting the training point cloud into the point cloud segmentation model to obtain a predicted voxel output by the point cloud segmentation model and detection information corresponding to the predicted voxel; determining at least one voxel pair, each voxel pair comprising a corresponding predicted voxel and a real voxel; according to the difference between the real information and the detection information of the voxel pair, adjusting the network parameters of the point cloud segmentation model;

or, acquiring point cloud data; and inputting the point cloud data into a point cloud segmentation model, and outputting the segmented point cloud data.

The point cloud segmentation model is obtained by training the point cloud segmentation model training method in any embodiment.

It can be understood that the processor 141 is further configured to execute a computer program to implement the technical solution of any of the foregoing embodiments, which is not described herein.

Referring to fig. 15, fig. 15 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present application. The computer readable storage medium 150 is for storing a computer program 151, which computer program 151, when being executed by a processor, is for carrying out the method of:

It will be appreciated that the computer program 151, when executed by a processor, is further configured to implement the technical solutions of any of the foregoing embodiments, which are not described herein in detail.

In summary, the application provides a point cloud segmentation model which solves the problem that sparse point clouds are only distributed on the surface of an object by directly regressing a 3D sparse coding voxel centroid heat map and point-by-point offset about centroids. The two types of sparse convolution are comprehensively used, so that accurate centroid regression can be obtained, and the calculation efficiency is very high. In addition, sparse supervision of multiple levels of voxel features allows better feature learning. To further enable our efficient hybrid sparse supervision to be used on hybrid feature representations, we propose three sparse alignment operators, including interconversions between point-by-point features and voxel features and alignment mismatched voxel features.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, e.g., the division of the circuits or elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes according to the present application and the accompanying drawings, or direct or indirect application in other related technical fields, are included in the scope of the present application.

Claims

1. A method for training a point cloud segmentation model, the method comprising:

Acquiring a training point cloud; the data points in the training point cloud are classified according to voxels to obtain corresponding real voxels, and each real voxel is marked with real information; the point cloud segmentation model comprises: a sparse voxel feature encoding network, a three-dimensional sparse network, a point voxel network, a first supervision network, a second supervision network, a third supervision network and a fourth supervision network;

Inputting the training point cloud into a sparse voxel feature coding network for feature extraction to obtain voxel features;

inputting the voxel features into a three-dimensional sparse network for sparse voxel feature extraction to obtain sparse voxel features; the three-dimensional sparse network comprises a first network block, a second network block, a third network block and a fourth network block, wherein the first network block comprises 2 foundation units, the second network block comprises 2 foundation units, the third network block comprises 3 foundation units and the fourth network block comprises 4 foundation units, the foundation units in each network block comprise two layers of sub-manifold sparse convolution and one layer of sparse convolution, the sub-manifold sparse convolution is used for feature extraction, the sparse convolution is used for carrying out short circuit connection on input and output of the units, dilution of an activation area is generated so as to cover a real object centroid, and the initial two network blocks adopt sub-manifold sparse maximum pooling so as to enlarge voxel feeling fields;

inputting the sparse voxel characteristics into a point voxel network for characteristic conversion to obtain data point characteristics;

Inputting the sparse voxel characteristics into the first supervision network to perform semantic segmentation learning on three-dimensional voxels to obtain first detection information corresponding to predicted voxels;

inputting the sparse voxel characteristic into the second monitoring network to perform thermodynamic diagram learning in three dimensions, and obtaining second detection information corresponding to the prediction voxel;

Inputting the data point characteristics into the third supervision network to perform semantic segmentation learning at a point level to obtain third detection information corresponding to each data point;

Inputting the data point characteristics into the fourth supervision network to perform offset supervision learning at a point level to obtain a predicted offset corresponding to each data point;

acquiring a first coordinate of each real voxel and a second coordinate of each predicted voxel;

Determining a first target coordinate and a second target coordinate with the same coordinates;

Determining the real voxels of the first coordinates of the target and the predicted voxels corresponding to the second coordinates of the target as voxel pairs, each voxel pair comprising a corresponding one of the predicted voxels and the real voxels; if only a predicted voxel exists in the coordinates, discarding the predicted voxel, and if only a real voxel exists in the coordinates, determining detection information of the corresponding predicted voxel as 0; the detection information comprises first detection information and second detection information;

and according to the difference between the real information and the detection information of the voxel pair, third detection information and a prediction offset corresponding to each data point, adjusting network parameters of the point cloud segmentation model.

2. The method of claim 1, wherein said obtaining a first coordinate of each of said real voxels and a second coordinate of each of said predicted voxels comprises:

carrying out hash coding on the first coordinates of all the real voxels to obtain a corresponding first hash table;

Carrying out hash coding on the second coordinates of all the prediction voxels to obtain a corresponding second hash table;

the determining the first coordinate of the target and the second coordinate of the target with the same coordinates includes:

And traversing the second hash table of each prediction voxel and all the first hash tables to determine a target first coordinate and a target second coordinate with the same coordinates.

3. The method of claim 2, wherein the real information comprises first real information, second real information, third real information, and fourth real information; the first real information characterizes semantic information of the real voxels; the second real information characterizes object centroid information of the real voxels; the third real information characterizes semantic information of the data point; the fourth real information characterizes offset information of the data point.

4. A method according to claim 3, wherein said adjusting network parameters of the point cloud segmentation model based on differences between the real information and the detected information of the voxel pairs comprises:

determining the difference between the first detection information and the first real information to obtain a first loss value;

determining the difference between the second detection information and the second real information to obtain a second loss value;

determining the difference between the third detection information and the third real information to obtain a third loss value;

Determining the difference between the predicted offset and the fourth real information to obtain a fourth loss value;

And adjusting network parameters of the point cloud segmentation model by using the first loss value, the second loss value, the third loss value and the fourth loss value.

5. The method of claim 4, wherein determining the difference between the first detected information and the first real information results in a first loss value, comprising:

determining a first sub-loss value between the first detection information and the first real information by using a Lovasz-Softmax loss function;

determining a second sub-loss value between the first detection information and the first real information using a cross entropy loss function;

And summing the first sub-loss value and the second sub-loss value to obtain the first loss value.

6. The method of claim 4, wherein the determining the difference between the second detected information and the second real information, prior to obtaining a second loss value, comprises:

inputting the sparse voxel features into the second monitoring network to perform thermodynamic diagram learning in three dimensions, and obtaining the probability that each sparse voxel feature belongs to the mass center of the object; taking the probability as the second detection information;

and determining the second real information by using each mass center of the object and the sparse voxel characteristic in a preset distance.

7. The method of claim 6, wherein determining the difference between the second detected information and the second actual information results in a second loss value, comprising:

And determining the difference between the second detection information and the second real information by using a Focal loss function to obtain the second loss value.

8. The method of claim 4, wherein determining the difference between the third detected information and the third actual information results in a third loss value, comprising:

Determining a third sub-loss value between the third detection information and the third real information by using a Lovasz-Softmax loss function;

determining a fourth sub-loss value between the third detection information and the third real information using a cross entropy loss function;

and summing the third sub-loss value and the fourth sub-loss value to obtain the third loss value.

9. The method of claim 4, wherein said determining a difference between the predicted offset and the fourth real information results in a fourth loss value, comprising:

Determining a true object centroid using the fourth true information of the data point;

Determining a true offset of the data point from the true object centroid;

and obtaining the fourth loss value by using the predicted offset and the real offset.

10. A method for partitioning point cloud data, the method comprising:

Acquiring point cloud data;

inputting the point cloud data into a point cloud segmentation model, and outputting the segmented point cloud data, wherein the point cloud segmentation model is trained by the method according to any one of claims 1-9.

11. A processing device of point cloud data, characterized in that the processing device comprises a processor and a memory coupled to the processor, the memory for storing a computer program, the processor for executing the computer program for implementing the method according to any of claims 1-10.

12. A computer readable storage medium for storing a computer program for implementing the method according to any one of claims 1-10 when executed by a processor.