CN112149677A

CN112149677A - Point cloud semantic segmentation method, device and equipment

Info

Publication number: CN112149677A
Application number: CN202010963729.4A
Authority: CN
Inventors: 魏宇飞
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2020-12-29

Abstract

The method, the device and the equipment for point cloud semantic segmentation are used for performing voxelization processing on point cloud data to be detected to obtain voxelized point cloud data; calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model to determine a semantic segmentation result; decoding the semantic segmentation result to determine a decoding result, and calculating the decoding result based on the index of each point of the decoding result in each dimension direction to obtain target point cloud data; and classifying all point cloud data to be detected according to the target point cloud data to determine the category information of all point cloud data to be detected. Therefore, the calculated amount of semantic segmentation is reduced, and the accuracy and the integrity of point cloud semantic segmentation results in all scenes are ensured.

Description

Point cloud semantic segmentation method, device and equipment

Technical Field

The present application relates to the field of computers, and in particular, to a method, an apparatus, and a device for point cloud semantic segmentation.

Background

The semantic segmentation algorithm based on the 2D image mainly has the function of carrying out region division on pixels belonging to different targets on a picture. Similarly, the semantic segmentation algorithm based on 3D point cloud mainly functions to partition the area of the point cloud belonging to different targets in the scene. The existing point cloud semantic segmentation algorithm comprises the following steps: PointNet, PointNet + +, PointSIFT, etc. The method directly processes point cloud data and outputs semantic segmentation categories of each point cloud. Although the algorithm achieves higher segmentation accuracy, the calculation amount is large because the point cloud data is directly processed, so that the practical application of the algorithm is limited.

Disclosure of Invention

An object of the present application is to provide a method, an apparatus and a device for point cloud semantic segmentation, which solve the problem of excessive calculation amount in the prior art for directly processing point cloud data.

According to one aspect of the application, a method for point cloud semantic segmentation is provided, and the method comprises the following steps:

performing voxelization processing on the point cloud data to be detected to obtain voxelized point cloud data;

calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model to determine a semantic segmentation result;

decoding the semantic segmentation result to determine a decoding result, and calculating the decoding result based on the index of each point of the decoding result in each dimension direction to obtain target point cloud data;

and classifying all point cloud data to be detected according to the target point cloud data to determine the category information of all point cloud data to be detected.

Further, before the calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model to determine a semantic segmentation result, the method comprises the following steps:

acquiring training data, and performing voxelization processing on the training data to obtain voxelized training data;

and training the point cloud semantic segmentation model by using the voxelized training data to obtain a preset point cloud semantic segmentation model.

Further, the training of the point cloud semantic segmentation model using the voxelized training data comprises:

classifying the voxelized training data to determine a plurality of category information, generating a label according to the category information and labeling the voxelized training data with the label;

calculating marked voxelized training data by using a high-resolution network to determine a training feature map;

and coding the label to obtain a coded label, and calculating the cross entropy loss of the coded label and the training feature map to train a point cloud semantic segmentation model.

Further, the calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model to determine a semantic segmentation result comprises:

calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model to determine a feature map;

and determining a semantic segmentation result according to the feature map.

Further, the decoding the semantic segmentation result to determine a decoding result includes:

and decoding the semantic segmentation result by using one-hot coding to determine a decoding result.

Further, the calculating the decoding result based on the index of each point of the decoding result in each dimension direction to obtain the target point cloud data includes:

screening all points in the decoding result according to the numerical value corresponding to each point to determine a target point;

taking the voxel serial numbers of the target points on all axes as indexes, and generating an index matrix according to the indexes;

and calculating the index matrix to obtain target point cloud data.

Further, the target point cloud data includes a central point coordinate of the target point cloud, and the step of classifying all the point cloud data to be detected according to the target point cloud data to determine category information of all the point cloud data to be detected includes:

calculating a coordinate threshold of the target point cloud according to the center point coordinates of the target point cloud;

and classifying all point cloud data to be detected according to the coordinate threshold value to determine the category information of all point cloud data to be detected.

Further, classifying all point cloud data to be detected according to the coordinate threshold to determine category information of all point cloud data to be detected, including:

screening point cloud data to be detected in the target point cloud space according to the coordinate threshold;

and classifying the point cloud data to be detected in the target point cloud space into target categories.

According to another aspect of the present application, there is also provided an apparatus for point cloud semantic segmentation, wherein the apparatus includes:

the data processing module is used for carrying out voxelization processing on the point cloud data to be detected to obtain voxelized point cloud data;

the identification module is used for calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model so as to determine a semantic segmentation result;

the decoding module is used for decoding the semantic segmentation result to determine a decoding result, and calculating the decoding result based on the index of each point of the decoding result in each dimension direction to obtain target point cloud data;

and the classification module is used for classifying all point cloud data to be detected according to the target point cloud data so as to determine the category information of all point cloud data to be detected.

According to yet another aspect of the application, there is also provided a computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the method of any of the preceding claims.

According to yet another aspect of the present application, there is also provided an apparatus for point cloud semantic segmentation, wherein the apparatus comprises:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform operations of any of the methods described above.

Compared with the prior art, the method and the device have the advantages that the point cloud data to be detected are subjected to voxelization processing, and voxelization point cloud data are obtained; calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model to determine a semantic segmentation result; decoding the semantic segmentation result to determine a decoding result, and calculating the decoding result based on the index of each point of the decoding result in each dimension direction to obtain target point cloud data; and classifying all point cloud data to be detected according to the target point cloud data to determine the category information of all point cloud data to be detected. Therefore, the calculated amount of semantic segmentation is reduced, and the accuracy and the integrity of point cloud semantic segmentation results in all scenes are ensured.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow diagram of a method of point cloud semantic segmentation provided in accordance with an aspect of the present application;

FIG. 2 is a schematic diagram of a vehicle labeling in a practical application scenario in a preferred embodiment of the present application;

FIG. 3 is a schematic diagram of three-dimensional spatial point cloud data and annotation collected by a laser radar in a preferred embodiment of the present application;

FIG. 4 illustrates a schematic flow diagram of a high resolution network in a preferred embodiment of the present application;

fig. 5 shows a schematic diagram of a framework structure of an apparatus for point cloud semantic segmentation provided according to another aspect of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

Fig. 1 shows a schematic flow chart of a method for point cloud semantic segmentation provided according to an aspect of the present application, where the method includes: S11-S14, wherein in the S11, the point cloud data to be detected is subjected to voxelization processing to obtain voxelized point cloud data; step S12, calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model to determine a semantic segmentation result; step S13, decoding the semantic segmentation result to determine a decoding result, and calculating the decoding result based on the index of each point of the decoding result in each dimension direction to obtain target point cloud data; and step S14, classifying all point cloud data to be detected according to the target point cloud data to determine the category information of all point cloud data to be detected. Therefore, the calculated amount of semantic segmentation is reduced, and the accuracy and the integrity of point cloud semantic segmentation results in all scenes are ensured.

Specifically, in step S11, the point cloud data to be detected is subjected to voxelization processing to obtain voxelized point cloud data. The voxelization (voxelization) is a method for dividing a space into a uniform grid-shaped space, discretizes a continuous space, and performs voxelization on point cloud data to be detected to obtain point cloud data consisting of voxels (voxels), that is, voxelization point cloud data, wherein the voxelization point cloud data is voxel data only containing 0 and 1.

Fig. 2 shows a vehicle labeling diagram in an actual application scenario in a preferred embodiment of the present application, and fig. 3 shows three-dimensional space point cloud data and a labeling diagram acquired by a laser radar in a preferred embodiment of the present application. Here, fig. 2 is acquired by a laser radar to obtain fig. 3, the three-dimensional space point cloud data is preferably Kitti point cloud data, the Kitti point cloud data is distributed in a space formed by an x axis (0,70.4), a y axis (-40,40) and a z axis (-3,1), the space is divided into 1408 parts along the x axis, 1600 parts along the y axis and 40 parts along the z axis, and accordingly 1600x1408x40 small spaces with the size of 0.05 long, 0.05 wide and 0.1 high, namely voxels (voxel) are obtained. It should be noted that voxelization is equivalent to custom setting of a resolution, and a user can freely set the resolution according to needs, for example, according to precision needs, or according to the situation of computing resources.

In the above embodiment, an all 0 matrix with a shape of 1600 × 1408 × 40 is first generated. Each data position in the matrix has a one-to-one correspondence with each small space of the point cloud space. And (3) assigning a value to each position in the matrix, and if any point in the voxel (voxel) exists, namely the coordinate of any point in the voxel is in a small space coordinate range corresponding to the voxel, determining that the point cloud exists in the voxel. For example, if the spatial range of a certain voxel is 0.1< x ≦ 0.2, 0.0< y ≦ 0.1, and 0.2< z ≦ 0.3, if the coordinates of any one point fall within the region, it is considered that a point cloud exists in the voxel. The matrix position corresponding to the voxel is 1, otherwise, 0, and by doing so, a three-dimensional voxel data is obtained, where the position of a cloud in the voxel data is 1, and the other positions are 0. Although coordinate information of the point cloud is ignored in the voxelization processing, geometric information of the point cloud is completely reserved, and the shape and the position of the space object in the scene space can be represented, so that calculation is simplified, calculation resource consumption is reduced, and subsequent calculation accuracy is not affected.

Step S12, calculating the voxelized point cloud data using a preset point cloud semantic segmentation model to determine a semantic segmentation result. The preset point cloud semantic segmentation model is preferably a trained three-dimensional (3D) point cloud semantic segmentation model, and the trained 3D point cloud semantic segmentation model is used to calculate the voxelized point cloud data to obtain data corresponding to the point cloud shape, that is, a semantic segmentation result.

And step S13, decoding the semantic segmentation result to determine a decoding result, and calculating the decoding result based on the index of each point of the decoding result in each dimension direction to obtain target point cloud data. Here, the index refers to the number of voxels, for example, if the space is divided into 1600 voxels on the y-axis, the index of the first voxel is 0, the index of the second voxel is 1, and so on. And calculating the decoding result according to the index of each point of the decoding result in each dimension direction to determine the coordinate value range of the voxel space, so as to obtain the target point cloud data.

And step S14, classifying all point cloud data to be detected according to the target point cloud data to determine the category information of all point cloud data to be detected. Here, all point cloud data to be detected are classified according to the coordinate value range of the voxel space corresponding to the target point cloud data, for example, when the coordinate value of the point cloud to be detected is within the coordinate value range of the voxel space corresponding to the target point cloud data, the point cloud data to be detected is classified into the category corresponding to the target point cloud data, and so on, the category information of all point cloud data to be detected is determined.

In a preferred embodiment of the application, before calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model to determine a semantic segmentation result, acquiring training data, and performing voxelization processing on the training data to obtain voxelized training data; and training the point cloud semantic segmentation model by using the voxelized training data to obtain a preset point cloud semantic segmentation model. The preset point cloud semantic segmentation data is preferably a trained 3D point cloud semantic segmentation model, the training data can be Kitti point cloud data, and after the Kitti point cloud data is subjected to voxelization, the 3D point cloud semantic segmentation model is trained by using voxelized Kitti point cloud data to obtain the trained 3D point cloud semantic segmentation model.

In a preferred embodiment of the present application, the voxelized training data is classified to determine a plurality of category information, a label is generated according to the category information, and the voxelized training data is labeled using the label; calculating marked voxelized training data by using a high-resolution network to determine a training feature map; and coding the label to obtain a coded label, and calculating the cross entropy loss of the coded label and the training feature map to train a point cloud semantic segmentation model. Here, the voxelized training data is classified into a plurality of categories to obtain a plurality of category information, and a label is generated from the category information to label the voxelized training data. For example, assuming that only the training data is classified twice, the category information is a vehicle, the point cloud belonging to the vehicle and the point cloud not belonging to the vehicle can be obtained by classifying the training data, and the point cloud label belonging to the vehicle is set to be 1, and the point cloud label not belonging to the vehicle is set to be 0. An all-0 matrix with size data identical to the training data can be pre-generated, if the corresponding voxel has the point cloud belonging to the vehicle, the matrix position corresponding to the vehicle is set to be 1, and the marking processing of the point cloud data belonging to the vehicle is completed.

Next, the labeled voxelized training data is calculated using a high resolution network (HRNet network) to determine a training feature map. Here, HRNet maintains high resolution features throughout. The high-resolution features retain more local features of the point cloud, and the local features are beneficial to judgment of a segmentation boundary; the HRNet multi-scale feature parallel connection structure ensures the simultaneous extraction of the multi-scale features; HRNet is connected with a subnet information exchange mechanism in parallel, so that repeated fusion of multi-scale features is realized; the extraction and fusion of the multi-scale features are beneficial to processing objects with different sizes by a network, and the extraction of global features in different spaces is realized; the global feature is a judgment using a segmentation class. And the final fusion layer fuses different scale features, and the fusion of the local features and the global features is favorable for judging the final segmentation class.

Fig. 4 shows a schematic flow diagram of a high resolution network in a preferred embodiment of the present application, with a size of the voxelized training data of 1600 × 1408 × 40, where the voxelized training data is classified into voxels belonging to vehicles and voxels not belonging to vehicles, where 40 is the dimension in height. The color picture data usually refers to the third dimension as the channel dimension, and thus the training data size is determined to be 1600 × 1408, and the number of channels is 40. The training data is input into an HRNet network to extract features, and depth feature data with the size of 400 x 352 x 128 is obtained through calculation of the HRNet network. With 4 times upsampling, the depth feature data size becomes 1600 × 1408 × 128. The number of channels of the depth feature is adjusted to 80 by one convolutional layer, and the first feature map with the size of 1600 × 1408 × 80 is obtained corresponding to the number of channels 40 of the two classes of voxelized training data. Finally, the first feature map is coded through shape resetting to obtain a feature map with the size of 1600 × 1408 × 40 × 2, and the last dimension can be determined through coding processing of one-hot coding.

After the label is coded, for example, the label is subjected to one-hot coding, cross entropy loss is calculated with the training feature map to train the point cloud semantic segmentation model, for example, the cross entropy loss is optimized by adopting a random gradient descent algorithm, the initial learning rate is set to be 0.001, the learning rate is adjusted to be 0.1 time of the original learning rate every 10 generations, and training is finished for 100 generations.

In a preferred embodiment of the present application, in step S12, the voxelized point cloud data is calculated by using a preset point cloud semantic segmentation model to determine a feature map; and determining a semantic segmentation result according to the feature map. The voxelized point cloud data is calculated by a preset point cloud semantic segmentation model, and then a feature map is determined, for example, the voxelized point cloud data is calculated by a trained 3D point cloud semantic segmentation model to obtain a feature map, and the feature map is calculated by the 3D point cloud semantic segmentation model to obtain a semantic segmentation result.

In a preferred embodiment of the present application, in step S13, the semantic segmentation result is decoded using one-hot encoding to determine a decoding result. Here, the range of the data corresponding to the semantic division result is 0 to 1, and the last dimension of the size is a result obtained after encoding processing by one-hot encoding (one-hot encoding), so that one-hot decoding is performed first. For example, the data size and shape obtained by one-hot decoding of a semantic segmentation result having a size of 1600 × 1408 × 40 × 2 is 1600 × 1408 × 40. It should be noted that there are two possibilities, for example, 0 or 1, for the value of each position of the decoded semantic segmentation result under the binary classification. 0 indicates that the location does not belong to a vehicle, and 1 indicates that the location belongs to a vehicle.

In a preferred embodiment of the present application, in step S13, all the points in the decoding result are screened according to the corresponding numerical value of each point to determine a target point; taking the voxel serial numbers of the target points on all axes as indexes, and generating an index matrix according to the indexes; and calculating the index matrix to obtain target point cloud data. Here, the corresponding numerical value of each point in the decoding result is used to distinguish the category of each point, for example, 1 represents that the point belongs to a vehicle, 0 represents that the point does not belong to the vehicle, all the points in the decoding result are classified according to the numerical value to determine the target point, for example, the target point is a point belonging to the vehicle, and all the points in the decoding result are screened by using the numerical value 1 to determine the point belonging to the vehicle. It should be noted that 0 and 1 are only examples of two categories, and in practical applications, multiple categories may be classified, and different values may be set for each category.

Then, the voxel number of each target point on each axis is used as an index, and the voxel number is a numerical value corresponding to the sequence of the voxel to which the target point belongs on each axis, for example, if the space is divided into 1600 parts on the y axis, the index of the first voxel is 0, the index of the second voxel is 1, and so on, the corresponding point cloud coordinate value can be converted according to the voxel initial position and the coordinate range corresponding to the voxel size. Then, an index matrix is generated according to the index, and the index matrix is calculated to obtain target point cloud data.

In a preferred embodiment of the present application, the size of the decoded point cloud semantic segmentation result is 1600 × 1408 × 40, and all the points with the value of 1 belong to a vehicle, and then indexes of all the points with the value of 1 in each dimension direction are combined into an N × 3 index matrix. This index matrix is shown below:

wherein, the value range of N is [0, 1600 multiplied by 1408 multiplied by 40 ]]，index_yAn index on the y-axis representing a point with a value of 1, ranging from 0 to 1599; index_xAn index on the x-axis representing a point with a value of 1, ranging from 0 to 1407; index_zDenotes the index on the z-axis of a point with a value of 1, ranging from 0 to 39.

x＝(index_x+0.5)*0.05+0

y＝(index_y+0.5)*0.05-40x

z＝(index_z+0.5)*0.1-3

Next, the above three equations are sequentially applied to the N × 3 index matrix to perform calculation to determine the coordinates (x, y, z) of the center point of the target point cloud data. By means of the method, calculation amount of semantic segmentation is reduced, and processing efficiency is improved.

In a preferred embodiment of the present application, in step S14, the target point cloud data includes coordinates of a center point of the target point cloud, and a coordinate threshold of the target point cloud is calculated according to the coordinates of the center point of the target point cloud; and classifying all point cloud data to be detected according to the coordinate threshold value to determine the category information of all point cloud data to be detected. The center point coordinates are center point coordinates of point cloud voxels where all target points are located, a coordinate threshold of the target point cloud is calculated according to all the center point coordinates, for example, the coordinate threshold of the target point cloud is calculated according to the calculated center point coordinates corresponding to all voxels belonging to a vehicle, when the point cloud to be tested is located within the coordinate threshold of the target point cloud, the point cloud to be tested is determined to belong to the vehicle, and the category information is the vehicle.

In a preferred embodiment of the present application, in step S14, screening out point cloud data to be detected in the target point cloud space according to the coordinate threshold; and classifying the point cloud data to be detected in the target point cloud space into target categories. Determining a target point cloud space range according to the coordinate threshold, when the coordinate value of the point cloud data to be detected is within the coordinate threshold, the point cloud data to be detected is within the target point cloud space, then classifying the point cloud data to be detected in the target point cloud space into a target category, for example, if the target category is a vehicle, when the point cloud to be detected is within the coordinate threshold of the target point cloud, determining that the point cloud to be detected belongs to the vehicle, and the category information is the vehicle.

In a preferred embodiment of the present application, whether each point cloud to be tested belongs to a vehicle is calculated according to center point coordinates corresponding to a plurality of voxels belonging to the vehicle. For a certain voxel with the length of 0.05, the width of 0.05 and the height of 0.1, as long as the point cloud to be tested falls in the voxel, namely whether the coordinate value corresponding to the point cloud to be tested falls in the coordinate value range corresponding to the voxel space, if so, the point cloud to be tested is judged to belong to the vehicle. If not, the point cloud to be tested does not belong to the vehicle. After the point clouds to be tested are processed in the way, segmentation categories are marked on all the point clouds to be tested, and the classification of all the point clouds to be tested is completed.

Fig. 5 is a schematic diagram illustrating a framework structure of an apparatus for point cloud semantic segmentation according to another aspect of the present application, wherein the apparatus includes: the data processing module 100 is configured to perform voxelization processing on the point cloud data to be detected to obtain voxelized point cloud data; an identification module 200, configured to calculate the voxelized point cloud data using a preset point cloud semantic segmentation model to determine a semantic segmentation result; the decoding module 300 is configured to decode the semantic segmentation result to determine a decoding result, and calculate the decoding result based on an index of each point of the decoding result in each dimension direction to obtain target point cloud data; and the classification module 400 is configured to classify all point cloud data to be detected according to the target point cloud data to determine category information of all point cloud data to be detected. Therefore, the calculated amount of semantic segmentation is reduced, and the accuracy and the integrity of point cloud semantic segmentation results in all scenes are ensured.

It should be noted that the content executed by the data processing module 100, the identification module 200, the decoding module 300, and the classification module 400 is the same as or corresponding to the content in the above steps S11, S12, S13, and S14, and for brevity, the description is omitted here.

Furthermore, a computer-readable medium is provided, on which computer-readable instructions are stored, and the computer-readable instructions can be executed by a processor to implement the aforementioned method for point cloud semantic segmentation.

According to still another aspect of the present application, there is also provided an apparatus for point cloud semantic segmentation, wherein the apparatus includes:

one or more processors; and

a memory having computer-readable instructions stored thereon that, when executed, cause the processor to perform the operations of one of the methods for point cloud semantic segmentation described previously.

For example, the computer readable instructions, when executed, cause the one or more processors to:

performing voxelization processing on the point cloud data to be detected to obtain voxelized point cloud data; calculating the voxelized point cloud data by using a preset point cloud semantic segmentation model to determine a semantic segmentation result; decoding the semantic segmentation result to determine a decoding result, and calculating the decoding result based on the index of each point of the decoding result in each dimension direction to obtain target point cloud data; and classifying all point cloud data to be detected according to the target point cloud data to determine the category information of all point cloud data to be detected.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method of point cloud semantic segmentation, wherein the method comprises:

2. The method of claim 1, wherein prior to said calculating the voxelized point cloud data using a preset point cloud semantic segmentation model to determine a semantic segmentation result, comprising:

3. The method of claim 2, wherein the training of the point cloud semantic segmentation model using the voxelized training data comprises:

4. The method of claim 1, wherein the computing the voxelized point cloud data using a preset point cloud semantic segmentation model to determine a semantic segmentation result comprises:

and determining a semantic segmentation result according to the feature map.

5. The method of claim 1, wherein the decoding the semantic segmentation result to determine a decoding result comprises:

6. The method of claim 1, wherein the calculating the decoding result based on the index of each point of the decoding result in each dimension direction to obtain the target point cloud data comprises:

and calculating the index matrix to obtain target point cloud data.

7. The method of claim 6, wherein the target point cloud data comprises center point coordinates of a target point cloud, and classifying all point cloud data to be detected according to the target point cloud data to determine category information of all point cloud data to be detected comprises:

8. The method of claim 7, wherein classifying all point cloud data to be detected according to the coordinate threshold to determine category information of all point cloud data to be detected comprises:

9. An apparatus for point cloud semantic segmentation, wherein the apparatus comprises:

10. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 8.

11. An apparatus for point cloud semantic segmentation, wherein the apparatus comprises:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of any of claims 1 to 8.