CN112927234A

CN112927234A - Point cloud semantic segmentation method and device, electronic equipment and readable storage medium

Info

Publication number: CN112927234A
Application number: CN202110213344.0A
Authority: CN
Inventors: 孙飞; 陈永录; 余四洋
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-08

Abstract

The invention provides a point cloud semantic segmentation method, which comprises the following steps: acquiring point cloud data to be detected; acquiring a voxel grid according to point cloud data to be detected; inputting a voxel grid into a point cloud semantic segmentation model deployed in edge equipment, and outputting a first probability matrix, wherein the point cloud semantic segmentation model comprises a first feature extraction network and an interpolation network which are sequentially connected, and the point cloud semantic segmentation model is obtained by compressing a depth model based on a knowledge distillation technology and then training; and the first probability matrix is processed by a classifier to obtain a semantic segmentation result of the point cloud data to be detected. In addition, the disclosure also provides a point cloud semantic segmentation device, an electronic device, a readable storage medium and a computer program product. The point cloud semantic segmentation method and device can be used in the field of artificial intelligence or other fields.

Description

Point cloud semantic segmentation method and device, electronic equipment and readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, in particular to the field of deep neural network model compression, and more particularly to a point cloud semantic segmentation method, a point cloud semantic segmentation apparatus, an electronic device, a readable storage medium, and a computer program product.

Background

In computer vision, the semantic segmentation task completes the classification of the image voxel level, which is a basic problem in computer vision, for example, in a geographic information system, a river, a road, a building and the like need to be labeled according to the semantic segmentation result.

In the related art, feature extraction of a classical semantic segmentation model, such as a full convolution neural network, or an improved lightweight semantic segmentation model, such as an ENet, a CGNet, or the like, is realized by using a deep convolution network. When the target is identified as three-dimensional point cloud data, a larger-scale deep convolutional network is required, so that the requirements on computing resources and storage space during deployment are large. In the process of realizing the method, the point cloud semantic segmentation model in the related technology is large in scale and difficult to operate on edge equipment.

Disclosure of Invention

In view of the above, the present disclosure provides a point cloud semantic segmentation method, a point cloud semantic segmentation apparatus, an electronic device, a readable storage medium, and a computer program product.

One aspect of the present disclosure provides a point cloud semantic segmentation method, including: acquiring point cloud data to be detected; acquiring a voxel grid according to the point cloud data to be detected; inputting the voxel grid into a point cloud semantic segmentation model deployed in edge equipment, and outputting a first probability matrix, wherein the point cloud semantic segmentation model comprises a first feature extraction network and an interpolation network which are sequentially connected, and the point cloud semantic segmentation model is obtained by compressing a depth model based on a knowledge distillation technology and then training; and obtaining a semantic segmentation result of the point cloud data to be detected by the first probability matrix through a classifier.

According to an embodiment of the present disclosure, the acquiring the voxel grid according to the point cloud data to be detected includes: inputting the point cloud data to be detected into a three-dimensional grid map with fixed resolution; for each grid in the three-dimensional grid map, replacing other point clouds in the grid with the centroids of all the point clouds in the grid to obtain the voxel of the grid; and traversing all grids in the three-dimensional grid map to obtain a voxel grid.

According to an embodiment of the present disclosure, the inputting the above-mentioned voxel grid into a point cloud semantic segmentation model deployed in an edge device, and the outputting the first probability matrix includes: inputting the voxel grid into the first feature extraction network to obtain a first feature matrix; inputting the first characteristic matrix into the interpolation network, and acquiring a classification probability vector of each voxel in a voxel grid in an up-sampling mode; and generating the first probability matrix based on the classification probability vector of each voxel.

According to an embodiment of the present disclosure, the depth model includes a second feature extraction network and an interpolation network connected in sequence, wherein a network size of the second feature extraction network is larger than a network size of the first feature extraction network.

According to the embodiment of the disclosure, the point cloud semantic segmentation model is obtained by training a depth model after compressing the depth model based on a knowledge distillation technology, and the training comprises the following steps: acquiring sample point cloud data, and acquiring a sample voxel grid based on the sample point cloud data; training the depth model to obtain a first model; inputting the sample voxel grid into the first model to obtain a second feature matrix and a second probability matrix output by the first model; and using the second feature matrix and the second probability matrix as labels of the sample voxel grids, training an initial point cloud semantic segmentation model based on the sample voxel grids, and obtaining the point cloud semantic segmentation model.

According to an embodiment of the present disclosure, the training an initial point cloud semantic segmentation model based on the sample voxel grid using the second feature matrix and the second probability matrix as labels of the sample voxel grid, and the obtaining the point cloud semantic segmentation model includes: using the second feature matrix as a label of the sample voxel grid, and training a first feature extraction network of the initial point cloud semantic segmentation model based on the sample voxel grid; and training a second model based on the sample voxel grid by using the second probability matrix as a label of the sample voxel grid to obtain the point cloud semantic segmentation model, wherein the second model comprises a trained first feature extraction network of the initial point cloud semantic segmentation model and an interpolation network of the initial point cloud semantic segmentation model.

According to an embodiment of the present disclosure, obtaining the semantic segmentation result of the point cloud data to be detected by passing the first probability matrix through a classifier includes: inputting the first probability matrix into a classifier to obtain the class of each voxel in the voxel network; and acquiring a semantic segmentation result of the point cloud data to be detected based on the category of each voxel in the voxel network.

Another aspect of the disclosure provides a point cloud semantic segmentation apparatus, which includes a data acquisition module, a data preprocessing module, an execution module, and a classification module.

The data acquisition module is used for acquiring point cloud data to be detected; the data preprocessing module is used for acquiring a voxel grid according to the point cloud data to be detected; the system comprises an execution module, a data acquisition module and a data processing module, wherein the execution module is used for inputting point cloud data to be detected of the voxel grid into a point cloud semantic segmentation model deployed in edge equipment and outputting a first probability matrix, the point cloud semantic segmentation model comprises a first feature extraction network and an interpolation network which are sequentially connected, and the point cloud semantic segmentation model is obtained by compressing a depth model based on a knowledge distillation technology and then training; and the classification module is used for enabling the first probability matrix to pass through a classifier to obtain a semantic segmentation result of the point cloud data to be detected.

According to an embodiment of the present disclosure, the data preprocessing module includes: the device comprises a first processing unit, a second processing unit and a third processing unit.

Wherein the first processing unit: the system is used for inputting the point cloud data to be detected into a three-dimensional grid map with fixed resolution; a second processing unit: for each grid in the three-dimensional grid map, replacing the centroids of all point clouds in the grid with other point clouds in the grid to obtain voxels of the grid; and a third processing unit, configured to traverse all grids in the three-dimensional grid map to obtain a voxel grid.

According to an embodiment of the present disclosure, the execution module includes: the device comprises a first execution unit, a second execution unit and a third execution unit.

The first execution unit is used for inputting the voxel grid into the first feature extraction network to obtain a first feature matrix; a second execution unit, configured to input the first feature matrix into the interpolation network, and obtain a classification probability vector of each voxel in a voxel grid in an upsampling manner; and a third execution unit configured to generate the first probability matrix based on the classification probability vector of each voxel.

According to an embodiment of the present disclosure, the executing module further includes: the device comprises a sample acquisition unit, a first training unit, a fourth execution unit and a second training unit.

The system comprises a sample acquisition unit, a data acquisition unit and a data processing unit, wherein the sample acquisition unit is used for acquiring sample point cloud data and acquiring a sample voxel grid based on the sample point cloud data; the first training unit is used for training the depth model to obtain a first model; a fourth execution unit, configured to input the sample voxel grid into the first model to obtain a second feature matrix and a second probability matrix output by the first model; and a second training unit, configured to train an initial point cloud semantic segmentation model based on the sample voxel grid using the second feature matrix and the second probability matrix as labels of the sample voxel grid, and obtain the point cloud semantic segmentation model.

According to an embodiment of the present disclosure, the second training unit includes: a first training subunit and a second training subunit.

The first training subunit is configured to train a first feature extraction network of the initial point cloud semantic segmentation model based on the sample voxel grid by using the second feature matrix as a label of the sample voxel grid; and a second training subunit, configured to train a second model based on the sample voxel grid using the second probability matrix as a label of the sample voxel grid, to obtain the point cloud semantic segmentation model, where the second model includes a trained first feature extraction network of the initial point cloud semantic segmentation model and an interpolation network of the initial point cloud semantic segmentation model.

According to an embodiment of the present disclosure, the classification module includes: a first classification unit and a second classification unit.

The first classification unit is used for inputting the first probability matrix into a classifier to obtain the class of each voxel in the voxel network; and the second classification unit is used for acquiring a semantic segmentation result of the point cloud data to be detected based on the category of each voxel in the voxel network.

Another aspect of the present disclosure provides an electronic device including: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.

Another aspect of the present disclosure provides a readable storage medium storing computer-executable instructions for implementing the method as described above when executed.

Another aspect of the disclosure provides a computer program product comprising computer executable instructions for implementing the method as described above when executed.

According to the embodiment of the disclosure, a voxel grid is obtained according to point cloud data to be detected; inputting the voxel grid into a point cloud semantic segmentation model deployed in edge equipment, and outputting a first probability matrix, wherein the point cloud semantic segmentation model comprises a first feature extraction network and an interpolation network which are sequentially connected, and the point cloud semantic segmentation model is obtained by compressing a depth model based on a knowledge distillation technology and then training; and obtaining a semantic segmentation result of the point cloud data to be detected by the first probability matrix through a classifier. The voxelized point cloud data is input into the point cloud semantic segmentation model, and the semantic segmentation result of the point cloud data is obtained based on the output of the point cloud semantic segmentation model. By adopting the knowledge distillation method, the low-precision loss compression of the point cloud semantic segmentation model is realized, the computing resources and the storage space required by the deployment of the point cloud semantic segmentation model are further reduced, and the deployment of the point cloud semantic segmentation model on edge equipment is realized.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates a schematic diagram of an application scenario of a point cloud semantic segmentation method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a point cloud semantic segmentation model training method 200 according to an embodiment of the present disclosure;

3a, 3b, and 3c schematically illustrate schematic diagrams of point cloud data structures according to embodiments of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of a point cloud semantic segmentation model training process according to an embodiment of the present disclosure;

fig. 5 schematically illustrates a schematic diagram of a point cloud semantic segmentation apparatus 500 according to an embodiment of the present disclosure;

fig. 6 schematically illustrates a block diagram of an electronic device 600 suitable for implementing a point cloud semantic segmentation method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The knowledge distillation refers to that a large network (teacher model) is used for guiding a small network (student model) to train, the teacher model is usually large but high in precision, the student model is small but low in precision, and the teacher model guides the student model to train so that the student model can obtain the precision of the teacher model on the basis of keeping the size of the original model. Based on the idea of knowledge distillation, the teacher model and the student model can be trained in equipment with abundant computing resources and storage resources, such as a cloud computing center, and the edge equipment can realize semantic segmentation of complex data by the edge equipment only by having resources for operating the student model.

In the related art, for example, deep feature learning of a point set, a network structure thereof may be divided into two parts, including a feature extraction network for acquiring global features of a point cloud and an interpolation network for acquiring information of each voxel. The feature extraction network is used for completing mapping of the point cloud from a three-dimensional space to an embedding space, so that the finally extracted global features comprise the overall information of the point cloud; the interpolation network performs up-sampling based on the global features to complete the acquisition of each voxel feature, and then obtains the result of semantic segmentation. The feature extraction network occupies most of the storage space and the calculation amount in the whole network, and based on the idea of knowledge distillation, the student model can be obtained by replacing the feature extraction network in the original network structure, so that the effect of model compression is achieved.

Embodiments of the present disclosure provide a communication method for each of a plurality of robots and a robot to which the method can be applied. The method comprises an identity identification process and an information transceiving process. In the identification process, each robot sends identification signals for identifying the robot at different time slots and receives identification signals from other robots. Based on the received identification signals, the plurality of robots can recognize each other. After the identification is completed, an information transceiving process is entered, and the plurality of robots can communicate with each other according to a predetermined rule.

Fig. 1 schematically illustrates a schematic diagram of an application scenario of a point cloud semantic segmentation method according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in FIG. 1, a system architecture 100 according to this embodiment may include a server 101, a network 102, a telemetry satellite 103, and a detection target 104.

Network 102 is the medium used to provide a communication link between server 101 and telemetry satellite 103. It should be noted that network 102 may include various types of connections including, but not limited to, wireless communication links, etc. as shown in the figures.

The remote sensing satellite 103 is provided with electronic devices having a 3D scanning function, such as a laser radar, a stereo camera, a depth camera, and the like, which acquire point cloud data of the detection target 104 by scanning the detection target 104 a plurality of times. Meanwhile, the remote sensing satellite 103 is equipped with an electronic device with data storage and processing functions, such as a satellite upper computer and the like, and can be used for processing point cloud data.

The server 101 may be a server or a server cluster with rich computing resources, and is configured to complete training of a point cloud semantic segmentation model according to point cloud data acquired by the remote sensing satellite 103.

The point cloud semantic segmentation method of the embodiment of the disclosure can be generally executed by the remote sensing satellite 103, so as to process the point cloud data in real time and reduce the occupation of uplink and downlink resources of the satellite. On the other hand, the point cloud semantic segmentation method of the embodiment of the present disclosure may also be executed by the server 101, or may be executed by another server or a server cluster capable of communicating with the server 104.

It should be noted that the number of servers and networks in fig. 1 is merely illustrative, and there may be any number of servers and networks, as desired for implementation. The point cloud semantic segmentation method can also be applied to the field of unmanned driving and is used for classifying the peripheral information of the automobile in real time; meanwhile, the method can be applied to the field of medical treatment, particularly the field of medical image analysis, and is used for assisting a doctor to distinguish normal tissues from pathological tissues.

Fig. 2 schematically illustrates a flow chart of a point cloud semantic segmentation model training method 200 according to an embodiment of the present disclosure.

As shown in FIG. 2, the method 200 includes operations S210-S240.

In operation S210, point cloud data to be detected is acquired.

According to the embodiment of the disclosure, point cloud data to be detected can be acquired by a 3D camera device. The point cloud data is represented as a set of points in a three-dimensional space, and may represent information such as the shape and color of an object.

In operation S220, a voxel grid is obtained according to the point cloud data to be detected.

According to an embodiment of the present disclosure, a voxel is employed as a minimum unit for detection of a three-dimensional image.

In operation S230, a voxel grid is input into a point cloud semantic segmentation model deployed in an edge device, and a first probability matrix is output.

According to the embodiment of the disclosure, the point cloud semantic segmentation model comprises a first feature extraction network and an interpolation network which are sequentially connected, and the point cloud semantic segmentation model is obtained by compressing and then training a depth model based on a knowledge distillation technology.

According to an embodiment of the present disclosure, the edge device includes a device at a user end, such as a mobile phone, a computer, a VR device, and the like, and may also include a base station, a server, and the like, which establish direct communication with the device at the user end.

In operation S240, the first probability matrix is processed by a classifier to obtain a semantic segmentation result of the point cloud data to be detected.

According to an embodiment of the present disclosure, the classifier may be a softmax classifier, and is configured to calculate, according to the probability vector of each voxel, a class corresponding to the voxel. For example, the probability vector of a voxel is [0.01, 0.02, 0.95, 0.02], and the voxel can be determined to belong to the 3 rd class by the classifier.

According to the embodiment of the disclosure, the voxelized point cloud data is input into the point cloud semantic segmentation model, and the semantic segmentation result of the point cloud data is obtained based on the output of the point cloud semantic segmentation model. Due to the adoption of the knowledge distillation method, the low-precision loss compression of the point cloud semantic segmentation model is realized, the computing resources and the storage space required by the deployment of the point cloud semantic segmentation model are further reduced, and the deployment of the point cloud semantic segmentation model on edge equipment is realized.

The method shown in fig. 2 is further explained with reference to fig. 3a, 3b, 3c, 4 in conjunction with specific embodiments.

Fig. 3a, 3b, and 3c schematically illustrate schematic diagrams of point cloud data structures according to embodiments of the present disclosure.

Fig. 3a shows a multi-view picture of an object, which is a collection of two-dimensional images taken from the object by virtual cameras of different views. Multi-view pictures typically require the use of a relatively large number of pictures to construct a complete three-dimensional model.

The multi-view picture is placed in a three-dimensional coordinate system, and the appearance shape of the object is represented by a set of points, so that the point cloud image shown in fig. 3b can be obtained. The point cloud is essentially a set of three-dimensional points, which are obtained by discretizing a continuous object image. The point cloud may contain information on the shape, location, color, etc. of the object.

Due to the fact that the number of the points in the point cloud is large, if the point cloud data are directly used as input, the scale of the established model is large, and the model is prone to being over-fitted during training. Therefore, when point cloud data is processed, it is often represented as a set of voxels in three-dimensional space, as shown in fig. 3 c. In an embodiment of the present disclosure, the voxelized point cloud data is employed as an input to the model.

According to the embodiment of the disclosure, the process of voxelizing point cloud data to be detected comprises the following steps: inputting point cloud data to be detected into a three-dimensional grid map with fixed resolution; for each grid in the three-dimensional grid map, replacing other point clouds in the grid by using the centroids of all the point clouds in the grid to obtain the voxels of the grid; and traversing all grids in the three-dimensional grid map to obtain the voxel grid.

According to the embodiment of the disclosure, the point cloud data after voxelization has less data volume, so that the dimensionality of input data is effectively reduced, and the consumption of resources during model deployment is reduced.

It is noted that the method of voxelization of the point cloud data is not limited to the method, and for example, for the point clouds in each grid, the geometric centers of all the point clouds may be used instead of the other point clouds.

Fig. 4 schematically illustrates a schematic diagram of a point cloud semantic segmentation model training process according to an embodiment of the present disclosure.

As shown in fig. 4, the point cloud semantic segmentation model is obtained by training the initial semantic segmentation model with the trained depth model as a teacher model.

According to an embodiment of the present disclosure, the depth model includes a second feature extraction network and an interpolation network connected in sequence, wherein a network scale of the second feature extraction network is larger than a network scale of the first feature extraction network.

The specific training process is as follows:

firstly, sample point cloud data is obtained, and a sample voxel grid is obtained based on the sample point cloud data.

According to an embodiment of the present disclosure, the sample point cloud data and the sample voxel grid may be obtained by the correlation method described with reference to fig. 3a to 3 c.

Then, the depth model is trained to obtain a first model.

According to the embodiment of the disclosure, the depth model has a complex structure, and more computing resources and storage space are required to be occupied when the depth model is deployed. Based on the idea of knowledge distillation, the first model obtained by training the depth model is a teacher model.

And then, inputting the sample voxel grid into the first model to obtain a second feature matrix and a second probability matrix output by the first model.

According to the embodiment of the disclosure, the sample voxel grid is input into a second feature extraction network in the first model, so that a second feature matrix can be obtained; inputting the second feature matrix into an interpolation network of the first model, wherein the network can obtain the classification probability vector of each sample voxel in a sample voxel grid by an upsampling method; finally, the classification probability vector of each sample voxel is placed into a vector space, and a second probability matrix can be generated.

And finally, using the second characteristic matrix and the second probability matrix as labels of the sample voxel grid, training an initial point cloud semantic segmentation model based on the sample voxel grid, and obtaining the point cloud semantic segmentation model.

According to the embodiment of the disclosure, in the process of training the initial point cloud semantic segmentation model, the real label of the sample voxel grid can be used for supervising the training process. During training, a step-by-step training method is adopted, firstly, a first feature extraction network of the initial point cloud semantic segmentation model is trained, and then an interpolation network of the initial point cloud semantic segmentation model is loaded for secondary training.

According to the embodiment of the disclosure, when a first feature extraction network of an initial point cloud semantic segmentation model is trained, a sample voxel grid is used as a training sample, a real label of the sample voxel grid is used as a first label, a second feature matrix is used as a second label, a random gradient descent method and the like can be adopted for iteration of model parameters, super parameters such as learning rate and training times during model training can be adjusted according to an actual training effect, and the training effect of the model can be represented by a loss value. The training can use the loss function in the related art, such as the L2 loss function shown in formula 1, or can use a composite loss function, as shown in formulas 2 and 3:

L＝||P_t-P_s||² (1)

L＝μL_hard+(1-μ)L_soft(P_s-P_t) (2)

L_soft＝-∑P_tlog P_s (3)

in formulas 1-3, L represents a loss function adopted during actual training; p_tThe output representing the teacher model, i.e., the second label; p_sThe output of the student model, in this embodiment, the output matrix of the first feature extraction network after this training is completed, is represented.

It should be noted that in P_tAnd P_sIf the dimensions of the first feature extraction network are different, the output matrix of the first feature extraction network is processed by using a normalization method so as to be aligned with the dimensions of the second feature matrix.

According to the embodiment of the disclosure, when an interpolation network of an initial point cloud semantic segmentation model is trained, the interpolation network is loaded to the rear end of a trained first feature extraction network to form a second model. When the second model is trained, the sample voxel grid is used as a training sample, the real label of the sample voxel grid is used as a first label, the second probability matrix is used as a second label, and the setting of other training parameters can be similar to that when the first feature extraction network is trained.

It should be noted that, when the second model is trained, the trained first feature extraction network is trained again, and particularly when the first feature extraction network has a function of avoiding gradient dispersion or gradient explosion, in order to avoid over-training, a smaller learning rate may be set for the first feature extraction network during the second training.

According to the embodiment of the disclosure, the point cloud data shown in fig. 4 is input into the point cloud semantic segmentation model, a first probability matrix corresponding to the point cloud data can be obtained, and the process of obtaining the semantic segmentation result according to the first probability matrix includes: inputting the first probability matrix into a classifier to obtain the class of each voxel in the voxel network; and acquiring a semantic segmentation result of the point cloud data to be detected based on the category of each voxel in the voxel network.

Fig. 5 schematically illustrates a schematic diagram of a point cloud semantic segmentation apparatus 500 according to an embodiment of the present disclosure.

As shown in fig. 5, the point cloud semantic segmentation apparatus 500 includes a data acquisition module 510, a data preprocessing module 520, an execution module 530, and a classification module 540. Wherein:

and a data obtaining module 510, configured to obtain point cloud data to be detected.

And the data preprocessing module 520 is used for acquiring the voxel grid according to the point cloud data to be detected.

The executing module 530 is used for inputting point cloud data to be detected of the voxel grid into a point cloud semantic segmentation model deployed in edge equipment and outputting a first probability matrix, wherein the point cloud semantic segmentation model comprises a first feature extraction network and an interpolation network which are sequentially connected, and the point cloud semantic segmentation model is obtained by compressing and then training a depth model based on a knowledge distillation technology;

and the classification module 540 is configured to pass the first probability matrix through a classifier to obtain a semantic segmentation result of the point cloud data to be detected.

According to an embodiment of the present disclosure, the data preprocessing module 520 includes: the device comprises a first processing unit, a second processing unit and a third processing unit.

Wherein the first processing unit: the system comprises a three-dimensional grid map, a point cloud detection unit, a data acquisition unit and a data processing unit, wherein the three-dimensional grid map is used for inputting point cloud data to be detected into the three-dimensional grid map with fixed resolution; a second processing unit: the method comprises the steps of replacing the centroids of all point clouds in a grid with other point clouds in the grid to obtain voxels of the grid for each grid in a three-dimensional grid map; and the third processing unit is used for traversing all grids in the three-dimensional grid map to obtain the voxel grid.

According to an embodiment of the present disclosure, the execution module 530 includes: the device comprises a first execution unit, a second execution unit and a third execution unit.

The first execution unit is used for inputting the voxel grid into a first feature extraction network to obtain a first feature matrix; the second execution unit is used for inputting the first characteristic matrix into the interpolation network and acquiring the classification probability vector of each voxel in the voxel grid in an up-sampling mode; and a third execution unit for generating a first probability matrix based on the classification probability vector for each voxel.

According to an embodiment of the present disclosure, the executing module 530 further includes: the device comprises a sample acquisition unit, a first training unit, a fourth execution unit and a second training unit.

The system comprises a sample acquisition unit, a data acquisition unit and a data processing unit, wherein the sample acquisition unit is used for acquiring sample point cloud data and acquiring a sample voxel grid based on the sample point cloud data; the first training unit is used for training the depth model to obtain a first model; the fourth execution unit is used for inputting the sample voxel grid into the first model to obtain a second feature matrix and a second probability matrix output by the first model; and the second training unit is used for training the initial point cloud semantic segmentation model based on the sample voxel grid by using the second characteristic matrix and the second probability matrix as labels of the sample voxel grid to obtain the point cloud semantic segmentation model.

The first training subunit is used for training a first feature extraction network of the initial point cloud semantic segmentation model based on the sample voxel grid by using the second feature matrix as a label of the sample voxel grid; and the second training subunit is used for training a second model based on the sample voxel grid by using the second probability matrix as a label of the sample voxel grid to obtain a point cloud semantic segmentation model, wherein the second model comprises a trained first feature extraction network of the initial point cloud semantic segmentation model and an interpolation network of the initial point cloud semantic segmentation model.

According to an embodiment of the present disclosure, the classification module 540 includes: a first classification unit and a second classification unit.

The first classification unit is used for inputting the first probability matrix into the classifier to obtain the class of each voxel in the voxel network; and the second classification unit is used for acquiring a semantic segmentation result of the point cloud data to be detected based on the category of each voxel in the voxel network.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

Fig. 6 schematically illustrates a block diagram of an electronic device 600 suitable for implementing a point cloud semantic segmentation method according to an embodiment of the present disclosure. The electronic device 600 shown in fig. 6 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.

As shown in fig. 6, an electronic device 600 according to an embodiment of the present disclosure includes a processor 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. Processor 601 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 601 may also include onboard memory for caching purposes. Processor 601 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the disclosure.

In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are stored. The processor 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. The processor 601 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 602 and/or RAM 603. It is to be noted that the programs may also be stored in one or more memories other than the ROM 602 and RAM 603. The processor 601 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 600 may also include input/output (I/O) interface 605, input/output (I/O) interface 605 also connected to bus 604, according to an embodiment of the disclosure. The system 600 may also include one or more of the following components connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program, when executed by the processor 601, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 602 and/or RAM 603 described above and/or one or more memories other than the ROM 602 and RAM 603.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, the program code being configured to cause the electronic device to implement the point cloud semantic segmentation method provided by the embodiments of the present disclosure.

The computer program, when executed by the processor 601, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, downloaded and installed through the communication section 609, and/or installed from the removable medium 611. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

It should be noted that the point cloud semantic segmentation method and apparatus provided by the embodiments of the present disclosure may be used in the field of artificial intelligence, specifically, for example, the field of unmanned driving, and may also be used in any field other than the financial field, for example, the field of edge calculation. The application fields of the point cloud semantic segmentation method and device provided by the embodiment of the disclosure are not limited.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A point cloud semantic segmentation method comprises the following steps:

acquiring point cloud data to be detected;

acquiring a voxel grid according to the point cloud data to be detected;

inputting the voxel grid into a point cloud semantic segmentation model deployed in edge equipment, and outputting a first probability matrix, wherein the point cloud semantic segmentation model comprises a first feature extraction network and an interpolation network which are sequentially connected, and the point cloud semantic segmentation model is obtained by compressing a depth model based on a knowledge distillation technology and then training; and

and the first probability matrix is subjected to a classifier to obtain a semantic segmentation result of the point cloud data to be detected.

2. The method of claim 1, wherein the obtaining a voxel grid from the point cloud data to be detected comprises:

inputting the point cloud data to be detected into a three-dimensional grid map with a fixed resolution;

for each grid in the three-dimensional grid map, replacing other point clouds in the grid with the centroids of all the point clouds in the grid to obtain the voxels of the grid; and

and traversing all grids in the three-dimensional grid map to obtain a voxel grid.

3. The method of claim 1, wherein the inputting the voxel grid into a point cloud semantic segmentation model deployed in an edge device, outputting a first probability matrix comprises:

inputting the voxel grid into the first feature extraction network to obtain a first feature matrix;

inputting the first feature matrix into the interpolation network, and acquiring a classification probability vector of each voxel in a voxel grid in an up-sampling mode; and

generating the first probability matrix based on the classification probability vector for each voxel.

4. The method of claim 1, wherein:

the depth model comprises a second feature extraction network and an interpolation network which are connected in sequence, wherein the network scale of the second feature extraction network is larger than that of the first feature extraction network.

5. The method of claim 4, wherein the point cloud semantic segmentation model based on knowledge distillation techniques to perform post-compression training on the depth model comprises:

acquiring sample point cloud data, and acquiring a sample voxel grid based on the sample point cloud data;

training the depth model to obtain a first model;

inputting the sample voxel grid into the first model to obtain a second feature matrix and a second probability matrix output by the first model; and

and using the second feature matrix and the second probability matrix as labels of the sample voxel grids, training an initial point cloud semantic segmentation model based on the sample voxel grids, and obtaining the point cloud semantic segmentation model.

6. The method of claim 5, wherein the training an initial point cloud semantic segmentation model based on the sample voxel grid using the second feature matrix and the second probability matrix as labels for the sample voxel grid, the obtaining a point cloud semantic segmentation model comprises:

training a first feature extraction network of the initial point cloud semantic segmentation model based on the sample voxel grid using the second feature matrix as a label for the sample voxel grid; and

and training a second model based on the sample voxel grid by using the second probability matrix as a label of the sample voxel grid to obtain the point cloud semantic segmentation model, wherein the second model comprises a trained first feature extraction network of the initial point cloud semantic segmentation model and an interpolation network of the initial point cloud semantic segmentation model.

7. The method according to claim 1, wherein the subjecting the first probability matrix to a classifier to obtain a semantic segmentation result of the point cloud data to be detected comprises:

inputting the first probability matrix into a classifier to obtain the class of each voxel in the voxel network; and

and acquiring a semantic segmentation result of the point cloud data to be detected based on the category of each voxel in the voxel network.

8. A point cloud semantic segmentation apparatus, comprising:

the data acquisition module is used for acquiring point cloud data to be detected;

the data preprocessing module is used for acquiring a voxel grid according to the point cloud data to be detected;

the system comprises an execution module, a data acquisition module and a data processing module, wherein the execution module is used for inputting point cloud data to be detected of the voxel grid into a point cloud semantic segmentation model deployed in edge equipment and outputting a first probability matrix, the point cloud semantic segmentation model comprises a first feature extraction network and an interpolation network which are sequentially connected, and the point cloud semantic segmentation model is obtained by compressing a depth model based on a knowledge distillation technology and then training; and

and the classification module is used for enabling the first probability matrix to pass through a classifier to obtain a semantic segmentation result of the point cloud data to be detected.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 7.

11. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.