WO2024040954A1

WO2024040954A1 - Point cloud semantic segmentation network training method, and point cloud semantic segmentation method and apparatus

Info

Publication number: WO2024040954A1
Application number: PCT/CN2023/082749
Authority: WO
Inventors: 温欣
Original assignee: 北京京东乾石科技有限公司
Priority date: 2022-08-24
Filing date: 2023-03-21
Publication date: 2024-02-29
Also published as: CN115375899A

Abstract

Provided in the present disclosure are a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and apparatus, an electronic device, and a storage medium, applicable in the technical field of artificial intelligence. The method comprises: respectively mapping a plurality of groups of point cloud data into an initial view to obtain a plurality of surround-view projection images; on the basis of a preset size, respectively partitioning a first surround-view projection image and a second surround-view projection image to obtain a plurality of first partition images and a plurality of second partition images; determining a plurality of first target partition images from the plurality of first partition images; replacing second target partition images in the second surround-view projection images by using each of the plurality of first target partition images to obtain a mixed projection image; and training an initial network by using the first surround-view projection image and the mixed projection image as training samples to obtain a point cloud semantic segmentation network.

Description

Point cloud semantic segmentation network training method, point cloud semantic segmentation method and device

This application claims priority from Chinese Patent Application No. 202211022552.3 submitted on August 24, 2022, the content of which is hereby incorporated by reference.

Technical field

The present disclosure relates to the field of artificial intelligence technology, and more specifically, to a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, a device, an electronic device and a storage medium.

Background technique

With the development of three-dimensional sensing technology, point cloud data is widely used in many fields such as autonomous driving and robot grabbing. As a mainstream solution for point cloud data analysis, deep learning technology has shown good performance in point cloud data processing. Since point cloud data collected through various sensors are usually unlabeled data, and the cost of manually labeling data is high, semi-supervised training is usually used to build deep neural networks in related technologies.

In related technologies, research on semi-supervised training algorithms to improve semantic segmentation tasks is mainly concentrated in the field of two-dimensional images. When this method is directly applied to the segmentation task of three-dimensional point clouds, it will cause the problem of three-dimensional shape distortion, which indirectly leads to point cloud data The semantic segmentation effect is poor.

Contents of the invention

In view of this, the present disclosure provides a point cloud semantic segmentation network training method, point cloud semantic segmentation method, device, electronic device, readable storage medium and computer program product.

One aspect of the present disclosure provides a point cloud semantic segmentation network training method, including: mapping multiple sets of point cloud data to initial views to obtain multiple surround projections; based on the preset size, respectively mapping the first surround projection The image and the second surround projection image are partitioned to obtain multiple first partition images and multiple second partition images, wherein the above-mentioned first surround view projection image and the above-mentioned second surround view projection image belong to multiple of the above-mentioned surround view projection images; from Determine a plurality of first target partition maps among a plurality of the above-mentioned first partition maps; utilize the plurality of the above-mentioned first target partition maps. Each of the above-mentioned first target partition maps replaces the second target partition map in the above-mentioned second surrounding projection image to obtain a hybrid projection image, wherein the above-mentioned second target partition map belongs to multiple above-mentioned second partition maps, and the above-mentioned third target partition map The position of the first target partition map is the same as the above-mentioned second target partition map; and the above-mentioned first surround projection image and the above-mentioned hybrid projection image are used as training samples to train the initial network to obtain a point cloud semantic segmentation network.

According to an embodiment of the present disclosure, the above-mentioned first surround projection image and the above-mentioned hybrid projection image are used as training samples to train the initial network, and the point cloud semantic segmentation network is obtained, which includes: separately using the above-mentioned first surround projection image and the above-mentioned hybrid projection image. The projection map is input into the above-mentioned initial network to obtain the first feature map and the first segmentation result corresponding to the above-mentioned first surround projection map, and the second feature map and the second segmentation result corresponding to the above-mentioned mixed projection map; calculate the above-mentioned first The information entropy loss between the feature map and the above-mentioned second feature map is used to obtain the first loss value; the cross-entropy loss between the above-mentioned first segmentation result and the above-mentioned second segmentation result is calculated to obtain the second loss value; and using the above-mentioned third loss value The first loss value and the second loss value are used to adjust the model parameters of the above-mentioned initial network to finally obtain the above-mentioned point cloud data semantic segmentation network.

According to an embodiment of the present disclosure, the above-mentioned calculation of the information entropy loss between the above-mentioned first feature map and the above-mentioned second feature map to obtain the first loss value includes: determining from the above-mentioned first feature map the information associated with a plurality of the above-mentioned first targets. A first sub-feature map related to the partition map; split the above-mentioned second feature map into a second sub-feature map related to a plurality of the above-mentioned first target partition maps and a third sub-character map unrelated to the plurality of the above-mentioned first target partition maps feature map; and when the confidence probability of the above-mentioned first sub-feature map is greater than the preset threshold, using the above-mentioned first sub-feature map and the above-mentioned second sub-feature map as a positive sample pair, using the above-mentioned first sub-feature map and the above-mentioned The third sub-feature map is used as a negative sample pair, and the information entropy loss between the above-mentioned positive sample pair and the above-mentioned negative sample pair is calculated to obtain the above-mentioned first loss value.

According to an embodiment of the present disclosure, the above-mentioned calculation of the cross-entropy loss between the above-mentioned first segmentation result and the above-mentioned second segmentation result to obtain the second loss value includes: determining from the above-mentioned first segmentation result a number of the above-mentioned first targets. A first sub-segmentation result related to the partition map; determining a second sub-segmentation result related to a plurality of the above-mentioned first target partition maps from the above-mentioned second segmentation result; based on the confidence probability of the above-mentioned first sub-segmentation result and the above-mentioned second sub-segmentation result split result Confidence probability, determine the predicted value and label value; and calculate the cross-entropy loss between the above-mentioned predicted value and the above-mentioned label value, to obtain the above-mentioned second loss value.

According to an embodiment of the present disclosure, determining the prediction value and the label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result includes: when the confidence probability of the first sub-segmentation result is greater than the above-mentioned In the case of the confidence probability of the second sub-segmentation result, it is determined that the above-mentioned first sub-segmentation result is the above-mentioned label value, and the above-mentioned second sub-segmentation result is the above-mentioned predicted value; and when the confidence probability of the above-mentioned first sub-segmentation result is less than the above-mentioned second sub-segmentation result In the case of the confidence probability of the sub-segmentation results, it is determined that the above-mentioned first sub-segmentation result is the above-mentioned predicted value, and the above-mentioned second sub-segmentation result is the above-mentioned label value.

According to an embodiment of the present disclosure, the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label; wherein the above-mentioned calculation is between the first segmentation result and the second segmentation result. cross-entropy loss to obtain a second loss value, including: determining a third sub-segmentation result related to the above-mentioned third target partition map from the above-mentioned first segmentation result, and having nothing to do with the above-mentioned third target partition map and related to multiple of the above-mentioned The fourth sub-segmentation result related to the first target partition map; the fifth sub-segmentation result related to the above-mentioned third target partition map is determined from the above-mentioned second segmentation result, and has nothing to do with the above-mentioned third target partition map and is related to multiple of the above-mentioned The sixth sub-segmentation result related to the first target partition map; calculate the cross-entropy loss between the above-mentioned third sub-segmentation result and the above-mentioned real label to obtain the third loss value; calculate the above-mentioned fourth sub-segmentation result and the above-mentioned sixth sub-segmentation The cross entropy loss between the results is used to obtain a fourth loss value; and based on the above third loss value and the above fourth loss value, the above second loss value is determined.

According to an embodiment of the present disclosure, the above-mentioned initial network includes an encoder and a decoder; wherein the above-mentioned first surround-view projection image and the above-mentioned mixed projection image are respectively input into the above-mentioned initial network to obtain a third image corresponding to the above-mentioned first surround-view projection image. A feature map and a first segmentation result, as well as a second feature map and a second segmentation result corresponding to the above-mentioned mixed projection image, including: respectively inputting the above-mentioned first surround projection image and the above-mentioned mixed projection image into the above-mentioned encoder to obtain the above-mentioned The first image feature corresponding to the first surround projection image and the second image feature corresponding to the above-mentioned hybrid projection image; and inputting the above-mentioned first image feature and the above-mentioned second image feature into the above-mentioned decoder respectively to obtain the above-mentioned first surround-view projection The above first feature map corresponding to the picture spectrum and the above-mentioned first segmentation result, as well as the above-mentioned second feature map and the above-mentioned second segmentation result corresponding to the above-mentioned mixed projection map.

According to an embodiment of the present disclosure, the above-mentioned mapping of multiple sets of point cloud data to the initial view to obtain multiple surround projections includes: for each set of the above-mentioned point cloud data, separately mapping the three-dimensional image of each point in the above-mentioned point cloud data. The coordinate data is subjected to polar coordinate conversion to obtain the polar coordinate data of each point in the above point cloud data; based on the polar coordinate data of each point in the above point cloud data, multiple points in the above point cloud data are respectively mapped to the above Among multiple grids of the initial view; for each grid of the above-mentioned initial view, determining the characteristic data of the above-mentioned grid based on the three-dimensional coordinate data and polar coordinate data of the points in the above-mentioned grid; and based on a plurality of the above-mentioned grids The characteristic data is used to construct the above-mentioned surround projection image.

Another aspect of the present disclosure provides a point cloud semantic segmentation method, including: mapping target point cloud data to an initial view to obtain a surround projection image; and inputting the above surround projection image into a point cloud semantic segmentation network to obtain the above The semantic segmentation feature map of the target point cloud data; wherein, the above-mentioned point cloud semantic segmentation network is trained using the point cloud semantic segmentation network training method as described above.

Another aspect of the present disclosure provides a point cloud semantic segmentation network training device, including: a first mapping module for mapping multiple sets of point cloud data to initial views respectively to obtain multiple surround projections; a first processing A module for performing partition processing on the first surround projection image and the second surround projection image respectively based on the preset size to obtain a plurality of first partition images and a plurality of second partition images, wherein the above-mentioned first surround projection image and The above-mentioned second surround projection picture belongs to a plurality of the above-mentioned surround projection pictures; a determination module is used to determine a plurality of first target partition pictures from a plurality of the above-mentioned first partition pictures; a second processing module is used to utilize a plurality of the above-mentioned first partition pictures. Each first target partition map in the target partition map replaces the second target partition map in the second surrounding projection map to obtain a hybrid projection map, wherein the second target partition map belongs to multiple second target partition maps. a partition map, the above-mentioned first target partition map and the above-mentioned second target partition map are in the same position; and a training module for using the above-mentioned first surround projection map and the above-mentioned mixed projection map as training samples to train the initial network to obtain points Cloud semantic segmentation network.

Another aspect of the present disclosure provides a point cloud semantic segmentation device, including: second a mapping module for mapping the target point cloud data to the initial view to obtain a surround projection image; and a third processing module for inputting the above surround projection image into the point cloud semantic segmentation network to obtain the semantics of the above target point cloud data Segmentation feature map; wherein, the above-mentioned point cloud semantic segmentation network is trained by using the point cloud semantic segmentation network training method as described above.

Another aspect of the present disclosure provides an electronic device, including: one or more processors; a memory for storing one or more instructions, wherein when the one or more instructions are processed by the one or more processors When executed, the above one or more processors are caused to implement the above method.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions, which when executed are used to implement the method as described above.

Another aspect of the present disclosure provides a computer program product, which includes computer-executable instructions that, when executed, are used to implement the method as described above.

According to embodiments of the present disclosure, when training a point cloud semantic segmentation network, the point cloud data can be mapped into a surround projection image, and the first surround projection image and the second surround projection image can be partitioned and mixed, that is, using the first surround projection Some partitions in the figure replace the corresponding partitions in the second surround projection image to obtain a hybrid projection image. Afterwards, the hybrid projection image and the first surround projection image can be used to train the initial network to finally obtain a point cloud semantic segmentation network. Through partition mixing, forced decoupling of this part of the partition and the background can be achieved, which can effectively improve the richness of the data, reduce the network's dependence on background and global information when predicting local areas, and improve the network's recognition ability. At the same time, through partition mixing, the three-dimensional shape of the original point cloud projected on the surround projection map can be effectively preserved, which can at least partially overcome the problem of three-dimensional deformation and shape information loss caused by data enhancement, and can improve the robustness of the network. sex. Through the above technical means, the utilization efficiency of hardware resources during network training can be effectively improved.

Description of drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an exemplary system architecture in which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and a device can be applied according to an embodiment of the present disclosure.

Figure 2 schematically shows a flow chart of a point cloud semantic segmentation network training method according to an embodiment of the present disclosure.

Figure 3 schematically shows a schematic diagram of the training process of a point cloud semantic segmentation network according to an embodiment of the present disclosure.

Figure 4 schematically shows a flow chart of a point cloud semantic segmentation method according to an embodiment of the present disclosure.

Figure 5 schematically shows a block diagram of a point cloud semantic segmentation network training device according to an embodiment of the present disclosure.

Figure 6 schematically shows a block diagram of a point cloud semantic segmentation device according to an embodiment of the present disclosure.

FIG. 7 schematically shows a block diagram of an electronic device suitable for implementing a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure.

Detailed ways

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood, however, that these descriptions are exemplary only and are not intended to limit the scope of the present disclosure. In the following detailed description, for convenience of explanation, numerous specific details are set forth to provide a comprehensive understanding of the disclosed embodiments. It will be apparent, however, that one or more embodiments may be practiced without these specific details. Furthermore, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily confusing the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the disclosure. The terms "comprising," "comprising," and the like, as used herein, indicate the presence of stated features, steps, operations, and/or components but do not exclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art, unless otherwise defined. It should be noted that the terms used here should be interpreted to have meanings consistent with the context of this specification and should not be interpreted in an idealized or overly rigid manner.

In the case of using an expression like "at least one of A, B, C, etc.", Generally should be interpreted as those skilled in the art generally understand the meaning of the expression (for example, "a system having at least one of A, B and C" shall include, but is not limited to, A alone, B alone, C alone , systems with A and B, systems with A and C, systems with B and C, and/or systems with A, B, C, etc.). Where an expression similar to "at least one of A, B or C, etc." is used, it should generally be interpreted in accordance with the meaning that a person skilled in the art generally understands the expression to mean (for example, "having A, B or C "A system with at least one of" shall include, but is not limited to, systems with A alone, B alone, C alone, A and B, A and C, B and C, and/or systems with A, B, C, etc. ).

In the field of autonomous driving technology, using deep learning technology to perceive and identify the surrounding environment is an extremely important basic research. However, deep neural networks built by deep learning technology often require a large amount of manually labeled data for training. The cost and time consumption of this manually labeled data are often barriers to improving the performance of deep neural network models. On the other hand, unmanned vehicles can collect a large amount of original unlabeled data through various sensors while driving. Therefore, how to make good use of these original unlabeled data, coupled with the assistance of a small amount of labeled data, to train the neural network, that is, using semi-supervised training to improve the recognition and classification performance of the neural network, is the process of developing an autonomous driving system. It is an important research task that can increase efficiency and reduce costs.

Among related technologies, research on using semi-supervised training algorithms to improve semantic segmentation tasks mainly focuses on the field of two-dimensional images. Research on semi-supervised training algorithms for 3D point cloud scenes, especially 3D point cloud semantic segmentation models based on LiDAR scanning results, is still at a blank stage. Due to the modal differences between 2D images and 3D point clouds, the semi-supervised training algorithm for point cloud semantic segmentation on 2D images cannot be directly and effectively transplanted to the 3D point cloud semantic segmentation task. For example, when performing semantic segmentation on a 3D point cloud through a surround projection image, using conventional 2D image data enhancement methods, such as adding noise, rotation, scaling, etc., will cause the 3D shape of the 3D point cloud to be distorted, thereby affecting the training effect of the model. .

In view of this, embodiments of the present disclosure provide a method that can effectively utilize a large amount of lidar raw point cloud data, supplemented by a small amount of labeled data, to conduct semi-supervised training of a point cloud semantic segmentation network. In this method , proposed a partitioned mixed data The enhancement strategy improves the recognition difficulty of the model by mixing two different surround projection images, reducing the loss of shape information of the three-dimensional point cloud during the data enhancement process, thereby improving the training effect of the model, and the robustness and reliability of the model. .

Specifically, embodiments of the present disclosure provide a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, a device, an electronic device, and a storage medium. The point cloud semantic segmentation network training method includes: mapping multiple sets of point cloud data to the initial view respectively to obtain multiple surround projection images; based on the preset size, partitioning the first surround projection image and the second surround projection image respectively Processing to obtain a plurality of first partition maps and a plurality of second partition maps, wherein the first surround projection map and the second surround projection map belong to multiple surround projection views; multiple first partition maps are determined from the plurality of first partition maps Target partition map; use each first target partition map in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection map to obtain a hybrid projection map, wherein the second target partition map Belonging to multiple second partition maps, the positions of the first target partition map and the second target partition map are the same; and the first surround projection map and the mixed projection map are used as training samples to train the initial network to obtain the point cloud semantic segmentation network .

FIG. 1 schematically illustrates an exemplary system architecture in which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and a device can be applied according to an embodiment of the present disclosure. It should be noted that Figure 1 is only an example of a system architecture to which embodiments of the present disclosure can be applied, to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure cannot be used in other applications. Device, system, environment or scenario.

As shown in Figure 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105.

The terminal devices 101, 102, and 103 may be various types of equipment equipped with lidar, or may be various types of electronic equipment capable of controlling lidar, or may be various types of electronic equipment capable of storing point cloud data.

The network 104 is a medium used to provide communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

Server 105 may be a server that provides various services. For example, the server may be The training process of point cloud semantic segmentation network provides support of computing resources and storage resources.

It should be noted that the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure can generally be executed by the server 105 . Accordingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure can generally be installed in the server 105 . The terminal devices 101, 102, and 103 can collect point cloud data, or the terminal devices 101, 102, and 103 can obtain point cloud data collected by other terminal devices through the Internet, and the point cloud data can be sent to the server 105 through the network. , so that the server 105 executes the method provided by the embodiment of the present disclosure to implement the training of the point cloud semantic segmentation network or perform point cloud semantic segmentation on the point cloud data. The point cloud semantic segmentation network training method or point cloud semantic segmentation method provided by the embodiments of the present disclosure can also be executed by a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101, 102, 103 and/or the server 105. Correspondingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure may also be provided on a server or server different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. in a server cluster. Alternatively, the point cloud semantic segmentation network training method or point cloud semantic segmentation method provided by the embodiments of the present disclosure can also be executed by the terminal device 101, 102, or 103, or can also be executed by a device different from the terminal device 101, 102, or 103. Other terminal devices execute. Correspondingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure can also be provided in the terminal device 101, 102, or 103, or be provided in a device different from the terminal device 101, 102, or 103. in other terminal devices.

It should be understood that the number of terminal devices, networks and servers in Figure 1 is only illustrative. Depending on implementation needs, there can be any number of end devices, networks, and servers.

As shown in Figure 2, the method includes operations S201 to S205.

In operation S201, multiple sets of point cloud data are respectively mapped to the initial view to obtain multiple surround projection images.

In operation S202, based on the preset size, partition processing is performed on the first surround projection image and the second surround projection image respectively to obtain multiple first partition images and multiple second partition images, wherein, The first surround projection image and the second surround projection image belong to multiple surround projection images.

In operation S203, a plurality of first target partition maps are determined from a plurality of first partition maps.

In operation S204, each first target partition map in the plurality of first target partition maps is used to respectively replace the second target partition map in the second surround projection image to obtain a hybrid projection image, wherein the second target partition map Belonging to multiple second partition maps, the positions of the first target partition map and the second target partition map are the same.

In operation S205, the first surround projection image and the mixed projection image are used as training samples to train the initial network to obtain a point cloud semantic segmentation network.

According to embodiments of the present disclosure, point cloud data can be collected using sensing equipment such as rotating scanning lidar. Each set of point cloud data can be configured with a preset rectangular coordinate system. Each point in the point cloud data can represent is a three-dimensional coordinate under the Cartesian coordinate system, and the center of the Cartesian coordinate system can represent the position of the sensing device when collecting the point cloud data.

According to embodiments of the present disclosure, point cloud data collected using rotating scanning lidar can be distributed in a sphere, and the initial view can be obtained by unfolding the annular surface of the sphere near a horizontal plane. For each point in the point cloud data, the direction vector when mapping the point can be determined based on the coordinates of the point, and then the direction vector can be used to project the point onto the initial view.

According to an embodiment of the present disclosure, partitioning the surround projection image based on a preset size may equally divide the surround projection image into a plurality of rectangular areas. The size of the preset size can be determined according to the size of the surround projection image in a specific application scenario, and is not limited here. For example, the resolution of the surround projection image can be 24×480. When partitioning the surround projection image, it can be equally divided into 16 parts and 6 parts along the length and width directions, thereby dividing the surround projection image into a total of 96 pieces of resolution. The rate is 4×30 partition map.

According to embodiments of the present disclosure, the first surround projection image and the second surround projection image may be randomly selected from a plurality of surround projection images. The first surround projection image and the second surround projection image may have completely different characteristics, that is, the point cloud data corresponding to the first surround projection image and the point cloud data corresponding to the second surround projection image may be for different scenes. Collected from different items below.

According to an embodiment of the present disclosure, the first target partition map can be randomly collected from multiple first partition maps, and the first target partition map can occupy a certain proportion in the first partition map, and the proportion can be, for example, 25%, 30%, etc., are not limited here.

According to an embodiment of the present disclosure, the operation of respectively replacing the second target partition map in the second surround projection image with each first target partition map in the plurality of first target partition maps may include: according to the first target partition map The position information of the picture is determined from the second surround projection picture, and after the second target partition picture is deleted, the first target partition picture is filled in the corresponding position.

According to the embodiments of the present disclosure, the method used in training the initial network is not limited here. For example, it may be the gradient descent method, the least squares method, etc. The training parameters set when training the initial network, such as the number of training times, batch capacity, learning rate, etc., can be set according to specific application scenarios and are not limited here.

The method shown in Figure 2 will be further described below with reference to Figure 3 and specific embodiments.

According to an embodiment of the present disclosure, the surround projection image can be obtained by using the method of operation S201. Specifically, operation S201 can include the following operations:

For each set of point cloud data, perform polar coordinate transformation on the three-dimensional coordinate data of each point in the point cloud data to obtain the polar coordinate data of each point in the point cloud data; based on the point cloud data Based on the polar coordinate data of each point in the data, multiple points in the point cloud data are mapped to multiple grids in the initial view; for each grid in the initial view, based on the three-dimensional coordinate data of the points in the grid and polar coordinate data to determine the characteristic data of the grid; and, based on the characteristic data of multiple grids, construct a surround projection map.

According to embodiments of the present disclosure, each point in the point cloud data may have three-dimensional coordinate data, that is, x, y, and z. By performing polar coordinate transformation on the point, the transformed coordinates yaw and pitch under the rotating coordinate system can be obtained, That is, polar coordinate data.

According to embodiments of the present disclosure, the grid of the initial view may refer to a pixel color block corresponding to a single pixel point in the initial view. For example, the resolution of the initial view may be 20×480, then the initial view may have 9600 pixel color patches, and correspondingly, the initial view may have 9600 rasters.

According to embodiments of the present disclosure, when multiple points are mapped in a grid, the feature data of the point closest to the origin among the multiple points can be taken as the feature data of the grid. The characteristic data of the point may include three-dimensional coordinate data, polar coordinate data, and data processed based on the three-dimensional coordinate data and polar coordinate data, such as reflectivity data, depth data, etc.

As shown in Figure 3, the training process of the point cloud semantic segmentation network can include a sample preprocessing process and a network iterative training process.

According to embodiments of the present disclosure, during sample preprocessing, a part of the partitions in the first surround projection image may be replaced into the second surround projection image to obtain a hybrid projection image. For specific methods, please refer to the methods of operations S202 to S204, which will not be described again here.

According to embodiments of the present disclosure, the network iterative training process may be to input the first surround projection image and the mixed projection image as a sample pair into the initial network, and based on the set loss function and gradient descent method, least squares method and other models The iterative method adjusts the model parameters of the initial network to achieve training of the initial network.

According to embodiments of the present disclosure, the initial network may include an encoder and a decoder.

According to an embodiment of the present disclosure, the first surround projection image and the hybrid projection image are respectively input into the initial network to obtain the first feature map and the first segmentation corresponding to the first surround projection image. The results, as well as the second feature map and the second segmentation result corresponding to the hybrid projection map, may include the following operations:

Input the first surround projection image and the mixed projection image into the encoder respectively to obtain the first image feature corresponding to the first surround view projection image and the second image feature corresponding to the mixed projection image; and respectively input the first image feature and the second image feature corresponding to the mixed projection image. The image features are input into the decoder to obtain a first feature map and a first segmentation result corresponding to the first surround projection image, and a second feature map and a second segmentation result corresponding to the hybrid projection image.

According to embodiments of the present disclosure, the encoder can be any feature extraction network, such as ResNet18, etc.

According to embodiments of the present disclosure, the decoder can be any feature upsampling network, such as UpperNet, etc.

According to embodiments of the present disclosure, the network iterative training process may specifically include the following operations:

Input the first surround projection image and the mixed projection image into the initial network respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the second feature map and second segmentation result corresponding to the mixed projection image. Result; calculate the information entropy loss between the first feature map and the second feature map to obtain the first loss value; calculate the cross entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value; and use The first loss value and the second loss value are used to adjust the model parameters of the initial network to finally obtain the point cloud data semantic segmentation network.

According to an embodiment of the present disclosure, the first segmentation result may represent a semantic feature segmentation result of each area in the first surround projection image.

According to embodiments of the present disclosure, the first feature map may have the same size as the first surround projection image, and areas with different semantic features on the first feature map may have different color features. For example, areas with different semantic features on the first feature map may respectively refer to areas where people, cars, and obstacles are located, and the three areas may be represented by red, blue, and green respectively.

According to an embodiment of the present disclosure, calculating the information entropy loss between the first feature map and the second feature map, and obtaining the first loss value may include the following operations:

Determine a first sub-feature map related to a plurality of first target partition maps from the first feature map; split the second feature map into a second sub-feature map related to a plurality of first target partition maps and a second sub-feature map related to the plurality of first target partition maps. The third sub-feature map that is irrelevant to the first target partition map; and when the confidence probability of the first sub-feature map is greater than the preset threshold, using the first sub-feature map and the second sub-feature map as a positive sample pair, and using the third sub-feature map as a positive sample pair. The first sub-feature map and the third sub-feature map are used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.

According to an embodiment of the present disclosure, since the first feature map may have the same size as the first surround projection image, the first feature map may be obtained based on the position information of the first target partition map in the first surround projection image. The first sub-feature map is determined.

According to an embodiment of the present disclosure, the method of calculating the confidence probability of the first sub-feature map is not limited here. For example, the Gaussian formula can be used to determine the confidence probability.

According to embodiments of the present disclosure, the preset threshold can be determined according to specific application scenarios, for example, it can be set to 90%, 95%, etc., which is not limited here.

According to an embodiment of the present disclosure, the calculation method of information entropy loss can be as shown in formula (1):

In formula (1), L ₁ represents information entropy loss; fp represents the first sub-feature map; fx represents the second sub-feature map; fy represents the third sub-feature map.

According to an embodiment of the present disclosure, calculating the cross-entropy loss between the first segmentation result and the second segmentation result, and obtaining the second loss value may include the following operations:

Determine a first sub-segmentation result related to a plurality of first target partition maps from the first segmentation result; determine a second sub-segmentation result related to a plurality of first target partition maps from the second segmentation result; based on the first sub-segmentation result The confidence probability of the segmentation result and the confidence probability of the second sub-segmentation result are used to determine the predicted value and the label value; and the cross-entropy loss between the predicted value and the label value is calculated to obtain the second loss value.

According to an embodiment of the present disclosure, the first segmentation result may have the same size as the first surround projection image, and thus may be determined from the first segmentation result based on the position information of the first target partition map in the first surround projection image. The first sub-segmentation result.

According to an embodiment of the present disclosure, the calculation method of the confidence probability of the first sub-segmentation result and the second sub-segmentation result is not limited here. For example, the Gaussian formula can be used to determine the confidence probability.

According to an embodiment of the present disclosure, the prediction value and the label value can be determined respectively by comparing the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result. Specifically, when the confidence probability of the first sub-segmentation result is greater than In the case of the confidence probability of the second sub-segmentation result, the first sub-segmentation result is determined as the label value and the second sub-segmentation result is the predicted value; when the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result In this case, it is determined that the first sub-segmentation result is the predicted value, and the second sub-segmentation result is the label value.

According to an embodiment of the present disclosure, the calculation method of cross entropy loss can be as shown in formula (2):
L ₂ =∑(ylogy _p +(1-y)log(1-y _p )) (2)

In formula (2), L ₂ represents cross entropy loss; y represents the label value; y _p represents the predicted value.

According to embodiments of the present disclosure, the total loss used to adjust the model parameters of the initial network may be a weighted sum of information entropy loss and cross-entropy loss, and its weight may be a hyperparameter that can be used by users when tuning the model. Any setting.

According to an embodiment of the present disclosure, the plurality of first target partition maps may include a third target partition map, and the third target partition map has a real label.

According to an embodiment of the present disclosure, when it is determined that the third target partition map exists, calculating the cross-entropy loss between the first segmentation result and the second segmentation result, and obtaining the second loss value may include the following operations:

Determine a third sub-segmentation result related to the third target partition map from the first segmentation result, and a fourth sub-segmentation result unrelated to the third target partition map and related to multiple first target partition maps; from the second segmentation result The fifth sub-segmentation result related to the third target partition map is determined in the result, and the sixth sub-segmentation result is independent of the third target partition map and related to multiple first target partition maps; calculate the third sub-segmentation result and the true Cross entropy loss between labels, we get a third loss value; calculate the cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain a fourth loss value; and determine the second loss value based on the third loss value and the fourth loss value.

According to the embodiments of the present disclosure, through the design of the above loss function, the original unlabeled data can be used and a small amount of labeled data can be used to train the network, realizing semi-supervised training of the point cloud semantic segmentation network, thereby ensuring the network Based on the semantic segmentation effect, the cost of data annotation is reduced.

As shown in Figure 4, the method includes operations S401 to S402.

In operation S401, the target point cloud data is mapped to the initial view to obtain a surround projection image.

In operation S402, the surround projection image is input into the point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data.

According to embodiments of the present disclosure, the point cloud semantic segmentation network can be trained using the method in the point cloud semantic segmentation network training method section, which will not be described again here.

As shown in FIG. 5 , the point cloud semantic segmentation network training device 500 includes a first mapping module 510 , a first processing module 520 , a determination module 530 , a second processing module 540 and a training module 550 .

The first mapping module 510 is used to map multiple sets of point cloud data to the initial view respectively to obtain multiple surround projections.

The first processing module 520 is configured to perform partition processing on the first surround view projection image and the second surround view projection image based on the preset size to obtain a plurality of first partition images and a plurality of second partition images, wherein the first surround view projection image The projection image and the second surround projection image belong to multiple surround projection images.

The determining module 530 is configured to determine a plurality of first target partition maps from a plurality of first partition maps.

The second processing module 540 is configured to use each first target partition map in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection image, to obtain to a mixed projection map, wherein the second target partition map belongs to multiple second partition maps, and the first target partition map and the second target partition map are at the same position.

The training module 550 is used to train the initial network using the first surround projection image and the mixed projection image as training samples to obtain a point cloud semantic segmentation network.

According to an embodiment of the present disclosure, the training module 550 includes a first training sub-module, a second training sub-module, a third training sub-module and a fourth training sub-module.

The first training submodule is used to input the first surround projection image and the mixed projection image into the initial network respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the first feature map corresponding to the mixed projection image. The second feature map and the second segmentation result.

The second training submodule is used to calculate the information entropy loss between the first feature map and the second feature map to obtain the first loss value.

The third training submodule is used to calculate the cross-entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value.

The fourth training submodule is used to use the first loss value and the second loss value to adjust the model parameters of the initial network to finally obtain the point cloud data semantic segmentation network.

According to an embodiment of the present disclosure, the second training sub-module includes a first training unit, a second training unit and a third training unit.

The first training unit is configured to determine the first sub-feature map related to the plurality of first target partition maps from the first feature map.

The second training unit is used to split the second feature map into a second sub-feature map related to the plurality of first target partition maps and a third sub-feature map unrelated to the plurality of first target partition maps.

The third training unit is used to use the first sub-feature map and the second sub-feature map as a positive sample pair when the confidence probability of the first sub-feature map is greater than the preset threshold, and use the first sub-feature map and the third sub-feature map as a positive sample pair. The sub-feature map is used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.

According to an embodiment of the present disclosure, the third training sub-module includes a fourth training unit, a fifth training unit, a sixth training unit and a seventh training unit.

The fourth training unit is configured to determine the first sub-segmentation results related to the plurality of first target partition maps from the first segmentation results.

A fifth training unit is configured to determine second sub-segmentation results related to the plurality of first target partition maps from the second segmentation results.

The sixth training unit is used to determine the prediction value and the label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result.

The seventh training unit is used to calculate the cross-entropy loss between the predicted value and the label value to obtain the second loss value.

According to an embodiment of the present disclosure, the sixth training unit includes a first training sub-unit and a second training sub-unit.

The first training subunit is used to determine that the first sub-segmentation result is the label value and the second sub-segmentation result is the predicted value when the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result.

The second training subunit is used to determine that the first sub-segmentation result is the predicted value and the second sub-segmentation result is the label value when the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result.

According to an embodiment of the present disclosure, the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label.

According to an embodiment of the present disclosure, the third training sub-module includes an eighth training unit, a ninth training unit, a tenth training unit, an eleventh training unit and a twelfth training unit.

The eighth training unit is used to determine, from the first segmentation result, a third sub-segmentation result related to the third target partition map, and a fourth sub-segmentation result that is independent of the third target partition map and related to the plurality of first target partition maps. Segmentation results.

A ninth training unit, configured to determine a fifth sub-segmentation result related to the third target partition map from the second segmentation result, and a sixth sub-segmentation result that is independent of the third target partition map and related to the plurality of first target partition maps. Segmentation results.

The tenth training unit is used to calculate the cross-entropy loss between the third sub-segmentation result and the real label to obtain the third loss value.

The eleventh training unit is used to calculate the cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain the fourth loss value.

The twelfth training unit is used to determine the second loss value based on the third loss value and the fourth loss value.

According to an embodiment of the present disclosure, the initial network includes an encoder and a decoder.

According to an embodiment of the present disclosure, the first training sub-module includes a thirteenth training unit and a fourteenth training unit.

The thirteenth training unit is used to input the first surround projection image and the hybrid projection image into the encoder respectively, and obtain the first image feature corresponding to the first surround projection image and the second image feature corresponding to the hybrid projection image.

The fourteenth training unit is used to input the first image feature and the second image feature into the decoder respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the first segmentation result corresponding to the mixed projection image. Two feature maps and second segmentation results.

According to an embodiment of the present disclosure, the first mapping module 510 includes a first mapping unit, a second mapping unit, a third mapping unit and a fourth mapping unit.

The first mapping unit is used to perform polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data for each set of point cloud data, so as to obtain the polar coordinate data of each point in the point cloud data.

The second mapping unit is used to map multiple points in the point cloud data to multiple grids in the initial view based on the polar coordinate data of each point in the point cloud data.

The third mapping unit is used for determining, for each grid of the initial view, the characteristic data of the grid based on the three-dimensional coordinate data and polar coordinate data of the points in the grid.

The fourth mapping unit is used to construct a surround projection map based on the feature data of multiple grids.

It should be noted that the point cloud semantic segmentation network training device part in the embodiment of the present disclosure corresponds to the point cloud semantic segmentation network training method part in the embodiment of the present disclosure. The description of the point cloud semantic segmentation network training device part is specific. Refer to the point cloud semantic segmentation network training method section and will not go into details here.

As shown in FIG. 6 , the point cloud semantic segmentation device 600 includes a second mapping module 610 and a third processing module 620 .

The second mapping module 610 is used to map the target point cloud data to the initial view to obtain a surround projection image.

The third processing module 620 is used to input the surround projection image into a point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data.

Any number of modules, sub-modules, units, sub-units according to embodiments of the present disclosure, or at least part of the functions of any number of them, may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be split into multiple modules for implementation. Any one or more of the modules, sub-modules, units, and sub-units according to embodiments of the present disclosure may be at least partially implemented as hardware circuits, such as field programmable gate arrays (FPGAs), programmable logic arrays (PLA), System-on-a-chip, system-on-substrate, system-on-package, application-specific integrated circuit (ASIC), or any other reasonable means of integrating or packaging circuits that can be implemented in hardware or firmware, or in a combination of software, hardware, and firmware Any one of these implementation methods or an appropriate combination of any of them. Alternatively, one or more of the modules, sub-modules, units, and sub-units according to embodiments of the present disclosure It can be at least partially implemented as a computer program module, and when the computer program module is executed, it can perform corresponding functions.

For example, the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or any more of the second mapping module 610 and the third processing module 620 can be combined into one Implemented in modules/units/subunits, or any one of the modules/units/subunits can be split into multiple modules/units/subunits. Alternatively, at least part of the functionality of one or more of these modules/units/subunits may be combined with at least part of the functionality of other modules/units/subunits and combined in one module/unit/subunit realized in. According to an embodiment of the present disclosure, at least one of the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or the second mapping module 610 and the third processing module 620 may be implemented, at least in part, as hardware circuitry, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated circuit (ASIC), or may It can be implemented by hardware or firmware in any other reasonable way to integrate or package circuits, or it can be implemented in any one of the three implementation methods of software, hardware and firmware or in an appropriate combination of any of them. Or both, the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or at least one of the second mapping module 610 and the third processing module 620 can be at least Partially implemented as computer program modules, when the computer program modules are executed, corresponding functions can be performed.

FIG. 7 schematically shows a block diagram of an electronic device suitable for implementing a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure. The electronic device shown in FIG. 7 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.

As shown in FIG. 7 , a computer electronic device 700 according to an embodiment of the present disclosure includes a processor 701 that can be loaded into a random access memory (RAM) 703 according to a program stored in a read-only memory (ROM) 702 or from a storage portion 708 perform various appropriate actions and processing according to the program in it. Processor 701 may include, for example, a general purpose microprocessor (eg, a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (eg, an application specific integrated circuit (ASIC)), among others. Processor 701 may also include onboard memory for caching purposes. The processor 701 may include different actions for performing the method flow according to embodiments of the present disclosure. A single processing unit or multiple processing units.

In the RAM 703, various programs and data required for the operation of the electronic device 700 are stored. The processor 701, ROM 702 and RAM 703 are connected to each other through a bus 704. The processor 701 performs various operations according to the method flow of the embodiment of the present disclosure by executing programs in the ROM 702 and/or RAM 703. It should be noted that the program may also be stored in one or more memories other than ROM 702 and RAM 703. The processor 701 may also perform various operations according to the method flow of embodiments of the present disclosure by executing programs stored in the one or more memories.

According to embodiments of the present disclosure, the electronic device 700 may further include an input/output (I/O) interface 705 that is also connected to the bus 704 . Electronic device 700 may also include one or more of the following components connected to I/O interface 705: an input portion 706 including a keyboard, mouse, etc.; including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and an output section 707 of speakers and the like; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem and the like. The communication section 709 performs communication processing via a network such as the Internet. Driver 710 is also connected to I/O interface 705 as needed. Removable media 711, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on the drive 710 as needed, so that a computer program read therefrom is installed into the storage portion 708 as needed.

According to embodiments of the present disclosure, the method flow according to the embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart. In such embodiments, the computer program may be downloaded and installed from the network via communication portion 709 and/or installed from removable media 711 . When the computer program is executed by the processor 701, the above-described functions defined in the system of the embodiment of the present disclosure are performed. According to embodiments of the present disclosure, the systems, devices, devices, modules, units, etc. described above may be implemented by computer program modules.

The present disclosure also provides a computer-readable storage medium. The computer-readable storage medium may be included in the device/device/system described in the above embodiments; it may also exist independently without being assembled into the device/system. in the device/system. The above computer-readable storage media carries There are one or more programs, and when the above one or more programs are executed, the method according to the embodiment of the present disclosure is implemented.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include but are not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the present disclosure, the computer-readable storage medium may include one or more memories other than ROM 702 and/or RAM 703 and/or ROM 702 and RAM 703 described above.

Embodiments of the present disclosure also include a computer program product, which includes a computer program. The computer program includes program code for executing the method provided by the embodiment of the present disclosure. When the computer program product is run on an electronic device, the program The code is used to enable the electronic device to implement the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure.

When the computer program is executed by the processor 701, the above-mentioned functions defined in the system/device of the embodiment of the present disclosure are performed. According to embodiments of the present disclosure, the systems, devices, modules, units, etc. described above may be implemented by computer program modules.

In one embodiment, the computer program may rely on tangible storage media such as optical storage devices and magnetic storage devices. In another embodiment, the computer program can also be transmitted and distributed in the form of a signal on a network medium, and downloaded and installed through the communication part 709, and/or installed from the removable medium 711. The program code contained in the computer program can be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.

According to the embodiments of the present disclosure, the program code for executing the computer program provided by the embodiments of the present disclosure may be written in any combination of one or more programming languages. Specifically, high-level procedural and/or object-oriented programming may be utilized. programming language, and/or assembly/machine language to implement implement these calculation procedures. Programming languages include, but are not limited to, programming languages such as Java, C++, python, "C" language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device, such as provided by an Internet service. (business comes via Internet connection).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operations of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block in the block diagram or flowchart illustration, and combinations of blocks in the block diagram or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or operations, or may be implemented by special purpose hardware-based systems that perform the specified functions or operations. Achieved by a combination of specialized hardware and computer instructions. Those skilled in the art will understand that features recited in various embodiments and/or claims of the present disclosure may be combined and/or combined in various ways, even if such combinations or combinations are not explicitly recited in the present disclosure. In particular, various combinations and/or combinations of features recited in the various embodiments and/or claims of the disclosure may be made without departing from the spirit and teachings of the disclosure. All such combinations and/or combinations fall within the scope of this disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although each embodiment is described separately above, this does not mean that the measures in the various embodiments cannot be used in combination to advantage. The scope of the disclosure is defined by the appended claims and their equivalents. Without departing from the scope of the present disclosure, those skilled in the art can make various substitutions and modifications, and these substitutions and modifications should all fall within the scope of the present disclosure.

Claims

A point cloud semantic segmentation network training method, including:

Map multiple sets of point cloud data to the initial view respectively to obtain multiple surround projections;

Based on the preset size, partition processing is performed on the first surround projection image and the second surround projection image respectively to obtain a plurality of first partition images and a plurality of second partition images, wherein the first surround view projection image and the third surround view projection image are The two surround projection images belong to a plurality of said surround projection images;

determining a plurality of first target partition maps from a plurality of said first partition maps;

Using each of the first target partition maps in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection image, a hybrid projection image is obtained, wherein the third Two target partition maps belong to multiple second partition maps, and the positions of the first target partition map and the second target partition map are the same; and

The first surround projection image and the hybrid projection image are used as training samples to train the initial network to obtain a point cloud semantic segmentation network.
The method according to claim 1, wherein the first surround projection image and the hybrid projection image are used as training samples to train an initial network to obtain a point cloud semantic segmentation network, including:

The first surround projection image and the hybrid projection image are respectively input into the initial network to obtain a first feature map and a first segmentation result corresponding to the first surround projection image, as well as a first feature map corresponding to the hybrid projection image. The corresponding second feature map and second segmentation result;

Calculate the information entropy loss between the first feature map and the second feature map to obtain a first loss value;

Calculate the cross-entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value; and

The first loss value and the second loss value are used to adjust the model parameters of the initial network to finally obtain the point cloud data semantic segmentation network.
The method according to claim 2, wherein calculating the information entropy loss between the first feature map and the second feature map to obtain the first loss value includes:

Determine first sub-feature maps related to a plurality of the first target partition maps from the first feature maps;

splitting the second feature map into a second sub-feature map related to the plurality of first target partition maps and a third sub-feature map unrelated to the plurality of first target partition maps; and

When the confidence probability of the first sub-feature map is greater than the preset threshold, the first sub-feature map and the second sub-feature map are used as a positive sample pair, and the first sub-feature map and the second sub-feature map are used as a positive sample pair. The third sub-feature map is used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.
The method according to claim 2, wherein calculating the cross-entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value includes:

Determine first sub-segmentation results related to a plurality of the first target partition maps from the first segmentation results;

Determine second sub-segmentation results related to a plurality of the first target partition maps from the second segmentation results;

Determine prediction values and label values based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result; and

Calculate the cross-entropy loss between the predicted value and the label value to obtain the second loss value.
The method of claim 4, wherein determining the prediction value and the label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result includes:

When the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result, it is determined that the first sub-segmentation result is the label value, and the second sub-segmentation result is the predicted value; and

When the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result, it is determined that the first sub-segmentation result is the predicted value, and the second sub-segmentation result is the tag value.
The method according to claim 2, wherein the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label;

Wherein, calculating the cross-entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value includes:

A third sub-segmentation result related to the third target partition map is determined from the first segmentation result, and a fourth sub-segmentation result that is independent of the third target partition map and related to a plurality of the first target partition maps is determined. sub-segmentation results;

A fifth sub-segmentation result related to the third target partition map is determined from the second segmentation result, and a sixth sub-segmentation result that is independent of the third target partition map and related to a plurality of the first target partition maps is determined. sub-segmentation result;

Calculate the cross-entropy loss between the third sub-segmentation result and the real label to obtain a third loss value;

Calculate the cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain a fourth loss value; and

The second loss value is determined based on the third loss value and the fourth loss value.
The method of claim 2, wherein the initial network includes an encoder and a decoder;

Wherein, the first surround projection image and the hybrid projection image are respectively input into the initial network, and a first feature map and a first segmentation result corresponding to the first surround projection image are obtained, as well as a first segmentation result corresponding to the first surround view projection image and the first segmentation result. The second feature map and the second segmentation result corresponding to the mixed projection image include:

Input the first surround projection image and the mixed projection image into the encoder respectively, and obtain the first image feature corresponding to the first surround view projection image and the second image feature corresponding to the mixed projection image; as well as

The first image feature and the second image feature are input into the decoder respectively, and the first feature map and the first segmentation result corresponding to the first surround projection image are obtained, as well as the The second feature map corresponding to the projection map and the second segmentation result are mixed.
The method according to claim 1, wherein the plurality of sets of point cloud data are respectively mapped to the initial view to obtain multiple surround projections, including:

For each set of point cloud data, perform polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data to obtain polar coordinate data of each point in the point cloud data;

Based on the polar coordinate data of each point in the point cloud data, map multiple points in the point cloud data to multiple grids of the initial view respectively;

For each grid of the initial view, determining characteristic data of the grid based on three-dimensional coordinate data and polar coordinate data of points in the grid; and

Based on the characteristic data of a plurality of the grids, the surrounding projection map is constructed.
A point cloud semantic segmentation method, including:

Map the target point cloud data into the initial view to obtain the surround projection; and

Input the surround projection image into the point cloud semantic segmentation network to obtain the semantic segmentation feature map of the target point cloud data;

Wherein, the point cloud semantic segmentation network is trained using the point cloud semantic segmentation network training method according to any one of claims 1 to 8.
A point cloud semantic segmentation network training device, including:

The first mapping module is used to map multiple sets of point cloud data to the initial view respectively to obtain multiple surround projections;

The first processing module is configured to perform partition processing on the first surround projection image and the second surround projection image respectively based on the preset size to obtain a plurality of first partition images and a plurality of second partition images, wherein the first The surround projection image and the second surround projection image belong to a plurality of the surround projection images;

a determining module, configured to determine a plurality of first target partition maps from a plurality of first partition maps;

The second processing module is configured to use each of the first target partition maps in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection image to obtain a hybrid projection. Figure, wherein the second target partition map belongs to a plurality of the second partition maps, and the positions of the first target partition map and the second target partition map are the same; and

A training module, configured to use the first surround projection image and the hybrid projection image as training samples to train the initial network to obtain a point cloud semantic segmentation network.
A point cloud semantic segmentation device, including:

The second mapping module is used to map the target point cloud data to the initial view to obtain the surround projection; and

The third processing module is used to input the surround projection image into the point cloud semantic segmentation network to obtain the semantic segmentation feature map of the target point cloud data;

Wherein, the point cloud semantic segmentation network is trained using the point cloud semantic segmentation network training method according to any one of claims 1 to 8.
An electronic device including:

one or more processors;

memory for storing one or more instructions,

Wherein, when the one or more instructions are executed by the one or more processors, the one or more processors are caused to implement the method described in any one of claims 1 to 9.
A computer-readable storage medium having executable instructions stored thereon, which when executed by a processor causes the processor to implement the method described in any one of claims 1 to 9.
A computer program product, which includes computer-executable instructions that, when executed, are used to implement the method according to any one of claims 1 to 9.