CN115375899A

CN115375899A - Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device

Info

Publication number: CN115375899A
Application number: CN202211022552.3A
Authority: CN
Inventors: 温欣
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-11-22
Also published as: WO2024040954A1

Abstract

The invention provides a point cloud semantic segmentation network training method, a point cloud semantic segmentation device, electronic equipment and a storage medium, which can be applied to the technical field of artificial intelligence. The method comprises the following steps: respectively mapping the plurality of groups of point cloud data into the initial view to obtain a plurality of all-round projection views; based on a preset size, respectively carrying out partition processing on the first annular view projection drawing and the second annular view projection drawing to obtain a plurality of first partition drawings and a plurality of second partition drawings; determining a plurality of first target partition maps from the plurality of first partition maps; replacing a second target partition map in the second surround-view projection map with each of the plurality of first target partition maps to obtain a mixed projection map; and training the initial network by taking the first all-round projection graph and the mixed projection graph as training samples to obtain the point cloud semantic segmentation network.

Description

Point cloud semantic segmentation network training method, point cloud semantic segmentation method and device

技术领域technical field

本公开涉及人工智能技术领域，更具体地，涉及一种点云语义分割网络训练方法、点云语义分割方法、装置、电子设备和存储介质。The present disclosure relates to the technical field of artificial intelligence, and more specifically, to a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, a device, an electronic device, and a storage medium.

背景技术Background technique

随着三维传感技术的发展，点云数据被广泛应用于自动驾驶、机器人抓取等诸多领域。深度学习技术作为点云数据分析的主流解决方案，在点云数据处理方面表现出了良好的性能。由于通过各式传感器收集到的点云数据通常为无标注数据，且人工标注数据的成本较高，因此，相关技术中通常利用半监督训练的方式来构建深度神经网络。With the development of 3D sensing technology, point cloud data is widely used in many fields such as automatic driving and robot grasping. As a mainstream solution for point cloud data analysis, deep learning technology has shown good performance in point cloud data processing. Since the point cloud data collected by various sensors is usually unlabeled data, and the cost of manually labeling data is high, in related technologies, a semi-supervised training method is usually used to construct a deep neural network.

在相关技术中，半监督训练算法提升语义分割任务的研究主要集中在二维图像领域，将该方法直接应用与三维点云的分割任务时会产生三维形状失真的问题，从而间接导致点云数据的语义分割效果较差。In related technologies, the research on semi-supervised training algorithms to improve semantic segmentation tasks is mainly concentrated in the field of two-dimensional images. When this method is directly applied to the segmentation task of three-dimensional point clouds, the problem of three-dimensional shape distortion will occur, which indirectly leads to the problem of point cloud data. The semantic segmentation effect is poor.

发明内容Contents of the invention

有鉴于此，本公开提供了一种点云语义分割网络训练方法、点云语义分割方法、装置、电子设备、可读存储介质和计算机程序产品。In view of this, the present disclosure provides a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, a device, an electronic device, a readable storage medium and a computer program product.

本公开的一个方面提供了一种点云语义分割网络训练方法，包括：将多组点云数据分别映射到初始视图中，得到多个环视投影图；基于预设尺寸，分别对第一环视投影图和第二环视投影图进行分区处理，得到多个第一分区图和多个第二分区图，其中，上述第一环视投影图和上述第二环视投影图属于多个上述环视投影图；从多个上述第一分区图中确定多个第一目标分区图；利用多个上述第一目标分区图中的每个上述第一目标分区图分别对上述第二环视投影图中的第二目标分区图进行替换，得到混合投影图，其中，上述第二目标分区图属于多个上述第二分区图，上述第一目标分区图与上述第二目标分区图的位置相同；以及将上述第一环视投影图和上述混合投影图作为训练样本来对初始网络进行训练，得到点云语义分割网络。One aspect of the present disclosure provides a point cloud semantic segmentation network training method, including: mapping multiple sets of point cloud data to the initial view respectively to obtain multiple surround view projection images; based on the preset size, respectively projecting the first surround view The map and the second surround-view projection map are partitioned to obtain a plurality of first partition maps and a plurality of second partition maps, wherein the above-mentioned first surround-view projection map and the above-mentioned second surround-view projection map belong to a plurality of the above-mentioned surround-view projection maps; from Determining a plurality of first target zoning maps in a plurality of the above-mentioned first zoning maps; using each of the above-mentioned first target zoning maps in the multiple above-mentioned first target zoning maps to respectively map the second target zoning in the above-mentioned second surround view projection map Figures are replaced to obtain a mixed projection map, wherein the second target partition map belongs to a plurality of the second partition maps, the first target partition map and the second target partition map have the same position; The graph and the above mixed projection graph are used as training samples to train the initial network to obtain a point cloud semantic segmentation network.

根据本公开的实施例，上述将上述第一环视投影图和上述混合投影图作为训练样本来对初始网络进行训练，得到点云语义分割网络，包括：分别将上述第一环视投影图和上述混合投影图输入上述初始网络中，得到与上述第一环视投影图对应的第一特征图谱和第一分割结果，以及与上述混合投影图对应的第二特征图谱和第二分割结果；计算上述第一特征图谱和上述第二特征图谱之间的信息熵损失，得到第一损失值；计算上述第一分割结果和上述第二分割结果之间的交叉熵损失，得到第二损失值；以及利用上述第一损失值和第二损失值来调整上述初始网络的模型参数，以最终得到上述点云数据语义分割网络。According to an embodiment of the present disclosure, the above-mentioned first surround-view projection image and the above-mentioned mixed projection image are used as training samples to train the initial network to obtain a point cloud semantic segmentation network, including: separately using the above-mentioned first surround-view projection image and the above-mentioned mixed The projection map is input into the above-mentioned initial network, and the first feature map and the first segmentation result corresponding to the above-mentioned first look-around projection map are obtained, and the second feature map and the second segmentation result corresponding to the above-mentioned mixed projection map are obtained; the above-mentioned first feature map and the second segmentation result are obtained; The information entropy loss between the feature map and the above-mentioned second feature map to obtain the first loss value; calculate the cross-entropy loss between the above-mentioned first segmentation result and the above-mentioned second segmentation result to obtain the second loss value; and use the above-mentioned first The first loss value and the second loss value are used to adjust the model parameters of the above initial network, so as to finally obtain the above point cloud data semantic segmentation network.

根据本公开的实施例，上述计算上述第一特征图谱和上述第二特征图谱之间的信息熵损失，得到第一损失值，包括：从上述第一特征图谱中确定与多个上述第一目标分区图相关的第一子特征图谱；将上述第二特征图谱拆分为与多个上述第一目标分区图相关的第二子特征图谱和与多个上述第一目标分区图无关的第三子特征图谱；以及在上述第一子特征图谱的置信概率大于预设阈值的情况下，以上述第一子特征图谱和上述第二子特征图谱作为正样本对，以上述第一子特征图谱和上述第三子特征图谱作为负样本对，计算上述正样本对和上述负样本对之间的信息熵损失，得到上述第一损失值。According to an embodiment of the present disclosure, the calculation of the information entropy loss between the above-mentioned first feature map and the above-mentioned second feature map to obtain the first loss value includes: determining from the above-mentioned first feature map that a plurality of the above-mentioned first targets A first sub-feature map related to the partition map; the above-mentioned second feature map is split into a second sub-feature map related to a plurality of the above-mentioned first target partition maps and a third sub-pattern irrelevant to a plurality of the above-mentioned first target partition maps feature map; and when the confidence probability of the above-mentioned first sub-feature map is greater than a preset threshold, using the above-mentioned first sub-feature map and the above-mentioned second sub-feature map as a positive sample pair, taking the above-mentioned first sub-feature map and the above-mentioned The third sub-feature map is used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.

根据本公开的实施例，上述计算上述第一分割结果和上述第二分割结果之间的交叉熵损失，得到第二损失值，包括：从上述第一分割结果中确定与多个上述第一目标分区图相关的第一子分割结果；从上述第二分割结果中确定与多个上述第一目标分区图相关的第二子分割结果；基于上述第一子分割结果的置信概率和上述第二子分割结果的置信概率，确定预测值和标签值；以及计算上述预测值和上述标签值之间的交叉熵损失，得到上述第二损失值。According to an embodiment of the present disclosure, the calculation of the cross-entropy loss between the above-mentioned first segmentation result and the above-mentioned second segmentation result to obtain the second loss value includes: determining from the above-mentioned first segmentation result that is compatible with a plurality of the above-mentioned first targets A first sub-segmentation result related to the partition map; determining a second sub-segmentation result related to a plurality of the first target partition maps from the second segmentation result; based on the confidence probability of the first sub-segmentation result and the second sub-segmentation result The confidence probability of the segmentation result, determining the predicted value and the label value; and calculating the cross-entropy loss between the above-mentioned predicted value and the above-mentioned label value, to obtain the above-mentioned second loss value.

根据本公开的实施例，上述基于上述第一子分割结果的置信概率和上述第二子分割结果的置信概率，确定预测值和标签值，包括：在上述第一子分割结果的置信概率大于上述第二子分割结果的置信概率的情况下，确定上述第一子分割结果为上述标签值，上述第二子分割结果为上述预测值；以及在上述第一子分割结果的置信概率小于上述第二子分割结果的置信概率的情况下，确定上述第一子分割结果为上述预测值，上述第二子分割结果为上述标签值。According to an embodiment of the present disclosure, determining the predicted value and the label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result includes: when the confidence probability of the first sub-segmentation result is greater than the above-mentioned In the case of the confidence probability of the second sub-segmentation result, it is determined that the above-mentioned first sub-segmentation result is the above-mentioned label value, and the above-mentioned second sub-segmentation result is the above-mentioned predicted value; In the case of the confidence probability of the sub-segmentation result, it is determined that the first sub-segmentation result is the prediction value, and the second sub-segmentation result is the label value.

根据本公开的实施例，多个上述第一目标分区图中包括第三目标分区图，上述第三目标分区图具有真实标签；其中，上述计算上述第一分割结果和上述第二分割结果之间的交叉熵损失，得到第二损失值，包括：从上述第一分割结果中确定与上述第三目标分区图相关的第三子分割结果，和与上述第三目标分区图无关且与多个上述第一目标分区图相关的第四子分割结果；从上述第二分割结果中确定与上述第三目标分区图相关的第五子分割结果，和与上述第三目标分区图无关且与多个上述第一目标分区图相关的第六子分割结果；计算上述第三子分割结果和上述真实标签之间的交叉熵损失，得到第三损失值；计算上述第四子分割结果和上述第六子分割结果之间的交叉熵损失，得到第四损失值；以及基于上述第三损失值和上述第四损失值，确定上述第二损失值。According to an embodiment of the present disclosure, a plurality of the above-mentioned first target partition maps include a third target partition map, and the above-mentioned third target partition maps have real labels; The cross-entropy loss to obtain the second loss value includes: determining the third sub-segmentation result related to the third target partition map from the above-mentioned first segmentation result, and having nothing to do with the above-mentioned third target partition map and related to multiple above-mentioned sub-segmentation results A fourth sub-segmentation result associated with the first target zoning map; a fifth sub-segmentation result related to the third target zoning map is determined from the second segmentation result, and is independent of the third target zoning map and related to a plurality of the above-mentioned The sixth sub-segmentation result related to the first target partition map; calculate the cross-entropy loss between the above-mentioned third sub-segmentation result and the above-mentioned real label, and obtain the third loss value; calculate the above-mentioned fourth sub-segmentation result and the above-mentioned sixth sub-segmentation result A cross-entropy loss between results to obtain a fourth loss value; and based on the third loss value and the fourth loss value, determine the second loss value.

根据本公开的实施例，上述初始网络包括编码器和解码器；其中，上述分别将上述第一环视投影图和上述混合投影图输入上述初始网络中，得到与上述第一环视投影图对应的第一特征图谱和第一分割结果，以及与上述混合投影图对应的第二特征图谱和第二分割结果，包括：分别将上述第一环视投影图和上述混合投影图输入上述编码器，得到与上述第一环视投影图对应的第一图像特征和与上述混合投影图对应的第二图像特征；以及分别将上述第一图像特征和上述第二图像特征输入上述解码器，得到与上述第一环视投影图对应的上述第一特征图谱和上述第一分割结果，以及与上述混合投影图对应的上述第二特征图谱和上述第二分割结果。According to an embodiment of the present disclosure, the above-mentioned initial network includes an encoder and a decoder; wherein, the above-mentioned first surround-view projection map and the above-mentioned mixed projection map are respectively input into the above-mentioned initial network to obtain the first surround-view projection map corresponding to the above-mentioned A feature map and the first segmentation result, and a second feature map and the second segmentation result corresponding to the above-mentioned mixed projection map, including: respectively input the above-mentioned first surround-view projection map and the above-mentioned mixed projection map into the above-mentioned encoder to obtain the above-mentioned The first image feature corresponding to the first surround-view projection map and the second image feature corresponding to the above-mentioned hybrid projection map; and inputting the above-mentioned first image feature and the above-mentioned second image feature into the decoder respectively to obtain the above-mentioned first surround-view projection The above-mentioned first feature map and the above-mentioned first segmentation result corresponding to the graph, and the above-mentioned second feature map and the above-mentioned second segmentation result corresponding to the above-mentioned mixed projection map.

根据本公开的实施例，上述将多组点云数据分别映射到初始视图中，得到多个环视投影图，包括：对于每组上述点云数据，分别对上述点云数据中每个点的三维坐标数据进行极坐标转换，以得到上述点云数据中每个点的极坐标数据；基于上述点云数据中每个点的极坐标数据，将上述点云数据中的多个点分别映射到上述初始视图的多个栅格中；对于上述初始视图的每个栅格，基于上述栅格中的点的三维坐标数据和极坐标数据，确定上述栅格的特征数据；以及基于多个上述栅格的特征数据，构建得到上述环视投影图。According to an embodiment of the present disclosure, the above-mentioned multiple sets of point cloud data are respectively mapped to the initial view to obtain multiple look-around projection maps, including: for each set of the above-mentioned point cloud data, respectively map the three-dimensional data of each point in the above-mentioned point cloud data The coordinate data is converted into polar coordinates to obtain the polar coordinate data of each point in the above point cloud data; based on the polar coordinate data of each point in the above point cloud data, multiple points in the above point cloud data are respectively mapped to the above Among the multiple grids of the initial view; for each grid of the above-mentioned initial view, based on the three-dimensional coordinate data and the polar coordinate data of the points in the above-mentioned grid, determine the characteristic data of the above-mentioned grid; and based on the plurality of above-mentioned grids The characteristic data of , construct and obtain the above-mentioned surround-view projection map.

本公开的另一个方面提供了一种点云语义分割方法，包括：将目标点云数据映射到初始视图中，得到环视投影图；以及将上述环视投影图输入点云语义分割网络中，得到上述目标点云数据的语义分割特征图谱；其中，上述点云语义分割网络包括利用如上所述的点云语义分割网络训练方法训练得到。Another aspect of the present disclosure provides a point cloud semantic segmentation method, including: mapping the target point cloud data to the initial view to obtain a surround view projection; and inputting the above surround view projection into the point cloud semantic segmentation network to obtain the above The semantic segmentation feature map of the target point cloud data; wherein, the above-mentioned point cloud semantic segmentation network is obtained by using the above-mentioned point cloud semantic segmentation network training method.

本公开的另一个方面提供了一种点云语义分割网络训练装置，包括：第一映射模块，用于将多组点云数据分别映射到初始视图中，得到多个环视投影图；第一处理模块，用于基于预设尺寸，分别对第一环视投影图和第二环视投影图进行分区处理，得到多个第一分区图和多个第二分区图，其中，上述第一环视投影图和上述第二环视投影图属于多个上述环视投影图；确定模块，用于从多个上述第一分区图中确定多个第一目标分区图；第二处理模块，用于利用多个上述第一目标分区图中的每个上述第一目标分区图分别对上述第二环视投影图中的第二目标分区图进行替换，得到混合投影图，其中，上述第二目标分区图属于多个上述第二分区图，上述第一目标分区图与上述第二目标分区图的位置相同；以及训练模块，用于将上述第一环视投影图和上述混合投影图作为训练样本来对初始网络进行训练，得到点云语义分割网络。Another aspect of the present disclosure provides a point cloud semantic segmentation network training device, including: a first mapping module, which is used to map multiple sets of point cloud data into the initial view respectively to obtain multiple surround-view projection images; the first processing A module, configured to partition the first surround-view projection and the second surround-view projection based on a preset size to obtain a plurality of first partitions and a plurality of second partitions, wherein the above-mentioned first surround projection and The above-mentioned second surround-view projection diagram belongs to multiple above-mentioned surround-view projection diagrams; the determination module is used to determine multiple first target partition diagrams from the multiple above-mentioned first partition diagrams; the second processing module is used to use multiple above-mentioned first partition diagrams. Each of the above-mentioned first target partition maps in the target partition map respectively replaces the second target partition map in the above-mentioned second look-around projection map to obtain a mixed projection map, wherein the above-mentioned second target partition map belongs to a plurality of the above-mentioned second Partition map, the position of the above-mentioned first target partition map is the same as that of the above-mentioned second target partition map; and a training module, which is used to use the above-mentioned first surround-view projection map and the above-mentioned mixed projection map as training samples to train the initial network to obtain points Cloud Semantic Segmentation Network.

本公开的另一个方面提供了一种点云语义分割装置，包括：第二映射模块，用于将目标点云数据映射到初始视图中，得到环视投影图；以及第三处理模块，用于将上述环视投影图输入点云语义分割网络中，得到上述目标点云数据的语义分割特征图谱；其中，上述点云语义分割网络包括利用如上所述的点云语义分割网络训练方法训练得到。Another aspect of the present disclosure provides a point cloud semantic segmentation device, including: a second mapping module, used to map the target point cloud data into the initial view, to obtain a surround view projection; and a third processing module, used to convert the The above-mentioned look-around projection image is input into the point cloud semantic segmentation network to obtain the semantic segmentation feature map of the above-mentioned target point cloud data; wherein, the above-mentioned point cloud semantic segmentation network is obtained by using the above-mentioned point cloud semantic segmentation network training method.

本公开的另一方面提供了一种电子设备，包括：一个或多个处理器；存储器，用于存储一个或多个指令，其中，当上述一个或多个指令被上述一个或多个处理器执行时，使得上述一个或多个处理器实现如上所述的方法。Another aspect of the present disclosure provides an electronic device, including: one or more processors; a memory for storing one or more instructions, wherein, when the one or more instructions are executed by the one or more processors When executed, the above-mentioned one or more processors are made to implement the above-mentioned method.

本公开的另一方面提供了一种计算机可读存储介质，存储有计算机可执行指令，上述指令在被执行时用于实现如上所述的方法。Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions, which are used to implement the above method when executed.

本公开的另一方面提供了一种计算机程序产品，上述计算机程序产品包括计算机可执行指令，上述指令在被执行时用于实现如上所述的方法。Another aspect of the present disclosure provides a computer program product comprising computer-executable instructions for implementing the method as described above when executed.

根据本公开的实施例，在训练点云语义分割网络时，可以将点云数据映射为环视投影图，并对第一环视投影图和第二环视投影图进行分区混合，即使用第一环视投影图中的部分分区对第二环视投影图中对应的分区进行替换，得到混合投影图，之后，可以利用混合投影图和第一环视投影图来训练初始网络，以最终得到点云语义分割网络。通过分区混合的方式，可以实现该部分分区与背景的强制解耦，能够有效提升数据的丰富度，降低网络在预测局部区域时对背景、全局信息的依赖，提高网络的识别能力。同时，通过分区混合的方式，还可以有效地保留原始点云投影在环视投影图上的三维形状，可以至少部分地克服数据增强导致的三维形变和形状信息丢失的问题，可以提升网络的鲁棒性。通过上述技术手段，可以有效提升网络训练过程中对硬件资源的利用效率。According to an embodiment of the present disclosure, when training the point cloud semantic segmentation network, the point cloud data can be mapped to a surround-view projection map, and the first surround-view projection map and the second surround-view projection map can be partitioned and mixed, that is, the first surround-view projection map can be used Part of the partitions in the figure replace the corresponding partitions in the second surround-view projection to obtain a mixed projection. Afterwards, the mixed projection and the first surround-view projection can be used to train the initial network to finally obtain a point cloud semantic segmentation network. Through the method of partition mixing, the forced decoupling of this part of the partition and the background can be realized, which can effectively improve the richness of data, reduce the network's dependence on background and global information when predicting local areas, and improve the recognition ability of the network. At the same time, the 3D shape of the original point cloud projected on the surround view projection map can be effectively preserved through the method of partition mixing, which can at least partially overcome the problems of 3D deformation and shape information loss caused by data enhancement, and can improve the robustness of the network. sex. Through the above technical means, the utilization efficiency of hardware resources during the network training process can be effectively improved.

附图说明Description of drawings

通过以下参照附图对本公开实施例的描述，本公开的上述以及其他目的、特征和优点将更为清楚，在附图中：The above and other objects, features and advantages of the present disclosure will be more clearly described through the following description of the embodiments of the present disclosure with reference to the accompanying drawings, in which:

图1示意性示出了根据本公开实施例的可以应用点云语义分割网络训练方法、点云语义分割方法及装置的示例性系统架构。Fig. 1 schematically shows an exemplary system architecture to which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and an apparatus can be applied according to an embodiment of the present disclosure.

图2示意性示出了根据本公开实施例的点云语义分割网络训练方法的流程图。Fig. 2 schematically shows a flow chart of a point cloud semantic segmentation network training method according to an embodiment of the present disclosure.

图3示意性示出了根据本公开实施例的点云语义分割网络的训练流程的示意图。Fig. 3 schematically shows a schematic diagram of a training process of a point cloud semantic segmentation network according to an embodiment of the present disclosure.

图4示意性示出了根据本公开实施例的点云语义分割方法的流程图。Fig. 4 schematically shows a flowchart of a point cloud semantic segmentation method according to an embodiment of the present disclosure.

图5示意性示出了根据本公开实施例的点云语义分割网络训练装置的框图。Fig. 5 schematically shows a block diagram of a point cloud semantic segmentation network training device according to an embodiment of the present disclosure.

图6示意性示出了根据本公开实施例的点云语义分割装置的框图。Fig. 6 schematically shows a block diagram of an apparatus for point cloud semantic segmentation according to an embodiment of the present disclosure.

图7示意性示出了根据本公开实施例的适于实现点云语义分割网络训练方法或点云语义分割方法的电子设备的框图。Fig. 7 schematically shows a block diagram of an electronic device suitable for implementing a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下，将参照附图来描述本公开的实施例。但是应该理解，这些描述只是示例性的，而并非要限制本公开的范围。在下面的详细描述中，为便于解释，阐述了许多具体的细节以提供对本公开实施例的全面理解。然而，明显地，一个或多个实施例在没有这些具体细节的情况下也可以被实施。此外，在以下说明中，省略了对公知结构和技术的描述，以避免不必要地混淆本公开的概念。Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. It should be understood, however, that these descriptions are exemplary only, and are not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concept of the present disclosure.

在此使用的术语仅仅是为了描述具体实施例，而并非意在限制本公开。在此使用的术语“包括”、“包含”等表明了所述特征、步骤、操作和/或部件的存在，但是并不排除存在或添加一个或多个其他特征、步骤、操作或部件。The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the present disclosure. The terms "comprising", "comprising", etc. used herein indicate the presence of stated features, steps, operations and/or components, but do not exclude the presence or addition of one or more other features, steps, operations or components.

在此使用的所有术语(包括技术和科学术语)具有本领域技术人员通常所理解的含义，除非另外定义。应注意，这里使用的术语应解释为具有与本说明书的上下文相一致的含义，而不应以理想化或过于刻板的方式来解释。All terms (including technical and scientific terms) used herein have the meaning commonly understood by one of ordinary skill in the art, unless otherwise defined. It should be noted that the terms used herein should be interpreted to have a meaning consistent with the context of this specification, and not be interpreted in an idealized or overly rigid manner.

在使用类似于“A、B和C等中至少一个”这样的表述的情况下，一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如，“具有A、B和C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。在使用类似于“A、B或C等中至少一个”这样的表述的情况下，一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如，“具有A、B或C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。Where expressions such as "at least one of A, B, and C, etc." are used, they should generally be interpreted as those skilled in the art would normally understand the expression (for example, "having A, B, and C A system of at least one of "shall include, but not be limited to, systems with A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, C, etc. ). Where expressions such as "at least one of A, B, or C, etc." are used, they should generally be interpreted as those skilled in the art would normally understand the expression (for example, "having A, B, or C A system of at least one of "shall include, but not be limited to, systems with A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, C, etc. ).

在自动驾驶技术领域中，利用深度学习技术对周围环境进行感知、识别是一项极为重要的基础研究。然而，深度学习技术所构建的深度神经网络往往需要大量的人工标注数据进行训练，这部分人工标注数据的成本和时耗往往是阻碍深度神经网络模型性能提升的壁垒。另一方面，无人驾驶车辆在行驶过程中，通过各式传感器能够收集到大量额原始无标注数据。因此，如何利用好这些原始无标注数据，加之少量有标注数据的辅助，对神经网络进行训练，即采用半监督训练的方式提升神经网络的识别、分类性能，是研发自动驾驶系统的过程中，能够起到增效、降本作用的一个重要研究任务。In the field of autonomous driving technology, it is an extremely important basic research to use deep learning technology to perceive and recognize the surrounding environment. However, the deep neural network built by deep learning technology often requires a large amount of manually labeled data for training. The cost and time consumption of this part of manually labeled data are often barriers to the improvement of the performance of the deep neural network model. On the other hand, unmanned vehicles can collect a large amount of original unlabeled data through various sensors during driving. Therefore, how to make good use of these original unlabeled data, coupled with a small amount of labeled data, to train the neural network, that is, to use semi-supervised training to improve the recognition and classification performance of the neural network, is the process of developing an automatic driving system. An important research task that can increase efficiency and reduce costs.

在相关技术中，利用半监督训练算法提升语义分割任务的研究主要集中在二维图像领域。针对三维点云场景，尤其是基于激光雷达扫描结果的三维点云语义分割模型的半监督训练算法研究仍然处于一个空白阶段。由于二维图像和三维点云之间存在模态差异，二维图像上的点云语义分割半监督训练算法无法直接、有效地移植到三维点云语义分割任务当中。例如，在通过环视投影图对三维点云进行语义分割时，利用常规的二维图像数据增强方法，如增加噪声、旋转、缩放等会导致三维点云的三维形状失真，进而影响模型的训练效果。In related technologies, the research on using semi-supervised training algorithms to improve semantic segmentation tasks mainly focuses on the field of two-dimensional images. For 3D point cloud scenes, especially the semi-supervised training algorithm research of 3D point cloud semantic segmentation model based on lidar scanning results is still in a blank stage. Due to the modal difference between 2D images and 3D point clouds, semi-supervised training algorithms for point cloud semantic segmentation on 2D images cannot be directly and effectively transplanted to 3D point cloud semantic segmentation tasks. For example, when performing semantic segmentation on a 3D point cloud through a look-around projection map, using conventional 2D image data enhancement methods, such as adding noise, rotation, scaling, etc., will cause the 3D shape of the 3D point cloud to be distorted, thereby affecting the training effect of the model .

有鉴于此，本公开的实施例提供了一种能够有效利用大量激光雷达原始点云数据，同时辅以少量有标注数据，能够对点云语义分割网络进行半监督训练的方法，在该方法中，提出了一种分区混合的数据增强策略，通过混合两个不同的环视投影图来提升模型的识别难度，减少了数据增强过程对三维点云的形状信息的丢失，进而提高模型的训练效果，及模型的鲁棒性和可靠性。In view of this, the embodiments of the present disclosure provide a method that can effectively utilize a large amount of raw lidar point cloud data, supplemented by a small amount of labeled data, and can perform semi-supervised training on the point cloud semantic segmentation network. In this method , a data enhancement strategy of partition mixing is proposed, which improves the recognition difficulty of the model by mixing two different look-around projection images, reduces the loss of shape information of the 3D point cloud during the data enhancement process, and improves the training effect of the model. and robustness and reliability of the model.

具体地，本公开的实施例提供了一种点云语义分割网络训练方法、点云语义分割方法、装置、电子设备和存储介质。该点云语义分割网络训练方法包括：将多组点云数据分别映射到初始视图中，得到多个环视投影图；基于预设尺寸，分别对第一环视投影图和第二环视投影图进行分区处理，得到多个第一分区图和多个第二分区图，其中，第一环视投影图和第二环视投影图属于多个环视投影图；从多个第一分区图中确定多个第一目标分区图；利用多个第一目标分区图中的每个第一目标分区图分别对第二环视投影图中的第二目标分区图进行替换，得到混合投影图，其中，第二目标分区图属于多个第二分区图，第一目标分区图与第二目标分区图的位置相同；以及将第一环视投影图和混合投影图作为训练样本来对初始网络进行训练，得到点云语义分割网络。Specifically, the embodiments of the present disclosure provide a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, a device, an electronic device, and a storage medium. The point cloud semantic segmentation network training method includes: mapping multiple sets of point cloud data to the initial view respectively to obtain multiple surround-view projection maps; based on preset sizes, respectively partitioning the first surround-view projection map and the second surround-view projection map processing to obtain a plurality of first partition diagrams and a plurality of second partition diagrams, wherein the first surround-view projection diagram and the second surround-view projection diagram belong to a plurality of surround-view projection diagrams; determine a plurality of first partition diagrams from the plurality of first partition diagrams Target zoning diagram: each of the first target zoning diagrams in multiple first target zoning diagrams is used to replace the second target zoning diagram in the second surround view projection to obtain a mixed projection diagram, wherein the second target zoning diagram Belonging to multiple second partition maps, the position of the first target partition map and the second target partition map are the same; and using the first surround-view projection map and the mixed projection map as training samples to train the initial network to obtain a point cloud semantic segmentation network .

图1示意性示出了根据本公开实施例的可以应用点云语义分割网络训练方法、点云语义分割方法及装置的示例性系统架构。需要注意的是，图1所示仅为可以应用本公开实施例的系统架构的示例，以帮助本领域技术人员理解本公开的技术内容，但并不意味着本公开实施例不可以用于其他设备、系统、环境或场景。Fig. 1 schematically shows an exemplary system architecture to which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and an apparatus can be applied according to an embodiment of the present disclosure. It should be noted that, what is shown in FIG. 1 is only an example of the system architecture to which the embodiments of the present disclosure can be applied, so as to help those skilled in the art understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used in other device, system, environment or scenario.

如图1所示，根据该实施例的系统架构100可以包括终端设备101、102、103，网络104和服务器105。As shown in FIG. 1 , a system architecture 100 according to this embodiment may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 .

终端设备101、102、103可以是配置有激光雷达的各类设备，或者，也可以是能够控制激光雷达的各类电子设备，或者，还可以是能够存储点云数据的各类电子设备。The terminal devices 101 , 102 , and 103 may be various types of devices equipped with laser radars, or various electronic devices capable of controlling laser radars, or various electronic devices capable of storing point cloud data.

网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型，例如有线和/或无线通信链路等等。The network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wired and/or wireless communication links, among others.

服务器105可以是提供各种服务的服务器，例如，该服务器可以为点云语义分割网络的训练过程提供计算资源和存储资源的支持。The server 105 may be a server that provides various services, for example, the server may provide support of computing resources and storage resources for the training process of the point cloud semantic segmentation network.

需要说明的是，本公开实施例所提供的点云语义分割网络训练方法或点云语义分割方法一般可以由服务器105执行。相应地，本公开实施例所提供的点云语义分割网络训练装置或点云语义分割装置一般可以设置于服务器105中。终端设备101、102、103可以采集得到点云数据，或者，终端设备101、102、103可以通过互联网等途径来获取其他终端设备采集的点云数据，该点云数据可以通过网络发送给服务器105，以便服务器105执行本公开实施例所提供的方法，以实现点云语义分割网络的训练或对该点云数据进行点云语义分割。本公开实施例所提供的点云语义分割网络训练方法或点云语义分割方法也可以由不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群执行。相应地，本公开实施例所提供的点云语义分割网络训练装置或点云语义分割装置也可以设置于不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群中。或者，本公开实施例所提供的点云语义分割网络训练方法或点云语义分割方法也可以由终端设备101、102、或103执行，或者也可以由不同于终端设备101、102、或103的其他终端设备执行。相应地，本公开实施例所提供的点云语义分割网络训练装置或点云语义分割装置也可以设置于终端设备101、102、或103中，或设置于不同于终端设备101、102、或103的其他终端设备中。It should be noted that the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure can generally be executed by the server 105 . Correspondingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure can generally be set in the server 105 . The terminal devices 101, 102, and 103 can collect point cloud data, or the terminal devices 101, 102, and 103 can obtain point cloud data collected by other terminal devices through the Internet, and the point cloud data can be sent to the server 105 through the network , so that the server 105 executes the method provided by the embodiment of the present disclosure, so as to realize the training of the point cloud semantic segmentation network or perform point cloud semantic segmentation on the point cloud data. The point cloud semantic segmentation network training method or point cloud semantic segmentation method provided by the embodiments of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101 , 102 , 103 and/or the server 105 . Correspondingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure may also be set on a server or in the server cluster. Alternatively, the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure may also be executed by the terminal device 101, 102, or 103, or may also be executed by a device different from the terminal device 101, 102, or 103. Other end devices execute. Correspondingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure can also be set in the terminal device 101, 102, or 103, or be set in a device different from the terminal device 101, 102, or 103 other terminal equipment.

应该理解，图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

如图2所示，该方法包括操作S201～S205。As shown in FIG. 2, the method includes operations S201-S205.

在操作S201，将多组点云数据分别映射到初始视图中，得到多个环视投影图。In operation S201, multiple sets of point cloud data are respectively mapped to initial views to obtain multiple surround view projection maps.

在操作S202，基于预设尺寸，分别对第一环视投影图和第二环视投影图进行分区处理，得到多个第一分区图和多个第二分区图，其中，第一环视投影图和第二环视投影图属于多个环视投影图。In operation S202, based on the preset size, the first surround-view projection map and the second surround-view projection map are respectively partitioned to obtain multiple first partition maps and multiple second partition maps, wherein the first surround-view projection map and the second surround-view projection map are The second surround view projection diagram belongs to multiple surround view projection diagrams.

在操作S203，从多个第一分区图中确定多个第一目标分区图。In operation S203, a plurality of first target partition maps are determined from the plurality of first partition maps.

在操作S204，利用多个第一目标分区图中的每个第一目标分区图分别对第二环视投影图中的第二目标分区图进行替换，得到混合投影图，其中，第二目标分区图属于多个第二分区图，第一目标分区图与第二目标分区图的位置相同。In operation S204, each of the first target partition maps in the plurality of first target partition maps is used to replace the second target partition map in the second surround view projection map to obtain a mixed projection map, wherein the second target partition map Belonging to multiple second partition maps, the first target partition map and the second target partition map have the same position.

在操作S205，将第一环视投影图和混合投影图作为训练样本来对初始网络进行训练，得到点云语义分割网络。In operation S205, the initial network is trained using the first surround-view projection image and the mixed projection image as training samples to obtain a point cloud semantic segmentation network.

根据本公开的实施例，点云数据可以利用旋转式扫描的激光雷达等传感设备采集得到，每组点云数据可以配置有预设的直角坐标系，点云数据中的每个点可以表示为该直角坐标系下的一个三维坐标，该直角坐标系的中心可以表示采集该点云数据时，该传感设备所处的位置。According to the embodiments of the present disclosure, point cloud data can be collected by sensing devices such as rotating scanning lidar, each set of point cloud data can be configured with a preset Cartesian coordinate system, and each point in the point cloud data can represent is a three-dimensional coordinate in the Cartesian coordinate system, and the center of the Cartesian coordinate system can represent the position of the sensing device when the point cloud data is collected.

根据本公开的实施例，利用旋转式扫描的激光雷达采集得到的点云数据可以分布在一个球体内，该初始视图可以由该球体在水平面附近的环状表面展开得到。对于点云数据中的每个点，基于该点的坐标可以确定该点进行映射时的方向向量，再利用该方向向量可以将该点投射到该初始视图上。According to an embodiment of the present disclosure, the point cloud data collected by the rotating scanning lidar can be distributed in a sphere, and the initial view can be obtained by unfolding the annular surface of the sphere near the horizontal plane. For each point in the point cloud data, based on the coordinates of the point, the direction vector when the point is mapped can be determined, and then the point can be projected onto the initial view by using the direction vector.

根据本公开的实施例，基于预设尺寸来环视投影图进行分区可以将该环视投影图等分为多个矩形区域。该预设尺寸的大小可以根据具体应用场景中该环视投影图的尺寸来确定，在此不作限定。例如，环视投影图的分辨率可以为24×480，在对环视投影图进行分区时，可以分别沿长宽方向等分成16份和6份，从而将该环视投影图分为总计96片的分辨率为4×30的分区图。According to an embodiment of the present disclosure, partitioning the look-around projection image based on a preset size may equally divide the look-around projection image into multiple rectangular areas. The size of the preset size may be determined according to the size of the surround-view projection image in a specific application scene, and is not limited here. For example, the resolution of the surround-view projection image can be 24×480. When partitioning the surround-view projection image, it can be divided into 16 parts and 6 parts along the length and width directions respectively, so that the surround-view projection image can be divided into a total of 96 pieces of resolution. A zoning map with a rate of 4×30.

根据本公开的实施例，第一环视投影图和第二环视投影图可以从多个环视投影图中随机选择得到。第一环视投影图和第二环视投影图可以具有完全不同的特征，即与该第一环视投影图对应的点云数据可以和与该第二环视投影图对应的点云数据可以是对不同场景下的不同物品采集得到的。According to an embodiment of the present disclosure, the first surround-view projection diagram and the second surround-view projection diagram may be randomly selected from multiple surround-view projection diagrams. The first surround-view projection map and the second surround-view projection map can have completely different characteristics, that is, the point cloud data corresponding to the first surround-view projection map and the point cloud data corresponding to the second surround-view projection map can be for different scenes Collected from different items below.

根据本公开的实施例，第一目标分区图可以从多个第一分区图中随机采集得到，第一目标分区图可以在第一分区图中占有一定的比例，该比例例如可以为25％、30％等，在此不作限定。According to an embodiment of the present disclosure, the first target zoning map may be randomly collected from multiple first zoning maps, and the first target zoning map may occupy a certain proportion in the first zoning map, for example, the proportion may be 25%, 30%, etc., are not limited here.

根据本公开的实施例，利用多个第一目标分区图中的每个第一目标分区图分别对第二环视投影图中的第二目标分区图进行替换的操作可以包括：根据第一目标分区图的位置信息，从第二环视投影图中确定第二目标分区图，将第二目标分区图删除后，再将第一目标分区图填充到对应位置中。According to an embodiment of the present disclosure, the operation of respectively replacing the second target partition map in the second surround-view projection map with each first target partition map in the plurality of first target partition maps may include: according to the first target partition map For the position information of the map, the second target partition map is determined from the second surround view projection map, and after the second target partition map is deleted, the first target partition map is filled into the corresponding position.

根据本公开的实施例，训练初始网络时所采用的方法在此不作限定，例如可以是梯度下降法、最小二乘法等。训练初始网络时设置的训练参数，如训练次数、批次容量、学习率等可以根据具体应用场景进行设置，在此不作限定。According to the embodiments of the present disclosure, the method used for training the initial network is not limited here, for example, it may be a gradient descent method, a least square method, and the like. The training parameters set when training the initial network, such as training times, batch capacity, learning rate, etc., can be set according to specific application scenarios, and are not limited here.

下面参考图3，结合具体实施例对图2所示的方法做进一步说明。Referring to FIG. 3 , the method shown in FIG. 2 will be further described in conjunction with specific embodiments.

根据本公开的实施例，环视投影图可以利用操作S201的方法来得到，具体地，操作S201可以包括如下操作：According to an embodiment of the present disclosure, the surround view projection map may be obtained by using the method of operation S201. Specifically, operation S201 may include the following operations:

对于每组点云数据，分别对点云数据中每个点的三维坐标数据进行极坐标转换，以得到点云数据中每个点的极坐标数据；基于点云数据中每个点的极坐标数据，将点云数据中的多个点分别映射到初始视图的多个栅格中；对于初始视图的每个栅格，基于栅格中的点的三维坐标数据和极坐标数据，确定栅格的特征数据；以及，基于多个栅格的特征数据，构建得到环视投影图。For each set of point cloud data, the polar coordinate conversion is performed on the three-dimensional coordinate data of each point in the point cloud data to obtain the polar coordinate data of each point in the point cloud data; based on the polar coordinates of each point in the point cloud data Data, map multiple points in the point cloud data to multiple grids in the initial view; for each grid in the initial view, determine the grid based on the 3D coordinate data and polar coordinate data of the points in the grid feature data; and, based on the feature data of multiple grids, a surround view projection map is constructed.

根据本公开的实施例，点云数据中的每个点可以具有三维坐标数据，即x、y和z，对该点进行极坐标转换，可以得到在旋转坐标系下的转换坐标yaw和pitch，即极坐标数据。According to an embodiment of the present disclosure, each point in the point cloud data can have three-dimensional coordinate data, namely x, y and z, and the polar coordinate transformation is performed on the point, and the transformed coordinates yaw and pitch in the rotating coordinate system can be obtained, That is, polar coordinate data.

根据本公开的实施例，初始视图的栅格可以指与该初始视图中的单个像素点对应的像素色块。例如，初始视图的分辨率可以为20×480，则该初始视图可以具有9600个像素色块，相应地，该初始视图可以有9600个栅格。According to an embodiment of the present disclosure, the grid of the initial view may refer to a pixel color block corresponding to a single pixel point in the initial view. For example, the resolution of the initial view may be 20×480, then the initial view may have 9600 pixel color blocks, and correspondingly, the initial view may have 9600 grids.

根据本公开的实施例，在栅格中映射有多个点的情况下，可以取多个点中离原点最近的点的特征数据作为该栅格的特征数据。该点的特征数据可以包括三维坐标数据、极坐标数据以及基于该三维坐标数据和极坐标数据处理得到的数据，如反射率数据、深度数据等。According to an embodiment of the present disclosure, when a plurality of points are mapped in a grid, feature data of a point closest to the origin among the plurality of points may be taken as feature data of the grid. The feature data of the point may include three-dimensional coordinate data, polar coordinate data, and data processed based on the three-dimensional coordinate data and polar coordinate data, such as reflectance data, depth data, and the like.

如图3所示，点云语义分割网络的训练流程可以包括样本预处理过程和网络迭代训练过程。As shown in Figure 3, the training process of the point cloud semantic segmentation network can include a sample preprocessing process and a network iterative training process.

根据本公开的实施例，在样本预处理过程中，可以将第一环视投影图中的一部分分区替换到第二环视投影图中，以得到混合投影图。具体方法可以参见操作S202～S204的方法，在此不再赘述。According to an embodiment of the present disclosure, during the sample preprocessing process, a part of partitions in the first surround-view projection map may be replaced with the second surround-view projection map to obtain a mixed projection map. For specific methods, reference may be made to the methods of operations S202-S204, which will not be repeated here.

根据本公开的实施例，网络迭代训练过程可以是将第一环视投影图和混合投影图作为样本对，输入到初始网络中，并基于设置好的损失函数及梯度下降法、最小二乘法等模型迭代方法对初始网络的模型参数进行调整，以实现该初始网络的训练。According to an embodiment of the present disclosure, the network iterative training process may be to input the first surround-view projection image and the mixed projection image as a sample pair into the initial network, and based on the set loss function, gradient descent method, least squares method and other models The iterative method adjusts the model parameters of the initial network to realize the training of the initial network.

根据本公开的实施例，初始网络可以包括编码器和解码器。According to an embodiment of the present disclosure, an initial network may include an encoder and a decoder.

根据本公开的实施例，分别将第一环视投影图和混合投影图输入初始网络中，得到与第一环视投影图对应的第一特征图谱和第一分割结果，以及与混合投影图对应的第二特征图谱和第二分割结果可以包括如下操作：According to an embodiment of the present disclosure, the first surround-view projection image and the mixed projection image are respectively input into the initial network to obtain the first feature map and the first segmentation result corresponding to the first surround-view projection image, and the first feature map corresponding to the mixed projection image The two-feature map and the second segmentation result may include the following operations:

分别将第一环视投影图和混合投影图输入编码器，得到与第一环视投影图对应的第一图像特征和与混合投影图对应的第二图像特征；以及分别将第一图像特征和第二图像特征输入解码器，得到与第一环视投影图对应的第一特征图谱和第一分割结果，以及与混合投影图对应的第二特征图谱和第二分割结果。Respectively input the first surround-view projection map and the mixed projection map into the encoder to obtain the first image feature corresponding to the first surround-view projection map and the second image feature corresponding to the mixed projection map; and respectively input the first image feature and the second image feature The image features are input into the decoder to obtain a first feature map and a first segmentation result corresponding to the first surround-view projection map, and a second feature map and second segmentation result corresponding to the mixed projection map.

根据本公开的实施例，编码器可以是任意的特征提取网络，如ResNet18等。According to the embodiment of the present disclosure, the encoder can be any feature extraction network, such as ResNet18 and so on.

根据本公开的实施例，解码器可以是任意的特征上采样网络，如UperNet等。According to the embodiment of the present disclosure, the decoder can be any feature upsampling network, such as UpperNet and so on.

根据本公开的实施例，网络迭代训练过程具体可以包括如下操作：According to an embodiment of the present disclosure, the iterative network training process may specifically include the following operations:

分别将第一环视投影图和混合投影图输入初始网络中，得到与第一环视投影图对应的第一特征图谱和第一分割结果，以及与混合投影图对应的第二特征图谱和第二分割结果；计算第一特征图谱和第二特征图谱之间的信息熵损失，得到第一损失值；计算第一分割结果和第二分割结果之间的交叉熵损失，得到第二损失值；以及利用第一损失值和第二损失值来调整初始网络的模型参数，以最终得到点云数据语义分割网络。Input the first surround-view projection map and the mixed projection map into the initial network respectively, and obtain the first feature map and the first segmentation result corresponding to the first look-around projection map, and the second feature map and the second segmentation corresponding to the mixed projection map Result; calculate the information entropy loss between the first feature map and the second feature map to obtain the first loss value; calculate the cross entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value; and use The first loss value and the second loss value are used to adjust the model parameters of the initial network to finally obtain a point cloud data semantic segmentation network.

根据本公开的实施例，第一分割结果可以表示第一环视投影图中各区域的语义特征分割结果。According to an embodiment of the present disclosure, the first segmentation result may represent a semantic feature segmentation result of each region in the first surround-view projection image.

根据本公开的实施例，第一特征图谱可以具有与第一环视投影图相同的尺寸，第一特征图谱上具有不同语义特征的区域可以具有不同的颜色特征。例如，第一特征图谱上具有不同语义特征的区域可以分别指人、车及障碍物所处的区域，该三个区域可以分别用红色、蓝色、绿色来表示。According to an embodiment of the present disclosure, the first feature map may have the same size as the first surround-view projection map, and regions with different semantic features on the first feature map may have different color features. For example, areas with different semantic features on the first feature map may refer to areas where people, vehicles, and obstacles are located, and these three areas may be represented by red, blue, and green, respectively.

根据本公开的实施例，计算第一特征图谱和第二特征图谱之间的信息熵损失，得到第一损失值可以包括如下操作：According to an embodiment of the present disclosure, calculating the information entropy loss between the first feature map and the second feature map to obtain the first loss value may include the following operations:

从第一特征图谱中确定与多个第一目标分区图相关的第一子特征图谱；将第二特征图谱拆分为与多个第一目标分区图相关的第二子特征图谱和与多个第一目标分区图无关的第三子特征图谱；以及在第一子特征图谱的置信概率大于预设阈值的情况下，以第一子特征图谱和第二子特征图谱作为正样本对，以第一子特征图谱和第三子特征图谱作为负样本对，计算正样本对和负样本对之间的信息熵损失，得到第一损失值。Determining first sub-feature maps related to multiple first target partition maps from the first feature map; splitting the second feature map into second sub-feature maps related to multiple first target partition maps and related to multiple first target partition maps The third sub-feature map irrelevant to the first target partition map; and in the case where the confidence probability of the first sub-feature map is greater than a preset threshold, the first sub-feature map and the second sub-feature map are used as positive sample pairs, and the first sub-feature map is used as a positive sample pair. The first sub-feature map and the third sub-feature map are used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain a first loss value.

根据本公开的实施例，由于第一特征图谱可以具有与第一环视投影图相同的尺寸，因此，可以基于第一目标分区图在第一环视投影图中的位置信息，来从第一特征图谱中确定该第一子特征图谱。According to an embodiment of the present disclosure, since the first feature map may have the same size as the first surround-view projection map, the first feature map may be obtained from the first feature map based on the position information of the first target partition map in the first surround-view projection map. Determine the first sub-feature map in .

根据本公开的实施例，第一子特征图谱的置信概率的计算方法在此不作限定，例如可以利用高斯公式来确定该置信概率。According to an embodiment of the present disclosure, the method for calculating the confidence probability of the first sub-feature map is not limited here, for example, the confidence probability may be determined by using a Gaussian formula.

根据本公开的实施例，预设阈值可以根据具体应用场景来确定，例如可以设置为90％、95％等，在此不作限定。According to an embodiment of the present disclosure, the preset threshold may be determined according to a specific application scenario, for example, may be set to 90%, 95%, etc., which is not limited herein.

根据本公开的实施例，信息熵损失的计算方法可以如公式(1)所示：According to an embodiment of the present disclosure, the calculation method of information entropy loss can be shown as formula (1):

在式(1)中，L₁表示信息熵损失；fp表示第一子特征图谱；fx表示第二子特征图谱；fy表示第三子特征图谱。In formula (1), L ₁ represents information entropy loss; fp represents the first sub-feature map; fx represents the second sub-feature map; fy represents the third sub-feature map.

根据本公开的实施例，计算第一分割结果和第二分割结果之间的交叉熵损失，得到第二损失值可以包括如下操作：According to an embodiment of the present disclosure, calculating the cross-entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value may include the following operations:

从第一分割结果中确定与多个第一目标分区图相关的第一子分割结果；从第二分割结果中确定与多个第一目标分区图相关的第二子分割结果；基于第一子分割结果的置信概率和第二子分割结果的置信概率，确定预测值和标签值；以及计算预测值和标签值之间的交叉熵损失，得到第二损失值。Determine the first sub-segmentation results related to a plurality of first target partition maps from the first segmentation results; determine the second sub-segmentation results related to a plurality of first target partition maps from the second segmentation results; The confidence probability of the segmentation result and the confidence probability of the second sub-segmentation result determine the predicted value and the label value; and calculate the cross-entropy loss between the predicted value and the label value to obtain the second loss value.

根据本公开的实施例，第一分割结果可以具有与第一环视投影图相同的尺寸，因而可以基于第一目标分区图在第一环视投影图中的位置信息，来从第一分割结果中确定第一子分割结果。According to an embodiment of the present disclosure, the first segmentation result may have the same size as the first surround-view projection map, and thus may be determined from the first segmentation result based on the position information of the first target partition map in the first surround-view projection map. The first sub-split result.

根据本公开的实施例，第一子分割结果和第二子分割结果的置信概率的计算方法在此不作限定，例如可以利用高斯公式来确定该置信概率。According to an embodiment of the present disclosure, the calculation method of the confidence probability of the first sub-segmentation result and the second sub-segmentation result is not limited here, for example, the confidence probability may be determined by using a Gaussian formula.

根据本公开的实施例，可以通过比较第一子分割结果的置信概率和第二子分割结果的置信概率，来分别确定预测值和标签值，具体地，在第一子分割结果的置信概率大于第二子分割结果的置信概率的情况下，确定第一子分割结果为标签值，第二子分割结果为预测值；在第一子分割结果的置信概率小于第二子分割结果的置信概率的情况下，确定第一子分割结果为预测值，第二子分割结果为标签值。According to an embodiment of the present disclosure, the predicted value and the label value can be determined respectively by comparing the confidence probability of the first sub-segmentation result with the confidence probability of the second sub-segmentation result, specifically, when the confidence probability of the first sub-segmentation result is greater than In the case of the confidence probability of the second sub-segmentation result, it is determined that the first sub-segmentation result is a label value, and the second sub-segmentation result is a predicted value; the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result In this case, it is determined that the first sub-segmentation result is the prediction value, and the second sub-segmentation result is the label value.

根据本公开的实施例，交叉熵损失的计算方法可以如公式(2)所示：According to an embodiment of the present disclosure, the calculation method of the cross-entropy loss can be shown as formula (2):

L₂＝∑(ylogy_p+(1-y)log(1-y_p)) (2)L ₂ ＝∑(ylogy _p +(1-y)log(1-y _p )) (2)

在式(2)中，L₂表示交叉熵损失；y表示标签值；y_p表示预测值。In Equation (2), _L2 represents the cross-entropy loss; y represents the label value; _yp represents the predicted value.

根据本公开的实施例，用于进行该初始网络的模型参数调整的总损失可以是信息熵损失和交叉熵损失的加权求和，其权重可以是一个超参，可供用户在模型调优时任意设置。According to an embodiment of the present disclosure, the total loss used to adjust the model parameters of the initial network may be a weighted sum of information entropy loss and cross-entropy loss, and its weight may be a hyperparameter, which can be used by users when tuning the model Arbitrary settings.

根据本公开的实施例，多个第一目标分区图中可以包括第三目标分区图，第三目标分区图具有真实标签。According to an embodiment of the present disclosure, the plurality of first target partition maps may include a third target partition map, and the third target partition map has a real label.

根据本公开的是实施例，在确定第三目标分区图存在时，计算第一分割结果和第二分割结果之间的交叉熵损失，得到第二损失值可以包括如下操作：According to an embodiment of the present disclosure, when it is determined that the third target partition map exists, calculating the cross-entropy loss between the first segmentation result and the second segmentation result, and obtaining the second loss value may include the following operations:

从第一分割结果中确定与第三目标分区图相关的第三子分割结果，和与第三目标分区图无关且与多个第一目标分区图相关的第四子分割结果；从第二分割结果中确定与第三目标分区图相关的第五子分割结果，和与第三目标分区图无关且与多个第一目标分区图相关的第六子分割结果；计算第三子分割结果和真实标签之间的交叉熵损失，得到第三损失值；计算第四子分割结果和第六子分割结果之间的交叉熵损失，得到第四损失值；以及基于第三损失值和第四损失值，确定第二损失值。From the first segmentation result, determine a third sub-segmentation result related to the third target partition map, and a fourth sub-segmentation result independent of the third target partition map and related to a plurality of first target partition maps; from the second segmentation Determine the fifth sub-segmentation result related to the third target partition map in the result, and the sixth sub-segmentation result that is independent of the third target partition map and related to multiple first target partition maps; calculate the third sub-segmentation result and the real A cross-entropy loss between labels to obtain a third loss value; calculating a cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain a fourth loss value; and based on the third loss value and the fourth loss value , to determine the second loss value.

根据本公开的实施例，通过上述损失函数的设计，可以利用原始无标注数据，并加以少量有标注数据来对网络进行训练，实现了点云语义分割网络的半监督训练，从而可以在保障网络的语义分割效果的基础上，降低数据标注的成本。According to the embodiments of the present disclosure, through the design of the above loss function, the original unlabeled data and a small amount of labeled data can be used to train the network, and the semi-supervised training of the point cloud semantic segmentation network can be realized, so that the network can be guaranteed Based on the semantic segmentation effect, the cost of data annotation is reduced.

如图4所示，该方法包括操作S401～S402。As shown in Fig. 4, the method includes operations S401-S402.

在操作S401，将目标点云数据映射到初始视图中，得到环视投影图。In operation S401, the target point cloud data is mapped to the initial view to obtain a surround view projection image.

在操作S402，将环视投影图输入点云语义分割网络中，得到目标点云数据的语义分割特征图谱。In operation S402, the surround view projection image is input into the point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data.

根据本公开的实施例，该点云语义分割网络可以利用上述点云语义分割网络训练方法部分的方法训练得到，在此不再赘述。According to an embodiment of the present disclosure, the point cloud semantic segmentation network can be trained by using the method described in the above-mentioned point cloud semantic segmentation network training method, which will not be repeated here.

如图5所示，点云语义分割网络训练装置500包括第一映射模块510、第一处理模块520、确定模块530、第二处理模块540和训练模块550。As shown in FIG. 5 , the point cloud semantic segmentation network training device 500 includes a first mapping module 510 , a first processing module 520 , a determination module 530 , a second processing module 540 and a training module 550 .

第一映射模块510，用于将多组点云数据分别映射到初始视图中，得到多个环视投影图。The first mapping module 510 is configured to respectively map multiple sets of point cloud data into initial views to obtain multiple surround view projection images.

第一处理模块520，用于基于预设尺寸，分别对第一环视投影图和第二环视投影图进行分区处理，得到多个第一分区图和多个第二分区图，其中，第一环视投影图和第二环视投影图属于多个环视投影图。The first processing module 520 is configured to perform partition processing on the first surround-view projection map and the second surround-view projection map based on a preset size to obtain a plurality of first partition maps and a plurality of second partition maps, wherein the first surround-view projection The projection diagram and the second surround projection diagram belong to multiple surround projection diagrams.

确定模块530，用于从多个第一分区图中确定多个第一目标分区图。A determining module 530, configured to determine multiple first target partition maps from the multiple first partition maps.

第二处理模块540，用于利用多个第一目标分区图中的每个第一目标分区图分别对第二环视投影图中的第二目标分区图进行替换，得到混合投影图，其中，第二目标分区图属于多个第二分区图，第一目标分区图与第二目标分区图的位置相同。The second processing module 540 is configured to use each first target partition map in the multiple first target partition maps to respectively replace the second target partition map in the second surround view projection map to obtain a mixed projection map, wherein the first target partition map The second target partition map belongs to multiple second partition maps, and the positions of the first target partition map and the second target partition map are the same.

训练模块550，用于将第一环视投影图和混合投影图作为训练样本来对初始网络进行训练，得到点云语义分割网络。The training module 550 is configured to use the first surround-view projection image and the mixed projection image as training samples to train the initial network to obtain a point cloud semantic segmentation network.

根据本公开的实施例，训练模块550包括第一训练子模块、第二训练子模块、第三训练子模块和第四训练子模块。According to an embodiment of the present disclosure, the training module 550 includes a first training submodule, a second training submodule, a third training submodule and a fourth training submodule.

第一训练子模块，用于分别将第一环视投影图和混合投影图输入初始网络中，得到与第一环视投影图对应的第一特征图谱和第一分割结果，以及与混合投影图对应的第二特征图谱和第二分割结果。The first training sub-module is used to input the first surround-view projection map and the mixed projection map into the initial network respectively to obtain the first feature map corresponding to the first surround-view projection map and the first segmentation result, and the corresponding mixed projection map. A second feature map and a second segmentation result.

第二训练子模块，用于计算第一特征图谱和第二特征图谱之间的信息熵损失，得到第一损失值。The second training sub-module is used to calculate the information entropy loss between the first feature map and the second feature map to obtain the first loss value.

第三训练子模块，用于计算第一分割结果和第二分割结果之间的交叉熵损失，得到第二损失值。The third training sub-module is used to calculate the cross-entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value.

第四训练子模块，用于利用第一损失值和第二损失值来调整初始网络的模型参数，以最终得到点云数据语义分割网络。The fourth training sub-module is used to adjust the model parameters of the initial network by using the first loss value and the second loss value, so as to finally obtain the point cloud data semantic segmentation network.

根据本公开的实施例，第二训练子模块包括第一训练单元、第二训练单元和第三训练单元。According to an embodiment of the present disclosure, the second training submodule includes a first training unit, a second training unit and a third training unit.

第一训练单元，用于从第一特征图谱中确定与多个第一目标分区图相关的第一子特征图谱。The first training unit is configured to determine a first sub-feature map related to a plurality of first target partition maps from the first feature map.

第二训练单元，用于将第二特征图谱拆分为与多个第一目标分区图相关的第二子特征图谱和与多个第一目标分区图无关的第三子特征图谱。The second training unit is configured to split the second feature map into a second sub-feature map related to the multiple first target partition maps and a third sub-feature map unrelated to the multiple first target partition maps.

第三训练单元，用于在第一子特征图谱的置信概率大于预设阈值的情况下，以第一子特征图谱和第二子特征图谱作为正样本对，以第一子特征图谱和第三子特征图谱作为负样本对，计算正样本对和负样本对之间的信息熵损失，得到第一损失值。The third training unit is used to use the first sub-feature map and the second sub-feature map as positive sample pairs when the confidence probability of the first sub-feature map is greater than a preset threshold, and use the first sub-feature map and the third The sub-feature map is used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.

根据本公开的实施例，第三训练子模块包括第四训练单元、第五训练单元、第六训练单元和第七训练单元。According to an embodiment of the present disclosure, the third training sub-module includes a fourth training unit, a fifth training unit, a sixth training unit and a seventh training unit.

第四训练单元，用于从第一分割结果中确定与多个第一目标分区图相关的第一子分割结果。The fourth training unit is configured to determine first sub-segmentation results related to the plurality of first target partition maps from the first segmentation results.

第五训练单元，用于从第二分割结果中确定与多个第一目标分区图相关的第二子分割结果。The fifth training unit is configured to determine second sub-segmentation results related to the plurality of first target partition maps from the second segmentation results.

第六训练单元，用于基于第一子分割结果的置信概率和第二子分割结果的置信概率，确定预测值和标签值。The sixth training unit is configured to determine a prediction value and a label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result.

第七训练单元，用于计算预测值和标签值之间的交叉熵损失，得到第二损失值。The seventh training unit is used to calculate the cross-entropy loss between the predicted value and the label value to obtain the second loss value.

根据本公开的实施例，第六训练单元包括第一训练子单元和第二训练子单元。According to an embodiment of the present disclosure, the sixth training unit includes a first training subunit and a second training subunit.

第一训练子单元，用于在第一子分割结果的置信概率大于第二子分割结果的置信概率的情况下，确定第一子分割结果为标签值，第二子分割结果为预测值。The first training subunit is configured to determine that the first sub-segmentation result is a label value and the second sub-segmentation result is a predicted value when the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result.

第二训练子单元，用于在第一子分割结果的置信概率小于第二子分割结果的置信概率的情况下，确定第一子分割结果为预测值，第二子分割结果为标签值。The second training subunit is configured to determine that the first sub-segmentation result is a predicted value and the second sub-segmentation result is a label value when the confidence probability of the first sub-segmentation result is smaller than the confidence probability of the second sub-segmentation result.

根据本公开的实施例，多个第一目标分区图中包括第三目标分区图，第三目标分区图具有真实标签。According to an embodiment of the present disclosure, the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label.

根据本公开的实施例，第三训练子模块包括第八训练单元、第九训练单元、第十训练单元、第十一训练单元和第十二训练单元。According to an embodiment of the present disclosure, the third training sub-module includes an eighth training unit, a ninth training unit, a tenth training unit, an eleventh training unit and a twelfth training unit.

第八训练单元，用于从第一分割结果中确定与第三目标分区图相关的第三子分割结果，和与第三目标分区图无关且与多个第一目标分区图相关的第四子分割结果。An eighth training unit, configured to determine a third sub-segmentation result related to the third target partition map from the first segmentation result, and a fourth sub-segmentation result not related to the third target partition map and related to a plurality of first target partition maps Split results.

第九训练单元，用于从第二分割结果中确定与第三目标分区图相关的第五子分割结果，和与第三目标分区图无关且与多个第一目标分区图相关的第六子分割结果。A ninth training unit, configured to determine from the second segmentation results a fifth sub-segmentation result related to the third target partition map, and a sixth sub-segmentation result that is not related to the third target partition map and is related to a plurality of first target partition maps Split results.

第十训练单元，用于计算第三子分割结果和真实标签之间的交叉熵损失，得到第三损失值。The tenth training unit is configured to calculate a cross-entropy loss between the third sub-segmentation result and the real label to obtain a third loss value.

第十一训练单元，用于计算第四子分割结果和第六子分割结果之间的交叉熵损失，得到第四损失值。The eleventh training unit is configured to calculate a cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain a fourth loss value.

第十二训练单元，用于基于第三损失值和第四损失值，确定第二损失值。A twelfth training unit, configured to determine a second loss value based on the third loss value and the fourth loss value.

根据本公开的实施例，初始网络包括编码器和解码器。According to an embodiment of the present disclosure, an initial network includes an encoder and a decoder.

根据本公开的实施例，第一训练子模块包括第十三训练单元和第十四训练单元。According to an embodiment of the present disclosure, the first training submodule includes a thirteenth training unit and a fourteenth training unit.

第十三训练单元，用于分别将第一环视投影图和混合投影图输入编码器，得到与第一环视投影图对应的第一图像特征和与混合投影图对应的第二图像特征。The thirteenth training unit is configured to respectively input the first surround-view projection image and the mixed projection image into the encoder to obtain a first image feature corresponding to the first surround-view projection image and a second image feature corresponding to the mixed projection image.

第十四训练单元，用于分别将第一图像特征和第二图像特征输入解码器，得到与第一环视投影图对应的第一特征图谱和第一分割结果，以及与混合投影图对应的第二特征图谱和第二分割结果。The fourteenth training unit is used to respectively input the first image feature and the second image feature into the decoder to obtain the first feature map and the first segmentation result corresponding to the first surround-view projection map, and the first feature map corresponding to the mixed projection map. Two feature maps and a second segmentation result.

根据本公开的实施例，第一映射模块510包括第一映射单元、第二映射单元、第三映射单元和第四映射单元。According to an embodiment of the present disclosure, the first mapping module 510 includes a first mapping unit, a second mapping unit, a third mapping unit, and a fourth mapping unit.

第一映射单元，用于对于每组点云数据，分别对点云数据中每个点的三维坐标数据进行极坐标转换，以得到点云数据中每个点的极坐标数据。The first mapping unit is configured to perform polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data for each set of point cloud data, so as to obtain polar coordinate data of each point in the point cloud data.

第二映射单元，用于基于点云数据中每个点的极坐标数据，将点云数据中的多个点分别映射到初始视图的多个栅格中。The second mapping unit is configured to respectively map multiple points in the point cloud data to multiple grids of the initial view based on the polar coordinate data of each point in the point cloud data.

第三映射单元，用于对于初始视图的每个栅格，基于栅格中的点的三维坐标数据和极坐标数据，确定栅格的特征数据。The third mapping unit is configured to, for each grid of the initial view, determine the feature data of the grid based on the three-dimensional coordinate data and the polar coordinate data of the points in the grid.

第四映射单元，用于基于多个栅格的特征数据，构建得到环视投影图。The fourth mapping unit is configured to construct a surround view projection map based on the feature data of multiple grids.

需要说明的是，本公开的实施例中点云语义分割网络训练装置部分与本公开的实施例中点云语义分割网络训练方法部分是相对应的，点云语义分割网络训练装置部分的描述具体参考点云语义分割网络训练方法部分，在此不再赘述。It should be noted that the point cloud semantic segmentation network training device part in the embodiment of the present disclosure corresponds to the point cloud semantic segmentation network training method part in the embodiment of the present disclosure, and the description of the point cloud semantic segmentation network training device part is specific Refer to the point cloud semantic segmentation network training method section, and will not repeat it here.

如图6所示，点云语义分割装置600包括第二映射模块610和第三处理模块620。As shown in FIG. 6 , the point cloud semantic segmentation device 600 includes a second mapping module 610 and a third processing module 620 .

第二映射模块610，用于将目标点云数据映射到初始视图中，得到环视投影图。The second mapping module 610 is configured to map the target point cloud data to the initial view to obtain a surround view projection.

第三处理模块620，用于将所述环视投影图输入点云语义分割网络中，得到所述目标点云数据的语义分割特征图谱。The third processing module 620 is configured to input the surround view projection image into a point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data.

根据本公开的实施例的模块、子模块、单元、子单元中的任意多个、或其中任意多个的至少部分功能可以在一个模块中实现。根据本公开实施例的模块、子模块、单元、子单元中的任意一个或多个可以被拆分成多个模块来实现。根据本公开实施例的模块、子模块、单元、子单元中的任意一个或多个可以至少被部分地实现为硬件电路，例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC)，或可以通过对电路进行集成或封装的任何其他的合理方式的硬件或固件来实现，或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者，根据本公开实施例的模块、子模块、单元、子单元中的一个或多个可以至少被部分地实现为计算机程序模块，当该计算机程序模块被运行时，可以执行相应的功能。Modules, sub-modules, units, any multiple of sub-units according to the embodiments of the present disclosure, or at least part of the functions of any multiple of them may be implemented in one module. Any one or more of modules, submodules, units, and subunits according to the embodiments of the present disclosure may be implemented by being divided into multiple modules. Any one or more of modules, submodules, units, and subunits according to embodiments of the present disclosure may be at least partially implemented as hardware circuits, such as field programmable gate arrays (FPGAs), programmable logic arrays (PLAs), system-on-chip, system-on-substrate, system-on-package, application-specific integrated circuit (ASIC), or hardware or firmware that may be implemented by any other reasonable means of integrating or packaging circuits, or in a combination of software, hardware, and firmware Any one of these implementations or an appropriate combination of any of them. Alternatively, one or more of the modules, submodules, units, and subunits according to the embodiments of the present disclosure may be at least partially implemented as computer program modules, and when the computer program modules are executed, corresponding functions may be performed.

例如，第一映射模块510、第一处理模块520、确定模块530、第二处理模块540和训练模块550，或者，第二映射模块610和第三处理模块620中的任意多个可以合并在一个模块/单元/子单元中实现，或者其中的任意一个模块/单元/子单元可以被拆分成多个模块/单元/子单元。或者，这些模块/单元/子单元中的一个或多个模块/单元/子单元的至少部分功能可以与其他模块/单元/子单元的至少部分功能相结合，并在一个模块/单元/子单元中实现。根据本公开的实施例，第一映射模块510、第一处理模块520、确定模块530、第二处理模块540和训练模块550，或者，第二映射模块610和第三处理模块620中的至少一个可以至少被部分地实现为硬件电路，例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC)，或可以通过对电路进行集成或封装的任何其他的合理方式等硬件或固件来实现，或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者，第一映射模块510、第一处理模块520、确定模块530、第二处理模块540和训练模块550，或者，第二映射模块610和第三处理模块620中的至少一个可以至少被部分地实现为计算机程序模块，当该计算机程序模块被运行时，可以执行相应的功能。For example, the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or any number of the second mapping module 610 and the third processing module 620 can be combined in one modules/units/subunits, or any one of the modules/units/subunits can be split into multiple modules/units/subunits. Alternatively, at least part of the functions of one or more modules/units/subunits of these modules/units/subunits can be combined with at least part of the functions of other modules/units/subunits, and combined in one module/unit/subunit realized in. According to an embodiment of the present disclosure, the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or at least one of the second mapping module 610 and the third processing module 620 can be implemented at least in part as a hardware circuit, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated circuit (ASIC), or can It is realized by hardware or firmware in any other reasonable way of integrating or encapsulating circuits, or by any one of the three implementation ways of software, hardware and firmware, or by an appropriate combination of any of them. Alternatively, the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540, and the training module 550, or at least one of the second mapping module 610 and the third processing module 620 may be at least partially It is realized as a computer program module, and when the computer program module is executed, corresponding functions can be performed.

图7示意性示出了根据本公开实施例的适于实现点云语义分割网络训练方法或点云语义分割方法的电子设备的框图。图7示出的电子设备仅仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。Fig. 7 schematically shows a block diagram of an electronic device suitable for implementing a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure. The electronic device shown in FIG. 7 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.

如图7所示，根据本公开实施例的计算机电子设备700包括处理器701，其可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。处理器701例如可以包括通用微处理器(例如CPU)、指令集处理器和/或相关芯片组和/或专用微处理器(例如，专用集成电路(ASIC))，等等。处理器701还可以包括用于缓存用途的板载存储器。处理器701可以包括用于执行根据本公开实施例的方法流程的不同动作的单一处理单元或者是多个处理单元。As shown in FIG. 7 , a computer electronic device 700 according to an embodiment of the present disclosure includes a processor 701 that can be loaded into a random access memory (RAM) 703 according to a program stored in a read-only memory (ROM) 702 or loaded from a storage section 708 Various appropriate actions and processing are performed by the programs in the program. Processor 701 may include, for example, a general-purpose microprocessor (eg, a CPU), an instruction set processor and/or associated chipset and/or a special-purpose microprocessor (eg, an application-specific integrated circuit (ASIC)), and the like. Processor 701 may also include on-board memory for caching purposes. The processor 701 may include a single processing unit or a plurality of processing units for executing different actions of the method flow according to the embodiments of the present disclosure.

在RAM703中，存储有电子设备700操作所需的各种程序和数据。处理器701、ROM702以及RAM703通过总线704彼此相连。处理器701通过执行ROM702和/或RAM703中的程序来执行根据本公开实施例的方法流程的各种操作。需要注意，所述程序也可以存储在除ROM 702和RAM 703以外的一个或多个存储器中。处理器701也可以通过执行存储在所述一个或多个存储器中的程序来执行根据本公开实施例的方法流程的各种操作。In the RAM 703 , various programs and data necessary for the operation of the electronic device 700 are stored. The processor 701 , ROM 702 and RAM 703 are connected to each other via a bus 704 . The processor 701 executes various operations according to the method flow of the embodiment of the present disclosure by executing programs in the ROM 702 and/or RAM 703 . It should be noted that the program may also be stored in one or more memories other than the ROM 702 and the RAM 703 . The processor 701 may also perform various operations according to the method flow of the embodiments of the present disclosure by executing programs stored in the one or more memories.

根据本公开的实施例，电子设备700还可以包括输入/输出(I/O)接口705，输入/输出(I/O)接口705也连接至总线704。电子设备700还可以包括连接至I/O接口705的以下部件中的一项或多项：包括键盘、鼠标等的输入部分706；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707；包括硬盘等的存储部分708；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器710上，以便于从其上读出的计算机程序根据需要被安装入存储部分708。According to an embodiment of the present disclosure, the electronic device 700 may further include an input/output (I/O) interface 705 which is also connected to the bus 704 . The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, etc.; An output section 707 of a speaker or the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the Internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, etc. is mounted on the drive 710 as necessary so that a computer program read therefrom is installed into the storage section 708 as necessary.

根据本公开的实施例，根据本公开实施例的方法流程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读存储介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分709从网络上被下载和安装，和/或从可拆卸介质711被安装。在该计算机程序被处理器701执行时，执行本公开实施例的系统中限定的上述功能。根据本公开的实施例，上文描述的系统、设备、装置、模块、单元等可以通过计算机程序模块来实现。According to the embodiments of the present disclosure, the method flow according to the embodiments of the present disclosure can be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable storage medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication portion 709 and/or installed from removable media 711 . When the computer program is executed by the processor 701, the above-mentioned functions defined in the system of the embodiment of the present disclosure are executed. According to the embodiments of the present disclosure, the above-described systems, devices, devices, modules, units, etc. may be implemented by computer program modules.

本公开还提供了一种计算机可读存储介质，该计算机可读存储介质可以是上述实施例中描述的设备/装置/系统中所包含的；也可以是单独存在，而未装配入该设备/装置/系统中。上述计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被执行时，实现根据本公开实施例的方法。The present disclosure also provides a computer-readable storage medium. The computer-readable storage medium may be included in the device/apparatus/system described in the above embodiments; it may also exist independently without being assembled into the device/system device/system. The above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed, the method according to the embodiment of the present disclosure is implemented.

根据本公开的实施例，计算机可读存储介质可以是非易失性的计算机可读存储介质。例如可以包括但不限于：便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD- ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

例如，根据本公开的实施例，计算机可读存储介质可以包括上文描述的ROM 702和/或RAM 703和/或ROM 702和RAM 703以外的一个或多个存储器。For example, according to an embodiment of the present disclosure, a computer-readable storage medium may include one or more memories other than the above-described ROM 702 and/or RAM 703 and/or ROM 702 and RAM 703 .

本公开的实施例还包括一种计算机程序产品，其包括计算机程序，该计算机程序包含用于执行本公开实施例所提供的方法的程序代码，当计算机程序产品在电子设备上运行时，该程序代码用于使电子设备实现本公开实施例所提供的点云语义分割网络训练方法或点云语义分割方法。Embodiments of the present disclosure also include a computer program product, which includes a computer program, and the computer program includes program codes for executing the method provided by the embodiments of the present disclosure. When the computer program product is run on an electronic device, the program The code is used to enable the electronic device to implement the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure.

在该计算机程序被处理器701执行时，执行本公开实施例的系统/装置中限定的上述功能。根据本公开的实施例，上文描述的系统、装置、模块、单元等可以通过计算机程序模块来实现。When the computer program is executed by the processor 701, the above-mentioned functions defined in the system/device of the embodiment of the present disclosure are executed. According to the embodiments of the present disclosure, the above-described systems, devices, modules, units, etc. may be implemented by computer program modules.

在一种实施例中，该计算机程序可以依托于光存储器件、磁存储器件等有形存储介质。在另一种实施例中，该计算机程序也可以在网络介质上以信号的形式进行传输、分发，并通过通信部分709被下载和安装，和/或从可拆卸介质711被安装。该计算机程序包含的程序代码可以用任何适当的网络介质传输，包括但不限于：无线、有线等等，或者上述的任意合适的组合。In one embodiment, the computer program may rely on tangible storage media such as optical storage devices and magnetic storage devices. In another embodiment, the computer program can also be transmitted and distributed in the form of a signal on a network medium, downloaded and installed through the communication part 709, and/or installed from the removable medium 711. The program code contained in the computer program can be transmitted by any appropriate network medium, including but not limited to: wireless, wired, etc., or any appropriate combination of the above.

根据本公开的实施例，可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例提供的计算机程序的程序代码，具体地，可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。程序设计语言包括但不限于诸如Java，C++，python，“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中，远程计算设备可以通过任意种类的网络，包括局域网(LAN)或广域网(WAN)，连接到用户计算设备，或者，可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。According to the embodiments of the present disclosure, the program codes for executing the computer programs provided by the embodiments of the present disclosure can be written in any combination of one or more programming languages, specifically, high-level procedural and/or object-oriented programming language, and/or assembly/machine language to implement these computing programs. Programming languages include, but are not limited to, programming languages such as Java, C++, python, "C" or similar programming languages. The program code can execute entirely on the user computing device, partly on the user device, partly on the remote computing device, or entirely on the remote computing device or server. In cases involving a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., using an Internet service provider). business to connect via the Internet).

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。本领域技术人员可以理解，本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合，即使这样的组合或结合没有明确记载于本公开中。特别地，在不脱离本公开精神和教导的情况下，本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合。所有这些组合和/或结合均落入本公开的范围。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions. Those skilled in the art can understand that various combinations and/or combinations can be made in the various embodiments of the present disclosure and/or the features described in the claims, even if such combinations or combinations are not explicitly recorded in the present disclosure. In particular, without departing from the spirit and teaching of the present disclosure, the various embodiments of the present disclosure and/or the features described in the claims can be combined and/or combined in various ways. All such combinations and/or combinations fall within the scope of the present disclosure.

以上对本公开的实施例进行了描述。但是，这些实施例仅仅是为了说明的目的，而并非为了限制本公开的范围。尽管在以上分别描述了各实施例，但是这并不意味着各个实施例中的措施不能有利地结合使用。本公开的范围由所附权利要求及其等同物限定。不脱离本公开的范围，本领域技术人员可以做出多种替代和修改，这些替代和修改都应落在本公开的范围之内。The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the various embodiments have been described separately above, this does not mean that the measures in the various embodiments cannot be advantageously used in combination. The scope of the present disclosure is defined by the appended claims and their equivalents. Various substitutions and modifications can be made by those skilled in the art without departing from the scope of the present disclosure, and these substitutions and modifications should all fall within the scope of the present disclosure.

Claims

1. A point cloud semantic segmentation network training method, comprising:

Map multiple sets of point cloud data to the initial view respectively to obtain multiple surround-view projections;

Based on the preset size, the first surround-view projection image and the second surround-view projection image are respectively partitioned to obtain a plurality of first partition images and a plurality of second partition images, wherein the first surround-view projection image and the second surround-view projection image are The second surround view projection belongs to multiple said surround view projections;

determining a plurality of first target partition maps from a plurality of said first partition maps;

Using each of the first target partition maps in the plurality of first target partition maps to replace the second target partition map in the second surround-view projection map to obtain a mixed projection map, wherein the first target partition map is Two target partition maps belong to a plurality of the second partition maps, and the positions of the first target partition map and the second target partition maps are the same; and

Using the first surround-view projection image and the mixed projection image as training samples to train an initial network to obtain a point cloud semantic segmentation network.

2. The method according to claim 1, wherein the initial network is trained as a training sample with the first look-around projection map and the mixed projection map to obtain a point cloud semantic segmentation network, comprising:

Respectively input the first surround-view projection map and the mixed projection map into the initial network to obtain a first feature map and a first segmentation result corresponding to the first surround-view projection map, and the mixed projection map a corresponding second feature map and a second segmentation result;

calculating an information entropy loss between the first feature map and the second feature map to obtain a first loss value;

calculating a cross-entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value; and

Using the first loss value and the second loss value to adjust the model parameters of the initial network to finally obtain the point cloud data semantic segmentation network.

3. The method according to claim 2, wherein said calculating the information entropy loss between said first feature map and said second feature map to obtain a first loss value comprises:

determining a first sub-feature map associated with a plurality of the first target partition maps from the first feature map;

splitting said second feature map into a second sub-feature map associated with a plurality of said first target partition maps and a third sub-feature map unrelated to a plurality of said first target partition maps; and

When the confidence probability of the first sub-feature map is greater than a preset threshold, the first sub-feature map and the second sub-feature map are used as a positive sample pair, and the first sub-feature map and the second sub-feature map are used as positive sample pairs. The third sub-feature map is used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.

4. The method according to claim 2, wherein the calculation of the cross-entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value comprises:

determining first sub-segmentation results related to a plurality of the first target partition maps from the first segmentation results;

determining second sub-segmentation results associated with a plurality of the first target partition maps from the second segmentation results;

determining a prediction value and a label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result; and

Calculate a cross-entropy loss between the predicted value and the label value to obtain the second loss value.

5. The method according to claim 4, wherein said determining a prediction value and a label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result comprises:

When the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result, determine that the first sub-segmentation result is the label value, and the second sub-segmentation result is the predicted value; and

When the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result, determine that the first sub-segmentation result is the predicted value, and the second sub-segmentation result is the tag value.

6. The method of claim 2, wherein a plurality of the first target partition maps include a third target partition map, the third target partition map having a true label;

Wherein, the calculation of the cross-entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value includes:

From the first segmentation result, determine a third sub-segmentation result related to the third target partition map, and a fourth sub-segmentation result independent of the third target partition map and related to a plurality of the first target partition maps sub-segmentation result;

From the second segmentation result, determine a fifth sub-segmentation result related to the third target partition map, and a sixth sub-segmentation result independent of the third target partition map and related to a plurality of the first target partition maps sub-segmentation result;

Calculating the cross-entropy loss between the third sub-segmentation result and the true label to obtain a third loss value;

calculating a cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain a fourth loss value; and

The second loss value is determined based on the third loss value and the fourth loss value.

7. The method of claim 2, wherein the initial network comprises an encoder and a decoder;

Wherein, the first surround-view projection map and the mixed projection map are respectively input into the initial network to obtain a first feature map and a first segmentation result corresponding to the first surround-view projection map, and the The second feature map corresponding to the mixed projection map and the second segmentation result include:

respectively inputting the first surround-view projection image and the mixed projection image into the encoder to obtain a first image feature corresponding to the first surround-view projection image and a second image feature corresponding to the mixed projection image; as well as

respectively inputting the first image feature and the second image feature into the decoder to obtain the first feature map corresponding to the first surround view projection map and the first segmentation result, and the The second feature map corresponding to the mixed projection map and the second segmentation result.

8. The method according to claim 1, wherein said mapping multiple groups of point cloud data into the initial view respectively obtains multiple look-around projections, comprising:

For each set of point cloud data, respectively perform polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data to obtain polar coordinate data of each point in the point cloud data;

Based on the polar coordinate data of each point in the point cloud data, respectively map multiple points in the point cloud data to multiple grids in the initial view;

For each grid of the initial view, determining feature data of the grid based on 3D coordinate data and polar coordinate data of points in the grid; and

Based on the feature data of multiple grids, the surround view projection map is constructed.

9. A point cloud semantic segmentation method, comprising:

Map the target point cloud data to the initial view to obtain a surround-view projection; and

Input the look-around projection map into the point cloud semantic segmentation network to obtain the semantic segmentation feature map of the target point cloud data;

Wherein, the point cloud semantic segmentation network is trained by using the point cloud semantic segmentation network training method according to any one of claims 1-8.

10. A point cloud semantic segmentation network training device, comprising:

The first mapping module is used to map multiple sets of point cloud data into the initial view respectively to obtain multiple surround view projections;

The first processing module is configured to perform partition processing on the first surround-view projection map and the second surround-view projection map based on a preset size to obtain a plurality of first partition maps and a plurality of second partition maps, wherein the first The surround-view projection diagram and the second surround-view projection diagram belong to a plurality of the surround-view projection diagrams;

A determining module, configured to determine a plurality of first target partition maps from a plurality of the first partition maps;

The second processing module is configured to use each of the first target partition maps in a plurality of the first target partition maps to respectively replace the second target partition map in the second surround-view projection map to obtain a mixed projection map, wherein the second target partition map belongs to a plurality of the second partition maps, and the first target partition map and the second target partition map are in the same position; and

The training module is used to use the first surround-view projection image and the mixed projection image as training samples to train the initial network to obtain a point cloud semantic segmentation network.

11. A point cloud semantic segmentation device, comprising:

The second mapping module is used to map the target point cloud data into the initial view to obtain a surround view projection; and

The third processing module is used to input the look-around projection map into the point cloud semantic segmentation network to obtain the semantic segmentation feature map of the target point cloud data;

12. An electronic device comprising:

one or more processors;

memory for storing one or more instructions,

Wherein, when the one or more instructions are executed by the one or more processors, the one or more processors are made to implement the method according to any one of claims 1-9.

13. A computer-readable storage medium, on which executable instructions are stored, and when executed by a processor, the executable instruction causes the processor to implement the method according to any one of claims 1 to 9.

14. A computer program product comprising computer executable instructions for implementing the method of any one of claims 1 to 9 when executed.