CN115409989A

CN115409989A - Three-dimensional point cloud semantic segmentation method for optimizing boundary

Info

Publication number: CN115409989A
Application number: CN202211156241.6A
Authority: CN
Inventors: 魏东; 张潇瀚; 朱智睿; 刘欢
Original assignee: Shenyang University of Technology
Current assignee: Shenyang University of Technology
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2022-11-29

Abstract

The invention discloses a three-dimensional point cloud semantic segmentation method for optimizing a boundary, belongs to the field of automatic driving, computer vision and deep learning, and particularly relates to a three-dimensional point cloud semantic segmentation method for optimizing a boundary. The method is characterized in that: the segmentation method comprises the following steps: preprocessing the three-dimensional point cloud; the coding layer of the second step consists of a local feature fusion module and a random sampling module; a decoding layer in the third step: taking the feature codes obtained in the step two as input, sampling the number of three-dimensional point clouds, learning feature dimensions and comparison boundaries of the feature dimensions, and reversely jumping layers of the code layers to send the feature dimensions and the comparison boundaries into a shared MLP fusion feature; after the feature fusion is completed, carrying out full-connection dimension increasing, then carrying out random inactivation operation, finally reducing the dimension to the dimension with the same number of semantic tags through shared MLP, and finally outputting the semantic segmentation result of the three-dimensional point cloud; and step four, optimizing the contrast boundary. The invention aims to solve the problem of improving the precision of large-scale three-dimensional point cloud semantic segmentation.

Description

Three-dimensional point cloud semantic segmentation method for optimizing boundary

Technical Field

The invention belongs to the field of automatic driving and computer vision and deep learning, and particularly relates to a three-dimensional point cloud semantic segmentation method for optimizing boundaries.

Background

The three-dimensional point cloud is a three-dimensional data point attached to the surface of an object in a three-dimensional space generated by different methods such as radar, picture and data synthesis. The three-dimensional point cloud generation method can also be divided into an object-level three-dimensional point cloud and a scene-level three-dimensional point cloud according to the scale of the generated three-dimensional point cloud. The object-level three-dimensional point cloud means that a certain object is expressed in a three-dimensional space in the form of a three-dimensional data point. The scene-level three-dimensional point cloud is expressed in the form of three-dimensional data points, wherein a plurality of different objects exist in a certain space range. The scene-level three-dimensional point cloud is divided into an indoor-level three-dimensional point cloud and an outdoor-level large-scale three-dimensional point cloud according to the scale, wherein the indoor-level three-dimensional point cloud is the three-dimensional point cloud of an indoor scene obtained through picture generation, a laser radar with a small range, data synthesis and the like. The outdoor-grade large-scale point cloud is a three-dimensional point cloud with a large space range acquired by a vehicle-mounted laser radar or an aviation laser radar and the like.

The task of segmenting the three-dimensional point cloud is processed on the basis of the acquired three-dimensional point cloud data, the task of segmenting the three-dimensional point cloud is to separate the three-dimensional point cloud through different characteristics of the three-dimensional point cloud, and different areas are expressed through different labels. The segmentation of the three-dimensional point cloud can be divided into partial segmentation (fine level), instance segmentation (object level), and semantic segmentation (scene level).

The method for semantic segmentation of three-dimensional point clouds is characterized in that data points in a scene and feature information of the data points are used as input and then a series of processing is carried out, and finally each three-dimensional data point is endowed with a semantic label. The semantic segmentation of the three-dimensional point cloud is mainly applied to the industries of unmanned driving, robots, unmanned planes, railway rail detection, three-dimensional modeling, entertainment and the like. The performance requirements for semantic segmentation of three-dimensional point clouds in different application scenes are different, and the performance required in an indoor scene is mainly the accuracy of segmentation; for outdoor large-scale scenes, the time for semantic segmentation processing needs to be reduced as much as possible on the premise of accuracy requirement.

Three-dimensional point cloud semantic segmentation is mainly divided into two major categories according to different methods for extracting features. The first method adopts artificial features and a classifier to carry out semantic segmentation on the three-dimensional point cloud, and in the method, objects in a region are detected and judged in a clustering or classifying mode at the first stage by introducing the artificial features, so that scattered points, linear points and surface points in data can be divided, and semantic labels are directly given; second stage to account for local context ties, a Markov Random Field (MRF) was introduced to model the data point context. The second type uses a deep learning-based mode to perform semantic segmentation of three-dimensional point cloud, and the modes are mainly divided into three types: a multi-view method, a three-dimensional point cloud voxelization method and an original three-dimensional point cloud-based method. The multi-view method is characterized in that a plurality of pictures are shot in a three-dimensional space or a three-dimensional model is projected on a two-dimensional image, then the features are extracted through a convolution layer and a pooling layer respectively, finally the features are aggregated and input into a network, and the result of semantic segmentation of the three-dimensional point cloud is returned. This method is not suitable for large-scale spatial scenes, and the segmentation accuracy is related to the view angle selection; the three-dimensional point cloud voxelization method is to voxelize a three-dimensional point cloud into a 3D grid and then perform 3D convolution to extract features, but the three-dimensional point cloud voxelization method has the problems that a plurality of semantic labels possibly exist in one voxel, the three-dimensional point cloud semantic segmentation performance is seriously influenced by the boundary between each object, the network model cannot separate the boundaries of different objects in the same voxel, and the precision of the three-dimensional point cloud semantic segmentation is influenced.

The method based on the original three-dimensional point cloud directly extracts the characteristic information from the input three-dimensional point cloud for segmentation, so that the loss of the original characteristic information can be reduced, and the semantic segmentation of the indoor three-dimensional point cloud has good performance. However, semantic segmentation of three-dimensional point clouds of different scales depends on the structure of a network model. Part of the existing network models can only execute the semantic segmentation of the three-dimensional point cloud of an indoor scene, and the reason for the occurrence of the situation is that the three-dimensional point cloud has disorder, the original three-dimensional point cloud-based method is difficult to directly extract the mutual relation among data points, and the model training can be carried out in a preprocessing mode.

Many network models based on original three-dimensional point clouds, which can process scenes of different scales, have poor local segmentation effect, and particularly, semantic labels of different objects are difficult to separate at the boundary part. The main reason is that the network model cannot clearly learn and separate boundary points in training due to the problems of shielding, sparseness, discontinuity and the like of the three-dimensional point cloud during collection. Therefore, different methods have different effects on three-dimensional point clouds of different scales.

Disclosure of Invention

The invention provides a three-dimensional point cloud semantic segmentation method for optimizing boundaries, and aims to improve the precision of large-scale three-dimensional point cloud semantic segmentation, solve the problems that a part of deep learning-based three-dimensional point cloud semantic segmentation methods cannot process large-scale three-dimensional point clouds and other semantic segmentation methods are low in time efficiency, and accelerate the semantic segmentation speed.

The technical scheme is as follows:

a three-dimensional point cloud semantic segmentation method for optimizing boundaries is characterized by comprising the following steps: the segmentation method comprises the following steps:

the method comprises the following steps: preprocessing three-dimensional point cloud: only performing three-dimensional point cloud preprocessing before an encoding layer, and setting the characteristic dimension of the input three-dimensional point cloud as 8 dimensions in a mode of sharing a Multilayer Perceptron (MLP);

step two: the coding layer consists of a local feature fusion module and a random sampling module; the local feature fusion module consists of two groups of continuous local space coding and attention pooling parts;

1. local spatial coding: the local feature code is divided into three parts: searching the position codes of the points and the relative points of the local area and enhancing the point characteristics;

2. pooling attention;

the attention score of each local feature point is learned by a function consisting of Conv2d and softmax. The feature of each local area point is then dot-product and summed with each attention score, and the result is input to a shared MLP consisting of BN and LeakyRelu. A feature vector f is obtained in this case _i ；

Step three: a decoding layer: taking the feature code obtained in the step two as the input of a decoding layer, sampling the number of nearest neighbor difference three-dimensional point clouds, and connecting the feature dimension with the feature dimension for comparison boundary learning and the encoding layer reverse order jump layer to send the feature into a shared MLP fusion feature; after the feature fusion is completed, carrying out full-connection dimension increasing, then carrying out random inactivation operation, finally reducing the dimension to the dimension with the same number of semantic tags through shared MLP, and finally outputting the semantic segmentation result of the three-dimensional point cloud;

step four: and (3) contrast boundary optimization: the contrast boundary optimization adopts a contrast boundary learning framework, which is mainly divided into two modules:

1. boundary sub-scene mining: extracting a true value label from three-dimensional point cloud sampling under different scales, wherein the point cloud number of each layer of sampling is the same as the random sampling result of the corresponding coding layer; determining semantic labels of point clouds of each layer by adopting boundary sub-scene excavation in a CBL (cubic boron column) frame under the condition that specific labels of points sampled at the boundary are uncertain along with the reduction of the scale of the three-dimensional point cloud;

2. and (3) contrast boundary learning: in the process of model training, the boundary point of each layer and the semantic label of the layer corresponding to the boundary point obtained by the boundary sub-scene are compared and learned; infonCE loss and generalization thereof are adopted in contrast learning to optimize and define an objective function of the boundary point, so that the final result of the boundary point is closer to adjacent points from the same category, and finally, the three-dimensional point cloud characteristics of the corresponding decoding layer are optimized.

In the step two (1) local spatial coding, finding the points of the local area: local areas controlled according to scene scale are based on Euclidean distance d, a KNN algorithm is used for collecting adjacent points, and the central point of each local area is taken as i ^th Sorting is carried out; the adjacent points in each local area are represented by K, namely

The subscript i represents the serial number of the central point, the superscript k represents the serial number of a point in a local area of the central point i, and a KD-tree index is constructed through KNN;

(2) And (3) relative position coding: in order to make the whole network structure understand local characteristics better, the coding mode is to combine the central point coordinate of each local area, the adjacent point coordinate in the local area and the local areaSplicing the four parts of the coordinate difference of the central point and the Euclidean distance between the adjacent points in the local area and the central point of the local area, and then performing MLP operation; thereby dividing each local regionIs/are as followsAssociating the point with a central point within the local area to improve local performance of the overall network structure;

(3) Enhancing the point characteristics: the characteristics of K neighbor points in each local area are F = { F = { (F) _i ¹ …f _i ^k Splicing the relative position codes of the adjacent points and the characteristics of the original points, constructing new characteristics by the position codes and the characteristics, and obtaining the spliced characteristics

Wherein subscript i represents the serial number of the center point, and superscript k represents the serial number of the point in the neighborhood of the center point i.

The invention has the following advantages and positive effects:

the invention uses an original three-dimensional point cloud method based on deep learning, compared with other deep learning methods, the time complexity of the method in a three-dimensional point cloud sampling mode is O (1), and a large-scale three-dimensional point cloud task can be processed at a higher speed. When the local features are extracted, a local feature expansion module is used, the receptive field of each point is improved, and therefore the neighborhood range is expanded. The 6-layer down-sampling structure and the contrast boundary optimization provided by the method are used for greatly sampling the three-dimensional point cloud and simultaneously performing the boundary optimization under different scales, so that the boundary segmentation of the three-dimensional point cloud semantic segmentation network model is clearer, and effective help is provided for the processing of tasks such as target detection and the like.

Drawings

FIG. 1 is a general structure diagram of a three-dimensional point cloud semantic segmentation method for optimizing boundaries according to the present invention;

FIG. 2 is an explanatory diagram of a 1-dimensional feature extraction module CBL1d in the boundary-optimized three-dimensional point cloud semantic segmentation method of the present invention;

FIG. 3 is an explanatory diagram of a local feature fusion module in the boundary-optimized three-dimensional point cloud semantic segmentation method of the present invention;

FIG. 4 is a 2-dimensional feature extraction module CBL2d in the three-dimensional point cloud semantic segmentation method for optimizing boundaries according to the present invention;

FIG. 5 is an explanatory diagram of a local space encoding module in the boundary-optimized three-dimensional point cloud semantic segmentation method of the present invention;

FIG. 6 is a module for attention pooling in the boundary-optimized three-dimensional point cloud semantic segmentation method of the present invention.

Detailed Description

In order to improve the precision of large-scale three-dimensional point cloud semantic segmentation and solve the problems that a part of deep learning-based three-dimensional point cloud semantic segmentation method cannot process large-scale three-dimensional point clouds and other semantic segmentation methods are low in time efficiency, the three-dimensional point cloud semantic segmentation method for optimizing the boundary is provided, and the precision and the processing efficiency of three-dimensional point cloud semantic segmentation under different scales of three-dimensional point clouds are improved. The method reduces resource consumption during sampling by using random sampling; the method adds attention pooling in local polymerization to enhance the receptive field of local areas; according to the method, contrast boundary learning is introduced in the network model training process, neighborhood is constructed only in the edge region to reduce resource consumption, and meanwhile, the result of the contrast learning of the boundary points is used for optimizing the model, so that the boundary processing capacity of the network model is improved. By the aid of the method, scene semantic segmentation of different levels can be achieved, boundary accuracy of three-dimensional point cloud semantic segmentation is improved, and processing efficiency of an integral network model is improved.

The invention is further described below with reference to the accompanying drawings:

a three-dimensional point cloud semantic segmentation method for optimizing boundaries comprises three-dimensional point cloud preprocessing, a coding layer, a decoding layer and comparison boundary optimization, wherein the coding layer comprises a local feature fusion module and a random sampling module, and the comparison boundary optimization comprises comparison boundary learning and boundary sub-scene mining. The processing flow of the method is shown in fig. 1, and specifically comprises the following steps:

the method comprises the following steps: preprocessing three-dimensional point cloud: because the original feature dimensions provided by different data sets are different (such as features of R, G, B, illumination, coordinates, channel numbers, normal vectors, and the like), the data sets are preprocessed. Performing three-dimensional point cloud preprocessing before an encoding layer, firstly extracting the characteristics of each point from the acquired original three-dimensional point cloud through a shared MLP, wherein the total number of the three-dimensional point cloud is N, and extracting the characteristics by using a CBL1d module consisting of one-dimensional convolution (Conv 1 d), batch normalization (Batch normalization) and leakage linear unit (LeakyReLu) as the shared MLP, as shown in FIG. 2. And enabling the feature dimension of the output three-dimensional point cloud to be 8-dimensional.

Step two: and (3) coding layer: as shown in fig. 1, a point cloud with initially 8-dimensional features is input into successive 6 layers of code layers, and random sampling is employed between each layer of code layer. The number of point clouds randomly sampled at the first 3 layers in the 6 layers of coding layers is 1/4 of the number of point clouds at the previous layer, the number of point clouds randomly sampled at the last 3 layers in the coding layers is 1/2 of the number of point clouds at the previous layer, a local feature fusion module is used for performing dimension increasing operation between each layer, and the feature dimension between each layer is increased by one time, namely the three-dimensional point clouds after pretreatment batch normalization are changed into (8, 16,32,64,128,256, 512). After local feature fusion operation is performed on each layer, random sampling operation is performed again, the random sampling of the first three layers is kept for 25%, the random sampling of the second three layers is kept for 50%, namely the number of three-dimensional point clouds is

A local feature fusion module: as shown in fig. 3, two partial spatial encodings and two attention pooling stacks are combined. The three-dimensional point cloud characteristics enter a CBL2d module for dimension halving, wherein the CBL2d module is composed of two-dimensional convolution (Conv 2 d), batch normalization (Batch normalization) and a leakage linear unit (LeakyReLu), and the CBL2d module is shown in FIG. 4. And then splicing the local space coordinates and the local features, entering a local space coding module and attention pooling to obtain a local attention feature, splicing the local space coordinates and the local space coordinates, and performing the local space coding module and attention pooling again. And performing dimension increasing operation on the obtained local attention features through a CB2d module, performing dimension increasing operation on the input three-dimensional point cloud features through the CB2d module, finally summing the two features with the same dimension, and entering LeakyRelu for activation. The CB2d module is composed of two-dimensional convolution (Conv 2 d) and Batch normalization (Batch normalization).

(1) Local spatial coding:

as shown in fig. 5. The local feature fusion part firstly collects the three-dimensional point cloud features with coordinate information through KNN and local region points based on Euclidean distance range, and the central point of each local region is i ^th Sequencing is carried out, and K points in each local area are

And P is _i ∈χ ⁿ Wherein the subscript indicates the center point number and the superscript indicates the number of points within the local area. Splicing five parts including the center point coordinate of a local area, the coordinates of adjacent points in the local area, the coordinate difference between the coordinates of the adjacent points in the local area and the center point coordinate of the local area, the Euclidean distance between the adjacent points in the local area and the center point of the local area and the normal vector estimation of each point, entering a CBL2d module, and obtaining the relative position code of the local area, namely r _i ^k Can be expressed as formula (1):

wherein

Indicating a stitching operation, a multilayer perceptron whose MLP is CBL2d,

representing the difference between the coordinates of the neighboring points in the local area and the coordinates of the center point of the local area,

representing the euclidean distance from the extraction neighborhood point to the central point,

representing the normal vector estimate for each point in the local region.

Each local areaInner K adjacent points characteristic is F = { F = { (F) _i ¹ …f _i ^k F represents a feature set of adjacent points in the local area; f. of _i ^k The index indicates the feature of the adjacent point in the local area, the superscript indicates the serial number of the adjacent point in the local area, and the subscript indicates the serial number of the local area. Splicing the relative position codes of the adjacent points in the local area and the characteristics of the adjacent points in the local area to form new characteristics constructed by splicing the characteristics and the point position coordinate codes, namely the characteristics

Wherein

Representing a new feature set of adjacent points in the constructed local area;

the method is characterized in that the method represents new characteristics of the near points in the constructed local area, superscripts represent serial numbers of the near points in the local area, and subscripts represent serial numbers of the local area. The expression formula is (2):

(2) Attention pooling:

as shown in FIG. 6, the attention score of each point is first determined, and the new feature is constructed

The shared weight W is input into an attention score calculation formula as a hyper-parameter to calculate the attention score of each point

The superscript indicates the local area proximity point number and the subscript indicates the local area number. The expression formula is (3):

where the g () function is a shared function consisting of a two-dimensional convolution and a softmax.

The obtained attention score and the constructed new characteristics

Performing dot product operation to obtain attention characteristics of each point, summing the attention characteristics of local each point, and inputting the summed attention characteristics into a CBL2d module as the attention characteristics of the local area

The expression formula is (4):

step three: a decoding layer: as shown in fig. 1, after 6 coding layers, 10 decoding layers are performed. The first layer of the decoding layer is to splice the feature codes output by the last layer of the coding layer with the features optimized by the comparison boundary learning, and then input the feature codes into a CBL2d module, and the dimension is still maintained to be 512 dimensions through the module. The mode from the layer 2 of the decoding layer to the layer 7 of the decoding layer is as follows: searching nearest neighbor points through a KNN algorithm, then performing up-sampling on the point cloud number through nearest neighbor interpolation (nearest neighbor), wherein the number of the three-dimensional point clouds from the 2 nd layer to the 4 th layer is increased by 2 times, the number of the three-dimensional point clouds from the 5 th layer to the 7 th layer is increased by 4 times, and the number of the three-dimensional point clouds is changed into the number of the three-dimensional point clouds

And performing nearest neighbor interpolation and then performing shared MLP. After nearest neighbor interpolation is carried out on each layer from the layer 2 to the layer 7, three-dimensional point cloud features obtained by the previous layer of decoding layer, three-dimensional point cloud features obtained after comparison boundary learning optimization of the corresponding layer and three-dimensional point cloud features of the coding layer corresponding to the point cloud number are spliced, and then the three-dimensional point cloud features are input into a CBL2d module, the feature dimension is reduced to half of that of the previous layer, and the expression is (5):

wherein f is _i ^d Representing the three-dimensional point cloud characteristic of the current decoding layer,

representing the characteristics of the three-dimensional point cloud of the previous layer,

three-dimensional point cloud characteristics of the coding layer representing the corresponding point cloud number, f _i ^C And representing the three-dimensional point cloud characteristics after the comparison boundary learning optimization. Layer 8 of decoding the full number of three-dimensional point clouds with 8-dimensional features are input into the CBL2d module to increase the dimensions to 64. And randomly inactivating the 9 th layer of the decoding layer, setting the inactivation rate to be 50%, and inputting the inactivation rate to the CBL2d module to reduce the three-dimensional point cloud characteristic dimension to be 32 dimensions. The 10 th layer of the decoding layer sets the dimension to be the same as the number of the semantic tags through CBL2d and finally predicts the semantic segmentation result.

Step four, contrast boundary optimization:

the structure is shown in figure 1. The algorithm adopts CBL three-dimensional point cloud semantic segmentation and comparison boundary framework, and consists of two parts of sub-scene boundary excavation and comparison boundary learning.

1. Sub-scene boundary mining: the purpose is to find the true boundary part in training, and the boundary digging structure of each layer can be used as the true value of the next layer. The number of layers of sub-scene boundary mining is the same as the encoding layer, and each layer of sub-scene boundary mining shares random sampling with each layer of the encoding layer. And introducing truth values of the data set in the model training process to carry out contrast boundary learning. The boundary points are points for distinguishing different objects, and the searching method is to find points different from semantic labels of adjacent points in the three-dimensional point cloud, so that different objects are more clearly separated in the end-to-end network model, and the semantic segmentation precision of the three-dimensional point cloud is improved. Wherein, the boundary neighborhood selects a sphere with the integral three-dimensional point cloud scale of 0.1 percent, and the expression formula is (6):

wherein χ represents the entire three-dimensional point cloud, B represents the set of boundary points, N _i Is represented by x _i Set of neighborhood points of (c), x _j Is represented in N _i Intra-neighborhood and x _i Points where semantic tags are inconsistent, l represents a semantic tag, and B ⁿ ∈χ ⁿ And n represents the number of corresponding coding layer layers. The superscript indicates the number of sampled layers, and the subscript indicates the position ordering of the points in the neighborhood, which is not described in detail later.

The sampled points of each layer are the same as the corresponding randomly sampled points, and the first layer directly uses the truth labels of the data sets

And is

The truth label determining method for the second layer and the later boundary is a value obtained by averaging and pooling labels of all points in a local area with the boundary point of the previous layer as the center. The expression formula is (7):

where AVG () represents an average pooling,

and representing a neighborhood point set of the front sampling layer by taking the boundary point as the center.

2. And (3) contrast boundary learning: first, the boundary point for contrast learning is determined according to formula 6, and the center point is registered as { x } in the neighborhood with the boundary point as the center _j ∈N _i ∧l _i ＝l _j Is at negative point of pair { x } _j ∈N _i ∧l _i ≠l _j }. Generalization of contrast training using InfoNCE loss (InfoNoise continuous Estimation loss) with introduced tau temperature coefficient as the hyperparameterThe method aims to expect that the optimization of the boundary points is closer to the normal point, and the boundary characteristics of the up-sampling stage are optimized by comparing the boundary characteristics obtained after learning, so that the limitation of the boundary area is enhanced, and meanwhile, the network model has the capability of identifying the boundary characteristics. The generalized function expression formula of the InfonCE loss is (8):

wherein L is _B Representing a loss of contrast boundary learning, f _i Representing the corresponding boundary point x _i Is characterized by f _j Is represented by the formula _i Semantically labeling the features of inconsistent points, subscript k denotes the total points of the local region, τ is the temperature coefficient as a hyper-parameter, and d (,) denotes the distance measurement result

The global loss function is calculated as (9) from the entire training process:

L _CE represents the cross-entropy loss, λ vs. loss weight for boundary learning.

Claims

1. A three-dimensional point cloud semantic segmentation method for optimizing boundaries is characterized by comprising the following steps: the segmentation method comprises the following steps:

the method comprises the following steps: preprocessing three-dimensional point cloud: only performing three-dimensional point cloud preprocessing before a coding layer, and defining the characteristic dimension of the input three-dimensional point cloud as 8 dimensions in a mode of sharing MLP;

step two: the coding layer consists of a local feature fusion module and a random sampling module; the local feature fusion module consists of two groups of continuous local space codes and attention pooling parts;

1. local spatial coding: the local feature code is divided into three parts: searching the position codes and point characteristic enhancement of points and relative points in a local area;

2. pooling attention;

learning the attention score of each local feature point through a function consisting of Conv2d and softmax; then, the feature of each local area point and each attention fraction are subjected to dot product and then summed, and the result is input into a shared MLP consisting of BN and LeakyRelu; a feature vector f is obtained in this case _i ；

Step three: a decoding layer: taking the feature code obtained in the step two as the input of a decoding layer, sampling the number of nearest neighbor difference three-dimensional point clouds, and connecting the feature dimension with a comparison boundary learning feature dimension and a coding layer in a reverse order jumping mode to send the feature to a shared MLP fusion feature; after the feature fusion is completed, carrying out full-connection dimension increasing, then carrying out random inactivation operation, finally reducing the dimension to the dimension with the same number of semantic tags through shared MLP, and finally outputting the semantic segmentation result of the three-dimensional point cloud;

1. boundary sub-scene mining: extracting a true value label from three-dimensional point cloud sampling under different scales, wherein the point cloud number of each layer of sampling is the same as the random sampling result of the corresponding coding layer; the method comprises the following steps that (1) as the scale of the three-dimensional point cloud is reduced, the specific label of a point sampled at the boundary is uncertain, and the semantic label of each layer of point cloud is determined by adopting boundary sub-scene excavation in a CBL (cubic boron nitride) frame;

2. and (3) contrast boundary learning: in the process of model training, the boundary point of each layer and the semantic label of the layer corresponding to the boundary point obtained by the boundary sub-scene are compared and learned; the InfonCE loss function and generalization thereof are adopted for contrast learning to optimize and define the target function of the boundary point, so that the final result of the boundary point is closer to the adjacent points from the same category, and finally, the three-dimensional point cloud characteristics of the corresponding decoding layer are optimized.

2. The boundary-optimized three-dimensional point cloud semantic segmentation method according to claim 1, characterized in that: in the step two (1) local spatial coding, finding the points of the local area: local area controlled according to scene scale is based on Euclidean distance d, and a KNN algorithm is used for collecting near distanceAdjacent points, with the central point of each local area as i ^th Sorting is carried out; the adjacent points in each local area are represented by K, namely

(2) And (3) relative position coding: in order to enable the whole network structure to better understand local characteristics, the coding mode is to carry out splicing operation on four parts of the center point coordinate of each local area, the adjacent point coordinate in the local area, the coordinate difference between the adjacent point coordinate in the local area and the center point coordinate of the local area, and the Euclidean distance between the adjacent point in the local area and the center point of the local area, and then carry out MLP operation; thereby to locate each local regionIs/are as followsPoints are associated with a central point within this local area to improve local performance of the overall network structure;

(3) Enhancing the point characteristics: the characteristics of K neighbor points in each local area are F = { F = { (F) _i ¹ …f _i ^k }; the subscript i represents the serial number of the central point, the superscript k represents the serial number of the point in the local area of the central point, the relative position code of the adjacent points and the original characteristic of the point are spliced, the position code and the characteristic construct a new characteristic, and the spliced characteristic is the relative position code of the adjacent points and the original characteristic of the adjacent points

3. The boundary-optimized three-dimensional point cloud semantic segmentation method according to claim 1, characterized in that: the number of layers of boundary sub-scene boundary mining in the fourth step is the same as that of the coding layer, and each layer of sub-scene boundary mining and each layer of the coding layer share random sampling; introducing a true value of a data set in the model training process to carry out contrast boundary learning; the boundary points are points for distinguishing different objects, and the searching method is to find points different from semantic labels of adjacent points in the three-dimensional point cloud, so that different objects are more clearly separated in an end-to-end network model, and the semantic segmentation precision of the three-dimensional point cloud is improved; wherein, the boundary neighborhood selects a sphere with the integral three-dimensional point cloud scale of 0.1 percent, and the expression formula is (6):

wherein χ represents the entire three-dimensional point cloud, B represents the set of boundary points, N _i Is expressed as x _i Set of neighborhood points of (c), x _j Is shown in N _i Intra-neighborhood sum x _i Points where semantic tags are inconsistent, l represents a semantic tag, and B ⁿ ∈χ ⁿ N represents the number of corresponding coding layer; the superscript indicates the number of sampled layers and the subscript indicates the positional ordering of the points within the neighborhood.