CN116543153A

CN116543153A - Semi-supervised point cloud semantic segmentation method based on selective active learning

Info

Publication number: CN116543153A
Application number: CN202310497117.4A
Authority: CN
Inventors: 徐宗懿; 袁波; 刘继逍; 高新波
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-08-04

Abstract

The invention belongs to the field of point cloud data processing, and particularly relates to a semi-supervised point cloud semantic segmentation method based on selective active learning; the segmentation method comprises the steps of obtaining target point cloud data; inputting the target point cloud data into a trained point cloud semantic segmentation neural network, and outputting a segmentation result of the target point cloud data; the marking cost of the point cloud data is reduced through the semi-supervised training point cloud semantic segmentation neural network; by actively learning the selection points, uncertain point selection is performed from an unlabeled point cloud data set through a specific sampling strategy, so that the selected point cloud data is important, contains rich information and can also be non-redundant. The method can improve the semi-supervised learning effect and the model capacity, and finally improve the segmentation performance of the three-dimensional point cloud data.

Description

Semi-supervised point cloud semantic segmentation method based on selective active learning

Technical Field

The invention belongs to the field of point cloud data processing, and particularly relates to a semi-supervised point cloud semantic segmentation method based on selective active learning.

Background

The purpose of the point cloud semantic segmentation is to assign a class label to each 3D point, which can be applied to various scenes such as robots, autopilots and augmented reality. Recently, deep learning-based approaches have achieved impressive performance. These high performance methods typically rely on large amounts of data with point-level tags. However, this also causes problems of high cost and mark redundancy.

To reduce the labor and cost of annotation, many methods have been proposed to explore semi-supervised ways to learn three-dimensional segmentation models of finite marker points, which are typically randomly sampled. These approaches attempt to develop efficient strategies to propagate tag information to untagged points. While these semi-supervised approaches can greatly reduce tag costs, performance may be limited. Since the marked points are randomly selected, the tag may contain redundant information, while important points may be omitted.

Active learning has recently been studied as an alternative learning strategy to alleviate the limitations of three-dimensional segmentation. Lin et al divide the entire point cloud into segments, each segment being the basic query unit of sample selection. The ReDAL suggests selecting those informative and diverse sub-scene regions for tag acquisition. Entropy, color discontinuities and structural complexity are used to measure information of sub-scene areas. Subsequently, the SSDR-AL groups the original point cloud into super points and progressively selects the most informative regions for annotation. However, the performance of region-based active learning is largely dependent on region-partitioning strategies. Furthermore, since the point cloud exhibits strong semantic similarity in the local region, selecting all points in the local region creates a redundant label budget.

In summary, although the independent use of the semi-supervised training and the active learning in the point cloud semantic segmentation task also achieves good results, how to effectively combine the semi-supervised training and the active learning and fully exert the advantages of the semi-supervised training and the active learning is a problem to be solved by the technicians in the field.

Disclosure of Invention

Based on the problems existing in the prior art, the invention provides a semi-supervised point cloud semantic segmentation method based on selective active learning, which comprises the steps of acquiring target point cloud data; inputting the target point cloud data into a trained point cloud semantic segmentation neural network, and outputting a segmentation result of the target point cloud data;

the training process of the point cloud semantic segmentation neural network specifically comprises the following steps:

101. acquiring a point cloud data set; the point cloud data set comprises a marked data set and an unmarked data set, the marked data set and the unmarked data set have the same distribution, each point cloud data in the marked data set has a label, and each point cloud data in the unmarked data set does not have a label;

102. training a point cloud semantic segmentation model in a semi-supervised mode by adopting a marked data set and an unmarked data set, and predicting to obtain a pseudo tag of unmarked data;

103. calculating the minimum margin score of the point cloud according to the point cloud data labels in the point cloud data set by adopting the layered minimum margin uncertainty measure; gradually capturing local contexts on different scales according to multiple downsampling, and calculating the context uncertainty score of the point cloud;

104. calculating an uncertainty result of the point cloud by adopting the minimum marginal score and the context uncertainty score of the point cloud;

105. and selecting a plurality of point clouds based on the uncertainty result of the point clouds, marking the selected plurality of point clouds, sending the marked point clouds into a marked data set, removing the marked point clouds from the unmarked data set, and returning to the step 101 until the training of the point cloud semantic segmentation model is completed.

The invention has the beneficial effects that:

the invention realizes the integration of a semi-supervised and improved active learning method, solves the problem that the learning performance of a semi-supervised model is limited by random point selection under the original semi-supervised framework, and provides a novel hierarchical-characteristic distance inhibition point selection active learning point selection strategy, which solves the problems of information redundancy and dependence on region division.

Drawings

FIG. 1 is a flow chart of a semi-supervised point cloud semantic segmentation method based on selective active learning according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a point cloud semantic segmentation neural network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a training process of a point cloud semantic segmentation neural network according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a semi-supervised point cloud semantic segmentation effect according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a schematic flow chart of a semi-supervised point cloud semantic segmentation method based on selective active learning, and as shown in fig. 1, the flow chart of the segmentation method operation of the invention is shown, and specifically comprises the following steps:

s1, acquiring cloud data of a target point;

in the embodiment of the invention, the target point cloud data is a point cloud data set to be segmented, the category to which each point cloud data in the set belongs needs to be marked, and the target point cloud data is segmented according to the category. The point cloud data set to be segmented may be acquired, for example, by an acquisition device on an autonomous vehicle.

S2, inputting the target point cloud data into a trained point cloud semantic segmentation neural network, and outputting a segmentation result of the target point cloud data;

in the embodiment of the invention, the problems of high cost and mark redundancy are brought about by relying on a large amount of data with point-level labels in the existing semantic segmentation neural network, so the label cost is reduced by adopting the semi-supervised point cloud semantic segmentation neural network, the semi-supervised point cloud semantic segmentation neural network is trained in advance by using a deep learning technology, the prediction type and the prediction probability of each point cloud data can be predicted by using the semi-supervised point cloud semantic segmentation neural network, wherein the prediction probability can represent the confidence of the corresponding prediction type, and the segmentation condition of the target point cloud data can be accurately obtained through the prediction probability and the prediction type.

In order to better explain the semi-supervised point cloud semantic segmentation neural network, the semi-supervised point cloud semantic segmentation neural network and the corresponding training process thereof are detailed.

Fig. 2 is a schematic diagram of a point cloud semantic segmentation neural Network according to an embodiment of the present invention, as shown in fig. 2, where a Teacher-Student model is adopted in the semi-supervised point cloud semantic segmentation neural Network, specifically, two segmentation networks are constructed by using a minkowski space according to an embodiment of the present invention, and are respectively represented as a Teacher model (Teacher Network) and a Student model (Student Network). Inputting point cloud data of a point cloud data set into a teacher model, inputting enhancement data of the point cloud data set into a student model, and training the teacher model and the student model by using consistency constraint between the teacher model and the student model and adopting cross entropy loss; optimizing a student network through gradient descent; and transferring the parameters of the student model to the teacher model by using the index moving average index.

The formula for transferring the parameters of the student model to the teacher model by using the index moving average index is expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,is the parameters of the teacher model in the jth training iteration,>is the parameters of the teacher model in the j-1 th training iteration process, < ->Is a parameter of the student model in the jth training iteration process; alpha is a super parameter that determines the rate of parameter transfer, typically approaching 1. In each iteration process, the teacher model outputs a predicted label of the marked data and a pseudo label of the unmarked data, and then the label output result of the model can be used for sampling and selecting the unmarked points so as to learn the better marked data to optimize the point cloud semantic segmentation neural network in an active learning mode.

Fig. 3 is a schematic diagram of a training process of a point cloud semantic segmentation neural network according to an embodiment of the present invention, as shown in fig. 3, where the training process includes:

101. acquiring a point cloud data set;

in consideration of the cost problem and redundancy problem of the point cloud data marking, the method and the device select better point cloud data to mark in an active learning mode to ensure the precision of the point cloud semantic segmentation neural network, wherein the point cloud data set comprises a marked data set and an unmarked data set, the marked data set and the unmarked data set have the same distribution, each point cloud data in the marked data set is provided with a label, and each point cloud data in the unmarked data set is not provided with a label;

in the embodiment of the invention, the point cloud data in the marked data set and the unmarked data set are respectively input into a teacher model, the enhancement data of the point cloud data in the marked data set and the unmarked data set are input into a student model, and the teacher model and the student model are trained by using consistency constraint between the teacher model and the student model and adopting cross entropy loss; optimizing a student network through gradient descent; and transferring the parameters of the student model to the teacher model by adopting an index moving average index, wherein in each training process, the teacher model outputs a predicted label of marked data and a pseudo label of unmarked data, and because the marked data has a real label, the unmarked data does not have a real label, and the real label of the marked data has higher effectiveness, the more important unmarked data is selected by utilizing the real label of the marked data and the pseudo label of the unmarked data in the subsequent process, and the next round of training iteration is carried out after the selected unmarked data is manually marked until the corresponding training requirement is met.

In some embodiments of the invention, semi-supervised learning refers to machine learning methods that utilize labeled and unlabeled data for training. The whole semi-supervised training process of this embodiment can be reported as follows:

1. preparing a data set: and respectively preparing the tagged point cloud data and the untagged point cloud data, and dividing the tagged point cloud data and the untagged point cloud data into a training set and a verification set according to a certain proportion.

2. Building a Teacher network: and training a deep neural network by using the tagged point cloud data to serve as a Teacher network. The training goal of this network is to minimize the loss function of tagged data, i.e., to minimize the gap between predicted and true tag values.

3. Building a Student network: and training a deep neural network by using the tagged point cloud data and the untagged point cloud data to serve as a Student network. When training the Student network, the output result of the Teacher network is used as the label of the label-free data, so that the label-free data is utilized.

4. Training Student network: and iteratively adjusting parameters of the Student network to reduce the loss function of all data as much as possible, wherein the loss function comprises tagged point cloud data and untagged point cloud data. Specifically, in each iteration period, the parameters of the Student network are updated by using the labeled point cloud data, and then the parameters are updated by using the unlabeled point cloud data. The loss function of the unlabeled point cloud data is formed by the difference between the prediction result of the Student network and the output result of the Teacher network.

5. Evaluating model performance: after each iteration cycle is completed, the validation set is used to evaluate the performance of the Student network. If the performance meets the expected requirement, stopping training; otherwise, continuing to adjust parameters of the Student network until the performance requirement is met.

Through the steps, the Teacher-Student model can fully utilize the unlabeled point cloud data to improve the learning effect, so that better performance is obtained under the same labeled point cloud data, and the segmentation performance of the point cloud semantic segmentation is improved in the subsequent process.

in the embodiment of the invention, the existing mode of performing the point cloud segmentation task by using active learning is generally based on the region, and the performance of the active learning based on the region depends on the region division strategy to a great extent. Furthermore, since the point cloud exhibits strong semantic similarity in the local region, selecting all points in the local region creates a redundant label budget. Therefore, on the active learning adopted by the invention, the invention predicts uncertainty of the point itself and neighbor points around the point through the hierarchical minimum marginal uncertainty measurement, and gradually captures local contexts on different scales after a plurality of downsampling. And accumulating the information and uncertainty of the neighbor points every time of downsampling to obtain a final uncertainty result.

For the hierarchical minimum marginal uncertainty metric, in particular, a point without surrounding context information cannot reflect the true importance of that point. The hierarchical minimum marginal uncertainty metric calculates an uncertainty score for each point by grouping points on multiple scales and progressively perceiving the context information of the unlabeled points over a larger range of levels. The minimum margin score U of each unlabeled point containing only self information can be calculated by the following formula _x 。

U _x ＝h(x ^u ；p ₁ (x ^u ))-h(x ^u ；p ₂ (x ^u ))

Wherein U is _x Representing a point cloud x _u X, the minimum margin fraction of (2) _u Representing candidate point clouds that can be selected for tagging; p is p ₁ (x ^u ) Tag class probability p representing highest score obtained by predicting point cloud semantic segmentation model ₂ (x ^u ) The method comprises the steps that a point cloud semantic segmentation model is represented to predict to obtain label category probability with high score; h (·) represents the segmentation predictor of the point cloud semantic segmentation model.

In addition to the minimum mark fraction, we need to obtain point cloud context uncertainty, we group wider neighbors by downsampling. The flexible maximum number transfer parameter (softmax) predictions for each downsampled point are obtained by averaging the flexible maximum number transfer parameter (softmax) predictions for neighboring points in the original point cloud. The flexible maximum transfer parameter (softmax) label for each downsampled point represents the predicted distribution of a local region, we useAnd (3) representing. We do voxel downsampling of N layers, to-be-marked point x ^u Can be represented by a downsampled prediction average of the i-th layer, as shown in the following formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing point x _u Local context information at the i-th downsampling layer; />Indicated at unlabeled point x _u A j-th neighbor point at the i-th downsampling layer; k (K) _i Is the total number of neighbor points in the voxel radius in the ith downsampling layer, p (·) represents the predictive probability. For downsampling of each layer, the point x is not marked ^u Context uncertainty score of +.>Can be obtained by the following formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing a point cloud x _u A context uncertainty score at the ith downsampling; />Predicting the point cloud semantic segmentation model to obtain local context information corresponding to the ith downsampling score highest, and performing +_f>And predicting the point cloud semantic segmentation model to obtain local context information corresponding to the ith downsampling score.

105. Calculating an uncertainty result of the point cloud by adopting the minimum marginal score and the context uncertainty score of the point cloud;

in the embodiment of the invention, the unlabeled point x is integrated ^u Point level uncertainty score U _x And its contextual uncertainty scoreObtaining a final uncertainty fraction v ^u Expressed as:

wherein v is ^u Representing a point cloud x _u Is the uncertainty result of (U) _x Representing a point cloud x _u Is the minimum marginal fraction of (2)，Representing a point cloud x _u Context uncertainty score, x, at ith downsampling _u Representing candidate point clouds that can be selected for tagging; w (w) ⁱ Indicating the super-parameter of the ith downsampling, N being the number of downsampling times, for example, w when n=3 ⁱ E {0.1,0.01,0.001} is a superparameter whose value from left to right represents the 1 st, 2 nd, 3 rd downsampling is w ¹ ，w ² ，w ³ Is a value of (a).

106. And selecting a plurality of point clouds based on the uncertainty result of the point clouds, marking the selected plurality of point clouds, sending the marked point clouds into a marked data set, removing the marked point clouds from the unmarked data set, and returning to the step 101 until the training of the point cloud semantic segmentation model is completed.

In the embodiment of the invention, after the layering minimum marginal uncertainty measurement is carried out, we obtain a ranking of final uncertainty, and at this time, the first K points can be directly selected for marking according to the ranking condition of the uncertainty result of the point cloud.

In the preferred embodiment of the present invention, we have devised a feature distance suppression method to ensure that the final selected marker points are evenly distributed, considering that this would result in redundancy of information since some of the top K points would be from the same local area. The characteristic distance suppression mode comprises the steps of determining a distance suppression radius and a characteristic similarity threshold; judging whether other selected point clouds exist in the distance inhibition radius of the candidate point cloud, and if the other selected point clouds do not exist, taking the candidate point cloud as the selected point cloud; and if other selected point clouds exist, the similarity distance between the candidate point clouds and the other selected point clouds is set, if the similarity distance exceeds a characteristic similarity threshold, the candidate point clouds are used as redundant points, the redundant points are restrained, and if the similarity distance does not exceed the characteristic similarity threshold, the candidate point clouds are used as the selected point clouds.

Specifically, for each selected point x _i We give a distance-suppressing radius r and aA threshold τ of feature similarity, first, we determine if there are other selected points within the distance-suppressing radius of the selected point, D ⁱ Is a set initialized as an empty set for storing all x _i Distance from point x in the radius of suppression which has been selected _j The formula is as follows:

wherein d is _ij Representative is x _i To x _j Is a distance of (3).

If it is judged that D ⁱ Still empty set, illustrated at x _i No other selected point in the distance-suppressed radius, then x _i This local range will be represented and marked as such in the next step. Of course, if D ⁱ Not empty, then we need to judge D ⁱ Points x and x in (a) _i The formula of the similarity is as follows:

wherein f _i And f _j Respectively represent x _i And x _j If Sim (x) _i ,x _j )>τ,x _i Will be considered redundant points and will be removed from the set of points that need to be marked.

Finally, when we go through feature distance suppression, we output the points that we need to mark finally, then we can mark these points manually or otherwise, the marked points are marked points, and these marked points will be replaced into the dataset and update the marked dataset portion and the unmarked dataset portion. The next iteration may then be performed until a preset termination condition is met.

Fig. 4 is a schematic diagram of a semi-supervised point cloud semantic segmentation effect according to an embodiment of the present invention, as shown in fig. 4, and is an effect diagram comparing the segmentation result of the full-supervision method on the test set of the S3DIS region-5. The method of the present invention gives comparable or even better results than the fully supervised baseline (MinkowskiNet) when only 0.43% of the marker data is used.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, etc.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The semi-supervised point cloud semantic segmentation method based on selective active learning is characterized by comprising the steps of acquiring target point cloud data; inputting the target point cloud data into a trained point cloud semantic segmentation neural network, and outputting a segmentation result of the target point cloud data;

2. The semi-supervised point cloud semantic segmentation method based on selective active learning of claim 1, wherein the point cloud semantic segmentation neural network comprises two point cloud semantic segmentation networks which are respectively represented as a teacher model and a student model and are constructed through a minkowski space; inputting point cloud data of a point cloud data set into a teacher model, inputting enhancement data of the point cloud data set into a student model, and training the teacher model and the student model by using consistency constraint between the teacher model and the student model and adopting cross entropy loss; optimizing a student network through gradient descent; and transferring the parameters of the student model to the teacher model by using the index moving average index.

3. The semi-supervised point cloud semantic segmentation method based on selective active learning of claim 2, wherein the formula for transferring the parameters of the student model to the teacher model by using the index moving average index is expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,is the parameters of the teacher model in the jth training iteration,>is the parameters of the teacher model in the j-1 th training iteration process, < ->Is a parameter of the student model in the jth training iteration process; alpha is a super parameter that determines the parameter transfer rate.

4. The semi-supervised point cloud semantic segmentation method based on selective active learning of claim 1, wherein the minimum marginal score of the point cloud is represented as:

U _x ＝h(x ^u ；p ₁ (x ^u ))-h(x ^u ；p ₂ (x ^u ))

5. The semi-supervised point cloud semantic segmentation method based on selective active learning of claim 1, wherein the point cloud context uncertainty score is expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing a point cloud x _u Context uncertainty score, x, at ith downsampling _u Representing candidate point clouds that can be selected for tagging; />Predicting the point cloud semantic segmentation model to obtain local context information corresponding to the ith downsampling score highest, and performing +_f>The method comprises the steps of representing a point cloud semantic segmentation model to predict and obtain local context information corresponding to the ith downsampling score; h (·) represents the segmentation predictor of the point cloud semantic segmentation model.

6. The semi-supervised point cloud semantic segmentation method based on selective active learning of claim 1, wherein the uncertainty result of the point cloud is expressed as:

wherein v is ^u Representing a point cloud x _u Is the uncertainty result of (U) _x Representing a point cloud x _u Is set to be a minimum margin score of (c),representing a point cloud x _u Context uncertainty score, x, at ith downsampling _u Representing candidate point clouds that can be selected for tagging; w (w) ⁱ Indicating the super-parameter of the ith downsampling, N is the downsampling frequency.

7. The semi-supervised point cloud semantic segmentation method based on selective active learning of claim 1, wherein in step 105, the selecting a plurality of point clouds based on the uncertainty result of the point clouds includes selecting a plurality of top-ranked point clouds according to the ranking of the uncertainty result of the point clouds; or selecting a plurality of point clouds with different similarity by adopting a characteristic distance suppression mode.

8. The semi-supervised point cloud semantic segmentation method based on selective active learning of claim 7, wherein the feature distance suppression mode comprises determining a distance suppression radius and a feature similarity threshold; judging whether other selected point clouds exist in the distance inhibition radius of the candidate point cloud, and if the other selected point clouds do not exist, taking the candidate point cloud as the selected point cloud; and if other selected point clouds exist, calculating the similarity distance between the candidate point clouds and the other selected point clouds, if the similarity distance exceeds a characteristic similarity threshold, taking the candidate point clouds as redundant points, and inhibiting the redundant points, and if the similarity distance does not exceed the characteristic similarity threshold, taking the candidate point clouds as the selected point clouds.