CN117523549B

CN117523549B - Three-dimensional point cloud object identification method based on deep and wide knowledge distillation

Info

Publication number: CN117523549B
Application number: CN202410009182.2A
Authority: CN
Inventors: 田逸非; 陈敏; 李朋阳; 尹捷明; 吕梦婕; 周剑
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2024-01-04
Filing date: 2024-01-04
Publication date: 2024-03-29
Anticipated expiration: 2044-01-04
Also published as: CN117523549A

Abstract

The invention belongs to the field of three-dimensional point cloud object identification, and discloses a three-dimensional point cloud object identification method based on deep and wide knowledge distillation, which comprises the steps of firstly, selecting a deep learning model as a teacher model, inputting original point cloud data into the teacher model for pre-training and testing, and respectively obtaining characteristic nodes, enhancement nodes and prediction results after training and testing; and secondly, training the data obtained after training as a training sample of the stacking width learning model after knowledge distillation to obtain a width learning classifier, and finally, inputting the sample data obtained by knowledge distillation of the data obtained by testing the teacher model into the training width learning classifier to obtain a class label. According to the invention, the advantages of the teacher model are transferred to the stacking width learning model, so that better classification capability can be obtained by using the student model after knowledge distillation, the calculation amount of the model is greatly reduced by using the stacking width model, and the classification speed is improved.

Description

Three-dimensional point cloud object identification method based on deep and wide knowledge distillation

Technical Field

The invention belongs to the field of three-dimensional point cloud object identification, and particularly relates to a three-dimensional point cloud object identification method based on deep and wide knowledge distillation.

Background

A point cloud is a three-dimensional dataset made up of a large number of discrete points that can represent objects and scenes in three-dimensional space. Each point contains spatial coordinate information and sometimes other information such as color, normal vector, etc. Point clouds are commonly used to capture geometric and visual information of objects or scenes in the real world, while the identification of three-dimensional point cloud objects is a classical task in the field of computer vision, and has wide application in the fields of automatic driving, industrial part production, and the like.

One of the most challenging tasks in three-dimensional point cloud object recognition is feature extraction and structural information analysis, particularly when dealing with unique properties of the point cloud, such as disorder, irregularity, and the like. To overcome these challenges, powerful methods are needed to efficiently handle the complexity inherent in point cloud data.

Inspired by the achievement of deep learning models in the field of image processing, many researchers have utilized convolutional neural networks to identify three-dimensional objects from point clouds. However, to overcome the special nature of the point cloud, the point cloud cannot be directly input into a traditional deep convolutional network, and preprocessing is required. Some network models translate the point cloud into a 2D/3D regular grid, e.g., multi-view images, voxels, in order to directly use existing deep learning algorithms to identify three-dimensional objects. However, these preprocessing steps may result in a loss of apparent features of the original geometric details. Thus, the grid-based algorithm is only applicable to objects with distinct distinguishing features. In order to avoid the point cloud losing features during rasterization, some researchers use multi-layer perceptrons (MLPs) to simulate convolution kernels in point cloud feature extraction, and by stacking MLPs sharing weights, high-dimensional point-to-point features and their neighborhood information can be obtained. In addition, for the unstructured nature of point clouds, some point convolution kernels aimed at directly extracting point cloud features have also been studied, and although these convolution kernels have a feature extraction capability that is stronger than that of MLPs, the parameters in these models require a lot of time and memory to train and fine tune. And most of the neural networks pay more attention to operator design, so that the influence of nonlinear classification on object recognition performance at a full connection layer is ignored.

While deep network structures can provide a network with a strong learning ability, such structures suffer from a large number of hyper-parameters and corresponding propagation processes, which are time consuming to train. The width learning system (BLS) is a shallow neural network structure, which reduces layer-to-layer coupling compared to a deep structure, making the network structure more compact. The width learning system generates feature nodes and enhancement nodes using inputs, and the feature nodes and enhancement nodes are connected with an output layer, and weights thereof are obtained by calculating pseudo-inverses. In addition, the width learning system is an incremental learning system, and can update network parameters in an incremental manner, and when feature nodes, enhancement nodes or input data are newly added, the network does not need to be retrained from the beginning, and only the weight of the newly added part is calculated, so that compared with a deep structure network, the width learning system has the characteristics of rapidness and high efficiency.

However, the learning ability of the width learning system as a shallow neural network is relatively limited, and the accuracy of the width learning system cannot be well ensured when facing complex tasks.

Disclosure of Invention

In order to solve the technical problems, the invention provides a three-dimensional point cloud object identification method based on deep and wide knowledge distillation.

In order to achieve the above purpose, the invention is realized by the following technical scheme:

the invention relates to a three-dimensional point cloud object identification method based on deep and wide knowledge distillation, which specifically comprises the following steps:

step 1, selecting a deep learning network as a teacher model, training the teacher model based on a training point cloud data set, extracting distinguishing features of original point cloud data by using the teacher model, and obtaining a soft label generated by the model;

step 2, constructing a jackKDBLS model of an n-layer width learning network, taking the jackKDBLS model as a student model, and splicing the characteristics acquired in the step 1 to be used as input of the student model;

step 3, training a stackKDBLS model by utilizing the soft tag information and the real tag of the point cloud data acquired by the deep learning network;

and 4, if the final training result in the step 3 exceeds a preset threshold value, further stacking a width learning network on the basis of the original jackKDBLS model.

The invention further improves that: the distinguishing features required to be obtained in the step 1 are global features and local features respectively, the global features are obtained by performing a series of spatial transformation, feature extraction and pooling operations on input point cloud data, the global features represent global properties of point cloud, and the geometric structures and features of the point cloud are summarized and extracted; the local features are obtained by first determining the local neighborhood range of each point, then normalizing its coordinates for each selected local neighborhood to reduce the effects of transformations such as rotations, translations, etc., and finally convolving or pooling, however unlike global features, it captures more specific and abstract feature information learned at the deep neural network layer, which helps the network more fully understand the structure and features of the point cloud data.

The invention further improves that: the student model constructed in the step 2 comprises n width learning system modules, the n width learning system modules are stacked through residual connection, the output of the i-1 th width learning system module is used as the input of the i-1 th width learning system module, the output of the i-1 th width learning system module is the residual of the 1,2, …, i-1 width learning system module, i is less than or equal to n, the final output of the width learning system module is the sum of the outputs of the n width learning system modules, and each width learning system module comprises a feature node, a feature node weight, an enhancement node and an enhancement node weight.

Assuming that the input data is x and the output data is y, the output of the ith width learning system module is u _i The method comprises the following steps:

，

wherein,and->For the connection weight of the feature node to the output layer, < ->For the weight between the randomly generated input and the feature node, < ->The connection weight between the randomly generated characteristic node and the enhancement node is used; />Is->Andwherein->Generalized function as feature node, ++>In order to enhance the generalized function of the node,，/>as a mapping function, the final output of the system is:

。

and->Obtained by solving an optimization problem:

，

wherein,is +.for training data in the ith width learning system module>Is a desired output of (1); />Is a balance coefficient.

The optimization problem is solved by ridge regression approximation:

，

wherein,；/>connection weights for feature node and enhancement node to output layer, +.>Is an identity matrix.

The invention further improves that: typically, in the classification task, the object labels are represented in a manner called one-hot coding, where each class is represented by a vector, only one element is 0, and the remaining elements are used to represent the class. The output of the neural network is a set of scores called logits, which have not been normalized by the softmax function and therefore may contain more information, particularly similarities and correlations between the target class and other classes. Thus, knowledge distillation is adopted in step 3, and the teacher model θ is used to assist in training the stackKDBLS model.

The logits and predicted outputs of the teacher model θ are:

，

wherein the method comprises the steps ofFor inputting data +.>Logits for teacher model, +.>And outputting the prediction of the teacher model theta.

If not considering teacher model theta _i ThenWherein->As pseudo-inverse matrix, when using teacher model θ to assist in training the stabkdbls model, target output +.>The calculation mode of (2) is changed, and the calculation mode is changed into:

，

wherein,for the number of stacked jackkdbls models, +.>Is distillation temperature, ++>Is the predicted output of the first jackkdbls model. This calculation takes into account the output of the teacher model and the previous jackkThe output of the DBLS model helps to convey more information to train the stabkdbls model.

If k=1, thenSummary of the output of the first k stabkdbls models need not be considered.

The invention further improves that: in step 5, to determine whether to stack more breadth-learning networks into the stackKDBLS model, KL divergence is used to measure the predicted outputAnd the target output. The calculation formula of the KL divergence is as follows:

，

if it isAbove a predetermined threshold epsilon, a BLS network is added to improve the performance of the model.

The beneficial effects of the invention are as follows:

1. the invention allows knowledge distillation between different types of models, namely, knowledge is transferred from a complex deep neural network to a lightweight width learning network, so that knowledge sharing and migration between models are promoted;

2. the invention can effectively utilize the width learning network as the student model to learn more knowledge from the teacher model more quickly and directly;

3. the stackKDBLS model improves the overall performance of three-dimensional shape recognition of the original point cloud through the knowledge transfer framework, and can obtain higher classification precision than the original deep learning network under the condition of smaller time and resource expenditure.

Drawings

FIG. 1 is a flow chart of a three-dimensional point cloud object identification method based on deep and wide knowledge distillation.

Fig. 2 is a schematic diagram of a distillation model based on deep and wide knowledge, namely a jackkdbls.

Detailed Description

Embodiments of the invention are disclosed in the drawings, and for purposes of explanation, numerous practical details are set forth in the following description. However, it should be understood that these practical details are not to be taken as limiting the invention. That is, in some embodiments of the invention, these practical details are unnecessary.

As shown in fig. 1, the invention relates to a three-dimensional point cloud object identification method based on deep and wide knowledge distillation, which specifically comprises the following steps:

step 1, selecting a deep learning network as a teacher model, training the teacher model based on a training point cloud data set, extracting distinguishing features of original point cloud data by using the teacher model, and obtaining a soft label generated by the model.

The specific operation is as follows:

(1) Selecting a deep learning network model, and taking original point cloud data as the input of the model;

(2) The method comprises the steps of performing a series of operations such as space transformation, feature extraction, pooling and the like on original point cloud data through a depth network model to capture global features and local features of the original point cloud;

(3) And obtaining the soft tag information by using the trained deep learning network.

And 2, constructing a stackKDBLS model with a three-layer width learning network, taking the stackKDBLS model as a student model, and splicing the characteristics acquired in the step 1 to be used as input of the student model. The specific operation is as follows:

step 21, for the first width learning system module, randomly initializing a weight matrixAnd->Using、/>And->Calculating to obtain characteristic node and enhancement node>，/>The formula can be used:

；

step 22, calculating to obtain input dataAnd (2) desired output->Weights between->，/>And then by the formula: />Obtaining a predicted output +.>；

Step 23, stacking new width learning modules on the basis of the first width learning system module, and outputting the i (i=2, 3) th width learning system module asI.e., the output of the last width learning system module, the desired output is:

，

likewise randomly initializing a weight matrixAnd->Use +.>、/>And->Calculating to obtain characteristic node and enhancement node>，/>The formula can be used:

calculating to obtain input->And (2) desired output->Weights between->、/>；

Step 24, thereby passing through the formula:obtain a pre-preparationMeasuring output->；

Step 25, repeating step 23 until the number of stacked width learning system modules is equal to n, and the final prediction output is:。

and step 3, training a stackKDBLS model by utilizing the soft tag information and the real tag of the point cloud data acquired by the deep learning network. The specific operation is as follows:

the knowledge distillation mode is adopted, the deep learning network selected in the step 1 is used as a teacher model theta to assist in training a stackKDBLS model, and the logits and the prediction output of the teacher model theta are as follows:

，

when the teacher model θ is employed to assist in training the stabkdbls model, an output is desiredThe calculation mode of (a) is changed and is converted into:

，

wherein,is the distillation temperature;

and 4, if the final training result in the step 3 exceeds a preset threshold value, further stacking a width learning network on the basis of the original jackKDBLS model. The specific operation is as follows:

to determine whether to stack more breadth-learning systems into the model, KL divergence is used to measure the prediction outputAnd eyes (eyes)Differences between the target outputs. The calculation formula of the KL divergence is as follows:

，

if it isAbove a predetermined threshold epsilon, a width learning system is additionally added to improve the performance of the model.

According to the invention, the advantages of the teacher model are transferred to the stacking width learning model, so that better classification capability can be obtained by using the student model after knowledge distillation, the calculation amount of the model is greatly reduced by using the stacking width model, and the classification speed is improved.

The foregoing description is only illustrative of the invention and is not to be construed as limiting the invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present invention, should be included in the scope of the claims of the present invention.

Claims

1. A three-dimensional point cloud object identification method based on deep and wide knowledge distillation is characterized in that: the three-dimensional point cloud object identification method comprises the following steps:

step 1, selecting a deep learning network model as a teacher model, training the teacher model based on a training point cloud data set, extracting distinguishing characteristics of original point cloud data by using the teacher model, and acquiring a soft label generated by the teacher model;

step 2, constructing a jackKDBLS model of an n-layer width learning network, taking the jackKDBLS model as a student model, and splicing the distinguishing features acquired in the step 1 to be used as input of the student model;

step 3, training the stackKDBLS model constructed in the step 2 by utilizing the soft tag information and the real tag of the point cloud data acquired by the deep learning network in the step 1;

step 4, if the final training result in step 3 exceeds a predetermined threshold, stacking a width learning network based on the original stackKDBLS model, wherein:

the jackKDBLS model constructed in the step 2, namely the student model, comprises n width learning system modules which are stacked through residual connection, wherein the output of the ith-1 width learning system module is used as the input of the ith width learning system module, the output of the ith width learning system module is 1,2, …, the residual of the ith-1 width learning system module, i is less than or equal to n, the final output of the width learning system module is the sum of the outputs of the n width learning system modules, each width learning system module comprises a characteristic node, a characteristic node weight, an enhancement node and an enhancement node weight,

the step 2 specifically includes:

wherein,and->For the connection weight of the feature node to the output layer, < ->For the weight between the randomly generated input and the feature node, < ->The connection weight between the randomly generated characteristic node and the enhancement node is used; q (Q) _p (. Cndot.) is a composite mapping of P (-) and Q (-) where P (-) is the generalized function of the feature node and Q (-) is the generalized function of the enhancement node, vi=g(u _i -1), g (·) is a mapping function, the final output of the system being:

2. the three-dimensional point cloud object identification method based on deep and wide knowledge distillation according to claim 1, wherein the method comprises the following steps: the step 1 specifically comprises the following steps:

step 1.1, selecting a deep learning network model, and taking original point cloud data as input of the deep learning network model;

step 1.2, performing space transformation, feature extraction and pooling operation on original point cloud data through a depth network model to capture global features and local features of the original point cloud data;

and 1.3, acquiring soft tag information of the point cloud data by using the trained deep learning network.

3. The three-dimensional point cloud object identification method based on deep and wide knowledge distillation according to claim 2, wherein the method comprises the following steps: the distinguishing features in the step 1 are global features and local features respectively, the global features are obtained by performing spatial transformation, feature extraction and pooling operation on input original point cloud data, the global features represent global properties of point clouds, the geometric structures and features of the point clouds are summarized and extracted, the local features are obtained by determining a local neighborhood range of each point, normalizing coordinates of each selected local neighborhood, and finally obtaining the local features through convolution or pooling, so that more local and abstract feature information learned in a deep neural network middle layer is captured.

4. The three-dimensional point cloud object identification method based on deep and wide knowledge distillation according to claim 1, wherein the method comprises the following steps: the saidAnd->Obtained by solving an optimization problem:

wherein y is _i Is the desired output for training data vi in the ith width learning system module, lambda is the balance coefficient,

the optimization problem is solved by ridge regression approximation:

wherein,W _i and the connection weights of the characteristic nodes and the enhancement nodes and the output layer are adopted, and I is an identity matrix.

5. The three-dimensional point cloud object identification method based on deep and wide knowledge distillation according to claim 1, wherein the method comprises the following steps: the step 3 specifically includes:

step 3.1, adopting a knowledge distillation mode, and using a teacher model theta to assist in training a stackKDBLS model, wherein the logits and the prediction output of the teacher model theta are as follows:

Z _t ＝θ(X)

Y _t ＝sofymax(θ(X))

wherein X is input data, Z _t Logits, Y, as teacher model _t The prediction output of the teacher model theta;

step 3.2, if the teacher model θ is not considered, thenWherein (A) ^k ) ⁺ As a pseudo-inverse matrix, when using the teacher model θ to assist in training the stabkdbls model, the target output Y ^k Calculation mode of (a)The calculation mode is changed into:

where k is the number of stacked jackkdbls models, t ^k Is the distillation temperature, Y ^l Is the predictive output of the first width learning system module,

if k=1, Y ¹ ＝(1-1/t)Y _GT +1/tY _t Summary of the output of the first k stabkdbls models need not be considered.

6. The three-dimensional point cloud object identification method based on deep and wide knowledge distillation according to claim 1, wherein the method comprises the following steps: in step 4, to determine whether to stack more breadth-learning networks into the stackKDBLS model, KL divergence is used to measure the predicted output Y ^k The difference from the target output, wherein the calculation formula of the KL divergence is:

L ^k ＝D _KL (Y ^k ||Y _GT )

if L ^k Above a predetermined threshold epsilon, a breadth-learning network is added to improve the performance of the model.