CN113408651B

CN113408651B - Unsupervised three-dimensional object classification method based on local discriminant enhancement

Info

Publication number: CN113408651B
Application number: CN202110784487.7A
Authority: CN
Inventors: 黄宇楠; 雷蕴奇; 王其聪; 陈伶俐; 蔡珊珊
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2021-07-12
Filing date: 2021-07-12
Publication date: 2024-01-23
Anticipated expiration: 2041-07-12
Also published as: CN113408651A

Abstract

An unsupervised three-dimensional object classification method based on local discriminant enhancement relates to the computer vision technology. The method comprises the following steps: A. preparing a point cloud data set for three-dimensional object classification; B. carrying out data enhancement on the original point cloud sample, and retaining the original point cloud sample and the enhanced point cloud sample; C. comparing the original version of each point cloud sample with the data-enhanced version, and extracting the high-dimensional global characteristics of each point cloud sample; D. the correlation and discriminant of different local structures are excavated by using the high-dimensional global features of the point cloud sample, so that the discriminant of the local features of the point cloud sample is enhanced; E. the local features and the high-dimensional global features of the fusion point cloud sample after the enhancement obtain discriminative enhanced fusion features; F. and training a linear support vector machine, and performing unsupervised classification by using fusion characteristics of the point cloud samples. The method can effectively learn the discriminative representation characteristics of the point cloud object, so that the performance of unsupervised three-dimensional object classification is improved.

Description

Unsupervised three-dimensional object classification method based on local discriminant enhancement

Technical Field

The invention relates to a computer vision technology, in particular to an unsupervised three-dimensional object classification method based on local discriminant enhancement.

Background

The traditional unsupervised three-dimensional object classification method uses a neural network to extract high-dimensional global features of Point cloud objects, classifies the Point cloud objects according to the high-dimensional global features, ignores the importance of local structural features of the objects, and enables the discrimination of the learned features to be weak, such as the features of each Point in an independent learning Point cloud sample in the feature extraction process by achliopatas et al (achliopatas, panos, O.Diamanti, ioannisMitliagkas and l.guilas. "Learning Representations and Generative Models for 3D Point Clouds (ICML (2018)), and merges the features of each Point to obtain the high-dimensional global features of the objects.

The success of convolutional neural networks proves the importance of local structural features of objects, and part of methods consider the importance of the local features and combine the local features to improve the discrimination capability of the learned object representation features. For example, rao et al (Rao, yongming, jiwen Lu and J.Zhou. "Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds."2020IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020): 5375-5384) divide the Point cloud sample into subsets during feature extraction, each subset being a local structure, and extract features of each local structure using a neural network. These methods, however, equally consider the importance of the local structure of the object, and do not consider that different local structures have different contributions to the representation of the object. You et al (You, haoxan, y. Feng, r.ji and Yue gao. "PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition." Proceedings of the 26th ACM international conference on Multimedia (2018): n.pag) used the feature extracted from multi-View data to mine correlation between different local structures of the original point cloud data in consideration of the importance of the different local structures, however, the tagged multi-View data and the original point cloud data are difficult to acquire, and are not suitable for a scene having only one type of untagged point cloud data.

Disclosure of Invention

The invention provides an unsupervised three-dimensional object classification method based on local discriminant enhancement. The method is characterized in that a deep network framework is built, high-dimensional global features of point cloud samples are learned, and the discriminant of local features is enhanced by using the high-dimensional global features. Firstly, carrying out random data enhancement on an original point cloud sample, and restraining similarity of features extracted from the original point cloud sample and the enhanced point cloud sample so as to learn high-dimensional global features of the point cloud sample. The relevance between the high-dimensional global features and the local features is calculated to enhance the discriminant of the local features, the local features with the strongest discriminant and the high-dimensional global features are fused, and finally the linear support vector machine is trained by using the fused features to perform unsupervised three-dimensional object classification.

The invention comprises the following steps:

A. preparing a point cloud data set for three-dimensional object classification;

B. carrying out data enhancement on the original point cloud sample, and retaining the original point cloud sample and the enhanced point cloud sample;

C. comparing the original version of each point cloud sample with the data-enhanced version, and extracting the high-dimensional global characteristics of each point cloud sample;

D. the correlation and discriminant of different local structures are excavated by using the high-dimensional global features of the point cloud sample, so that the discriminant of the local features of the point cloud sample is enhanced;

E. the local features and the high-dimensional global features of the fusion point cloud sample after the enhancement obtain discriminative enhanced fusion features;

F. and training a linear support vector machine, and performing unsupervised classification by using fusion characteristics of the point cloud samples.

In step a, the preparing of the point cloud data set for three-dimensional object classification further comprises the sub-steps of:

A1. a common dataset using unsupervised three-dimensional point cloud object classification, model net (Wu, zhirong, shell Song, a. Khosla, et al, "3D ShapeNets:A deep representation for volumetric shapes."2015IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015): 1912-1920.), including model net40 and model net10; modelNet40 contains 40 classes 12311 of personal designed CAD models, with a training set of 9843 models, and a testing set of 2468 models; modelNet10 contains 10 categories 4900 CAD models, with training set 3991 models, test set 909 models;

A2. the method comprises the steps of preprocessing point cloud data, and uniformly sampling 1024 points from the surface of each point cloud sample, wherein each point contains xyz coordinate information.

In step B, the data enhancement on the original point cloud sample further includes the following sub-steps:

B1. after the point cloud data set is prepared, the point cloud samples are subjected to random data enhancement operations, possible data enhancement including translation and scaling, rotation, dithering, randomly removing portions of the points, etc., leaving the original samples and enhanced samples.

In step C, the extracting the high-dimensional global feature of each point cloud sample further includes the sub-steps of:

C1. the input point cloud sample is an unordered point cloud set comprising n pointsEach point contains three-dimensional coordinates, the input point cloud sample and the corresponding random enhanced point cloud sample are respectively extracted by using two encoders with depth convolution networks to realize shared parameters, and the original point cloud is extracted to obtain local features L= { L ₁ ,l ₂ ,...,l _n And global feature g, where l _i Representing local features of different abstract levels, extracting enhancement point cloud to obtain local featuresAnd global features->

C2. Global feature g or extracted for one of the encodersFeature transformation using a multi-layer perceptron-implemented feature transformation network to obtain transformed features z or +.>

C3. In order to enable a network to effectively learn the high-dimensional global features with unchanged point clouds, namely the learned global features can not be influenced by different data enhancement, a symmetrical loss function is used for constraining the similarity of the global features extracted by one encoder and the transformed features obtained after the global features extracted by the other encoder are transformed, and the loss function is as follows:

s is a similarity function, similarity between two features is calculated, and stop of gradient operation and counter-propagation of the encoder gradient are stopped; the high-dimensional global feature of the point cloud samples can be extracted through network training, and the feature keeps unique information of each point cloud sample, which is not affected by data enhancement.

In step D, the enhancing the discriminant of the local features of the point cloud sample further includes the substeps of:

D1. the importance of the different local structures of the point cloud on the object representation is different, and the local structure with stronger discriminant can better represent the object; the global features with stronger discriminant of the point cloud are used for excavating the features with stronger discriminant in the local structure; first, the local feature l is calculated _i And the correlation score for the global feature g is as follows:

the operation sigma is to copy the global feature n times, and then splice the copied global feature to each point of the local feature;the relation between the local features and the global features is deduced by using a relation function realized by a multi-layer perceptron; phi is a regularization function, regularized so that the obtained correlation score ranges from [0,1]Between them; the obtained correlation score represents the correlation between each local feature and the global feature, and the higher the score is, the larger the correlation is, indicating the discrimination of the local featureThe stronger the sex;

D2. enhancing local features using correlation scores of the resulting global features and local features _i Is the discrimination of (1):

l′ _i ＝l _i *(1+Score(l _i ,g))

the local features with strong discriminant originally are further enhanced, and the local features with weak discriminant are further weakened.

In step E, the step of merging the local features and the high-dimensional global features obtained by enhancing the point cloud sample to obtain the fusion features with discriminative enhancement further includes the following substeps:

E1. the local features and the high-dimensional global features of the fused point cloud sample after enhancement obtain discriminative enhanced fused features:

f＝ρ(γ(l′ ₁ ),γ(l' ₂ ),...,γ(l' _n ),γ(g))

wherein gamma is the maximum pooling operation, rho is the characteristic splicing operation, and the local characteristics l of the point cloud sample are selected _i And the feature with the strongest discriminant in the global feature g is spliced to obtain a fused feature f with strong discriminant after fusion;

E2. and reconstructing the point cloud samples from the fusion characteristics by using a decoder realized by a depth convolution network, and constraining the similarity of the reconstructed point cloud samples and the original input point cloud samples by using a similarity function so as to optimize the fusion characteristics.

In step F, the training of the linear support vector machine, performing unsupervised classification by using the fusion features of the point cloud samples, is to train the linear support vector machine by using features extracted from the training set data, and predict the class of the testing set data by using the trained linear support vector machine, so as to implement unsupervised three-dimensional object classification.

According to the method, firstly, the high-dimensional global features are learned through comparing the feature similarity of the original point cloud sample and the point cloud sample with the data enhanced, the correlation between the local features of the point cloud and the high-dimensional global features is further calculated, and the discriminant of the local features is enhanced. The object representation feature obtained by fusing the local feature with the strongest discriminant and the high-dimensional global feature can realize good unsupervised classification performance.

Compared with the prior art, the invention has the following outstanding advantages:

the invention only uses the original point cloud data, and the previous unsupervised three-dimensional object classification method based on the point cloud considers the importance of different local structures equally. The high-dimensional global feature of the point cloud sample is learned first, and the feature contains unique information of each point cloud sample and has high discriminant. And then, the relevance between the high-dimensional global features and the local features is calculated to enhance the discriminant of the local features, and finally, the enhanced local features and the high-dimensional global features are fused to obtain fusion features with stronger discriminant, and the fusion features are used for three-dimensional object classification to obtain remarkable performance effects.

Drawings

FIG. 1 is a schematic diagram of an unsupervised three-dimensional object classification framework according to an embodiment of the present invention.

Detailed Description

The following examples are given to illustrate the method of the present invention in detail with reference to the accompanying drawings, and the examples are given as embodiments and specific operation procedures based on the technical scheme of the present invention, but the scope of the present invention is not limited to the following examples.

Referring to fig. 1, the implementation of the embodiment of the present invention includes the following steps:

1. a point cloud dataset is prepared.

A. Common data sets for unsupervised three-dimensional point cloud object classification, modelNet (Wu, zhimng, shuran Song, A.Khosla, et al, "3D ShapeNets:A deep representation for volumetric shapes."2015IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015): 1912-1920.), including ModelNet40 and ModelNet10, were used. ModelNet40 contains 40 classes 12311 of personal designed CAD models, with a training set of 9843 models, and a testing set of 2468 models. ModelNet10 contains 10 categories 4900 CAD models, with training set 3991 models, test set 909 models.

B. The method comprises the steps of preprocessing point cloud data, and uniformly sampling 1024 points from the surface of each point cloud sample, wherein each point contains xyz coordinate information.

2. And carrying out data enhancement on the point cloud sample. After the point cloud data set is prepared, the point cloud samples are subjected to random data enhancement operations, and possible data enhancement includes translation and scaling, rotation, dithering, random removal of part of the points, etc., with the original samples and the enhanced samples being preserved.

3. And extracting high-dimensional global features of the point cloud sample.

A. The input point cloud sample is an unordered point cloud set comprising n pointsEach point contains three-dimensional coordinates, the input point cloud sample and the corresponding random enhanced point cloud sample are respectively extracted by using two encoders with depth convolution networks to realize shared parameters, and the original point cloud is extracted to obtain local features L= { L ₁ ,l ₂ ,...,l _n And global feature g, where l _i Representing local features of different abstract levels, extracting enhancement point cloud to obtain local featuresAnd global features->

B. Global feature g or extracted for one of the encodersFeature transformation using a multi-layer perceptron-implemented feature transformation network to obtain transformed features z or +.>

C. In order to enable a network to effectively learn the high-dimensional global features with unchanged point clouds, namely the learned global features can not be influenced by different data enhancement, a symmetrical loss function is used for constraining the similarity of the global features extracted by one encoder and the transformed features obtained after the global features extracted by the other encoder are transformed, and the loss function is as follows:

wherein, the stop rad is to stop the gradient operation and stop the counter-propagation of the encoder gradient. S is a sine and cosine similarity function, and similarity between two features is calculated:

wherein I ₂ Is l ₂ Norms. The high-dimensional global feature of the point cloud samples can be extracted through network training, and the feature keeps unique information of each point cloud sample, which is not affected by data enhancement.

4. And the discriminant of the local characteristics of the point cloud sample is enhanced.

A. The importance of the different local structures of the point cloud on the object representation is different, and the local structures with stronger discriminant can better represent the object. And excavating the characteristic with stronger discriminant in the local structure by using the global characteristic with stronger discriminant of the point cloud. First, the local feature l is calculated _i And the correlation score for the global feature g is as follows:

where operation σ is to copy the global feature n times and then splice the copied global feature to each point of the local feature.The relation between the local feature and the global feature is deduced by using a relation function implemented by a multi-layer perceptron. Phi is a regularization function, regularized so that the obtained correlation score ranges from [0,1]Between them. The obtained correlation score represents the correlation between each local feature and the global feature, and the higher the score is, the larger the correlation is, which indicates that the discrimination of the local feature is stronger.

B. Enhancing local features using correlation scores of the resulting global features and local features _i Is the discrimination of (1):

l′ _i ＝l _i *(1+Score(l _i ,g))

5. And obtaining fusion characteristics by the local characteristics and the high-dimensional global characteristics after the fusion point cloud sample is enhanced.

A. The local features and the high-dimensional global features of the fused point cloud sample after enhancement obtain discriminative enhanced fused features:

f＝ρ(γ(l′ ₁ ),γ(l' ₂ ),...,γ(l' _n ),γ(g))

wherein gamma is the maximum pooling operation, rho is the characteristic splicing operation, and the local characteristics l of the point cloud sample are selected _i And the feature with the strongest discriminant in the global feature g is spliced to obtain a fused feature f with strong discriminant after fusion.

B. Reconstructing a point cloud sample from the fusion feature by using a decoder implemented by a depth convolution network, and constraining similarity of the reconstructed point cloud sample and an original input point cloud sample by using a chamferrostance function, thereby optimizing the fusion feature:

wherein X is the original point cloud,is the reconstructed point cloud.

6. And training a linear support vector machine, and performing unsupervised classification by using fusion characteristics of the point cloud samples. And training a linear support vector machine by using the features extracted by the training set data and the corresponding data labels, predicting the category of the test set data by using the trained linear support vector machine, and calculating the classification accuracy by comparing the predicted category with the real data labels.

Table 1 shows the comparison of the three-dimensional point cloud object classification data set ModelNet40/10 with other unsupervised three-dimensional point cloud object classification methods, the evaluation criteria adopts classification accuracy,

TABLE 1

LGAN corresponds to the method proposed by achliopatas et al (achliopatas, panos, O.Diamanti, ioannisMitliagkas and L.Guibas. "Learning Representations and Generative Models for 3D Point Clouds." ICML (2018 "));

FoldinNet corresponds to the method proposed by Yang et al (Yang, Y., chen Feng, Y.Screen and Dong Tian. "FoldinNet: point Cloud Auto-Encoder via Deep Grid analysis," 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018): 206-215.);

ClusterNet corresponds to the method proposed by Zhang et al (Zhang, L.and Z.Zhu. "Unsupervised Feature Learning for Point Cloud Understanding by Contrasting and Clustering Using Graph Convolutional Neural networks."2019International Conference on 3D Vision (3 DV) (2019): 395-404.);

the 3D-PointCapsNet corresponds to the method proposed by Zhao et al (Zhao, Y., tolgabirdial, haowen Deng and Federico Tombari. "3D Point Capsule Networks."2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019): 1009-1018.);

MAP-VAE corresponds to the method proposed by Han et al (Han, Z., xiyang Wang, yu-Shen Liu and Matthias Zwicker. "Multi-Angle Point Cloud-VAE: unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half prediction."2019IEEE/CVF International Conference on Computer Vision (ICCV) (2019): 10441-10450.);

PointGLR corresponds to the method proposed by Rao et al (Rao, yongming, jiwen Lu and J.Zhou. "Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of D Point Clouds."2020IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020): 5375-5384.).

Claims

1. The non-supervision three-dimensional object classification method based on local discriminant enhancement is characterized by comprising the following steps of:

the distinguishing of the local characteristics of the enhanced point cloud sample further comprises the following substeps:

the operation sigma is to copy the global feature n times, and then splice the copied global feature to each point of the local feature;is realized by using a multi-layer perceptronDeriving a relationship between the local feature and the global feature; phi is a regularization function, regularized so that the obtained correlation score ranges from [0,1]Between them; the obtained correlation score represents the correlation between each local feature and the global feature, and the higher the score is, the larger the correlation is, which indicates that the discrimination of the local feature is stronger;

l _i '＝l _i *(1+Score(l _i ,g))

the local features with strong discriminant originally are further enhanced, and the local features with weak discriminant are further weakened;

2. The non-supervised three dimensional object classification method based on local discriminative enhancement as recited in claim 1, wherein in step a, the preparing of the point cloud data set for three dimensional object classification further comprises the sub-steps of:

A1. a common data set ModelNet of unsupervised three-dimensional point cloud object classification is adopted, wherein the common data set ModelNet comprises ModelNet40 and ModelNet10; modelNet40 contains 40 classes 12311 of personal designed CAD models, with a training set of 9843 models, and a testing set of 2468 models; modelNet10 contains 10 categories 4900 CAD models, with training set 3991 models, test set 909 models;

3. The method for classifying an unsupervised three-dimensional object based on local discriminant enhancement as recited in claim 1, wherein in step B, said data enhancement of the original point cloud sample further comprises the sub-steps of:

B1. after the point cloud data set is prepared, performing random data enhancement operation on the point cloud sample, wherein the data enhancement comprises translation and scale transformation, rotation, dithering and random removal of partial points, and the original sample and the enhanced sample are reserved.

4. The method for classifying an unsupervised three-dimensional object based on local discriminant enhancement as recited in claim 1, wherein in step C, said extracting high-dimensional global features of each point cloud sample further comprises the sub-steps of:

5. The method for classifying an unsupervised three-dimensional object based on local discriminant enhancement as claimed in claim 1, wherein in step E, the step of combining the local features and the high-dimensional global features obtained by enhancing the cloud sample to obtain the discriminant enhanced combined features further comprises the following sub-steps:

f＝ρ(γ(l ₁ '),γ(l' ₂ ),...,γ(l' _n ),γ(g))

6. The method for classifying an unsupervised three-dimensional object based on local discriminant enhancement as claimed in claim 1, wherein in the step F, the training of the linear support vector machine, the unsupervised classification using the fusion features of the point cloud samples, is to train the linear support vector machine using features extracted from the training set data, and predict the class of the test set data using the trained linear support vector machine, so as to implement the unsupervised three-dimensional object classification.