CN112785526A

CN112785526A - Three-dimensional point cloud repairing method for graphic processing

Info

Publication number: CN112785526A
Application number: CN202110116229.1A
Authority: CN
Inventors: 朱佩浪; 张岩; 刘琨
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2021-05-11
Anticipated expiration: 2041-01-28
Also published as: CN112785526B

Abstract

The invention provides a three-dimensional point cloud repairing method for graphic processing, which comprises the following steps: step 1, collecting data of an input point cloud model dataset; step 2, combining a Self Attention mechanism method based on Self-Attention with a multi-layer perceptron MLP to obtain a long-distance dependency extraction network, mapping the input point cloud into a global feature vector by using the long-distance dependency extraction network, and generating a missing part of the incomplete point cloud by using a decoder of a topological root tree structure; and 3, synthesizing the incomplete point clouds and the generated missing part point clouds together to obtain a finally repaired complete point cloud model.

Description

Three-dimensional point cloud repairing method for graphic processing

Technical Field

The invention belongs to the field of computer three-dimensional model processing and computer graphics, and particularly relates to a three-dimensional point cloud repairing method for graphic processing.

Background

In recent years, direct acquisition of large amounts of three-dimensional data in the real world has been achieved by using LiDAR scanners or depth sensors, such as Kinect, and stereo cameras, among others.

However, the 3D data obtained using these instruments is often incomplete, mainly due to the following: the scanning view angle of the scanner is limited, and the influence of the shielding, light refraction and reflection of non-target objects is avoided. Therefore, the geometric information and semantic information of the target object are lost frequently. Therefore, it is a necessary research topic to study how to repair the incomplete 3D model for more subsequent applications. In addition, 3D models also come in a number of representations such as point clouds, voxels, patches, distance fields, and the like. The use of point clouds to represent and process 3D data has received increasing attention because of its lower storage cost compared to other representations (e.g., 3D voxel grids), but it enables a more refined representation of 3D models. The occurrence of documents 1 c.r.qi, h.su, k.mo, and l.j.guibas.point: Deep learning on point segments for 3D classification and segmentation.2018 allows unordered point sets to be directly processed, which greatly facilitates the development of Deep learning architectures for processing point clouds and the development of other related studies, such as 3D scene reconstruction, 3D model segmentation, 3D model repair, and the like.

Documents 2w.yuan, t.khot, d.helld, c.mertz, and m.hebert.pcn: Point comparison network.international reference on 3D Vision 2018, documents 3z.huang, y.yu, j.xu, f.ni, and x.le.pf-Net: Point frame for 3D Point Cloud comparison. reference on Computer Vision and Pattern Recognition 2020, documents 4 w.yuan, l.p.tchapmi, s.h. rezakighi, i.reid, and s.savage.topnet: pixel decoder.map.2019, which use a Point map of different dimensions as a final feature vector to extract and input the final feature from the Point map of the incomplete Point. Meanwhile, since there is no relatively feasible method to define the local neighborhood of the point cloud at present, it is difficult to extract features through convolution operation like a 2D image. Thus, these methods rely heavily on multiple fully connected layers with similar architectures to capture the features of the input model and the dependencies between different points in the input point cloud. Furthermore, PF-Net indicates that low and medium layers in MLP often extract local information and that these local cannot be exploited to form global features simply by passing them to higher layers using a shared fully connected layer. This means that the method cannot efficiently extract enough long distance dependent information and embed it into the final global feature vector. Another problem is that even if limited long distance dependent information can be captured, it often needs to go through several fully connected layers to learn it. This may be detrimental to efficient capture of long range dependencies, for several reasons:

(1) more targeted models may be needed to represent these long-range dependencies;

(2) it may be difficult for an optimization algorithm to calculate certain parameter values that may be used to facilitate coordination among multiple layers to capture these long range dependencies.

(3) These parameter settings may be statistically vulnerable when applied to new models not seen by the network.

In recent years, the Attention mechanism has been combined with various methods (such as the recovery method and the GAN method) to capture long-distance dependent information. It was first started in the field of computer vision and has evolved considerably in the field of Natural Language Processing (NLP). The document 5 V.Mnih, N.Heess, A.Graves, and K.Kavukcugcuoglu.Current Models of Visual Attention.reference on Neural Information Processing Systems 2014 combines this mechanism with the RNN method for image classification studies, and obtains excellent performance. Document 6 d. bahdana u, k. cho, and y. bengio. neural Machine Translation by Jointly Learning to Align and translate. international Conference on Learning translations 2015. the Translation mechanism is applied to NLP, i.e. it is used to perform Translation and alignment simultaneously to complete the Machine Translation task. Self-attention allows the input elements in a collection to interact with each other to compute weights or responses and to find out which elements a certain element should place more attention on. Document 7 a. vaswani, n.shazer, n.parmar, j.uszkoreit, l.jones, a.n.gomez, and l.kaiser. orientation Is All You new. Conference on Neural Information Processing Systems 2017 shows that applying the Self-orientation mechanism to machine translation tasks achieves the best performance at the time. Document 8h.zhang, i.goodfellow, d.metaxas, j. Uszkoreit, and a.odena.self-orientation genetic additive networks.international Conference on Machine Learning 2019. integrating the Self-orientation mechanism into the GAN framework, the best performance in terms of class-condition image generation on ImageNet at the time was achieved.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of providing a three-dimensional point cloud repairing method for graphic processing aiming at the defects of the prior art, and particularly discloses a three-dimensional point cloud repairing method for long-distance dependent extraction based on a self-attention mechanism, which is used for repairing an incomplete 3D model and comprises the following steps:

step 1, inputting a point cloud model data set and collecting data;

step 2, combining a Self Attention mechanism method based on Self-Attention with a multi-layer perceptron MLP to obtain a long-distance dependency extraction network, mapping the input point cloud into a full-locality characteristic vector by using the long-distance dependency extraction network, and generating a missing part of the incomplete point cloud by using a decoder of a topological root tree structure;

and 3, synthesizing the incomplete point clouds and the generated missing part point clouds together to obtain a finally repaired complete point cloud model.

The step 1 comprises the following steps:

step 1-1, setting and inputting a single three-dimensional point cloud model s, presetting 5 viewpoints which are (1, 0, 0), (0, 0, 1), (1, 0, 1), (-1, 0, 0), (-1, 1, 0), and (1, 1, 0), and ensuring that the missing part of an incomplete model has randomness when training and testing data are collected;

step 1-2, randomly selecting a viewpoint as a central point p, and presetting a radius r (the radius is set according to the removal point number, and is not a specific length in the mathematical sense, if the removal point number is set to be 25% of the original point cloud, then taking p as the central point and removing 25% of points nearest to the p point);

step 1-3, regarding a three-dimensional point cloud model s, removing points within a preset radius r by taking a randomly selected viewpoint p as a center to obtain an incomplete point cloud model; the set of points removed is the missing portion of the point cloud corresponding to the incomplete point cloud model.

The step 2 comprises the following steps:

step 2-1, the input three-dimensional point cloud model data set S ═ S_Train，S_TestDivide into training set S_Train＝{s₁，s₂，...s_i，...，s_nAnd test set S_Test＝{s_n+1，s_n+2，...，s_n+j，...，s_n+mIn which s is_iRepresenting the ith three-dimensional point cloud model, s, in the training set_n+jRepresenting the jth three-dimensional point cloud model in the test set; i is 1-n, j is 1-m;

step 2-2, for training set S_TrainAcquiring an incomplete point cloud model P of each three-dimensional point cloud model at a random viewpoint_Train＝{p₁，p₂，...p_i，...，p_nAnd the corresponding point cloud model G of the missing part_Train＝{g₁，g₂，...g_i，...，g_nTraining as the input of the whole network to obtain the trained long-distance dependency relationship extraction network and the decoder with the topology root tree structure, wherein p_iRefers to the training set S_TrainThe ith three-dimensional point cloud model s in (1)_iCorresponding incomplete point cloud model, g_iRefers to the training set S_TrainThe ith three-dimensional point cloud model s in_iA corresponding point cloud model of the missing part;

step 2-3, for test set S_TestCollecting incomplete point cloud model P of each three-dimensional point cloud model at random view point_Test＝{p_n+1，p_n+2，...，p_n+j，...，p_n+mAnd inputting the data into a trained network to obtain a missing part point cloud corresponding to the incomplete point cloud input model, wherein p is_n+jRefers to test set S_TestThe jth three-dimensional point cloud model s in (1)_n+jA corresponding incomplete point cloud model.

Step 2-2 comprises the following steps:

step 2-2-1, training set S_TrainMiddle incomplete point cloud P_TrainAs input, and using the corresponding missing part point cloud G_TrainAfter the supervised training is carried out, after the forward propagation of a shared multi-layer perceptron (shared MLP) in the first stage, each point in the incomplete point cloud is mapped into a 256-dimensional feature vector. The first-stage multilayer shared sensing machine is composed of two layers of shared fully-connected networks, each point is mapped into a 128-dimensional characteristic vector by the first layer, each point is mapped into a 256-dimensional characteristic vector by the second layer, and the whole input point cloud is mapped into a matrix with dimensions of 2048 multiplied by 256;

step 2-2-2, the 2048 × 256-dimensional matrix obtained in step 2-2-1 is set to x ═ x (x)₁，x₂，x₃，...，x_i) It will be the input from the attention module, where x_iA corresponding feature vector in an input point cloud is obtained; the attention scores of the input point clouds are calculated by mapping x onto two feature spaces Q and K through two 1 × 1 convolutional networks, which are obtained by functions h (x) and v (x), respectively, where Q ═ h (x)₁)，h(x₂)，h(x₃)，...h(x_i))＝((w_hx₁，w_hx₂，w_hx₃，...w_hx_i)， K＝(v(x₁)，v(x₂)，v(x₃)，...v(x_i))＝(w_vx₁，w_vx₂，w_vx₃，...w_vx_i)，w_hAnd w_vIs a weight matrix to be learned, which is respectively corresponding to h (x) and v (x) and is realized by 1 × 1 convolution, w_hAnd w_vAll dimensions of (a) are 32 × 256; q is a 2048 × 32 dimensional query matrix representing the input point cloudThe number of points of (1) is 2048, and each point is represented by a 32-dimensional feature vector, namely the length of a query value of each point is 32; k is a key value matrix corresponding to the input point cloud, and the dimensionality is also 2048 multiplied by 32, namely the key value dimensionality corresponding to each point is 32; q and K will be used to calculate the attention score value of the input point cloud;

step 2-2-3, defining a function

To calculate a scalar quantity, which is used to represent the dependency relationship of each point in the input point cloud with respect to other points; function(s)

Is defined as

Wherein i is the point index in the matrix Q obtained in the step 2-2-2, and j is the point index in the matrix K obtained in the step 2-2-2; for each point (represented by a 32-dimensional vector) in Q, multiplying key values corresponding to all points including the point by one, namely multiplying the key values by the 32-dimensional vector corresponding to each point in the matrix K, wherein the number of the input point clouds is 2048, so that each point can obtain 2048 scalars by calculation, and finally combining the scalars corresponding to all the points to obtain a matrix with the dimension of 2048 multiplied by 2048, namely an attention scoring graph corresponding to the input point clouds;

step 2-2-4, each point in the input point cloud is mapped to a value matrix V to calculate an input signal at a point j

Step 2-2-5, performing Softmax operation on the attention score map obtained in the step 2-2-3;

step 2-2-6, setting the output of the attention module as y, and mapping the input x to the output y by using a function phi;

step 2-2-7, performing maximum pooling;

step 2-2-8, finally generating a missing point cloud corresponding to the incomplete point cloud model;

and 2-2-9, obtaining the trained long-distance dependency relationship extraction network and the topology root tree generator.

Step 2-2-4 comprises: defining a function f (x)_j)＝w_fx_jMapping each point in the input point cloud to a matrix of values V to compute an input signal at point j, where w_fIs a weight matrix to be learned, which is realized by 1 × 1 convolution; the point in the value matrix V corresponds to each key value in the key value matrix K one by one, namely the input signal at the point j corresponds to the key value at the point j one by one; wherein V ═ f (x)₁)，f(x₂)，f(x₃)，...f(x_j))＝(w_fx₁，w_fx₂，w_fx₃，...w_fx_j) The dimension is 2048 × 128.

The steps 2-2-5 comprise: defining a formula

Performing Softmax operation on the attention score map obtained in step 2-2-3, wherein q is_i，jRepresenting the attention score value of point i in the query matrix Q with respect to point j in the key value matrix K.

The steps 2-2-6 comprise: define y ═ phi (x) ═ y₁，y₂，...，y_i，y_N)＝(φ(x₁)，φ(x₂)，...，φ(x_i)，φ(x_N) N is the number of points in the input point cloud, x)_NFor a feature vector, y, corresponding to a point in the input point cloud_iFor outputting the ith point in the point cloud, a formula is adopted

Calculated to give, wherein g (x)_i)＝w_gx_i，w_gIs a weight matrix to be learned, which is realized by 1 × 1 convolution; and finally, obtaining an output matrix y with the same dimension as the input characteristic matrix x, namely the dimension of the output matrix y is 2048 multiplied by 256.

The steps 2-2-7 comprise: and 2, performing maximum pooling on the 2048 × 256 dimensional matrixes obtained in the step 2-2-6, namely selecting the maximum value of each point under each dimension to combine into a 256 dimensional feature vector, obtaining 2048 × 256 dimensional feature matrixes with the same shape by using the feature vector through a stacking method, and splicing the feature matrixes and the 2048 × 256 dimensional matrixes obtained in the step 2-2-6 together to form the 2048 × 512 dimensional feature matrix fused with the long-distance dependency information.

The steps 2-2-8 comprise: the characteristic matrix of 2048 x 512 dimensions fused with the long-distance dependency relationship information is subjected to forward propagation of a second-stage shared multilayer perceptron, each point in an input incomplete point cloud model is mapped into a 1024-dimensional vector, the whole incomplete point cloud is mapped into a 2048 x 1024 matrix, and then the 2048 x 1024 matrix is subjected to maximum pooling, namely the maximum value of each point under each dimension is selected to obtain a 1024-dimensional global characteristic vector; and inputting the global feature vector into a decoder of a topological root tree structure, and finally generating the missing point cloud corresponding to the incomplete point cloud model.

The steps 2-2-9 comprise: and comparing the generated missing point cloud part with the real missing point cloud corresponding to the incomplete point cloud, calculating a Loss function of Loss, performing back propagation, and finally obtaining a trained long-distance dependency relationship extraction network and a topology root tree generator. The step 3 comprises the following steps:

the method of the present invention addresses the problem of three-dimensional model repair. Sensors can be used to acquire a large amount of three-dimensional data quickly, but it is often difficult to acquire complete three-dimensional data. Repairing and conjecturing the complete model based on the partial incomplete model are also widely applied to the fields of computer vision, robots, virtual reality and the like, such as mixed model analysis, target detection and tracking, 3D reconstruction, style migration, robot roaming and grabbing and the like, and the work is made to be very meaningful.

Has the advantages that: the method introduces a self-attention mechanism into the problem of three-dimensional point cloud restoration, does not only adopt layer sharing full connection for feature extraction, and is beneficial to modeling the long-distance dependence relationship among all points in the input point cloud. It can be seen from the visualization of the partial graph of the self-attention in fig. 4 and the comparison result in fig. 5 that, by using the self-attention mechanism, the points in the missing part point cloud generated by the network model of the present invention can be finely coordinated with other distant points, and the feature extractor can generate global features by using information of distant points rather than local positions, so that the prediction result has less noise and deformation, and the 3D model repairing effect is improved. The whole method system is efficient and practical. Meanwhile, as can be seen from tables 1 and 2, compared with other methods for repairing a three-dimensional point cloud model, the method provided by the invention has the advantages that the CD (Chamfer distance) value is remarkably reduced, and the repairing performance is remarkably improved.

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1a is an incomplete point cloud model before repair.

FIG. 1b is the repaired point cloud model.

FIG. 2 is a block diagram of a self-attention module of the method of the present invention.

FIG. 3 is a block diagram of a feature extraction module of the method of the present invention.

FIG. 4 is a visualization of the attention scores corresponding to the input point cloud model in the method of the present invention.

FIG. 5 is a comparison of the repair effect of the method of the present invention with other methods.

FIG. 6 is a flow chart of the present invention.

Detailed Description

As shown in fig. 6, the invention discloses a three-dimensional point cloud repairing method for extracting long-distance dependency relationship based on a self-attention mechanism, which randomly selects a viewpoint from a plurality of preset viewpoints as a central point, and removes all points in a preset radius range to acquire an incomplete model under the viewpoint; inputting incomplete models and corresponding missing parts of a model training set into the network of the method for training, and inputting incomplete models of a model test set into the trained network to obtain the missing parts corresponding to the incomplete models; and then synthesizing the incomplete model and the missing part together to obtain the finally repaired model.

For a given set of 3D models of a certain class S ═ S_Train，S_TestAre divided into training sets S_Train＝ {s₁，s₂，...s_i，...，s_nAnd test set S_Test＝{s_n+1，s_n+2，...，s_n+j，...，s_n+mIn which s is_iRepresenting the ith model, s, in the training set_n+jThe j model in the test set is shown, and the invention completes the test set S through the following steps_TestThe model is repaired, the target task is shown in fig. 1a, and the flow charts are shown in fig. 2, fig. 3 and fig. 6:

the method specifically comprises the following steps:

step 1, inputting a point cloud model data set and collecting data;

step 2, combining a Self Attention mechanism method based on Self-Attention with a multi-layer perceptron MLP to obtain a long-distance dependency extraction network, mapping the input point cloud into a global feature vector by using the long-distance dependency extraction network, and generating a missing part of the incomplete point cloud by using a decoder of a topological root tree structure;

The step 1 comprises the following steps:

The step 2 comprises the following steps:

step 2-1, the input three-dimensional point cloud model data set S ═ S_Train，S_TestDivide into training set S_Train＝{s₁，s₂，...s_i，...，s_nAnd test set S_Test＝{s_n+1，s_n+2，...，s_n+j，...，s_n+mIn which s is_iRepresenting the ith three-dimensional point cloud model, s, in the training set_n+jRepresenting the jth three-dimensional point cloud model in the test set; the value of i is 1-n, and the value of i is 1-m;

Step 2-2 comprises the following steps:

step 2-2-1, trainingCollection S_TrainMiddle incomplete point cloud P_TrainAs input, and using the corresponding missing part point cloud G_TrainAfter the supervised training is carried out, after the forward propagation of a shared multi-layer perceptron (shared MLP) in the first stage, each point in the incomplete point cloud is mapped into a 256-dimensional feature vector. The first-stage multilayer shared sensing machine is composed of two layers of shared fully-connected networks, each point is mapped into a 128-dimensional characteristic vector by the first layer, each point is mapped into a 256-dimensional characteristic vector by the second layer, and the whole input point cloud is mapped into a matrix with dimensions of 2048 multiplied by 256;

step 2-2-2, the 2048 × 256-dimensional matrix obtained in step 2-2-1 is set to x ═ x (x)₁，x₂，x₃，...，x_i) It will be the input from the attention module, where x_iA corresponding feature vector in an input point cloud is obtained; the attention scores of the input point clouds are calculated by mapping x onto two feature spaces Q and K through two 1 × 1 convolutional networks, which are obtained by functions h (x) and v (x), respectively, where Q ═ h (x)₁)，h(x₂)，h(x₃)，...h(x_i))＝((w_hx₁，w_hx₂，w_hx₃，...w_hx_i)， K＝(v(x₁)，v(x₂)，v(x₃)，...v(x_i))＝(w_vx₁，w_vx₂，w_vx₃，...w_vx_i)，w_hAnd w_vIs a weight matrix to be learned, which is respectively corresponding to h (x) and v (x) and is realized by 1 × 1 convolution, w_hAnd w_vAll dimensions of (a) are 32 × 256; q is a query matrix with dimensions of 2048 multiplied by 32, the number of points representing the input point cloud is 2048, each point is represented by a 32-dimensional feature vector, namely the length of a query value of each point is 32; k is a key value matrix corresponding to the input point cloud, and the dimensionality is also 2048 multiplied by 32, namely the key value dimensionality corresponding to each point is 32; q and K will be used to calculate the attention score value of the input point cloud;

step 2-2-3, defining a function

Is defined as

step 2-2-4, defining a function f (x)_j)＝w_fx_jEach point in the input point cloud is mapped to a matrix of values V to calculate the input signal at point j (the matrix of values will be used to multiply the attention score map obtained in step 2-2-3 to obtain a weighted vector), where w_fIs a weight matrix to be learned, and is realized by 1 × 1 convolution. The point in the value matrix V corresponds to each key value in the key value matrix K, i.e., the input signal at the point j corresponds to the key value at the point j. Wherein V ═ f (x)₁)，f(x₂)，f(x₃)，...f(x_j))＝(w_fx₁，w_fx₂，w_fx₃，...w_fx_j) The dimension is 2048 × 128;

step 2-2-5, defining a formula

Performing Softmax operation on the attention score map obtained in step 2-2-3 (i.e. performing Softmax operation on all the attention score values corresponding to each point so that the sum of the attention scores of each point with respect to all other points is 1), wherein q is_i，jRepresenting queriesThe attention score value of the point i in the matrix Q relative to the point j in the key value matrix K;

step 2-2-6, setting the output of the attention module as y, and mapping the input x to the output y by using a function phi; define y ═ phi (x) ═ y₁，y₂，...，y_i，y_N)＝(φ(x₁)，φ(x₂)，...，φ(x_i)，φ(x_N) N is the number of points in the input point cloud, x)_NFor a feature vector, y, corresponding to a point in the input point cloud_iFor outputting the ith point in the point cloud, a formula is adopted

Step 2-2-7, performing maximum pooling on the 2048 × 256 dimensional matrixes obtained in the step 2-2-6, namely selecting the maximum value of each point under each dimension to combine into a 256-dimensional feature vector, obtaining 2048 × 256 dimensional feature matrixes with the same shape by the feature vector through a stacking method, and splicing the feature matrixes and the 2048 × 256 dimensional matrixes obtained in the step 2-2-6 together to form a 2048 × 512 dimensional feature matrix fused with long-distance dependency relationship information;

step 2-2-8, enabling the characteristic matrix of 2048 x 512 dimensions fused with the long-distance dependency relationship information to be transmitted in the forward direction of a second-stage shared multilayer perceptron (shared MLP), mapping each point in an input incomplete point cloud model into a 1024-dimensional vector, mapping the whole incomplete point cloud into a 2048 x 1024 matrix, and performing maximum pooling on the 2048 x 1024 matrix, namely selecting the maximum value of each point in each dimension to obtain a 1024-dimensional global characteristic vector; inputting the global feature vector into a decoder of a topological root tree structure, and finally generating a missing point cloud corresponding to the incomplete point cloud model;

and 2-2-9, comparing the generated missing point cloud part with the real missing point cloud corresponding to the incomplete point cloud, calculating a Loss function of Loss, and performing back propagation to finally obtain the trained long-distance dependency relationship extraction network and the topology root tree generator.

The step 3 comprises the following steps:

and (3) synthesizing the point cloud of the missing part obtained in the step (2) and the incomplete point cloud together to obtain a final repaired complete point cloud model.

Examples

The target tasks of the present invention are shown in fig. 1a and fig. 1b, fig. 1a is an original model needing to be repaired, fig. 1b is a repaired model, the structural system of the self-attention module of the method of the present invention is shown in fig. 2, and the structural system of the whole global feature extractor is shown in fig. 3. The steps of the present invention are described below according to examples.

Step (1), collecting data of an input point cloud model data set;

step (1.1), setting and inputting a single three-dimensional point cloud model s, presetting 5 viewpoints which are respectively (1, 0, 0), (0, 0, 1), (1, 0, 1), (-1, 0, 0), (-1, 1, 0), and multiple different viewpoints to ensure that the missing part of an incomplete model has randomness when training and testing data are collected;

step (1.2), randomly selecting a viewpoint as a central point p, and presetting a radius r (the radius is set according to the removal point number, and is not a specific length in the mathematical sense, if the removal point number is set to be 25% of the original point cloud, then taking p as the central point, and removing 25% of points nearest to the p point);

step (1.3), regarding a three-dimensional point cloud model s, removing points within a preset radius r by taking a randomly selected viewpoint p as a center to obtain an incomplete point cloud model; the set of points removed is the missing portion of the point cloud corresponding to the incomplete point cloud model.

Step (2), combining a Self Attention mechanism method based on Self-Attention with a multi-layer perceptron MLP to obtain a long-distance dependency extraction network, mapping the input point cloud into a global feature vector by using the long-distance dependency extraction network, and generating a missing part of the incomplete point cloud by using a decoder of a topological root tree structure;

step (2.1), the input three-dimensional point cloud model data set S ═ S_Train，S_TestDivide into training set S_Train＝{s₁，s₂，...s_i，...，s_nAnd test set S_Test＝{s_n+1，s_n+2，...，s_n+j，...，s_n+mIn which s is_iRepresenting the ith three-dimensional point cloud model, s, in the training set_n+jRepresenting the jth three-dimensional point cloud model in the test set; the value of i is 1-n, and the value of i is 1-m;

step (2.2), for training set S_TrainAcquiring an incomplete point cloud model P of each three-dimensional point cloud model under a random viewpoint_Train＝{p₁，p₂，...p_i，...，p_nAnd the corresponding point cloud model G of the missing part_Train＝ {g₁，g₂，...g_i，...，g_nTraining as the input of the whole network to obtain the trained long-distance dependency relationship extraction network and the decoder with the topology root tree structure, wherein p_iRefers to the training set S_TrainThe ith three-dimensional point cloud model s in_iCorresponding incomplete point cloud model, g_iRefers to the training set S_TrainThe ith three-dimensional point cloud model s in_iA corresponding missing part point cloud model;

step (2.2.1), training set S_TrainMiddle incomplete point cloud P_TrainAs input, and using the corresponding missing part to point cloud G_TrainAfter the supervised training is carried out, after the forward propagation of a shared multi-layer perceptron (shared MLP) in the first stage, each point in the incomplete point cloud is mapped into a 256-dimensional feature vector. The first-stage multilayer shared sensing machine is composed of two layers of shared fully-connected networks, each point is mapped into a 128-dimensional characteristic vector by the first layer, each point is mapped into a 256-dimensional characteristic vector by the second layer, and the whole input point cloud is mapped into a matrix with dimensions of 2048 multiplied by 256;

step (2.2.2), the 2048 x 256 dimensional matrix obtained in step 2-2-1 is set asx＝(x₁，x₂，x₃，...，x_i) It will be the input from the attention module, where x_iA corresponding feature vector in an input point cloud is obtained; the attention scores for the input point clouds were calculated by mapping x onto two feature spaces Q and K through two 1 × 1 convolutional networks, obtained by the functions h (x) and v (x), respectively, where Q ═ h (x)₁)，h(x₂)，h(x₃)，...h(x_i))＝((w_hx₁，w_hx₂，w_hx₃，...w_hx_i)， K＝(v(x₁)，v(x₂)，v(x₃)，...v(x_i))＝(w_vx₁，w_vx₂，w_vx₃，...w_vx_i)，w_hAnd w_vIs a weight matrix to be learned, which is respectively corresponding to h (x) and v (x) and is realized by 1 × 1 convolution, w_hAnd w_vAll dimensions of (a) are 32 × 256; q is a query matrix with dimensions of 2048 multiplied by 32, the number of points representing the input point cloud is 2048, each point is represented by a 32-dimensional feature vector, namely the length of a query value of each point is 32; k is a key value matrix corresponding to the input point cloud, and the dimensionality is also 2048 multiplied by 32, namely the key value dimensionality corresponding to each point is 32; q and K will be used to calculate the attention score value of the input point cloud;

step (2.2.3), defining a function

Is defined as

Wherein i is the point index in the matrix Q obtained in the step 2-2-2, and j is the point index in the matrix K obtained in the step 2-2-2; for each point in Q (represented by a 32-dimensional vector), the key values corresponding to all points, including the point itself, are multiplied one by oneMultiplying 32-dimensional vectors corresponding to each point in the matrix K, wherein the number of the input point cloud is 2048, so that each point can be calculated to obtain 2048 scalars, and finally combining the scalars corresponding to all the points to obtain a matrix with the dimension of 2048 multiplied by 2048, namely the matrix is an attention scoring graph corresponding to the input point cloud;

step (2.2.4), defining function f (x)_j)＝w_fx_jMapping each point in the input point cloud to a matrix of values V to compute the input signal at point j (the matrix of values will be used to multiply the attention score map obtained in step 2-2-3 to obtain a weighted vector), where w_fIs a weight matrix to be learned, and is realized by 1 × 1 convolution. The point in the value matrix V corresponds to each key value in the key value matrix K, i.e., the input signal at the point j corresponds to the key value at the point j. Wherein V ═ f (x)₁)，f(x₂)，f(x₃)，...f(x_j))＝(w_fx₁，w_fx₂，w_fx₃，...w_fx_j) The dimension is 2048 × 128;

step (2.2.5), defining a formula

Softmax operation is performed on the attention score map obtained in step 2-2-3 (i.e., Softmax operation is performed on all the attention score values corresponding to each point such that the sum of the attention scores of each point with respect to all other points is 1), where q is_i，jRepresenting the attention score value of the point i in the query matrix Q relative to the point j in the key value matrix K;

step (2.2.6), setting the output of the attention module as y, and mapping the input x to the output y by using a function phi; define y ═ phi (x) ═ y₁，y₂，...，y_i，y_N)＝(φ(x₁)，φ(x₂)，...，φ(x_i)，φ(x_N) N is the number of points in the input point cloud, x)_NFor a feature vector, y, corresponding to a point in the input point cloud_iFor outputting the ith point in the point cloud, a formula is adopted

Step (2.2.7), performing maximum pooling on the 2048 × 256 dimensional matrixes obtained in the step 2-2-6, namely selecting the maximum value of each point under each dimension to combine into a 256-dimensional feature vector, obtaining 2048 × 256 dimensional feature matrixes with the same shape by the feature vector through a stacking method, and splicing the feature matrixes and the 2048 × 256 dimensional matrixes obtained in the step 2-2-6 together to form a 2048 × 512 dimensional feature matrix fused with long-distance dependency information;

step (2.2.8), the characteristic matrix of 2048 × 512 dimensions fused with the long-distance dependency relationship information is subjected to forward propagation of a shared multilayer perceptron (shared MLP) at a second stage, each point in an input incomplete point cloud model is mapped into a 1024-dimensional vector, the whole incomplete point cloud is mapped into a 2048 × 1024 matrix, and then the 2048 × 1024 matrix is subjected to maximum pooling, namely, the maximum value of each point under each dimension is selected to obtain a 1024-dimensional global characteristic vector; inputting the global feature vector into a decoder of a topological root tree structure, and finally generating a missing point cloud corresponding to the incomplete point cloud model;

and (2.2.9) comparing the generated missing point cloud part with the real missing point cloud corresponding to the incomplete point cloud, calculating a Loss function of Loss, and performing back propagation to finally obtain the trained long-distance dependency relationship extraction network and the topology root tree generator.

Step (2.3), for test set S_TestAcquiring an incomplete point cloud model P of each three-dimensional point cloud model under a random viewpoint_Test＝{p_n+1，p_n+2，...，p_n+j，...，p_n+mAnd inputting the data into a trained network to obtain a missing part point cloud corresponding to the incomplete point cloud input model, wherein p is_n+jRefers to test set S_TestTo (1)j three-dimensional point cloud models s_n+jA corresponding incomplete point cloud model. The test process mainly comprises the following steps:

step (2.3.1), an incomplete point cloud model P of a test model set under random viewpoints is collected_TestInputting the data into a generator network which is well trained in cooperation with the long-distance dependency relationship extraction network;

and (2.3.2) outputting missing part point clouds corresponding to the incomplete models of the test model set under the random viewpoint.

And (3) synthesizing the incomplete point cloud and the generated missing part point cloud together to obtain a finally repaired complete point cloud model.

Analysis of results

The experimental environmental parameters of the method of the invention are as follows:

(1) the parameters of an experimental platform for collecting data of the model are Ubuntu 16.04.4 LTS operating system, Intel (R) core (TM) i7-6850K CPU @3.60GHz and internal memory 32GB, a Python programming language is adopted, and a programming development environment is Pycharm 2019;

(2) the parameters of an experimental platform for carrying out the training and testing process of the long-distance dependency extraction network based on the Self-extension mechanism are an Ubuntu 16.04.4 LTS operating system, an Intel (R) core (TM) i7-6850K CPU @3.60GHz and a memory 32GB, the video card is TITAN RTX GPU 24GB, a Python programming language is adopted, and a TensorFlow third-party open source library is adopted for realization.

The results of comparative experiments (shown in tables 1 and 2) of the method of the present invention with TopNet, Folding, PCN, atlas Net, and PointNetFCAE are analyzed as follows:

experiments were performed on a subset of the accepted reference data set ShapeNet, which is a subset of 8 different models, each class of data set having the category names, meaning airplan, Lamp, Cabinet, Car, Chair, Couch, Table, Watercraft, as shown in the first column of Table 1; the division of the training set and test set is shown in the second column of table 1.

The final metric is the average Chamfer Distance (CD) of the intact model after repair. CD pair as shown in tables 1 and 2Ratios (Table 1 shows the 8 different model CD comparisons of the method of the invention with other methods on the ShapeNet dataset, and Table 2 shows the average CD comparison of the method of the invention with other methods on all classes on the ShapeNet dataset) are shown. The CDs shown in the table are all multiplied by 10 at the same time after calculation⁵. As can be seen from tables 1 and 2, all the class CD values and the class mean CD values of the method of the present invention are lower than those of the other methods. Fig. 5 shows a comparison graph of the repair result of the method of the present invention with other methods, and it can be seen that the method of the present invention significantly reduces noise and deformation for the missing part of the incomplete point cloud repair, and significantly improves the repair effect.

TABLE 1

TABLE 2

	ATlasNet	Folding	PCN	TopNet	PointNetFCAE	The method of the invention
							Category Avg.	94.4	74.6	67.1	63.9	97.6	55.8

In the Self-comparison experiment, the Self-annotation module for extracting long-distance dependence information is removed, and the CD ratio of the final experiment result is shown in Table 3, which shows that the optimization operation for extracting long-distance dependence information can significantly reduce the final CD value of the repaired complete model.

In addition, the method of the present invention visualizes the learned long-distance dependency information relationship, and the visualization result is shown in fig. 4. And each point in the incomplete 3D point cloud has a corresponding long-distance dependency information relationship. In each row of fig. 4, the first picture shows three representative positions of dots, and the other three pictures show the attention score plots corresponding to these dots. Because a more targeted method is adopted for extracting the long-distance dependence information, rather than only adopting a fully-connected shared multilayer perceptron layer, enough long-distance dependence information can be learned. Thereby significantly reducing the final CD value of the complete model after repair. Table 3 shows the comparison of the final results of the method of the present invention with the results after the optimization of the self-attention module without long distance dependent information extraction.

TABLE 3

The present invention provides a three-dimensional point cloud repairing method for image processing, and a plurality of methods and approaches for implementing the technical solution are provided, the above description is only a preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and decorations can be made without departing from the principle of the present invention, and these improvements and decorations should also be regarded as the protection scope of the present invention. All the components not specified in the embodiment can be realized by the prior art.

Claims

1. A three-dimensional point cloud repairing method for graphic processing is characterized by comprising the following steps:

step 1, inputting a point cloud model data set and collecting data;

2. The method of claim 1, wherein step 1 comprises the steps of:

step 1-1, setting and inputting a single three-dimensional point cloud model s, and presetting 5 viewpoints which are (1, 0, 0), (0, 0, 1), (1, 0, 1), (-1, 0, 0) and (-1, 1, 0);

step 1-2, randomly selecting a viewpoint as a central point p, and presetting a radius r;

3. The method of claim 2, wherein step 2 comprises the steps of:

step 2-1, inputIs as a three-dimensional point cloud model data set S ═ S_Train，S_TestDivide into training set S_Train＝{s₁，s₂，...s_i，...，s_nAnd test set S_Test＝{s_n+1，s_n+2，...，s_n+j，...，s_n+mIn which s is_iRepresenting the ith three-dimensional point cloud model, s, in the training set_n+jRepresenting the jth three-dimensional point cloud model in the test set; i is 1-n, j is 1-m;

step 2-2, for training set S_TrainAcquiring an incomplete point cloud model P of each three-dimensional point cloud model at a random viewpoint_Train＝{p₁，p₂，...p_i，...，p_nAnd the corresponding point cloud model G of the missing part_Train＝{g₁，g₂，..g_i，...，g_nTraining as the input of the whole network to obtain the trained long-distance dependency relationship extraction network and the decoder with the topology root tree structure, wherein p_iRefers to the training set S_TrainThe ith three-dimensional point cloud model s in_iCorresponding incomplete point cloud model, g_iRefers to the training set S_TrainThe ith three-dimensional point cloud model s in_iA corresponding missing part point cloud model;

step 2-3, for test set S_TestCollecting incomplete point cloud model P of each three-dimensional point cloud model under random view point_Test＝{p_n+1，p_n+2，...，p_n+j，...，p_n+mAnd inputting the data into a trained network to obtain a missing part point cloud corresponding to the incomplete point cloud input model, wherein p is_n+jRefers to test set S_TestThe jth three-dimensional point cloud model s in (1)_n+jA corresponding incomplete point cloud model.

4. A method according to claim 3, characterized in that step 2-2 comprises the steps of:

step 2-2-1, training set S_TrainMiddle incomplete point cloud P_TrainAs input, and using the corresponding missing partPoint cloud G_TrainPerforming supervision training, and after forward propagation of a first-stage shared multilayer perceptron, mapping each point in the incomplete point cloud into a 256-dimensional characteristic vector, wherein the first-stage multilayer shared perceptron is composed of two layers of shared fully-connected networks, the first layer maps each point into a 128-dimensional characteristic vector, the second layer maps each point into a 256-dimensional characteristic vector, and the whole input point cloud is mapped into a matrix with dimensions of 2048 × 256;

step 2-2-2, the 2048 × 256-dimensional matrix obtained in step 2-2-1 is set to x ═ x (x)₁，x₂，x₃，...，x_i) It will be the input from the attention module, where x_iA corresponding feature vector in an input point cloud is obtained; the attention scores of the input point cloud are calculated by mapping x to two feature spaces Q and K through two 1 × 1 convolution networks, which are obtained by functions h (x) and v (x), respectively, where Q ═ h (x)₁)，h(x₂)，h(x₃)，...h(x_i))＝((w_hx₁，w_hx₂，w_hx₃，...w_hx_i)，K＝(v(x₁)，v(x₂)，v(x₃)，...v(x_i))＝(w_vx₁，w_vx₂，w_vx₃，...w_vx_i)，w_hAnd w_vIs a weight matrix to be learned, corresponding to h (x) and v (x), respectively, and realized by 1 × 1 convolution, w_hAnd w_vAll dimensions of (a) are 32 × 256; q is a query matrix with dimensions of 2048 multiplied by 32, the number of points representing the input point cloud is 2048, each point is represented by a 32-dimensional feature vector, namely the length of a query value of each point is 32; k is a key value matrix corresponding to the input point cloud, and the dimensionality is also 2048 multiplied by 32, namely the key value dimensionality corresponding to each point is 32; q and K will be used to calculate the attention score value of the input point cloud;

step 2-2-3, defining a function

To compute a scalar quantity representing the dependency of each point in the input point cloud with respect to other pointsIs a step of; function(s)

Is defined as

Wherein i is the point index in the matrix Q obtained in the step 2-2-2, and j is the point index in the matrix K obtained in the step 2-2-2; for each point in Q, multiplying key values corresponding to all points including the point per se one by one, namely multiplying 32-dimensional vectors corresponding to each point in a matrix K, wherein the number of the input point cloud is 2048, so that each point can be calculated to obtain 2048 scalars, and finally combining the scalars corresponding to all the points to obtain a matrix with the dimension of 2048 multiplied by 2048, namely an attention scoring graph corresponding to the input point cloud;

step 2-2-7, performing maximum pooling;

5. The method of claim 4, wherein steps 2-2-4 comprise: defining a function f (x)_j)＝w_fx_jMapping each point in the input point cloud to a matrix of values V to compute an input signal at point i, where w_fIs a weight matrix to be learned, which is realized by 1 × 1 convolution; the point in the value matrix V corresponds to each key value in the key value matrix K one to one, that is, the input signal at the point j corresponds to the key value at the point j one to one; wherein V ═ f (x)₁)，f(x₂)，f(x₃)，...f(x_j))＝(w_fx₁，w_fx₂，w_fx₃，...w_fx_j) The dimension is 2048 × 128.

6. The method of claim 5, wherein steps 2-2-5 comprise: defining a formula

7. The method of claim 6, wherein steps 2-2-6 comprise: define y ═ phi (x) ═ y₁，y₂，...，y_i，y_N)＝(φ(x₁)，φ(x₂)，...，φ(x_i)，φ(x_N) N is the number of points in the input point cloud, x)_NFor a feature vector, y, corresponding to a point in the input point cloud_iFor outputting the ith point in the point cloud, a formula is adopted

8. The method of claim 7, wherein steps 2-2-7 comprise: and (3) performing maximum pooling on the 2048 × 256 dimensional matrixes obtained in the step 2-2-6, namely selecting the maximum value of each point under each dimension to combine into a 256 dimensional feature vector, obtaining 2048 × 256 dimensional feature matrixes with the same shape by using the feature vector through a stacking method, and splicing the feature matrixes and the 2048 × 256 dimensional matrixes obtained in the step 2-2-6 together to form the 2048 × 512 dimensional feature matrix fused with the long-distance dependency relationship information.

9. The method of claim 8, wherein steps 2-2-8 comprise: the characteristic matrix of 2048 x 512 dimensions fused with the long-distance dependency relationship information is subjected to forward propagation of a second stage shared multilayer perceptron, each point in an input incomplete point cloud model is mapped into a 1024-dimensional vector, the whole incomplete point cloud is mapped into a 2048 x 1024 matrix, and then the 2048 x 1024 matrix is subjected to maximum pooling, namely the maximum value of each point under each dimension is selected to obtain a 1024-dimensional global characteristic vector; and inputting the global feature vector into a decoder of a topological root tree structure, and finally generating the missing point cloud corresponding to the incomplete point cloud model.

10. The method of claim 9, wherein steps 2-2-9 comprise: and comparing the generated missing point cloud part with the real missing point cloud corresponding to the incomplete point cloud, calculating a Loss function of Loss, performing back propagation, and finally obtaining the trained long-distance dependency relationship extraction network and the topology root tree generator.