CN114972695A

CN114972695A - Point cloud generation method and device, electronic equipment and storage medium

Info

Publication number: CN114972695A
Application number: CN202210555391.8A
Authority: CN
Inventors: 李革; 陈婧怡; 李宏; 高伟
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-08-30
Anticipated expiration: 2042-05-20
Also published as: CN114972695B

Abstract

The application provides a point cloud generating method, a point cloud generating device, electronic equipment and a storage medium, wherein the method comprises the following steps: s1, acquiring point cloud data of the target type; s2, for each point cloud data, processing the point cloud data by using an encoder to obtain a mean vector and a variance vector corresponding to the point cloud data; s3, obtaining an implicit code vector corresponding to the point cloud data based on the mean vector, the variance vector and the Gaussian distribution vector; s4, inputting the first hidden code matrix into the forward process of the point cloud normalized stream to obtain a first matrix; s5, inputting each point cloud data and the first implicit code matrix into a reversible decoder, and carrying out the forward process of the target process to obtain a second matrix; s6, calculating a first loss value based on the first implicit code matrix and the first matrix, and calculating a second loss value based on the first gaussian distribution matrix and the second matrix. The method and the device can enable the point cloud generated by the reversible point cloud decoder after training is finished to have richer details.

Description

Point cloud generation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of point cloud generation, and in particular, to a point cloud generation method and apparatus, an electronic device, and a storage medium.

Background

Point cloud generation aims at generating a point cloud from a specific distribution. This is a challenging task due to the disorder of the point cloud.

Common methods of point cloud generation include generating a countermeasure network, an autoregressive model, a variational autoencoder, and a normalized stream. However, these methods have inherent drawbacks, such as unstable generation of robust network training, sequential generation of autoregressive models, fuzzy point clouds generated by variational autocoders, and long time required for normalized stream training. Therefore, there have been some studies to combine a variational auto-encoder with a normalized stream to improve the quality of the generated point cloud. However, this part of the method rarely considers network design. Specifically, for the point cloud encoder part of the variational automatic encoder, a widely adopted structure is difficult to capture local information, and the defect can cause that hidden codes are not represented in high frequency, which is not beneficial to the generation of point cloud. Direct replacement with a complex point cloud encoder destroys the robustness of the generation and brings unacceptable computational costs.

Disclosure of Invention

In view of the above, an object of the present application is to provide a point cloud generating method, an apparatus, an electronic device and a storage medium, which can make the point cloud generated by the reversible point cloud decoder after training have richer details.

In a first aspect, an embodiment of the present application provides a point cloud generating method, where the method includes:

s101, acquiring point cloud data of m objects of a target type, wherein m is an integer larger than 1;

s102, for each newly acquired point cloud data, performing a target processing process on the point cloud data by using a point cloud encoder to obtain a mean vector corresponding to the point cloud data and a variance vector corresponding to the point cloud data, wherein the target processing process comprises the following steps: performing first convolution processing on the point cloud data to obtain an initial characteristic matrix of the point cloud data; based on the initial feature matrix, obtaining a first spatial feature matrix for representing the mutual position information of each point and each first neighbor point in the point cloud data in a feature space, and obtaining a second spatial feature matrix for representing the mutual position information of each point and each second neighbor point in the point cloud data in a Cartesian space, wherein for each point in the point cloud data, the first neighbor point of the point is k points in the point cloud data which are closer to the point in the feature space, the second neighbor node of the point is k points in the point cloud data which are closer to the point in the Cartesian space, and k is an integer greater than 1; performing first pooling processing on an integrated spatial feature matrix to obtain a first pooled feature matrix, and performing second pooling processing on the integrated spatial feature matrix to obtain a second pooled feature matrix, wherein the integrated spatial feature matrix is obtained by adding the first spatial feature matrix and the second spatial feature matrix, or the integrated spatial feature matrix is obtained by combining the first spatial feature matrix and the second spatial feature matrix; converting the first pooled feature matrix into the mean vector by a first fully-connected layer and converting the second pooled feature matrix into the variance vector by a second fully-connected layer;

s103, obtaining an implicit code vector corresponding to the point cloud data based on the mean vector, the variance vector and a Gaussian distribution vector formed by first target sampling points, wherein the first target sampling points are obtained by randomly sampling noise of which the probability density function accords with standard Gaussian distribution;

s104, a first hidden code matrix obtained by combining hidden code vectors corresponding to each point cloud data obtained latest is input into the forward process of a first point cloud normalization stream, and a first target matrix with the same dimension as the first hidden code matrix is obtained;

s105, inputting each newly acquired point cloud data and the first hidden code matrix into a reversible point cloud decoder, and performing a forward process of a target process to obtain a second target matrix, wherein the target process is the diffusion of a second point cloud normalized stream or a Markov chain;

s106, calculating to obtain a first loss value based on the first implicit code matrix and the first target matrix, and calculating to obtain a second loss value based on a first Gaussian distribution matrix and a second target matrix, wherein the first Gaussian distribution matrix and the second target matrix are the same in dimension as the second target matrix and are formed by first target three-dimensional sampling points, and the first target three-dimensional sampling points are obtained by randomly sampling three-dimensional noise of which the probability density function meets standard Gaussian distribution;

s107, if the latest first loss value is larger than a first preset loss value and/or the latest second loss value is larger than a second preset loss value, optimizing at least one of the first point cloud normalized stream, the reversible point cloud decoder and the point cloud encoder based on a gradient descent method, and repeating the steps S101-S106 until the latest first loss value is smaller than or equal to the first preset loss value and the latest second loss value is smaller than or equal to the second preset loss value, so that the current first point cloud normalized stream is used as the trained first point cloud normalized stream, and the current reversible point cloud decoder is used as the trained reversible point cloud decoder;

s108, inputting a second Gaussian distribution matrix which has the same dimension as the first target matrix and is formed by second target sampling points into the reverse process of the trained first point cloud normalized flow to obtain a second implicit code matrix which has the same dimension as the first target matrix, wherein the second target sampling points are obtained by randomly sampling noise of which the probability density function conforms to standard Gaussian distribution;

s109, inputting a third Gaussian distribution matrix which has the same dimension with the first Gaussian distribution matrix and is formed by second target three-dimensional sampling points and the second hidden code matrix into the reversible point cloud decoder after training is completed, and performing a reverse process of the target process to generate new point cloud of the target type, wherein the second target three-dimensional sampling points are obtained by randomly sampling three-dimensional noise of which the probability density function conforms to standard Gaussian distribution.

In a possible implementation, performing a first pooling process on the integrated spatial feature matrix to obtain a first pooled feature matrix includes:

rearranging the numerical values in each row in the comprehensive space characteristic matrix according to the sequence from small to large, or rearranging the numerical values in each row in the comprehensive space characteristic matrix according to the sequence from large to small to obtain a first candidate characteristic matrix;

performing second convolution processing on the first candidate feature matrix to obtain a second candidate feature matrix;

calculating to obtain a first selection probability of the numerical value of each position in the second candidate feature matrix based on a softmax function, and replacing the numerical value of the position in the second candidate feature matrix with the first selection probability to obtain a first weight matrix;

multiplying the transposed matrix of the first weight matrix with the first candidate feature matrix to obtain the first pooled feature matrix;

performing second pooling treatment on the comprehensive spatial feature matrix to obtain a second pooled feature matrix, which comprises:

rearranging the numerical values in each row in the comprehensive space characteristic matrix according to the sequence from small to large, or rearranging the numerical values in each row in the comprehensive space characteristic matrix according to the sequence from large to small to obtain a third candidate characteristic matrix;

performing third convolution processing on the third candidate feature matrix to obtain a fourth candidate feature matrix;

calculating a second selection probability of the numerical value of each position in the fourth candidate feature matrix based on a softmax function, and replacing the numerical value of the position in the fourth candidate feature matrix with the second selection probability to obtain a second weight matrix;

and multiplying the transposed matrix of the second weight matrix with the third candidate feature matrix to obtain the second pooled feature matrix.

In a possible embodiment, the first point cloud normalized stream includes n reversible residual coupled blocks, where n is an integer greater than 1, and for each of the reversible residual coupled blocks, the reversible residual coupled block includes a first target layer, a second target layer, and a third target layer, where the first target layer is a convolutional layer or a fully-connected layer, the second target layer is a convolutional layer or a fully-connected layer, and the third target layer is a convolutional layer or a fully-connected layer; inputting a first hidden code matrix obtained by combining hidden code vectors corresponding to each point cloud data obtained latest into a forward process of a first point cloud normalized stream to obtain a first target matrix with the same dimension as the first hidden code matrix, wherein the first hidden code matrix comprises:

splitting the first implicit code matrix into a first sub implicit code matrix and a second sub implicit code matrix which have the same dimension, taking the first sub implicit code matrix as a first input matrix of a first reversible residual coupling block, and taking the second sub implicit code matrix as a second input matrix of the first reversible residual coupling block;

calculating a first output matrix of an ith reversible residual coupling block and a second output matrix of the ith reversible residual coupling block through the following formula, and when i is smaller than n, taking the first output matrix of the ith reversible residual coupling block as a first input matrix of an (i + 1) th reversible residual coupling block, and taking the second output matrix of the ith reversible residual coupling block as a second input matrix of the (i + 1) th reversible residual coupling block, wherein the initial value of i is 1;

wherein,

is the first output matrix of the ith reversible residual coupling block,

is the second output matrix of the ith invertible residual coupling block,

is composed of

A first reference matrix obtained after passing through the first target layer in the ith reversible residual coupling block,

is composed of

A second reference matrix obtained after passing through a second target layer in the ith reversible residual coupling block,

is composed of

A third reference matrix obtained after passing through a third target layer in the ith reversible residual coupling block,

is the second input matrix of the ith reversible residual coupling block,

a first input matrix being an ith reversible residual coupling block;

calculating a first output matrix of an ith reversible residual coupling block and a second output matrix of the ith reversible residual coupling block by using the following formulas after the i +1 is returned, and when i is smaller than n, using the first output matrix of the ith reversible residual coupling block as a first input matrix of the (i + 1) th reversible residual coupling block, and using the second output matrix of the ith reversible residual coupling block as a second input matrix of the (i + 1) th reversible residual coupling block until i is equal to n;

and combining the first output matrix of the nth reversible residual coupling block and the second output matrix of the nth reversible residual coupling block to obtain the first target matrix.

In a possible embodiment, inputting a second gaussian distribution matrix composed of second target sampling points and having the same dimension as the first target matrix into the inverse process of the trained first point cloud normalized stream to obtain a second hidden code matrix having the same dimension as the first target matrix, includes:

splitting the second Gaussian distribution matrix into a first sub-Gaussian distribution matrix and a second sub-Gaussian distribution matrix with the same dimension, taking the first sub-Gaussian distribution matrix as a third output matrix of the nth reversible residual coupling block, and taking the second sub-Gaussian distribution matrix as a fourth output matrix of the nth reversible residual coupling block;

calculating to obtain a third input matrix of a jth reversible residual coupling block and a fourth input matrix of the jth reversible residual coupling block through the following formulas, and when j is larger than 1, taking the third input matrix of the jth reversible residual coupling block as a third output matrix of a jth-1 reversible residual coupling block, and taking the fourth input matrix of the jth reversible residual coupling block as a fourth output matrix of the jth-1 reversible residual coupling block, wherein the initial value of j is n;

wherein,

a third output matrix for the jth reversible residual coupling block,

is composed of

A fourth reference matrix obtained after passing through a third target layer in the jth reversible residual coupling block,

is the fourth output matrix of the jth reversible residual coupling block,

is the fourth input matrix of the jth invertible residual coupling block,

is composed of

A fifth reference matrix obtained after passing through a second target layer in the jth reversible residual coupling block,

is composed of

A sixth reference matrix obtained after passing through the first target layer in the jth reversible residual coupling block,

a third input matrix being a jth reversible residual coupling block;

calculating a third input matrix of a jth reversible residual coupling block and a fourth input matrix of the jth reversible residual coupling block according to the following formula in the step j-1 and returning to the step, and when j is larger than 1, taking the third input matrix of the jth reversible residual coupling block as a third output matrix of the jth reversible residual coupling block and taking the fourth input matrix of the jth reversible residual coupling block as a fourth output matrix of the jth reversible residual coupling block until j is equal to 1;

and combining the third input matrix of the first reversible residual coupling block and the fourth input matrix of the first reversible residual coupling block to obtain the second implicit code matrix.

In a second aspect, an embodiment of the present application further provides a point cloud generating apparatus, where the apparatus includes:

the acquisition module is used for acquiring point cloud data of m objects of a target type, wherein m is an integer larger than 1;

the first processing module is used for performing a target processing process on each newly acquired point cloud data by using a point cloud encoder to obtain a mean vector corresponding to the point cloud data and a variance vector corresponding to the point cloud data, wherein the target processing process comprises the following steps: performing first convolution processing on the point cloud data to obtain an initial characteristic matrix of the point cloud data; based on the initial feature matrix, obtaining a first spatial feature matrix for representing the mutual position information of each point and each first neighbor point in the point cloud data in a feature space, and obtaining a second spatial feature matrix for representing the mutual position information of each point and each second neighbor point in the point cloud data in a Cartesian space, wherein for each point in the point cloud data, the first neighbor point of the point is k points in the point cloud data which are closer to the point in the feature space, the second neighbor node of the point is k points in the point cloud data which are closer to the point in the Cartesian space, and k is an integer greater than 1; performing first pooling processing on an integrated spatial feature matrix to obtain a first pooled feature matrix, and performing second pooling processing on the integrated spatial feature matrix to obtain a second pooled feature matrix, wherein the integrated spatial feature matrix is obtained by adding the first spatial feature matrix and the second spatial feature matrix, or the integrated spatial feature matrix is obtained by combining the first spatial feature matrix and the second spatial feature matrix; converting the first pooled feature matrix into the mean vector by a first fully-connected layer and converting the second pooled feature matrix into the variance vector by a second fully-connected layer;

the second processing module is used for obtaining an implicit code vector corresponding to the point cloud data based on the mean vector, the variance vector and a Gaussian distribution vector formed by first target sampling points, wherein the first target sampling points are obtained by randomly sampling noise of which the probability density function meets standard Gaussian distribution;

the third processing module is used for inputting a first hidden code matrix obtained by combining hidden code vectors corresponding to each point cloud data which is obtained latest into the forward process of the first point cloud normalization stream to obtain a first target matrix with the same dimension as the first hidden code matrix;

the fourth processing module is used for inputting each newly acquired point cloud data and the first hidden code matrix into the reversible point cloud decoder, and performing a forward process of a target process to obtain a second target matrix, wherein the target process is the diffusion of a second point cloud normalized stream or a Markov chain;

the calculation module is used for calculating to obtain a first loss value based on the first hidden code matrix and the first target matrix, and calculating to obtain a second loss value based on a first Gaussian distribution matrix and a second target matrix, wherein the first Gaussian distribution matrix and the second target matrix are the same in dimension as the second target matrix and are formed by first target three-dimensional sampling points, and the first target three-dimensional sampling points are obtained by randomly sampling three-dimensional noise of which the probability density function conforms to standard Gaussian distribution;

an optimization module, configured to optimize at least one of the first point cloud normalized stream, the reversible point cloud decoder, and the point cloud encoder based on a gradient descent method if the latest first loss value is greater than a first preset loss value and/or the latest second loss value is greater than a second preset loss value, and re-submit the optimized first point cloud normalized stream, the reversible point cloud decoder, and the point cloud encoder to the acquisition module for processing until the latest first loss value is less than or equal to the first preset loss value and the latest second loss value is less than or equal to the second preset loss value, so as to use the current first point cloud normalized stream as the trained first point cloud normalized stream, and use the current reversible point cloud decoder as the trained reversible point cloud decoder;

a fifth processing module, configured to input a second gaussian distribution matrix having the same dimension as the first target matrix and formed by second target sampling points into a reverse process of the trained first point cloud normalized stream, so as to obtain a second hidden code matrix having the same dimension as the first target matrix, where the second target sampling points are obtained by randomly sampling noise whose probability density function meets standard gaussian distribution;

and the point cloud generating module is used for inputting a third Gaussian distribution matrix which has the same dimension as the first Gaussian distribution matrix and is formed by second target three-dimensional sampling points and the second implicit code matrix into the reversible point cloud decoder after training is finished, and performing the reverse process of the target process to generate new point cloud of the target type, wherein the second target three-dimensional sampling points are obtained by randomly sampling three-dimensional noise of which the probability density function conforms to standard Gaussian distribution.

In a possible embodiment, performing a first pooling process on the integrated spatial feature matrix to obtain a first pooled feature matrix includes:

calculating a first selection probability of the numerical value of each position in the second candidate feature matrix based on a softmax function, and replacing the numerical value of the position in the second candidate feature matrix with the first selection probability to obtain a first weight matrix;

In a possible embodiment, the first point cloud normalized stream includes n reversible residual coupled blocks, where n is an integer greater than 1, and for each of the reversible residual coupled blocks, the reversible residual coupled block includes a first target layer, a second target layer, and a third target layer, where the first target layer is a convolutional layer or a fully-connected layer, the second target layer is a convolutional layer or a fully-connected layer, and the third target layer is a convolutional layer or a fully-connected layer; the third processing module is specifically configured to:

wherein,

is the first output matrix of the ith reversible residual coupling block,

is the second output matrix of the ith reversible residual coupling block,

is composed of

is composed of

is composed of

Passing through a third target in the ith reversible residual coupling blockA third reference matrix obtained after the layer,

is the second input matrix of the ith invertible residual coupling block,

a first input matrix being an ith reversible residual coupling block;

In a possible implementation manner, the fifth processing module is specifically configured to:

calculating a third input matrix of a jth reversible residual coupling block and a fourth input matrix of the jth reversible residual coupling block through the following formula, and when j is greater than 1, taking the third input matrix of the jth reversible residual coupling block as a third output matrix of a jth-1 reversible residual coupling block, and taking the fourth input matrix of the jth reversible residual coupling block as a fourth output matrix of the jth-1 reversible residual coupling block, wherein the initial value of j is n;

wherein,

a third output matrix for the jth reversible residual coupling block,

is composed of

is the fourth output matrix of the jth reversible residual coupling block,

is the fourth input matrix of the jth reversible residual coupling block,

is composed of

is composed of

a third input matrix being a jth reversible residual coupling block;

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when an electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the steps of the point cloud generating method according to any one of the first aspect.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the point cloud generating method according to any one of the first aspect.

According to the point cloud generating method and device, the electronic equipment and the storage medium, the point cloud generated by the reversible point cloud decoder after training is enabled to have richer details.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 shows a flowchart of a point cloud generation method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a comparison of a generated point cloud provided by an embodiment of the present application;

fig. 3 is a schematic structural diagram of a point cloud generating apparatus provided in an embodiment of the present application;

fig. 4 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. In addition, one skilled in the art, under the guidance of the present disclosure, may add one or more other operations to the flowchart, or may remove one or more operations from the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

For facilitating understanding of the present embodiment, a point cloud generating method, an apparatus, an electronic device, and a storage medium provided in the embodiments of the present application are described in detail.

Referring to fig. 1, a flowchart of a point cloud generating method provided in an embodiment of the present application is shown, where the method includes:

s101, point cloud data of m objects of a target type are obtained, wherein m is an integer larger than 1.

For example, the target class is an airplane class, a table class, a chair class, or the like, and then, when the target class is an airplane class, the m objects of the target class may be m same or different airplanes.

Illustratively, m may be 8, and when m is 8, it means that the subsequent training process is training with 8 point cloud data as a set.

S102, for each newly acquired point cloud data, performing a target processing process on the point cloud data by using a point cloud encoder to obtain a mean vector corresponding to the point cloud data and a variance vector corresponding to the point cloud data, wherein the target processing process comprises the following steps: performing first convolution processing on the point cloud data to obtain an initial characteristic matrix of the point cloud data; based on the initial feature matrix, obtaining a first spatial feature matrix for representing the mutual position information of each point and each first neighbor point in the point cloud data in a feature space, and obtaining a second spatial feature matrix for representing the mutual position information of each point and each second neighbor point in the point cloud data in a Cartesian space, wherein for each point in the point cloud data, the first neighbor point of the point is k points in the point cloud data which are closer to the point in the feature space, the second neighbor node of the point is k points in the point cloud data which are closer to the point in the Cartesian space, and k is an integer greater than 1; performing first pooling processing on an integrated spatial feature matrix to obtain a first pooled feature matrix, and performing second pooling processing on the integrated spatial feature matrix to obtain a second pooled feature matrix, wherein the integrated spatial feature matrix is obtained by adding the first spatial feature matrix and the second spatial feature matrix, or the integrated spatial feature matrix is obtained by combining the first spatial feature matrix and the second spatial feature matrix; converting the first pooled feature matrix into the mean vector through a first fully-connected layer, and converting the second pooled feature matrix into the variance vector through a second fully-connected layer.

Here, the first modification of the present application with respect to the prior art is that, in the prior art, the step of generating the first spatial feature matrix and the second spatial feature matrix according to the initial feature matrix is not included, but the initial feature matrix is directly subjected to pooling processing, and by means of the present application, the finally generated point cloud can have richer details.

Preferably, k may be 10, and when k is 10, the finally generated point cloud has a better balance between generation quality and calculation resources.

Illustratively, the pooling process may be a maximum pooling process or an average pooling process, or the like.

S103, obtaining an implicit code vector corresponding to the point cloud data based on the mean vector, the variance vector and a Gaussian distribution vector formed by first target sampling points, wherein the first target sampling points are obtained by randomly sampling noise of which the probability density function accords with standard Gaussian distribution.

This step is the process of reparameterization.

S104, a first hidden code matrix obtained by combining the hidden code vectors corresponding to each point cloud data obtained latest is input into the forward process of the first point cloud normalization stream, and a first target matrix with the same dimension as the first hidden code matrix is obtained.

For example, there are 3 hidden code vectors, each [ 12 ]3]、[4 5 6]And [ 789 ]]Then, the first implicit code matrix can be

And S105, inputting each newly acquired point cloud data and the first hidden code matrix into a reversible point cloud decoder, and performing a forward process of a target process to obtain a second target matrix, wherein the target process is the diffusion of a second point cloud normalized stream or a Markov chain.

S106, calculating to obtain a first loss value based on the first hidden code matrix and the first target matrix, and calculating to obtain a second loss value based on a first Gaussian distribution matrix and a second target matrix, wherein the first Gaussian distribution matrix and the second Gaussian distribution matrix are the same in dimension as the second target matrix and are formed by first target three-dimensional sampling points, and the first target three-dimensional sampling points are obtained by randomly sampling three-dimensional noise of which the probability density function conforms to standard Gaussian distribution.

Illustratively, the first loss value and the second loss value may be calculated based on a mean square error or the like.

S107, if the latest first loss value is larger than a first preset loss value and/or the latest second loss value is larger than a second preset loss value, optimizing at least one of the first point cloud normalized stream, the reversible point cloud decoder and the point cloud encoder based on a gradient descent method, and repeating the steps S101-S106 until the latest first loss value is smaller than or equal to the first preset loss value and the latest second loss value is smaller than or equal to the second preset loss value, so that the current first point cloud normalized stream is used as the trained first point cloud normalized stream, and the current reversible point cloud decoder is used as the trained reversible point cloud decoder.

And (4) training one group of point cloud data of each m point, and optimizing at least one of the point cloud encoder, the first point cloud normalization stream and the reversible point cloud decoder based on a gradient descent method until a loss value (namely a latest loss value) corresponding to the latest group of point cloud data is less than or equal to a preset loss value, so that the point cloud encoder after the training is finished and the reversible point cloud decoder after the training is finished are obtained.

And S108, inputting a second Gaussian distribution matrix which has the same dimension as the first target matrix and is formed by second target sampling points into the reverse process of the trained first point cloud normalized flow to obtain a second implicit code matrix which has the same dimension as the first target matrix, wherein the second target sampling points are obtained by randomly sampling noise with a probability density function conforming to standard Gaussian distribution.

Step S108 to step S109 are testing (using) processes after the training is completed, in the training process (i.e., step S101 to step S107), a forward process of the first point cloud normalization stream and a forward process of the target process are used, and in the testing process (i.e., the above two steps), a reverse process of the first point cloud normalization stream (after the training) and a reverse process of the target process (performed by the reversible point cloud decoder after the training) are used.

Referring to fig. 2, a schematic diagram of comparing the generated point cloud provided in the embodiment of the present application with the point cloud generated by the prior art (diffusion pm) is shown, and it is obvious that the point cloud generated by the method of the present application is far superior to the point cloud generated by the prior art (diffusion pm) in terms of both whole and details.

illustratively, if the first spatial signature matrix is

Then the first candidate feature matrix is

Or,

assume the second candidate feature matrix is [ 28 ]]Then, correspondingly, the first weight matrix is

Multiplying the transposed matrix of the first weight matrix by the first candidate feature matrix to obtain the first pooled feature matrix;

here, the second modification of the present application with respect to the prior art is that the present application proposes a completely new approach of adaptive weighted pooling compared to the prior art approaches of maximum pooling and average pooling.

the second pooling process is the same as the first pooling process in principle, and is not repeated herein.

wherein,

is the first output matrix of the ith reversible residual coupling block,

is the second output matrix of the ith reversible residual coupling block,

is composed of

is composed of

is composed of

is the second input matrix of the ith reversible residual coupling block,

a first input matrix being an ith reversible residual coupling block;

Here, the present application is a third modification of the present application with respect to the prior art, and specifically, the present application changes the coupling manner (modifies a specific formula) of the coupling layer (i.e., the inverse residual coupling block) in the first point cloud normalized stream.

Illustratively, it has been experimented that the coupling layer (i.e., the inverse residual coupling block) coupling effect is excellent when n is 14.

wherein,

a third output matrix for the jth invertible residual coupling block,

is composed of

is the fourth output matrix of the jth reversible residual coupling block,

is the fourth input matrix of the jth reversible residual coupling block,

is composed of

is composed of

a third input matrix being a jth reversible residual coupling block;

Here, the reverse process of the first point cloud normalized flow corresponding to the forward process of the first point cloud normalized flow related to the foregoing steps is used, and in the testing process, the reverse process of the trained first point cloud normalized flow is used.

According to the point cloud generating method provided by the embodiment of the application, the point cloud generated by the reversible point cloud decoder after training has richer details.

Based on the same inventive concept, the embodiment of the present application further provides a point cloud generating device corresponding to the point cloud generating method in the embodiment of the present application, and as the principle of solving the problem of the device in the embodiment of the present application is similar to the point cloud generating method in the embodiment of the present application, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 3, a schematic structural diagram of a point cloud generating apparatus provided in an embodiment of the present application is shown, where the apparatus includes:

an obtaining module 301, configured to obtain point cloud data of m objects of a target category, where m is an integer greater than 1;

a first processing module 302, configured to perform a target processing procedure on each newly acquired point cloud data by using a point cloud encoder, so as to obtain a mean vector corresponding to the point cloud data and a variance vector corresponding to the point cloud data, where the target processing procedure includes: performing first convolution processing on the point cloud data to obtain an initial characteristic matrix of the point cloud data; based on the initial feature matrix, obtaining a first spatial feature matrix for representing the mutual position information of each point and each first neighbor point in the point cloud data in a feature space, and obtaining a second spatial feature matrix for representing the mutual position information of each point and each second neighbor point in the point cloud data in a Cartesian space, wherein for each point in the point cloud data, the first neighbor point of the point is k points in the point cloud data which are closer to the point in the feature space, the second neighbor node of the point is k points in the point cloud data which are closer to the point in the Cartesian space, and k is an integer greater than 1; performing first pooling processing on an integrated spatial feature matrix to obtain a first pooled feature matrix, and performing second pooling processing on the integrated spatial feature matrix to obtain a second pooled feature matrix, wherein the integrated spatial feature matrix is obtained by adding the first spatial feature matrix and the second spatial feature matrix, or the integrated spatial feature matrix is obtained by combining the first spatial feature matrix and the second spatial feature matrix; converting the first pooled feature matrix into the mean vector by a first fully-connected layer and converting the second pooled feature matrix into the variance vector by a second fully-connected layer;

a second processing module 303, configured to obtain an implicit code vector corresponding to the point cloud data based on the mean vector, the variance vector, and a gaussian distribution vector formed by first target sampling points, where the first target sampling points are obtained by randomly sampling noise whose probability density function meets a standard gaussian distribution;

a third processing module 304, configured to input a first hidden code matrix obtained by merging hidden code vectors corresponding to each point cloud data obtained newly into a forward process of a first point cloud normalization stream, so as to obtain a first target matrix having the same dimension as the first hidden code matrix;

a fourth processing module 305, configured to input each newly acquired point cloud data and the first hidden code matrix into a reversible point cloud decoder, and perform a forward process of a target process to obtain a second target matrix, where the target process is diffusion of a second point cloud normalized stream or a markov chain;

a calculating module 306, configured to calculate a first loss value based on the first implicit code matrix and the first target matrix, and calculate a second loss value based on a first gaussian distribution matrix and a second target matrix, which have the same dimension as the second target matrix and are formed by first target three-dimensional sampling points, where the first target three-dimensional sampling points are obtained by randomly sampling three-dimensional noise whose probability density function conforms to a standard gaussian distribution;

an optimizing module 307, configured to optimize at least one of the first point cloud normalized stream, the reversible point cloud decoder, and the point cloud encoder based on a gradient descent method if the latest first loss value is greater than a first preset loss value and/or the latest second loss value is greater than a second preset loss value, and re-submit the optimized first point cloud normalized stream, the reversible point cloud decoder, and the point cloud encoder to the obtaining module 301 for processing until the latest first loss value is less than or equal to the first preset loss value and the latest second loss value is less than or equal to the second preset loss value, so as to use the current first point cloud normalized stream as the trained first point cloud normalized stream, and use the current reversible point cloud decoder as the trained reversible decoder;

a fifth processing module 308, configured to input a second gaussian distribution matrix having the same dimension as the first target matrix and formed by second target sampling points into a reverse process of the trained first point cloud normalized stream, so as to obtain a second hidden code matrix having the same dimension as the first target matrix, where the second target sampling points are obtained by randomly sampling noise whose probability density function meets standard gaussian distribution;

the point cloud generating module 309 is configured to input a third gaussian distribution matrix having the same dimension as the first gaussian distribution matrix and formed by a second target three-dimensional sampling point, and the second implicit code matrix into the reversible point cloud decoder after training is completed, perform a reverse process of the target process, and generate a new point cloud of the target type, where the second target three-dimensional sampling point is obtained by randomly sampling three-dimensional noise whose probability density function conforms to standard gaussian distribution.

In a possible embodiment, the first point cloud normalized stream includes n reversible residual coupled blocks, where n is an integer greater than 1, and for each of the reversible residual coupled blocks, the reversible residual coupled block includes a first target layer, a second target layer, and a third target layer, where the first target layer is a convolutional layer or a fully-connected layer, the second target layer is a convolutional layer or a fully-connected layer, and the third target layer is a convolutional layer or a fully-connected layer; the third processing module 304 is specifically configured to:

splitting the first hidden code matrix into a first sub hidden code matrix and a second sub hidden code matrix which have the same dimension, taking the first sub hidden code matrix as a first input matrix of a first reversible residual coupling block, and taking the second sub hidden code matrix as a second input matrix of the first reversible residual coupling block;

wherein,

is the first output matrix of the ith reversible residual coupling block,

is the second output matrix of the ith reversible residual coupling block,

is composed of

is composed of

is composed of

is the second input matrix of the ith reversible residual coupling block,

a first input matrix being an ith reversible residual coupling block;

calculating to obtain a first output matrix of an ith reversible residual coupling block and a second output matrix of the ith reversible residual coupling block by using the following formulas after returning to the step i +1, and when i is less than n, taking the first output matrix of the ith reversible residual coupling block as a first input matrix of the (i + 1) th reversible residual coupling block, and taking the second output matrix of the ith reversible residual coupling block as a second input matrix of the (i + 1) th reversible residual coupling block until i is equal to n;

In a possible implementation manner, the fifth processing module 308 is specifically configured to:

wherein,

a third output matrix for the jth reversible residual coupling block,

is composed of

is the fourth output matrix of the jth reversible residual coupling block,

is the fourth input matrix of the jth reversible residual coupling block,

is composed of

is composed of

a third input matrix being a jth reversible residual coupling block;

The point cloud generating device provided by the embodiment of the application can enable the point cloud generated by the reversible point cloud decoder after training to have richer details.

Referring to fig. 4, an electronic device 400 provided in an embodiment of the present application includes: a processor 401, a memory 402 and a bus, the memory 402 storing machine-readable instructions executable by the processor 401, the processor 401 and the memory 402 communicating via the bus when the electronic device is running, the processor 401 executing the machine-readable instructions to perform the steps of the method as described above for point cloud generation.

Specifically, the memory 402 and the processor 401 can be general-purpose memory and processor, which are not limited in particular, and the processor 401 can execute the method of point cloud generation when executing the computer program stored in the memory 402.

Corresponding to the above method for generating point cloud, the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program performs the steps of the above method for generating point cloud.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process of the system and the apparatus described above may refer to the corresponding process in the method embodiment, and is not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in software functional units and sold or used as a stand-alone product, may be stored in a non-transitory computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of point cloud generation, the method comprising:

s102, for each newly acquired point cloud data, performing a target processing process on the point cloud data by using a point cloud encoder to obtain a mean vector corresponding to the point cloud data and a variance vector corresponding to the point cloud data, wherein the target processing process comprises the following steps: performing first convolution processing on the point cloud data to obtain an initial characteristic matrix of the point cloud data; based on the initial feature matrix, obtaining a first space feature matrix for representing the mutual position information of each point and each first neighbor point in the point cloud data in a feature space, and obtaining a second space feature matrix for representing the mutual position information of each point and each second neighbor point in the point cloud data in a Cartesian space, wherein for each point in the point cloud data, the first neighbor point of the point is k points, which are closer to the point in the feature space, in the point cloud data, the second neighbor node of the point is k points, which are closer to the point in the Cartesian space, in the point cloud data, and k is an integer greater than 1; performing first pooling processing on an integrated spatial feature matrix to obtain a first pooled feature matrix, and performing second pooling processing on the integrated spatial feature matrix to obtain a second pooled feature matrix, wherein the integrated spatial feature matrix is obtained by adding the first spatial feature matrix and the second spatial feature matrix, or the integrated spatial feature matrix is obtained by combining the first spatial feature matrix and the second spatial feature matrix; converting the first pooled feature matrix into the mean vector by a first fully-connected layer and converting the second pooled feature matrix into the variance vector by a second fully-connected layer;

s106, calculating to obtain a first loss value based on the first hidden code matrix and the first target matrix, and calculating to obtain a second loss value based on a first Gaussian distribution matrix and a second target matrix, wherein the first Gaussian distribution matrix and the second target matrix are the same in dimension as the second target matrix and are formed by first target three-dimensional sampling points, and the first target three-dimensional sampling points are obtained by randomly sampling three-dimensional noise of which the probability density function conforms to standard Gaussian distribution;

2. The point cloud generation method of claim 1, wherein performing a first pooling process on the integrated spatial feature matrix to obtain a first pooled feature matrix comprises:

3. The point cloud generation method of claim 1, wherein the first point cloud normalized stream includes n reversible residual coupled blocks, where n is an integer greater than 1, and for each of the reversible residual coupled blocks, the reversible residual coupled block includes a first target layer, a second target layer, and a third target layer, the first target layer is a convolutional layer or a fully-connected layer, the second target layer is a convolutional layer or a fully-connected layer, and the third target layer is a convolutional layer or a fully-connected layer; inputting a first hidden code matrix obtained by combining hidden code vectors corresponding to each point cloud data obtained latest into a forward process of a first point cloud normalized stream to obtain a first target matrix with the same dimension as the first hidden code matrix, wherein the first hidden code matrix comprises:

wherein,

is the first output matrix of the ith reversible residual coupling block,

is the second output matrix of the ith reversible residual coupling block,

is composed of

is composed of

is composed of

is the second input matrix of the ith reversible residual coupling block,

a first input matrix being an ith reversible residual coupling block;

4. The method of claim 3, wherein inputting a second Gaussian distribution matrix having the same dimension as the first target matrix and comprising second target sample points into the inverse process of the trained first point cloud normalization stream to obtain a second implicit code matrix having the same dimension as the first target matrix comprises:

wherein,

a third output matrix for the jth reversible residual coupling block,

is composed of

is the fourth output matrix of the jth reversible residual coupling block,

is the fourth input matrix of the jth invertible residual coupling block,

is composed of

is composed of

A sixth reference matrix obtained after the first target layer in the jth invertible residual coupling block,

a third input matrix being a jth reversible residual coupling block;

5. A point cloud generating apparatus, characterized in that the apparatus comprises:

the first processing module is used for performing a target processing process on each newly acquired point cloud data by using a point cloud encoder to obtain a mean vector corresponding to the point cloud data and a variance vector corresponding to the point cloud data, wherein the target processing process comprises the following steps: performing first convolution processing on the point cloud data to obtain an initial characteristic matrix of the point cloud data; based on the initial feature matrix, obtaining a first space feature matrix for representing the mutual position information of each point and each first neighbor point in the point cloud data in a feature space, and obtaining a second space feature matrix for representing the mutual position information of each point and each second neighbor point in the point cloud data in a Cartesian space, wherein for each point in the point cloud data, the first neighbor point of the point is k points, which are closer to the point in the feature space, in the point cloud data, the second neighbor node of the point is k points, which are closer to the point in the Cartesian space, in the point cloud data, and k is an integer greater than 1; performing first pooling processing on an integrated spatial feature matrix to obtain a first pooled feature matrix, and performing second pooling processing on the integrated spatial feature matrix to obtain a second pooled feature matrix, wherein the integrated spatial feature matrix is obtained by adding the first spatial feature matrix and the second spatial feature matrix, or the integrated spatial feature matrix is obtained by combining the first spatial feature matrix and the second spatial feature matrix; converting the first pooled feature matrix into the mean vector by a first fully-connected layer and converting the second pooled feature matrix into the variance vector by a second fully-connected layer;

the calculating module is used for calculating to obtain a first loss value based on the first implicit code matrix and the first target matrix, and calculating to obtain a second loss value based on a first Gaussian distribution matrix and a second target matrix, wherein the first Gaussian distribution matrix and the second target matrix are the same in dimension as the second target matrix and are formed by first target three-dimensional sampling points, and the first target three-dimensional sampling points are obtained by randomly sampling three-dimensional noise of which the probability density function accords with standard Gaussian distribution;

a fifth processing module, configured to input a second gaussian distribution matrix having the same dimension as the first target matrix and formed by second target sampling points into a reverse process of the trained first point cloud normalized stream, so as to obtain a second implicit code matrix having the same dimension as the first target matrix, where the second target sampling points are obtained by randomly sampling noise whose probability density function meets standard gaussian distribution;

6. The point cloud generating device of claim 5, wherein the first pooling process is performed on the integrated spatial feature matrix to obtain a first pooled feature matrix, comprising:

7. The point cloud generation device according to claim 5, wherein the first point cloud normalized stream includes n reversible residual coupled blocks, where n is an integer greater than 1, and for each of the reversible residual coupled blocks, the reversible residual coupled block includes a first target layer, a second target layer, and a third target layer, the first target layer is a convolutional layer or a fully-connected layer, the second target layer is a convolutional layer or a fully-connected layer, and the third target layer is a convolutional layer or a fully-connected layer; the third processing module is specifically configured to:

wherein,

is the first output matrix of the ith reversible residual coupling block,

is the second output matrix of the ith reversible residual coupling block,

is composed of

is composed of

is composed of

is the second input matrix of the ith reversible residual coupling block,

a first input matrix being an ith reversible residual coupling block;

8. The point cloud generation apparatus of claim 7, wherein the fifth processing module is specifically configured to:

wherein,

a third output matrix for the jth reversible residual coupling block,

is composed of

is the fourth output matrix of the jth reversible residual coupling block,

is the fourth input matrix of the jth reversible residual coupling block,

is composed of

is composed of

a third input matrix being a jth reversible residual coupling block;

9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is run, the processor executing the machine-readable instructions to perform the steps of the point cloud generation method of any of claims 1 to 4.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon which, when being executed by a processor, carries out the steps of the point cloud generation method of any one of claims 1 to 4.