CN115330938B

CN115330938B - Method for generating three-dimensional point cloud based on sketch of projection density map sampling

Info

Publication number: CN115330938B
Application number: CN202210938411.XA
Authority: CN
Inventors: 于茜; 高宸健
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2023-06-20
Anticipated expiration: 2042-08-05
Also published as: CN115330938A

Abstract

The invention discloses a method for generating a three-dimensional point cloud based on a sketch of projection density map sampling, which comprises the following steps: extracting shape features from the hand drawing based on the sketch conversion module, and outputting a feature drawing containing channel number information; inputting the feature map into a point cloud generating module, predicting a projection density map, sampling a two-dimensional point cloud from the projection density map, and deducing the depth of each two-dimensional point; combining the (x, y) coordinates from the projection density map samples and the z-coordinates from the depth samples, an overall three-dimensional point cloud is generated. According to the method, a hand-drawn sketch of a plurality of view angles is not needed, and a three-dimensional point cloud with complete relative details can be generated based on the sketch sampled by the projection density map; compared with the generation of the three-dimensional point cloud in the prior art, the method has the advantages that the detailed shape information is more expressed, and the visual result is clearer.

Description

Method for generating three-dimensional point cloud based on sketch of projection density map sampling

Technical Field

The invention relates to the technical field of sketch three-dimensional generation, in particular to a method for generating three-dimensional point cloud based on sketch of projection density map sampling.

Background

In recent years, three-dimensional reconstruction based on sketches has made great progress. Most methods rely on hand-drawn sketches of multiple perspectives. In practical application scenarios, hand-drawn sketches at multiple viewing angles are often difficult to obtain.

In addition, the existing method generally adopts feature vectors as intermediaries of two modes of a hand-drawn sketch and a three-dimensional model simply, so that detailed characteristics in the sketch cannot be fully embodied in the three-dimensional model.

Thus, the following two problems are faced for the above case:

1) Hand-drawing is a very sparse abstract shape representation, often lacking a sketch of the hand-drawing at multiple perspectives, which presents difficulties for three-dimensional reconstruction.

2) The existing method simply adopts the feature vector as an intermediary between two modes of a hand-drawn sketch and a three-dimensional model, only a rough three-dimensional shape can be generated, and the capability of recovering fine granularity details is not provided.

Disclosure of Invention

The invention aims to provide a method for generating a three-dimensional point cloud based on a sketch of projection density map sampling, which at least partially solves the technical problems.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the invention provides a method for generating a three-dimensional point cloud based on a sketch of projection density map sampling, which comprises the following steps:

s10, extracting shape features from a hand drawing based on a sketch conversion module, and outputting a feature drawing containing channel number information;

s20, inputting the feature map into a point cloud generating module, predicting a projection density map, sampling a two-dimensional point cloud from the projection density map, and deducing the depth of each two-dimensional point;

s30, combining the (x, y) coordinates from the projection density map sampling and the z coordinates from the depth sampling to generate an integral three-dimensional point cloud.

Further, the sketch conversion module in the step S10 adopts a CNN network structure of an encoder-decoder;

the CNN network structure comprises an encoder network, a residual block network and a decoder network which are sequentially connected; the encoder network comprises 4 downsampling blocks and is used for extracting high-level abstract information of an input sketch and taking the abstract information as input of a residual block network; the residual block network comprises 9 residual blocks and is used for supplementing and perfecting sparse shape characteristics;

the decoder network contains 4 upsampling blocks for gradually deducing three dimensional shape information with increased spatial resolution.

Further, the downsampling block consists of a convolution layer, a normalization layer and a ReLU function which are sequentially connected; the step length of the convolution layer in the downsampling block is 2;

the residual block consists of a convolution layer, a normalization layer, a ReLU function, the convolution layer and the normalization layer which are connected in sequence; the step length of the convolution layers in the residual block is 1;

the up-sampling block consists of a transposed convolution layer, a normalization layer and a ReLU function which are sequentially connected; the convolution layer step size in the upsampled block is 2.

Further, the point cloud generating module includes: a projection density map prediction sub-module and a depth sampling sub-module;

the projection density map prediction submodule is formed by sequentially and alternately connecting 3 convolution layers and 3 ReLU functions; the 3 rd ReLU function is connected with the normalization layer; the step length of the 1 st convolution layer and the 3 rd convolution layer is 1, and the step length of the 2 nd convolution layer is 3; the final normalization layer is used for ensuring that the sum of values of all positions in the projection density map is 1;

and the depth sampling submodule firstly carries out dimension lifting on random noise by using three residual error MLPs, the random vector after dimension lifting is connected with the feature vector sampled at the given position of the feature map in the channel dimension, and then depth information is obtained through three layers of residual error MLPs and one layer of full connection.

Further, the step S20 includes:

s201, taking the feature map as input, and predicting joint distribution P (X, y|I) of two-dimensional coordinates by using a projection density map prediction submodule, wherein X and Y respectively correspond to random variables of X and Y axes; sampling from P (X, y|i) will produce a two-dimensional point cloud;

s202, predicting depth distribution p (Z) for each point in the two-dimensional point cloud _i |x _i ，y _i I), wherein x _i ，y _i Two-dimensional coordinates of the i-th point; from p (Zi|x _i ，y _i The depth of each two-dimensional point is obtained by sampling in I).

Further, the step S201 includes:

1) Inputting the characteristic diagram F into a projection density diagram prediction submodule to obtain an output result projection density diagram

2) Projection of density maps using two-dimensional single channel signals

Defining a two-dimensional polynomial distribution +.>

Wherein the probability mass function is: />

3) From a two-dimensional polynomial distribution

Sampling N samples to form a two-dimensional point cloud>

Comprising N elements, each representing a two-dimensional point.

Further, the step S202 includes:

a) Initializing a collection

Is an empty set;

b) From a two-dimensional point cloud

A two-dimensional point x is taken out _i ，y _i And follow it->

Removing the components;

c) From x of feature map F _i ，y _i Position sampling to obtain characteristic vector of C channel

Wherein x is _i ，y _i Two-dimensional coordinates of the i-th point;

d) From uniform distribution

Mid-sampling to obtain scalar random noise +.>

e) Feature vector

And scalar random noise->

Inputting the depth sampling submodule to obtain the depth z _i ；

f) Will (x) _i ，y _i ，z _i ) Added to

In (a) and (b);

g) Repeating steps (b) - (f) because

The first stage contains N elements, so that finally +.>

Comprising N elements.

Compared with the prior art, the invention has the following beneficial effects:

a method of generating a three-dimensional point cloud based on a sketch of projection density map sampling, comprising: extracting shape features from the hand drawing based on the sketch conversion module, and outputting a feature drawing containing channel number information; inputting the feature map into a point cloud generating module, predicting a projection density map, sampling a two-dimensional point cloud from the projection density map, and deducing the depth of each two-dimensional point; combining the (x, y) coordinates from the projection density map samples and the z-coordinates from the depth samples, an overall three-dimensional point cloud is generated. According to the method, a hand-drawn sketch of a plurality of view angles is not needed, and a three-dimensional point cloud with complete relative details can be generated based on the sketch sampled by the projection density map; compared with the generation of the three-dimensional point cloud in the prior art, the method has the advantages that the detailed shape information is more expressed, and the visual result is clearer.

Drawings

FIG. 1 is a flow chart of a method of generating a three-dimensional point cloud based on a sketch of projection density map sampling of the present invention.

FIG. 2 is a schematic block diagram of the present invention relating to generating a three-dimensional point cloud based on a sketch of projection density map sampling.

Fig. 3 is a block diagram of a sketch conversion module according to the present invention.

Fig. 4 is a block diagram of a projection density map prediction sub-module of the present invention.

Fig. 5 is a block diagram of a depth sampling sub-module of the present invention.

Fig. 6 is a projection density map representation of the present invention.

FIG. 7 is a graphical representation of a visualization based on a comparison on a Synthetic-Linear draw dataset.

FIG. 8 is a graphical representation of the visualization results based on comparison on the ShapeNet-Sketch, amateurSketch, proSketch-3DChair dataset.

Detailed Description

The invention is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "upper", "lower", "inner", "outer", "front", "rear", "both ends", "one end", "the other end", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific direction, be configured and operated in the specific direction, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "provided," "connected," and the like are to be construed broadly, and may be fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Referring to fig. 1, the method for generating a three-dimensional point cloud based on a sketch of projection density map sampling provided by the invention comprises the following steps:

In the present embodiment, a feature map F (two-dimensional C-channel signal) is extracted from the hand map I; generating a three-dimensional point cloud based on feature map F

Wherein the input hand drawing I is H _I ×W _I Tensor of (H) _I And W is _I The values of each pixel are in the range of 0 to 1, 0 represents that the pixel belongs to a background area, and 1 represents that the pixel belongs to a hand-drawn stroke. The characteristic diagram F is H _F ×W _F Tensor of XC, H _F And W is _F The height and width of the feature map F, respectively, and C is the number of channels of the feature map. Three-dimensional point cloud->

For a set of N elements, each element is a three-dimensional point, and N is the number of points in the point cloud.

The modules involved in the implementation of the invention are shown in figure 2, and the two modules of the sketch conversion module and the point cloud generation module are learnable modules; the point cloud generation module comprises a projection density map prediction sub-module and a depth sampling sub-module. The hand-drawn sketch is used as input of the sketch conversion module, and the characteristic drawing is used as output; then, the feature map is used as input of a point cloud generation module, a projection density map is obtained through a projection density map prediction sub-module, and then depth information of an image is obtained through a depth sampling sub-module, and the three-dimensional point cloud is generated through common output.

1. A sketch conversion module:

the CNN network architecture of the encoder-decoder is employed, the overall architecture is shown in fig. 3. The encoder network contains 4 downsampled blocks in order to increase the receptive field of neurons to extract the high level abstract information of the input sketch. The decoder network contains 4 up-sampling blocks to gradually infer the spatial resolution-enhanced three-dimensional shape information. The residual block network contains 9 residual blocks for supplementing the sparse shape features.

After sketch conversion, the response of the feature map F is much denser than the input sketch, while the spatial structure and resolution remain approximately unchanged.

The lower sampling block consists of a convolution layer, a normalization layer and a ReLU function which are sequentially connected; the step length of the convolution layer in the downsampling block is 2; the residual block consists of a convolution layer, a normalization layer, a ReLU function, the convolution layer and the normalization layer which are connected in sequence; the step length of the convolution layers in the residual blocks is 1; the up-sampling block consists of a transposed convolution layer, a normalization layer and a ReLU function which are sequentially connected; the convolution layer step size in the upsampled block is 2.

The characteristic diagram F extracted by the encoder-decoder is a two-dimensional C channel signal, is defined in a spatial domain, and different spatial positions have different responses, so that the shape information of details only existing in local positions can be expressed, and the three-dimensional point cloud with complete relative details can be generated.

2. And the point cloud generation module is used for:

the point cloud generation module comprises a projection density map prediction sub-module and a depth sampling sub-module;

1) As shown in fig. 4, the projection density map prediction submodule is formed by sequentially and alternately connecting 3 convolution layers and 3 ReLU functions; the 3 rd ReLU function is connected with the normalization layer; the step length of the 1 st convolution layer and the 3 rd convolution layer is 1, and the step length of the 2 nd convolution layer is 3; the final normalization layer is used for ensuring that the sum of values of all positions in the projection density map is 1;

2) As shown in fig. 5, the depth sampling sub-module firstly performs dimension lifting on random noise by using three residual error MLPs, the random vector after dimension lifting is connected with the feature vector sampled at the given position of the feature map in the channel dimension, and then depth information is obtained through three layers of residual error MLPs and one layer of full connection.

Based on the above description of the whole frame module, the following details of S10 to S30 will be described respectively:

the purpose of the above step S10 is to extract valid shape features from the combination of abstract and sparse hand-drawn strokes, which finally take the feature map as output form. The step is that the hand drawing I is input into a sketch conversion module to obtain a result characteristic diagram F.

The purpose of the step S20 is to generate a three-dimensional point cloud from the sketched transformed feature map. The step is to input a feature map F into a point cloud generation module to obtain a result three-dimensional point cloud

The point cloud generation module is the core module of this step. The point cloud generation module comprises two sub-modules: the projection density map prediction sub-module and the depth sampling sub-module are respectively a core module for predicting two-dimensional point clouds and deducing the depth of each two-dimensional point for two sub-steps of the point cloud generation process. The two sub-steps are described as follows:

s201, predicting a two-dimensional point cloud.

The sub-step predicts a joint distribution P (X, Y I) of two-dimensional coordinates using a projection density map prediction sub-module, where X, Y are random variables corresponding to the X, Y axes, respectively. Sampling from P (X, y|i) will produce a two-dimensional point cloud. Specifically:

1) Obtaining an output result projection density map by using an F input projection density map prediction submodule

2) Projection of density maps using two-dimensional single channel signals

Defining a two-dimensional polynomial distribution +.>

Wherein the Probability Mass Function (PMF) is: />

3) From a two-dimensional polynomial distribution

Sampling N samples to form a two-dimensional point cloud>

Comprising N elements, each element being a two-dimensional point.

In the step 1), the resulting projection density map is shown in fig. 6, blue represents a low response value, and red represents a high response value; the projection density map M is used as a core intermediate variable, M is a two-dimensional single-channel signal, and is defined based on projection of a three-dimensional model under a specific view angle.

As shown in fig. 6, when a three-dimensional point cloud is projected onto a two-dimensional plane at a given viewing angle, the projected points on the two-dimensional plane differ in density at different positions because a plurality of three-dimensional points may be projected onto the same position on the two-dimensional plane. That is, one projection point corresponds to a variable number of three-dimensional points, which can measure the local density of the projection points. Therefore, the present invention proposes to project a density map M to record the density of projected points at each position in space.

The process of acquiring a projection density map from a three-dimensional model can be expressed as follows:

(a) Extracting pixel x from M _i ，y _i The method comprises the steps of carrying out a first treatment on the surface of the Wherein x is _i ，y _i Is the two-dimensional coordinates of the i-th point;

(b) Searching three-dimensional model for x and y coordinates of x _i ，y _i The number n of the recording points _i ；

(c) Let M (x) _i ，y _i ) Is n _i ；

(d) Repeating (a) - (c) until all pixels in M are traversed;

(e) Order the

To achieve better performance, processes (a) - (c) may be performed in parallel using a graphics processor.

S202, deducing the depth of each two-dimensional point. This substep predicts a depth profile p (Z) for each point in the two-dimensional point cloud _i |x _i ，y _i I), wherein x _i ，y _i Is the two-dimensional coordinates of the i-th point. From p (Z) _i |x _i ，y _i Sampling in I) can result in the depth of each point. The method specifically comprises the following steps:

a) Initializing a collection

Is an empty set;

b) From the slave

A two-dimensional point x is taken out _i ，y _i And follow it->

Removing the components;

c) X from feature map F (two-dimensional C-channel signal) _i ，y _i Position sampling to obtain characteristic vector of C channel

d) From uniform distribution

Mid-sampling to obtain scalar +.>

(random noise);

e) Will be

And->

Inputting the depth sampling submodule to obtain the depth z _i ；

f) Will (x) _i ，y _i ，z _i ) Added to

In (a) and (b);

g) Repeating steps (b) - (f) because

The first stage contains N elements, so that finally +.>

Comprising N elements.

In the step S30, the coordinates from the projection density map sampling and the coordinates from the depth sampling are combined to form (x) _i ，y _i ，z _i ) Aggregation of locations

The three-dimensional point cloud comprises N elements, and the N elements jointly generate an integral three-dimensional point cloud.

The sketch conversion module and the point cloud generation module related to the steps S10 and S20 are both learner modules based on deep learning, and can be used after training. The training process is described as follows:

(1) Generating paired hand-drawn-projected density map-point cloud datasets from existing three-dimensional model datasets

The following sub-processes are performed for each three-dimensional model:

1.1 Performing Furthest Point Sampling (FPS) on the surface to produce a point cloud;

1.2 Randomly generating v perspectives;

1.3 Rendering the three-dimensional model for each view angle to obtain v hand drawings and v Zhang Duiying projection density maps;

1.4 Combining the three-dimensional point cloud with v hand drawings and v Zhang Duiying projection density maps to obtain v samples, and adding

Is a kind of medium.

(2) Will be

The samples in the training set are divided into training sets in units of three-dimensional point clouds>

And verification set->

(3) Adam optimizer (learning rate 1 e-3) is adopted in training set

Upper optimization model, specifically:

3.1 From (a) a slave

Randomly taking a sample including a hand-drawn image I, a projection density image M, and a three-dimensional point cloud +.>

3.2 Step S10, step S20 and step S30 are sequentially executed on the hand drawing I to obtain

And->

3.3 Calculating a loss function l=λ) ₁ L _CD +λ ₂ L _D Wherein

Representing the generated integral three-dimensional point cloud set; p and q each represent->

And->

Is a dot in (2); />

Representing a predicted resulting projection density map; m represents an actual resulting projection density map; x is x _i ，y _i Representing the two-dimensional coordinates of the i-th point; lambda (lambda) ₁ And lambda (lambda) ₂ Respectively is L _CD And L _D Is set to 1 and 1e4, respectively.

3.4 Gradient back propagation and updating trainable parameters by adopting an Adam optimizer;

3.5 Repeating steps 3.1) to 3.4) until L converges.

The similarity between the point cloud generated by the method and the real point cloud is evaluated through four indexes of CD, EMD, FPD and Vox-IOU. Given two point clouds

Representing a three-dimensional real number field, namely: R.times.R.times.R. R is the real number domain;

wherein, CD is Chamfer Distance, and the smaller CD is, the higher the similarity of two point clouds is.

EMD is Chamfer Distance, and the smaller the EMD is, the higher the similarity of two point clouds is.

The FPD is Frechet Point cloud Distance, and the FPD does not evaluate the generation quality of a single sample, but rather evaluates the overall quality of the generation distribution, and the smaller the FPD, the higher the two point clouds are.

The Vox-IOU is a Voxel-IOU, and the larger the Vox-IOU is, the higher the similarity of two point clouds is, and the definition is as follows:

wherein the method comprises the steps of

Representing an operation of voxelizing the point cloud, the present invention selects a voxel resolution of 32 x 32 for evaluation.

On the Synthetic-linedraw dataset, the present invention is compared to the existing three-dimensional reconstruction method of a single Sketch, sktch 2Mesh, sktch 2Model, sktch 2Point, and the three-dimensional reconstruction method of a single image, pcdnat, DISN.

Numerical results on four indicators of CD, EMD, FPD, vox-IOU are shown in Table 1:

TABLE 1

The visualization results are shown in fig. 7.

And (II) on the ShapeNet-Sketch, amateurSketch, proSketch-3DChair dataset, the present invention contrasts with the existing three-dimensional reconstruction method from a single Sketch, sketch2Mesh, sketch2Model, sketch2 Point.

Numerical results on the four indicators CD, EMD, FPD, vox-IOU are shown in Table 2:

TABLE 2

The visualization result is shown in fig. 8.

It is apparent from the above tables 1 and 2 that the present invention is superior to the evaluation in the related art. It can also be seen from fig. 7 and 8 that the present invention generates a three-dimensional point cloud with relatively complete details; compared with the generation of the three-dimensional point cloud in the prior art, the method has the advantages that the detailed shape information is more expressed, and the visual result is clearer.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for generating a three-dimensional point cloud based on a sketch of projection density map sampling, comprising the steps of:

s20, inputting the feature map into a point cloud generating module, predicting a projection density map, sampling a two-dimensional point cloud from the projection density map, and deducing the depth of each two-dimensional point; the point cloud generating module comprises: a projection density map prediction sub-module and a depth sampling sub-module;

the depth sampling submodule firstly carries out dimension lifting on random noise by using three residual error MLPs, the random vector after dimension lifting is connected with the feature vector sampled at a given position of the feature map in the channel dimension, and then depth information is obtained through three layers of residual error MLPs and one layer of full connection;

s30, combining the (x, y) coordinates from the projection density map sampling and the z coordinates from the depth sampling to generate an integral three-dimensional point cloud;

wherein, the step S20 includes:

s201, taking the feature map as input, and predicting joint distribution P (X, y|I) of two-dimensional coordinates by using a projection density map prediction submodule, wherein X and Y respectively correspond to random variables of X and Y axes; sampling from P (I, y|i) will produce a two-dimensional point cloud;

s202, predicting depth distribution p (Z) for each point in the two-dimensional point cloud _i |x _i ,y _i I), wherein x _i ,y _i Two-dimensional coordinates of the i-th point; from p (Z) _i |x _i ,y _i The depth of each two-dimensional point is obtained by sampling in I).

2. The method for generating three-dimensional point cloud based on sketch of projection density map sampling according to claim 1, wherein said sketch conversion module in step S10 adopts a CNN network structure of encoder-decoder;

3. The method for generating a three-dimensional point cloud based on a sketch of projection density map sampling according to claim 2, wherein the downsampling block is composed of a convolution layer, a normalization layer and a ReLU function which are sequentially connected; the step length of the convolution layer in the downsampling block is 2;

4. The method of generating a three-dimensional point cloud based on a sketch of a projection density map sample according to claim 1, wherein said step S201 comprises:

2) Projection of density maps using two-dimensional single channel signals

Defining a two-dimensional polynomial distribution +.>

Wherein the probability mass function is: />

3) From a two-dimensional polynomial distribution

Sampling N samples to form a two-dimensional point cloud>

Comprising N elements, each representing a two-dimensional point.

5. The method of generating a three-dimensional point cloud based on a sketch of a projection density map sample of claim 4, wherein said step S202 comprises:

a) Initializing a collection