CN116363329B

CN116363329B - Three-dimensional image generation method and system based on CGAN and LeNet-5

Info

Publication number: CN116363329B
Application number: CN202310214419.6A
Authority: CN
Inventors: 刘莉; 张军飞; 谭文俊; 王志非
Original assignee: Zwcad Software Co ltd
Current assignee: Zwcad Software Co ltd
Priority date: 2023-03-08
Filing date: 2023-03-08
Publication date: 2023-11-03
Anticipated expiration: 2043-03-08
Also published as: CN116363329A

Abstract

The application discloses a three-dimensional image generation method and a system based on CGAN and LeNet-5, wherein the method comprises the following steps: acquiring an image dataset to be trained based on an open-source 3D dataset shape Net, wherein the image dataset to be trained comprises a true normal image and a 2D sketch image; performing optimization training on the image dataset to be trained based on the CGAN network, and outputting an intermediate normal image; sequentially rendering the intermediate normal image through the 3D grid generation model and the differentiable renderer, and outputting the rendered normal image; and inputting the rendered normal image and the real normal image into a LeNet-5 network for training, and outputting a three-dimensional image model. By using the application, a 3D image model with high precision is obtained based on single 2D sketch conversion through a CGAN structure and a LeNet-5 structure. The method and the system for generating the three-dimensional image based on the CGAN and the LeNet-5 can be widely applied to the technical field of computer aided design.

Description

Three-dimensional image generation method and system based on CGAN and LeNet-5

Technical Field

The application relates to the technical field of computer aided design, in particular to a three-dimensional image generation method and system based on CGAN and LeNet-5.

Background

Computer Aided Design (CAD) refers to the use of a computer and its graphics equipment to assist a designer in performing a design task. In engineering and product design, computers can help designers to take work such as computing, information storage, and mapping. In designing image data, a designer starts designing with a sketch, and editing, enlarging, reducing, translating, rotating, and the like of graphics can be performed by a computer. Sketching is an effective and intuitive way of graphically displaying ideas, which plays an important role in the art creation, product engineering and industrial design fields by its compact and efficient features. However, there is a great gap between sketches and three-dimensional (3D) products. Building a 3D model based on a two-dimensional sketch is the goal of sketch-based 3D shape prediction. The computer cannot perceive 3D shape and spatial position from 2D sketches through a priori knowledge like a human, many studies define additional rules to obtain enough information to convert 2D sketches into 3D models, but these methods are very limited to specific shapes and preconditions, 3D shape generation becomes more cumbersome when there are too many irregular lines, and traditional deep convolutional neural networks such as AlexNet, vggNet are all composed of convolutional layers and fully connected layers, and usually take images with standard sizes as input, resulting in non-spatially arranged outputs; the purpose of converting 2D into 3D is to automatically generate a 3D depth information image through a single parallax 2D image with any size, so that the output result of non-spatial arrangement cannot realize the practical application of converting 2D into 3D.

Disclosure of Invention

In order to solve the technical problems, the application aims to provide a three-dimensional image generation method and a three-dimensional image generation system based on CGAN and LeNet-5, which realize that a high-precision 3D image model is obtained based on single 2D sketch conversion through a CGAN structure and a LeNet-5 structure.

The first technical scheme adopted by the application is as follows: the three-dimensional image generation method based on CGAN and LeNet-5 comprises the following steps:

acquiring an image dataset to be trained based on an open-source 3D dataset shape Net, wherein the image dataset to be trained comprises a true normal image and a 2D sketch image;

performing optimization training on the image dataset to be trained based on the CGAN network, and outputting an intermediate normal image;

sequentially rendering the intermediate normal image through the 3D grid generation model and the differentiable renderer, and outputting the rendered normal image;

and inputting the rendered normal image and the real normal image into a LeNet-5 network for training, and outputting a three-dimensional image model.

Further, the step of optimally training the image dataset to be trained based on the CGAN network and outputting the intermediate normal image specifically comprises the following steps:

inputting an image data set to be trained into a CGAN network for training, wherein the CGAN network comprises a generator and a discriminator;

mapping the random vector on the 2D sketch image by a generator based on the CGAN network to generate a preliminary intermediate normal image;

the discriminator based on the CGAN network performs discrimination processing on the preliminary intermediate normal image and the real normal image, and outputs discrimination results;

optimizing the generator of the CGAN network according to the identification result to obtain an optimized generator;

and carrying out optimization generation on the 2D sketch image based on the optimized generator to obtain an intermediate normal image.

Further, the CGAN network-based discriminator performs a discrimination process on the preliminary intermediate normal image and the real normal image, and outputs a discrimination result, which specifically includes:

calculating a global distance between the preliminary intermediate normal image and the real normal image based on a discriminator of the CGAN network;

performing pixel sampling processing on the preliminary intermediate normal image through a Sobel filter to obtain local clear characteristics of the preliminary intermediate normal image;

and integrating the global distance between the preliminary intermediate normal image and the real normal image with the local clear characteristic of the preliminary intermediate normal image to construct an identification result.

Further, the expression for optimizing the generator of the CGAN network according to the authentication result is specifically as follows:

in the above formula, g represents a parameter for controlling the gradation strength,representing the global distance between the preliminary intermediate normal image and the real normal image, +.>Representing locally sharp features, lambda, of a preliminary intermediate normal image _g And lambda (lambda) _l Respectively represent a loss function->And->Weight parameters of (c).

Further, the standard objective function of the CGAN network is specifically as follows:

in the above formula, G represents a generator of the CGAN network, D represents a discriminator of the CGAN network, x represents an input 2D sketch image, y represents a normal image, z represents a random vector, D (x, y) represents a probability that D judges whether y is true under the condition of x, D (x, G (x, z))) represents a probability that D judges whether a picture generated by G under the condition of x is true,representing mathematical expectations.

Further, the step of sequentially rendering the intermediate normal image by the 3D mesh generation model and the differentiable renderer, and outputting the rendered normal image specifically includes:

deforming the intermediate normal image by deforming the predefined sphere mesh based on the 3D mesh generation model, generating an intermediate normal image with the 3D mesh;

and rendering the intermediate normal image with the 3D grid to a two-dimensional image by using a differential renderer, wherein the two-dimensional image is an image obtained by projecting the intermediate normal image with the 3D grid on a two-dimensional plane, and outputting the rendered normal image.

Further, the step of inputting the rendered normal image and the real normal image into a LeNet-5 network for training and outputting a three-dimensional image model specifically comprises the following steps:

performing downsampling operation on the rendered normal image to obtain a compressed normal image;

constructing a LeNet-5 network, and inputting the compressed normal line image and the real normal line image into the LeNet-5 network, wherein the LeNet-5 network comprises a convolution layer C1, a pooling layer S2, a convolution layer C3, a pooling layer S4, a convolution layer C5, a full-connection layer F6 and an output layer;

calculating the surface normal of the compressed normal image grid and mapping the surface normal to RGB range [0,1];

acquiring a true value of a real normal image, and calculating normal loss of the real normal image and the compressed normal image by combining RGB (red, green and blue) ranges of the surface normal mapping of the compressed normal image grid;

calculating the contour loss of the real normal image and the compressed normal image through Intersection-Over-Union;

using As-Rigid-As-Possible energy As edge loss to normalize the edge loss of the real normal image and the compressed normal image;

introducing smoothness loss, integrating normal loss, contour loss and edge loss, carrying out weighted calculation, optimizing the LeNet-5 network, and outputting a three-dimensional image model by the optimized LeNet-5 network.

Further, the formula for weighting calculation by integrating the normal loss, the contour loss and the edge loss is as follows:

in the above-mentioned method, the step of,representing the final loss of the LeNet-5 network, < >>Represents normal loss, ++>Represents edge loss, ++>Representing contour loss->Represents a loss of smoothness lambda _n 、λ _b 、λ _p And lambda (lambda) _s And the weight parameters corresponding to the loss functions are represented.

The second technical scheme adopted by the application is as follows: a three-dimensional image generation system based on CGAN and LeNet-5, comprising:

the acquisition module is used for acquiring an image data set to be trained based on an open-source 3D data set ShapeNet, wherein the image data set to be trained comprises a real normal image and a 2D sketch image;

the optimization module performs optimization training on the image data set to be trained based on the CGAN network and outputs an intermediate normal image;

the rendering module is used for sequentially rendering the intermediate normal image through the 3D grid generation model and the differential renderer and outputting the rendered normal image;

and the training module is used for inputting the rendered normal image and the real normal image into the LeNet-5 network for training and outputting a three-dimensional image model.

The method and the system have the beneficial effects that: according to the application, a 2D sketch generated by a shape Net data set is input into a CGAN structure and a LeNet-5 structure to predict and generate a 3D model, a 2D sketch image is converted into a normal image thereof through the CGAN network structure, the normal image provides geometric surface information favorable for recovering concave characteristics, the CGAN network comprises two modules, mutual game learning of a generator model and a discriminator model generates relatively good graphic output with specific attributes, then the 3D model is generated through the LeNet-5 network structure, the LeNet-5 network has characteristic learning capability, the input information can be subjected to translation invariant classification according to the hierarchical structure, and the application can be converted based on a single 2D sketch to obtain a high-precision 3D image model.

Drawings

FIG. 1 is a flow chart of the steps of the CGAN and LeNet-5 based three-dimensional image generation method of the present application;

FIG. 2 is a block diagram of a three-dimensional image generation system based on CGAN and LeNet-5 of the present application;

FIG. 3 is a schematic diagram of the structure of the CGAN and LeNet-5 neural network of the present application;

FIG. 4 is a schematic flow chart of three-dimensional image generation in accordance with an embodiment of the present application;

FIG. 5 is a graph of experimental results obtained in accordance with an embodiment of the present application.

Detailed Description

The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

Because of sparsity and irregularity of the sketch, extracting complete 3D shapes from the sketch still presents a problem that is very limited to specific shapes and preconditions, the development of deep neural networks provides new ideas for sketch generation of 3D models, neural networks have the potential to learn from 3D model priors to predict 3D shapes corresponding to the sketch by internally extracting a set of patterns or associations between different variables/parameters in the dataset, and then using these patterns to predict the output of interest for a given number of input variables;

the generated countermeasure network (GAN) is a deep learning model, and is one of the most promising methods for unsupervised learning on complex distribution in recent years; the model is built up of two modules in the frame: the mutual game learning of the generator model and the discriminator model produces relatively good output, and since GAN cannot generate pictures with specific attributes, a conditional generation countermeasure network (CGAN) is a network optimized for the problem, and the core of the method is that attribute information y is integrated into the generator G and the discriminator D, and the attribute y can be any label information, such as the category of images, facial expression of face images, and the like; convolutional Neural Networks (CNNs) are a type of feed-forward neural network comprising convolutional computation and having a depth structure, the convolutional neural networks have a characteristic learning capability, and can perform translation invariant classification on input information according to a hierarchical structure of the convolutional neural networks, leNet-5 is one type of CNNs, the structure of the convolutional neural networks is 7 layers in total, each layer is provided with a plurality of Feature maps, and each Feature Map is a Feature of an input extracted through a convolutional filter.

The application provides a framework based on CGAN and LeNet-5 models, which can generate an end-to-end learning framework of a 3D model from a single 2D sketch image; according to the learning framework, firstly, a sketch image is converted into a normal image of the sketch image by using the CGAN, and then, the normal image is used for constructing a 3D shape by using the LeNet-5, so that the learning framework can cope with challenges in 3D shape modeling based on a single sketch, and therefore, the application of the CGAN and the LeNet-5 neural network to the field of 3D model generation has very important significance; the key point of the work is that the 2D sketch is input to a CGAN network, the CGAN network converts the 2D sketch into a normal image, and then the normal image is input to a LeNet-5 network to generate a high-precision 3D model; the CGAN network includes two modules: the mutual game learning of the generator model and the discriminator model produces relatively good graphical outputs with specific attributes; the LeNet-5 is a feedforward neural network which comprises convolution calculation and has a depth structure, and the convolution neural network has characteristic learning capability and can carry out translation invariant classification on input information according to a hierarchical structure of the convolution neural network. The application provides an end-to-end learning framework for generating a 3D model from a single 2D sketch image by combining a CGAN structure and a LeNet-5 structure. The learning of the learning framework is divided into two steps, converting the 2D sketch image into its normal image, and then restoring the 3D shape.

Referring to fig. 1, the present application provides a three-dimensional image generation method based on CGAN and LeNet-5, the method comprising the steps of:

s1, acquiring an image dataset to be trained based on an open source 3D dataset shape, wherein the image dataset to be trained comprises a real normal image and a 2D sketch image;

specifically, the application focuses on using a data driven method to improve the accuracy of generating a 3D model from a single 2D sketch, using an open source 3D dataset shaanenet, which is a annotated 3D shape dataset that is commonly used to support computer graphics research; according to the application, each 3D model in the data set generates a 256×256-pixel real normal image set Y and a sketch image data set X according to 24 azimuth angles, and a training set, a verification set and a test set are divided according to the proportion of 8:1:1.

S2, optimizing training is carried out on the image data set to be trained based on the CGAN network, and an intermediate normal image is output;

specifically, normal image generation of a single sketch image is considered as an image-to-image translation. For normal image generators, CGAN architecture is a common option for image-to-image conversion. The normal image generator is optimized using a CGANs hybrid global L1 distance and local-definition feature sample regularizer. In each training iteration, the sketch image X { 256 pixels in step 1 is taken ₁ ,X ₂ …X _n Generation G fed to the CGAN structure, intermediate normal image set generated by the generatorThen the intermediate normal image and the real normal image set Y { Y } in the first step ₁ ,Y ₂ …Y _n The task of the discriminator is to receive the real data or to generate data and to try to predict whether the input is real (true) or false (false), after training, the CGAN can generate a better normal image N from the sketch of the input;

the generator G maps the random vector z to an image y, namely G is x-y, and the CGANs condition corresponds to additional information x, wherein x is a sketch image in the application; the standard objective function of CGANs is defined as:

in the above formula, G represents a generator of the CGAN network, D represents a discriminator of the CGAN network, x represents an input 2D sketch image, y represents a normal image, z represents a random vector, D (x, y) represents a probability that D judges whether y is true under the condition of x, D (x, G (x, z))) represents a probability that D judges whether a picture generated by G under the condition of x is true,representing mathematical expectations;

the distance between the generated normal image and the real image is calculated to normalize the global image distribution, and the L2 norm encourages image blurring compared to the L1 norm, and thus the global distance is:

furthermore, in each training iteration, some pixels are sampled from regions with sharp geometric features (such as corners and edges) by applying a Sobel filter to the normal image to enhance the constraint of local features:

in the above-mentioned method, the step of,is a pixel sampled from the generated normal image, ω is the true corresponding pixel;

the normal image generator g is obtained by optimizing the final target, and its expression is:

in the above formula, g is a parameter for controlling the gradation strength.

S3, sequentially rendering the intermediate normal image through the 3D grid generation model and the differential renderer, and outputting the rendered normal image;

s31, deforming the predefined sphere grid to generate a 3D grid;

specifically, S3 is to generate a 3D mesh from the normal image output by the CGAN structure in S2 by deforming a predefined sphere mesh, where the deformed predefined sphere mesh has a topology class 0, which can be deformed into a shape with the same class level; the deformation process is expressed asWherein v is _i Is the vertex of the mesh, +.>Is the local bias of each vertex, deltav ^g Is a global bias and these two bias vectors are the outputs of the grid predictor.

S32, providing a differentiable renderer;

specifically, constructing a differentiable renderer, and rendering the 3D grid model generated in the step S31 into a 2D normal image again; standard real-time rendering is the process of drawing a 3D model into a computer display, starting from 3D vertices to 2D rendering, since planar vertices and fragment shaders are easily defined in a completely micromanipulation, but since the discrete sampling operation, rasterization is not micromanipulation, a differential rasterization formula is proposed to render the 3D mesh model again into a 2D normal image;

let A _i (x _i ,y _i ) Is a single pixel point of the image, and the color of the single pixel point is marked as I _i The gradient can be expressed asSuppose pixel point A _i On the projection plane f _j In addition to when f _j Move to a _i At this point, its color becomes I _i . D is D (A) _i ,f _j ) Represented as pixel point A _i And a projection plane f _j Differentiation along x and y coordinates, thus dividing I _i The derivative of (2) is defined as:

s4, inputting the rendered normal line image and the real normal line image into a LeNet-5 network for training, and outputting a three-dimensional image model.

Specifically, a LeNet-5 neural network is constructed for optimizing the 3D mesh model predictor. The shape predictor is an encoder-decoder architecture using a predefined sphere with 642 vertices as the base mesh. In the training phase, the 2D normal image generated in the fourth step and the real normal image generated in the first step are firstly downsampled to 64 multiplied by 64, then the resized image is fed to a LeNet-5 network, then the LeNet-5 network is trained, and parameters of the 3D network model predictor are continuously optimized. The optimized 3D network model predictor can finally generate a high-precision 3D model.

In each iteration of the training process, the surface normal of the mesh is first calculated and mapped to the RGB range [0,1]]These values are then rendered into a normal line image by a differentiable rendererThe true value of the normal image is N; considering that the normal image contains both 2D contour information and 3D mesh surface details and that the L1 distance maintains more sharpening features than the L2 distance, calculate +.>The L1 distance between and N is lost as normal:

then let theAnd P represents predicted and true contours using Interselect-Over-Union (IOU) as contour loss, respectively, which is defined as:

in the above, the symbolRepresenting the product of the element levels.

In addition, the edge is normalized using As-rgid-As-poisable energy As edge loss:

in the above, b _i Representing an original edge in edge set B of the gridIs with b _i The corresponding current edge, n, is the number of edges;

a smoothness penalty is introduced that acts directly on the predicted grid and ensures consistency of the surface:

in the above-mentioned method, the step of,<f _i ,f _j >two representing two adjacent facesFace angle, F, represents the set of all adjacent faces;

the final penalty of the grid prediction is the weighted sum of the above-mentioned penalties:

In summary, the 2D sketch generated by the shape Net data set is input into the CGAN structure and the LeNet-5 structure to predict and generate the 3D model. The learning framework is divided into two steps, namely, a 2D sketch image is converted into a normal image through a CGAN network structure, and then a 3D model is generated through a LeNet-5 network structure.

Referring to fig. 2, a CGAN and LeNet-5 based three-dimensional image generation system comprising:

The drawings of the present application are explained:

FIG. 3 shows a CGAN neural network architecture mainly comprising a generator and a discriminator, with the input being a sketch and the output being a normal image; the LeNet-5 neural network structure comprises 3 convolution layers, 2 pooling layers and 2 full-link layers, wherein the input is a normal image, and the output result is a 3D model;

FIG. 4 shows the steps of implementing the overall system of the system flow diagram, extracting a 2D sketch and a true normal image from a ShapeNet three-dimensional dataset; inputting the 2D sketch and the real normal image into a CGAN neural network, and generating the normal image by the CGAN; inputting the normal line image and the real normal line image into a LeNet-5 neural network structure to generate a 3D model;

FIG. 5 is a diagram showing the experimental results of a 2D sketch to generate a 3D model through a CGAN structure and a LeNet-5 structure; where (a) is an input 2D sketch and a generated normal image, and (b) and (c) are 3D models generated from two perspective renderings, the normal image providing geometric surface information that facilitates the restoration of the concave features. Experimental results show that the CGAN structure and the LeNet-5 structure have better performance in the experiment, and the 3D model generated by the 2D sketch shows higher precision.

The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.

While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. The three-dimensional image generation method based on CGAN and LeNet-5 is characterized by comprising the following steps:

inputting the rendered normal image and the real normal image into a LeNet-5 network for training, and outputting a three-dimensional image model;

the step of optimally training the image dataset to be trained based on the CGAN network and outputting the middle normal image specifically comprises the following steps:

optimizing and generating the 2D sketch image based on the optimized generator to obtain an intermediate normal image;

the step of the discriminator based on the CGAN network for discriminating the preliminary intermediate normal image and the real normal image and outputting the discrimination result specifically comprises the following steps:

2. The three-dimensional image generating method based on CGAN and LeNet-5 according to claim 1, wherein the expression for optimizing the generator of the CGAN network according to the discrimination result is as follows:

3. The three-dimensional image generating method based on CGAN and LeNet-5 according to claim 2, wherein the standard objective function of the CGAN network is specifically as follows:

4. The three-dimensional image generating method based on CGAN and LeNet-5 according to claim 3, wherein the step of sequentially rendering the intermediate normal image by the 3D mesh generating model and the differentiable renderer, and outputting the rendered normal image comprises the steps of:

5. The method for generating three-dimensional image based on CGAN and LeNet-5 according to claim 4, wherein the step of inputting the rendered normal image and the real normal image to the LeNet-5 network for training and outputting the three-dimensional image model specifically comprises the steps of:

6. The method for generating a three-dimensional image based on CGAN and LeNet-5 according to claim 5, wherein the formula for introducing smoothness loss and integrating normal loss, contour loss and edge loss for weighting calculation is as follows:

7. The three-dimensional image generation system based on CGAN and LeNet-5 is characterized by comprising the following modules:

the training module is used for inputting the rendered normal image and the real normal image into a LeNet-5 network for training and outputting a three-dimensional image model;

the CGAN network-based optimization training is performed on the image dataset to be trained, and an intermediate normal image is output, and the method specifically comprises the following steps: inputting an image data set to be trained into a CGAN network for training, wherein the CGAN network comprises a generator and a discriminator; mapping the random vector on the 2D sketch image by a generator based on the CGAN network to generate a preliminary intermediate normal image; the discriminator based on the CGAN network performs discrimination processing on the preliminary intermediate normal image and the real normal image, and outputs discrimination results; optimizing the generator of the CGAN network according to the identification result to obtain an optimized generator; optimizing and generating the 2D sketch image based on the optimized generator to obtain an intermediate normal image;

the CGAN network-based discriminator performs discrimination processing on the preliminary intermediate normal image and the real normal image, and outputs discrimination results, which specifically includes: calculating a global distance between the preliminary intermediate normal image and the real normal image based on a discriminator of the CGAN network; performing pixel sampling processing on the preliminary intermediate normal image through a Sobel filter to obtain local clear characteristics of the preliminary intermediate normal image; and integrating the global distance between the preliminary intermediate normal image and the real normal image with the local clear characteristic of the preliminary intermediate normal image to construct an identification result.