CN116363329B - Three-dimensional image generation method and system based on CGAN and LeNet-5 - Google Patents

Three-dimensional image generation method and system based on CGAN and LeNet-5 Download PDF

Info

Publication number
CN116363329B
CN116363329B CN202310214419.6A CN202310214419A CN116363329B CN 116363329 B CN116363329 B CN 116363329B CN 202310214419 A CN202310214419 A CN 202310214419A CN 116363329 B CN116363329 B CN 116363329B
Authority
CN
China
Prior art keywords
image
normal image
cgan
network
lenet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310214419.6A
Other languages
Chinese (zh)
Other versions
CN116363329A (en
Inventor
刘莉
张军飞
谭文俊
王志非
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zwcad Software Co ltd
Original Assignee
Zwcad Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zwcad Software Co ltd filed Critical Zwcad Software Co ltd
Priority to CN202310214419.6A priority Critical patent/CN116363329B/en
Publication of CN116363329A publication Critical patent/CN116363329A/en
Application granted granted Critical
Publication of CN116363329B publication Critical patent/CN116363329B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/10Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • G06T17/205Re-meshing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application discloses a three-dimensional image generation method and a system based on CGAN and LeNet-5, wherein the method comprises the following steps: acquiring an image dataset to be trained based on an open-source 3D dataset shape Net, wherein the image dataset to be trained comprises a true normal image and a 2D sketch image; performing optimization training on the image dataset to be trained based on the CGAN network, and outputting an intermediate normal image; sequentially rendering the intermediate normal image through the 3D grid generation model and the differentiable renderer, and outputting the rendered normal image; and inputting the rendered normal image and the real normal image into a LeNet-5 network for training, and outputting a three-dimensional image model. By using the application, a 3D image model with high precision is obtained based on single 2D sketch conversion through a CGAN structure and a LeNet-5 structure. The method and the system for generating the three-dimensional image based on the CGAN and the LeNet-5 can be widely applied to the technical field of computer aided design.

Description

Three-dimensional image generation method and system based on CGAN and LeNet-5
Technical Field
The application relates to the technical field of computer aided design, in particular to a three-dimensional image generation method and system based on CGAN and LeNet-5.
Background
Computer Aided Design (CAD) refers to the use of a computer and its graphics equipment to assist a designer in performing a design task. In engineering and product design, computers can help designers to take work such as computing, information storage, and mapping. In designing image data, a designer starts designing with a sketch, and editing, enlarging, reducing, translating, rotating, and the like of graphics can be performed by a computer. Sketching is an effective and intuitive way of graphically displaying ideas, which plays an important role in the art creation, product engineering and industrial design fields by its compact and efficient features. However, there is a great gap between sketches and three-dimensional (3D) products. Building a 3D model based on a two-dimensional sketch is the goal of sketch-based 3D shape prediction. The computer cannot perceive 3D shape and spatial position from 2D sketches through a priori knowledge like a human, many studies define additional rules to obtain enough information to convert 2D sketches into 3D models, but these methods are very limited to specific shapes and preconditions, 3D shape generation becomes more cumbersome when there are too many irregular lines, and traditional deep convolutional neural networks such as AlexNet, vggNet are all composed of convolutional layers and fully connected layers, and usually take images with standard sizes as input, resulting in non-spatially arranged outputs; the purpose of converting 2D into 3D is to automatically generate a 3D depth information image through a single parallax 2D image with any size, so that the output result of non-spatial arrangement cannot realize the practical application of converting 2D into 3D.
Disclosure of Invention
In order to solve the technical problems, the application aims to provide a three-dimensional image generation method and a three-dimensional image generation system based on CGAN and LeNet-5, which realize that a high-precision 3D image model is obtained based on single 2D sketch conversion through a CGAN structure and a LeNet-5 structure.
The first technical scheme adopted by the application is as follows: the three-dimensional image generation method based on CGAN and LeNet-5 comprises the following steps:
acquiring an image dataset to be trained based on an open-source 3D dataset shape Net, wherein the image dataset to be trained comprises a true normal image and a 2D sketch image;
performing optimization training on the image dataset to be trained based on the CGAN network, and outputting an intermediate normal image;
sequentially rendering the intermediate normal image through the 3D grid generation model and the differentiable renderer, and outputting the rendered normal image;
and inputting the rendered normal image and the real normal image into a LeNet-5 network for training, and outputting a three-dimensional image model.
Further, the step of optimally training the image dataset to be trained based on the CGAN network and outputting the intermediate normal image specifically comprises the following steps:
inputting an image data set to be trained into a CGAN network for training, wherein the CGAN network comprises a generator and a discriminator;
mapping the random vector on the 2D sketch image by a generator based on the CGAN network to generate a preliminary intermediate normal image;
the discriminator based on the CGAN network performs discrimination processing on the preliminary intermediate normal image and the real normal image, and outputs discrimination results;
optimizing the generator of the CGAN network according to the identification result to obtain an optimized generator;
and carrying out optimization generation on the 2D sketch image based on the optimized generator to obtain an intermediate normal image.
Further, the CGAN network-based discriminator performs a discrimination process on the preliminary intermediate normal image and the real normal image, and outputs a discrimination result, which specifically includes:
calculating a global distance between the preliminary intermediate normal image and the real normal image based on a discriminator of the CGAN network;
performing pixel sampling processing on the preliminary intermediate normal image through a Sobel filter to obtain local clear characteristics of the preliminary intermediate normal image;
and integrating the global distance between the preliminary intermediate normal image and the real normal image with the local clear characteristic of the preliminary intermediate normal image to construct an identification result.
Further, the expression for optimizing the generator of the CGAN network according to the authentication result is specifically as follows:
in the above formula, g represents a parameter for controlling the gradation strength,representing the global distance between the preliminary intermediate normal image and the real normal image, +.>Representing locally sharp features, lambda, of a preliminary intermediate normal image g And lambda (lambda) l Respectively represent a loss function->And->Weight parameters of (c).
Further, the standard objective function of the CGAN network is specifically as follows:
in the above formula, G represents a generator of the CGAN network, D represents a discriminator of the CGAN network, x represents an input 2D sketch image, y represents a normal image, z represents a random vector, D (x, y) represents a probability that D judges whether y is true under the condition of x, D (x, G (x, z))) represents a probability that D judges whether a picture generated by G under the condition of x is true,representing mathematical expectations.
Further, the step of sequentially rendering the intermediate normal image by the 3D mesh generation model and the differentiable renderer, and outputting the rendered normal image specifically includes:
deforming the intermediate normal image by deforming the predefined sphere mesh based on the 3D mesh generation model, generating an intermediate normal image with the 3D mesh;
and rendering the intermediate normal image with the 3D grid to a two-dimensional image by using a differential renderer, wherein the two-dimensional image is an image obtained by projecting the intermediate normal image with the 3D grid on a two-dimensional plane, and outputting the rendered normal image.
Further, the step of inputting the rendered normal image and the real normal image into a LeNet-5 network for training and outputting a three-dimensional image model specifically comprises the following steps:
performing downsampling operation on the rendered normal image to obtain a compressed normal image;
constructing a LeNet-5 network, and inputting the compressed normal line image and the real normal line image into the LeNet-5 network, wherein the LeNet-5 network comprises a convolution layer C1, a pooling layer S2, a convolution layer C3, a pooling layer S4, a convolution layer C5, a full-connection layer F6 and an output layer;
calculating the surface normal of the compressed normal image grid and mapping the surface normal to RGB range [0,1];
acquiring a true value of a real normal image, and calculating normal loss of the real normal image and the compressed normal image by combining RGB (red, green and blue) ranges of the surface normal mapping of the compressed normal image grid;
calculating the contour loss of the real normal image and the compressed normal image through Intersection-Over-Union;
using As-Rigid-As-Possible energy As edge loss to normalize the edge loss of the real normal image and the compressed normal image;
introducing smoothness loss, integrating normal loss, contour loss and edge loss, carrying out weighted calculation, optimizing the LeNet-5 network, and outputting a three-dimensional image model by the optimized LeNet-5 network.
Further, the formula for weighting calculation by integrating the normal loss, the contour loss and the edge loss is as follows:
in the above-mentioned method, the step of,representing the final loss of the LeNet-5 network, < >>Represents normal loss, ++>Represents edge loss, ++>Representing contour loss->Represents a loss of smoothness lambda n 、λ b 、λ p And lambda (lambda) s And the weight parameters corresponding to the loss functions are represented.
The second technical scheme adopted by the application is as follows: a three-dimensional image generation system based on CGAN and LeNet-5, comprising:
the acquisition module is used for acquiring an image data set to be trained based on an open-source 3D data set ShapeNet, wherein the image data set to be trained comprises a real normal image and a 2D sketch image;
the optimization module performs optimization training on the image data set to be trained based on the CGAN network and outputs an intermediate normal image;
the rendering module is used for sequentially rendering the intermediate normal image through the 3D grid generation model and the differential renderer and outputting the rendered normal image;
and the training module is used for inputting the rendered normal image and the real normal image into the LeNet-5 network for training and outputting a three-dimensional image model.
The method and the system have the beneficial effects that: according to the application, a 2D sketch generated by a shape Net data set is input into a CGAN structure and a LeNet-5 structure to predict and generate a 3D model, a 2D sketch image is converted into a normal image thereof through the CGAN network structure, the normal image provides geometric surface information favorable for recovering concave characteristics, the CGAN network comprises two modules, mutual game learning of a generator model and a discriminator model generates relatively good graphic output with specific attributes, then the 3D model is generated through the LeNet-5 network structure, the LeNet-5 network has characteristic learning capability, the input information can be subjected to translation invariant classification according to the hierarchical structure, and the application can be converted based on a single 2D sketch to obtain a high-precision 3D image model.
Drawings
FIG. 1 is a flow chart of the steps of the CGAN and LeNet-5 based three-dimensional image generation method of the present application;
FIG. 2 is a block diagram of a three-dimensional image generation system based on CGAN and LeNet-5 of the present application;
FIG. 3 is a schematic diagram of the structure of the CGAN and LeNet-5 neural network of the present application;
FIG. 4 is a schematic flow chart of three-dimensional image generation in accordance with an embodiment of the present application;
FIG. 5 is a graph of experimental results obtained in accordance with an embodiment of the present application.
Detailed Description
The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
Because of sparsity and irregularity of the sketch, extracting complete 3D shapes from the sketch still presents a problem that is very limited to specific shapes and preconditions, the development of deep neural networks provides new ideas for sketch generation of 3D models, neural networks have the potential to learn from 3D model priors to predict 3D shapes corresponding to the sketch by internally extracting a set of patterns or associations between different variables/parameters in the dataset, and then using these patterns to predict the output of interest for a given number of input variables;
the generated countermeasure network (GAN) is a deep learning model, and is one of the most promising methods for unsupervised learning on complex distribution in recent years; the model is built up of two modules in the frame: the mutual game learning of the generator model and the discriminator model produces relatively good output, and since GAN cannot generate pictures with specific attributes, a conditional generation countermeasure network (CGAN) is a network optimized for the problem, and the core of the method is that attribute information y is integrated into the generator G and the discriminator D, and the attribute y can be any label information, such as the category of images, facial expression of face images, and the like; convolutional Neural Networks (CNNs) are a type of feed-forward neural network comprising convolutional computation and having a depth structure, the convolutional neural networks have a characteristic learning capability, and can perform translation invariant classification on input information according to a hierarchical structure of the convolutional neural networks, leNet-5 is one type of CNNs, the structure of the convolutional neural networks is 7 layers in total, each layer is provided with a plurality of Feature maps, and each Feature Map is a Feature of an input extracted through a convolutional filter.
The application provides a framework based on CGAN and LeNet-5 models, which can generate an end-to-end learning framework of a 3D model from a single 2D sketch image; according to the learning framework, firstly, a sketch image is converted into a normal image of the sketch image by using the CGAN, and then, the normal image is used for constructing a 3D shape by using the LeNet-5, so that the learning framework can cope with challenges in 3D shape modeling based on a single sketch, and therefore, the application of the CGAN and the LeNet-5 neural network to the field of 3D model generation has very important significance; the key point of the work is that the 2D sketch is input to a CGAN network, the CGAN network converts the 2D sketch into a normal image, and then the normal image is input to a LeNet-5 network to generate a high-precision 3D model; the CGAN network includes two modules: the mutual game learning of the generator model and the discriminator model produces relatively good graphical outputs with specific attributes; the LeNet-5 is a feedforward neural network which comprises convolution calculation and has a depth structure, and the convolution neural network has characteristic learning capability and can carry out translation invariant classification on input information according to a hierarchical structure of the convolution neural network. The application provides an end-to-end learning framework for generating a 3D model from a single 2D sketch image by combining a CGAN structure and a LeNet-5 structure. The learning of the learning framework is divided into two steps, converting the 2D sketch image into its normal image, and then restoring the 3D shape.
Referring to fig. 1, the present application provides a three-dimensional image generation method based on CGAN and LeNet-5, the method comprising the steps of:
s1, acquiring an image dataset to be trained based on an open source 3D dataset shape, wherein the image dataset to be trained comprises a real normal image and a 2D sketch image;
specifically, the application focuses on using a data driven method to improve the accuracy of generating a 3D model from a single 2D sketch, using an open source 3D dataset shaanenet, which is a annotated 3D shape dataset that is commonly used to support computer graphics research; according to the application, each 3D model in the data set generates a 256×256-pixel real normal image set Y and a sketch image data set X according to 24 azimuth angles, and a training set, a verification set and a test set are divided according to the proportion of 8:1:1.
S2, optimizing training is carried out on the image data set to be trained based on the CGAN network, and an intermediate normal image is output;
specifically, normal image generation of a single sketch image is considered as an image-to-image translation. For normal image generators, CGAN architecture is a common option for image-to-image conversion. The normal image generator is optimized using a CGANs hybrid global L1 distance and local-definition feature sample regularizer. In each training iteration, the sketch image X { 256 pixels in step 1 is taken 1 ,X 2 …X n Generation G fed to the CGAN structure, intermediate normal image set generated by the generatorThen the intermediate normal image and the real normal image set Y { Y } in the first step 1 ,Y 2 …Y n The task of the discriminator is to receive the real data or to generate data and to try to predict whether the input is real (true) or false (false), after training, the CGAN can generate a better normal image N from the sketch of the input;
the generator G maps the random vector z to an image y, namely G is x-y, and the CGANs condition corresponds to additional information x, wherein x is a sketch image in the application; the standard objective function of CGANs is defined as:
in the above formula, G represents a generator of the CGAN network, D represents a discriminator of the CGAN network, x represents an input 2D sketch image, y represents a normal image, z represents a random vector, D (x, y) represents a probability that D judges whether y is true under the condition of x, D (x, G (x, z))) represents a probability that D judges whether a picture generated by G under the condition of x is true,representing mathematical expectations;
the distance between the generated normal image and the real image is calculated to normalize the global image distribution, and the L2 norm encourages image blurring compared to the L1 norm, and thus the global distance is:
furthermore, in each training iteration, some pixels are sampled from regions with sharp geometric features (such as corners and edges) by applying a Sobel filter to the normal image to enhance the constraint of local features:
in the above-mentioned method, the step of,is a pixel sampled from the generated normal image, ω is the true corresponding pixel;
the normal image generator g is obtained by optimizing the final target, and its expression is:
in the above formula, g is a parameter for controlling the gradation strength.
S3, sequentially rendering the intermediate normal image through the 3D grid generation model and the differential renderer, and outputting the rendered normal image;
s31, deforming the predefined sphere grid to generate a 3D grid;
specifically, S3 is to generate a 3D mesh from the normal image output by the CGAN structure in S2 by deforming a predefined sphere mesh, where the deformed predefined sphere mesh has a topology class 0, which can be deformed into a shape with the same class level; the deformation process is expressed asWherein v is i Is the vertex of the mesh, +.>Is the local bias of each vertex, deltav g Is a global bias and these two bias vectors are the outputs of the grid predictor.
S32, providing a differentiable renderer;
specifically, constructing a differentiable renderer, and rendering the 3D grid model generated in the step S31 into a 2D normal image again; standard real-time rendering is the process of drawing a 3D model into a computer display, starting from 3D vertices to 2D rendering, since planar vertices and fragment shaders are easily defined in a completely micromanipulation, but since the discrete sampling operation, rasterization is not micromanipulation, a differential rasterization formula is proposed to render the 3D mesh model again into a 2D normal image;
let A i (x i ,y i ) Is a single pixel point of the image, and the color of the single pixel point is marked as I i The gradient can be expressed asSuppose pixel point A i On the projection plane f j In addition to when f j Move to a i At this point, its color becomes I i . D is D (A) i ,f j ) Represented as pixel point A i And a projection plane f j Differentiation along x and y coordinates, thus dividing I i The derivative of (2) is defined as:
s4, inputting the rendered normal line image and the real normal line image into a LeNet-5 network for training, and outputting a three-dimensional image model.
Specifically, a LeNet-5 neural network is constructed for optimizing the 3D mesh model predictor. The shape predictor is an encoder-decoder architecture using a predefined sphere with 642 vertices as the base mesh. In the training phase, the 2D normal image generated in the fourth step and the real normal image generated in the first step are firstly downsampled to 64 multiplied by 64, then the resized image is fed to a LeNet-5 network, then the LeNet-5 network is trained, and parameters of the 3D network model predictor are continuously optimized. The optimized 3D network model predictor can finally generate a high-precision 3D model.
In each iteration of the training process, the surface normal of the mesh is first calculated and mapped to the RGB range [0,1]]These values are then rendered into a normal line image by a differentiable rendererThe true value of the normal image is N; considering that the normal image contains both 2D contour information and 3D mesh surface details and that the L1 distance maintains more sharpening features than the L2 distance, calculate +.>The L1 distance between and N is lost as normal:
then let theAnd P represents predicted and true contours using Interselect-Over-Union (IOU) as contour loss, respectively, which is defined as:
in the above, the symbolRepresenting the product of the element levels.
In addition, the edge is normalized using As-rgid-As-poisable energy As edge loss:
in the above, b i Representing an original edge in edge set B of the gridIs with b i The corresponding current edge, n, is the number of edges;
a smoothness penalty is introduced that acts directly on the predicted grid and ensures consistency of the surface:
in the above-mentioned method, the step of,<f i ,f j >two representing two adjacent facesFace angle, F, represents the set of all adjacent faces;
the final penalty of the grid prediction is the weighted sum of the above-mentioned penalties:
in the above-mentioned method, the step of,representing the final loss of the LeNet-5 network, < >>Represents normal loss, ++>Represents edge loss, ++>Representing contour loss->Represents a loss of smoothness lambda n 、λ b 、λ p And lambda (lambda) s And the weight parameters corresponding to the loss functions are represented.
In summary, the 2D sketch generated by the shape Net data set is input into the CGAN structure and the LeNet-5 structure to predict and generate the 3D model. The learning framework is divided into two steps, namely, a 2D sketch image is converted into a normal image through a CGAN network structure, and then a 3D model is generated through a LeNet-5 network structure.
Referring to fig. 2, a CGAN and LeNet-5 based three-dimensional image generation system comprising:
the acquisition module is used for acquiring an image data set to be trained based on an open-source 3D data set ShapeNet, wherein the image data set to be trained comprises a real normal image and a 2D sketch image;
the optimization module performs optimization training on the image data set to be trained based on the CGAN network and outputs an intermediate normal image;
the rendering module is used for sequentially rendering the intermediate normal image through the 3D grid generation model and the differential renderer and outputting the rendered normal image;
and the training module is used for inputting the rendered normal image and the real normal image into the LeNet-5 network for training and outputting a three-dimensional image model.
The drawings of the present application are explained:
FIG. 3 shows a CGAN neural network architecture mainly comprising a generator and a discriminator, with the input being a sketch and the output being a normal image; the LeNet-5 neural network structure comprises 3 convolution layers, 2 pooling layers and 2 full-link layers, wherein the input is a normal image, and the output result is a 3D model;
FIG. 4 shows the steps of implementing the overall system of the system flow diagram, extracting a 2D sketch and a true normal image from a ShapeNet three-dimensional dataset; inputting the 2D sketch and the real normal image into a CGAN neural network, and generating the normal image by the CGAN; inputting the normal line image and the real normal line image into a LeNet-5 neural network structure to generate a 3D model;
FIG. 5 is a diagram showing the experimental results of a 2D sketch to generate a 3D model through a CGAN structure and a LeNet-5 structure; where (a) is an input 2D sketch and a generated normal image, and (b) and (c) are 3D models generated from two perspective renderings, the normal image providing geometric surface information that facilitates the restoration of the concave features. Experimental results show that the CGAN structure and the LeNet-5 structure have better performance in the experiment, and the 3D model generated by the 2D sketch shows higher precision.
The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (7)

1. The three-dimensional image generation method based on CGAN and LeNet-5 is characterized by comprising the following steps:
acquiring an image dataset to be trained based on an open-source 3D dataset shape Net, wherein the image dataset to be trained comprises a true normal image and a 2D sketch image;
performing optimization training on the image dataset to be trained based on the CGAN network, and outputting an intermediate normal image;
sequentially rendering the intermediate normal image through the 3D grid generation model and the differentiable renderer, and outputting the rendered normal image;
inputting the rendered normal image and the real normal image into a LeNet-5 network for training, and outputting a three-dimensional image model;
the step of optimally training the image dataset to be trained based on the CGAN network and outputting the middle normal image specifically comprises the following steps:
inputting an image data set to be trained into a CGAN network for training, wherein the CGAN network comprises a generator and a discriminator;
mapping the random vector on the 2D sketch image by a generator based on the CGAN network to generate a preliminary intermediate normal image;
the discriminator based on the CGAN network performs discrimination processing on the preliminary intermediate normal image and the real normal image, and outputs discrimination results;
optimizing the generator of the CGAN network according to the identification result to obtain an optimized generator;
optimizing and generating the 2D sketch image based on the optimized generator to obtain an intermediate normal image;
the step of the discriminator based on the CGAN network for discriminating the preliminary intermediate normal image and the real normal image and outputting the discrimination result specifically comprises the following steps:
calculating a global distance between the preliminary intermediate normal image and the real normal image based on a discriminator of the CGAN network;
performing pixel sampling processing on the preliminary intermediate normal image through a Sobel filter to obtain local clear characteristics of the preliminary intermediate normal image;
and integrating the global distance between the preliminary intermediate normal image and the real normal image with the local clear characteristic of the preliminary intermediate normal image to construct an identification result.
2. The three-dimensional image generating method based on CGAN and LeNet-5 according to claim 1, wherein the expression for optimizing the generator of the CGAN network according to the discrimination result is as follows:
in the above formula, g represents a parameter for controlling the gradation strength,representing the global distance between the preliminary intermediate normal image and the real normal image, +.>Representing locally sharp features, lambda, of a preliminary intermediate normal image g And lambda (lambda) l Respectively represent a loss function->And->Weight parameters of (c).
3. The three-dimensional image generating method based on CGAN and LeNet-5 according to claim 2, wherein the standard objective function of the CGAN network is specifically as follows:
in the above formula, G represents a generator of the CGAN network, D represents a discriminator of the CGAN network, x represents an input 2D sketch image, y represents a normal image, z represents a random vector, D (x, y) represents a probability that D judges whether y is true under the condition of x, D (x, G (x, z))) represents a probability that D judges whether a picture generated by G under the condition of x is true,representing mathematical expectations.
4. The three-dimensional image generating method based on CGAN and LeNet-5 according to claim 3, wherein the step of sequentially rendering the intermediate normal image by the 3D mesh generating model and the differentiable renderer, and outputting the rendered normal image comprises the steps of:
deforming the intermediate normal image by deforming the predefined sphere mesh based on the 3D mesh generation model, generating an intermediate normal image with the 3D mesh;
and rendering the intermediate normal image with the 3D grid to a two-dimensional image by using a differential renderer, wherein the two-dimensional image is an image obtained by projecting the intermediate normal image with the 3D grid on a two-dimensional plane, and outputting the rendered normal image.
5. The method for generating three-dimensional image based on CGAN and LeNet-5 according to claim 4, wherein the step of inputting the rendered normal image and the real normal image to the LeNet-5 network for training and outputting the three-dimensional image model specifically comprises the steps of:
performing downsampling operation on the rendered normal image to obtain a compressed normal image;
constructing a LeNet-5 network, and inputting the compressed normal line image and the real normal line image into the LeNet-5 network, wherein the LeNet-5 network comprises a convolution layer C1, a pooling layer S2, a convolution layer C3, a pooling layer S4, a convolution layer C5, a full-connection layer F6 and an output layer;
calculating the surface normal of the compressed normal image grid and mapping the surface normal to RGB range [0,1];
acquiring a true value of a real normal image, and calculating normal loss of the real normal image and the compressed normal image by combining RGB (red, green and blue) ranges of the surface normal mapping of the compressed normal image grid;
calculating the contour loss of the real normal image and the compressed normal image through Intersection-Over-Union;
using As-Rigid-As-Possible energy As edge loss to normalize the edge loss of the real normal image and the compressed normal image;
introducing smoothness loss, integrating normal loss, contour loss and edge loss, carrying out weighted calculation, optimizing the LeNet-5 network, and outputting a three-dimensional image model by the optimized LeNet-5 network.
6. The method for generating a three-dimensional image based on CGAN and LeNet-5 according to claim 5, wherein the formula for introducing smoothness loss and integrating normal loss, contour loss and edge loss for weighting calculation is as follows:
in the above-mentioned method, the step of,representing the final loss of the LeNet-5 network, < >>Represents normal loss, ++>Represents edge loss, ++>Representing contour loss->Represents a loss of smoothness lambda n 、λ b 、λ p And lambda (lambda) s And the weight parameters corresponding to the loss functions are represented.
7. The three-dimensional image generation system based on CGAN and LeNet-5 is characterized by comprising the following modules:
the acquisition module is used for acquiring an image data set to be trained based on an open-source 3D data set ShapeNet, wherein the image data set to be trained comprises a real normal image and a 2D sketch image;
the optimization module performs optimization training on the image data set to be trained based on the CGAN network and outputs an intermediate normal image;
the rendering module is used for sequentially rendering the intermediate normal image through the 3D grid generation model and the differential renderer and outputting the rendered normal image;
the training module is used for inputting the rendered normal image and the real normal image into a LeNet-5 network for training and outputting a three-dimensional image model;
the CGAN network-based optimization training is performed on the image dataset to be trained, and an intermediate normal image is output, and the method specifically comprises the following steps: inputting an image data set to be trained into a CGAN network for training, wherein the CGAN network comprises a generator and a discriminator; mapping the random vector on the 2D sketch image by a generator based on the CGAN network to generate a preliminary intermediate normal image; the discriminator based on the CGAN network performs discrimination processing on the preliminary intermediate normal image and the real normal image, and outputs discrimination results; optimizing the generator of the CGAN network according to the identification result to obtain an optimized generator; optimizing and generating the 2D sketch image based on the optimized generator to obtain an intermediate normal image;
the CGAN network-based discriminator performs discrimination processing on the preliminary intermediate normal image and the real normal image, and outputs discrimination results, which specifically includes: calculating a global distance between the preliminary intermediate normal image and the real normal image based on a discriminator of the CGAN network; performing pixel sampling processing on the preliminary intermediate normal image through a Sobel filter to obtain local clear characteristics of the preliminary intermediate normal image; and integrating the global distance between the preliminary intermediate normal image and the real normal image with the local clear characteristic of the preliminary intermediate normal image to construct an identification result.
CN202310214419.6A 2023-03-08 2023-03-08 Three-dimensional image generation method and system based on CGAN and LeNet-5 Active CN116363329B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310214419.6A CN116363329B (en) 2023-03-08 2023-03-08 Three-dimensional image generation method and system based on CGAN and LeNet-5

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310214419.6A CN116363329B (en) 2023-03-08 2023-03-08 Three-dimensional image generation method and system based on CGAN and LeNet-5

Publications (2)

Publication Number Publication Date
CN116363329A CN116363329A (en) 2023-06-30
CN116363329B true CN116363329B (en) 2023-11-03

Family

ID=86932592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310214419.6A Active CN116363329B (en) 2023-03-08 2023-03-08 Three-dimensional image generation method and system based on CGAN and LeNet-5

Country Status (1)

Country Link
CN (1) CN116363329B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537031A (en) * 2021-07-12 2021-10-22 电子科技大学 Radar image target identification method for generating countermeasure network based on condition of multiple discriminators
CN114842136A (en) * 2022-04-08 2022-08-02 华南理工大学 Single-image three-dimensional face reconstruction method based on differentiable renderer
WO2022250401A1 (en) * 2021-05-24 2022-12-01 Samsung Electronics Co., Ltd. Methods and systems for generating three dimensional (3d) models of objects
CN115457197A (en) * 2022-08-29 2022-12-09 北京邮电大学 Face three-dimensional reconstruction model training method, reconstruction method and device based on sketch

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022250401A1 (en) * 2021-05-24 2022-12-01 Samsung Electronics Co., Ltd. Methods and systems for generating three dimensional (3d) models of objects
CN113537031A (en) * 2021-07-12 2021-10-22 电子科技大学 Radar image target identification method for generating countermeasure network based on condition of multiple discriminators
CN114842136A (en) * 2022-04-08 2022-08-02 华南理工大学 Single-image three-dimensional face reconstruction method based on differentiable renderer
CN115457197A (en) * 2022-08-29 2022-12-09 北京邮电大学 Face three-dimensional reconstruction model training method, reconstruction method and device based on sketch

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Image-based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era;Xian-Feng Han, Hamid Laga;IEEE;全文 *
基于二维点云图的三维人体建模方法;张广翩;计忠平;;计算机工程与应用(第19期);全文 *
基于条件生成对抗网络的图像转化方法研究;冷佳明;曾振;刘广源;郑新阳;刘璎慧;;数码世界(第09期);全文 *
由LeNet-5 从单张着装图像重建三维人体;许豪灿,李基拓,陆国栋;《浙江大学学报》;第55卷(第1期);全文 *

Also Published As

Publication number Publication date
CN116363329A (en) 2023-06-30

Similar Documents

Publication Publication Date Title
CN110458939B (en) Indoor scene modeling method based on visual angle generation
Su et al. Splatnet: Sparse lattice networks for point cloud processing
CN111428586B (en) Three-dimensional human body posture estimation method based on feature fusion and sample enhancement
CN110728219B (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
US20050093875A1 (en) Synthesis of progressively-variant textures and application to arbitrary surfaces
CN112950775A (en) Three-dimensional face model reconstruction method and system based on self-supervision learning
CN111091624B (en) Method for generating high-precision drivable human face three-dimensional model from single picture
CN112132739B (en) 3D reconstruction and face pose normalization method, device, storage medium and equipment
CN114373056A (en) Three-dimensional reconstruction method and device, terminal equipment and storage medium
CN113345106A (en) Three-dimensional point cloud analysis method and system based on multi-scale multi-level converter
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
Sharma et al. Point cloud upsampling and normal estimation using deep learning for robust surface reconstruction
CN110335275B (en) Fluid surface space-time vectorization method based on three-variable double harmonic and B spline
Cheng et al. Dense point cloud completion based on generative adversarial network
Wei et al. GeoDualCNN: Geometry-supporting dual convolutional neural network for noisy point clouds
CN114782417A (en) Real-time detection method for digital twin characteristics of fan based on edge enhanced image segmentation
Yin et al. Virtual reconstruction method of regional 3D image based on visual transmission effect
Chen et al. Multi-view Pixel2Mesh++: 3D reconstruction via Pixel2Mesh with more images
CN116363329B (en) Three-dimensional image generation method and system based on CGAN and LeNet-5
Tereshin et al. Automatically controlled morphing of 2d shapes with textures
CN113808006B (en) Method and device for reconstructing three-dimensional grid model based on two-dimensional image
CN115375847A (en) Material recovery method, three-dimensional model generation method and model training method
CN112837420B (en) Shape complement method and system for terracotta soldiers and horses point cloud based on multi-scale and folding structure
CN112017159A (en) Ground target reality simulation method in remote sensing scene
Chu et al. Hole-filling framework by combining structural and textural information for the 3D Terracotta Warriors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant