CN112489198A - Three-dimensional reconstruction system and method based on counterstudy - Google Patents
Three-dimensional reconstruction system and method based on counterstudy Download PDFInfo
- Publication number
- CN112489198A CN112489198A CN202011371730.4A CN202011371730A CN112489198A CN 112489198 A CN112489198 A CN 112489198A CN 202011371730 A CN202011371730 A CN 202011371730A CN 112489198 A CN112489198 A CN 112489198A
- Authority
- CN
- China
- Prior art keywords
- dimensional
- network
- image
- layer
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000012549 training Methods 0.000 claims abstract description 36
- 230000003042 antagnostic effect Effects 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 55
- 230000004913 activation Effects 0.000 claims description 15
- 238000009877 rendering Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 230000000007 visual effect Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000013441 quality evaluation Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 5
- 230000003190 augmentative effect Effects 0.000 abstract description 3
- 230000002452 interceptive effect Effects 0.000 abstract description 3
- 238000004519 manufacturing process Methods 0.000 abstract description 3
- 238000013256 Gubra-Amylin NASH model Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 229940088594 vitamin Drugs 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a three-dimensional reconstruction system based on antagonistic learning and a method thereof. The invention adopts the GAN principle to realize high-quality three-dimensional reconstruction, provides a new antagonistic learning three-dimensional reconstruction frame, and improves and converges an original three-dimensional reconstruction model iteratively by training a GAN model. The model only takes a real-time two-dimensional observation image as a weak supervision means and does not depend on the prior knowledge of a shape model or any three-dimensional datum data. The invention provides a technology for rapidly reconstructing the three-dimensional shape of an object from a view without contact and conveniently, which is applicable to a plurality of fields of ship comprehensive guarantee, equipment virtual maintenance, interactive electronic technical manuals, movies, animation, virtual reality, augmented reality, industrial manufacturing and the like, and has wide market prospect.
Description
Technical Field
The invention belongs to the technical field of computer three-dimensional reconstruction, and particularly relates to a three-dimensional reconstruction system and a three-dimensional reconstruction method based on counterstudy.
Technical Field
In the fields of computer graphics and computer vision, three-dimensional reconstruction is a technique to restore the shape, structure and appearance of real objects. Due to the abundant and visual expressive force, the three-dimensional reconstruction is widely applied to the fields of equipment guarantee, virtual maintenance, construction, geology, archaeology, games, virtual reality and the like. In the past several researchers have made significant progress in three-dimensional reconstruction. The traditional three-dimensional reconstruction methods such as SFM (Structure from motion) and MVS (Multi View Stereo) are as follows: firstly, seeking feature matching in two images, and estimating initial three-dimensional reconstruction results of two views; then, iteratively adding a new image on the basis of the two-view reconstruction result, and performing feature matching on the newly added image and the previous image; and reconstructing a three-dimensional model by using modes such as triangulation, structure, motion beam adjustment and the like. However, the temporal complexity of conventional SFM and MVS methods is typically high; in addition, when the surface of the reconstructed object lacks texture or has specular reflection, cavities, deformation and fuzzy parts are often generated, or the reconstruction of a voxelized three-dimensional model can be only carried out on a simple isolated object, so that the requirements of practical application are not met. Newly developed generation of countermeasure Networks (GAN) is a very influential approach in deep neural Networks, and has been successful in many fields of image processing. GAN has recently been used by some scholars for three-dimensional reconstruction. Among them, the representative work is 3D-GAN [ Jianjun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum.Learning a basic diagnostic space of objects maps via 3D genetic additive modeling. in Advances in Neural Information Processing Systems, pages 82-90,2016 ]. The 3D-GAN introduces a generative-antagonistic loss as a decision whether the object is real or reconstructed. Since three-dimensional objects are highly structured, the generative-antagonistic criterion has better performance than traditional methods in capturing three-dimensional object structural differences. The current GAN network three-dimensional reconstruction method still has many disadvantages, such as low precision, poor stability of the training process, etc.
Disclosure of Invention
The invention aims to provide a three-dimensional reconstruction system and a three-dimensional reconstruction method based on antagonistic learning by utilizing a mapping relation from a low-dimensional probability space to a three-dimensional object space by utilizing a GAN network technology aiming at the defects of the existing single-view three-dimensional reconstruction technology. The reconstruction process of the method does not depend on a three-dimensional CAD model and a training data set corresponding to a two-dimensional image.
In order to solve the technical problems, the invention adopts the following technical scheme.
The invention relates to a three-dimensional reconstruction system based on antagonistic learning, which comprises:
a three-dimensional generation network and a three-dimensional discrimination network;
the three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: generating a classification probability value of the image;
the three-dimensional generation network comprises: reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse a three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene; and (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1.
Further, the three-dimensional generation network includes:
1 two-dimensional convolutional layer, denoted Conv; 2 dense connection modules; 3 full connectivity layers, denoted FC; 4 three-dimensional transposed convolution layers, denoted convT;
the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, the convolution kernel size is recorded as Stride, 16 characteristic graphs are output, and the characteristic graphs are recorded as FM;
the 2 dense connecting modules respectively comprise 4 two-dimensional convolution layers; the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 x 3, the last convolution kernel size is 1 x 1; the stride is 1; setting 1 batch normalization layer behind each two-dimensional convolution layer, recording as a BN layer and 1 ReLU activation function; setting 1 average pooling layer behind the last convolution layer, and recording as Avg Pool; outputting 32 feature maps for each two-dimensional convolutional layer in the first dense connection module; outputting 64 feature maps per two-dimensional convolutional layer in a second dense connection module;
the outputs of the 3 fully-connected layers are 2048, 1024 and 256 × 4 × 4 × 4, respectively, and 1 BN layer and 1 ReLU activation function are arranged behind each fully-connected layer;
the core size of each of the 4 three-dimensional transposed convolution layers is 3 multiplied by 3, the step size is 2, the output channels are 256, 128, 64 and 16 respectively, and 1 BN layer and a ReLU activation function are arranged after each three-dimensional transposed convolution layer.
Further, the three-dimensional discrimination network includes:
1 two-dimensional convolution layer, 2 dense connection modules and 2 full connection layers;
the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, and 64 characteristic graphs are output;
the 2 dense connection modules comprise 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 BN layer and 1 ReLU activation function behind each two-dimensional convolution layer; setting 1 Avg Pool behind the last coiling layer; outputting 128 feature maps per two-dimensional convolutional layer in the first dense connection module; outputting 256 feature maps per two-dimensional convolutional layer in the second dense connection module;
the outputs of the 2 full-connection layers are 2048 and 1 respectively, and 1 BN layer and 1 ReLU activation function are arranged behind the previous full-connection layer; the latter full connectivity layer is followed by a Sigmoid function.
The invention relates to a three-dimensional reconstruction system based on antagonistic learning and a method thereof, wherein the three-dimensional reconstruction system comprises: designing a loss function L for training a deep neural networkOverallThe three-dimensional generation network and the three-dimensional discrimination network are trained in an antagonistic way, and when the network model reaches Nash equilibrium, the three-dimensional generation network can reconstruct a three-dimensional scene model which is completely consistent with the characteristics and distribution of a real scene(ii) a For the observation image of the reconstructed three-dimensional scene model and the observation view of the real three-dimensional scene, the classification probability of the three-dimensional discrimination network is 0.5;
the confrontation training comprises the following processes:
a. generating an initial three-dimensional scene model, and initializing a three-dimensional generation network; the specific process is as follows: shooting a video by using a camera, and generating a real reference image data set, camera parameters and a motion pose T according to the video; estimating image depth information by comparing differences between adjacent image frames; generating an initial three-dimensional scene model by adopting a space mapping method;
b. placing the reconstructed three-dimensional scene model in a three-dimensional virtual environment, setting a virtual camera with the same parameters as the real camera in the three-dimensional virtual environment, and acquiring a rendering image stream of the three-dimensional scene model by using the virtual camera; the specific process is as follows: moving the virtual camera along a camera track T recorded in the process of acquiring the reference video; projecting the reconstructed three-dimensional scene model to a two-dimensional image by using a pseudo renderer at the same position and view point as the real observation scene to generate rendered images with the same number as the reference images;
c. after the reference image and the rendering image are ready, the network model confrontation training can be carried out; distinguishing a reference image and a rendered image by using a three-dimensional discrimination network; calculating a total loss value through a loss function, carrying out network fine adjustment, and forming a new three-dimensional generation network and a new three-dimensional discrimination network;
d. iteratively training a three-dimensional generation network and a three-dimensional discrimination network; the specific process is as follows: generating a new three-dimensional reconstruction model by using a new three-dimensional generation net according to the reference image; and (b) transferring to the step (b): placing the three-dimensional reconstruction model in a virtual environment, observing by using a virtual camera, and inputting a newly observed rendering image and a reference image into a three-dimensional discrimination network for discrimination; then, repeating steps (b) - (d), iteratively training and creating new three-dimensional generating networks and three-dimensional discriminating networks until the total loss converges to a desired value.
Further, the loss function of the training deep neural networkNumber LOverallThe method comprises the following steps: reconstruction loss function LReconsCross entropy loss function LGANThe definition is as follows:
LOverall=λ·LRecons+(1-λ)·LGAN (1)
wherein, λ is a parameter for adjusting the weight between the reconstruction loss and the cross entropy loss;
phi rebuilding loss function LRecons
Reconstruction loss function LReconsMay be defined by the difference between the reference image and the rendered image calculated from the three-dimensional discriminator; two criteria were used: the structural similarity SSIM is an image quality evaluation index based on a human visual system, an SSIM indicating value of a reference image and a rendered image is between 0 and 1, and when the SSIM value is close to 1, the difference between the image x and the image y is small; the peak signal-to-noise ratio PSNR, the index of which evaluates the difference of influence effects from the viewpoint of gray fidelity, has a common value of 20-70dB, and the value of PSNR is adjusted to the range of 0 to 1 by adopting a Sigmoid function:
wherein E _ Sigm () represents a Sigmoid function;
reconstruction of the loss function L according to the inventionReconsIs defined as:
wherein, alpha and beta are parameters for adjusting PSNR and SSIM weight; subscript GjFjRepresenting a reference image and rendered image pair; n represents the total number of image pairs;
② cross entropy loss function LGAN
Cross entropy loss value LGANThe training process of the three-dimensional generation network and the three-dimensional discrimination network can be quantitatively reflected; the WGAN with the gradient penalty is more effective for the complex three-dimensional net formation and the three-dimensional discrimination net formation of the training; here, theDesign cross entropy loss function L by WGAN training methodGAN:
Wherein, PrIs the true reference image distribution; pgIs rendering the image distribution; symbol x denotes a reference image, symbolIs a rendered image implicitly generated by a three-dimensional generating network;representing a gradient penalty of the three-dimensional generation network; theta is a parameter for adjusting the gradient penalty weight in the cross entropy loss;is implicitly defined as a data set along the path from PrAnd PgUniformly sampling straight lines between the distributed point pairs; e represents the mathematical expectation.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the method provided by the invention belongs to a weak supervision learning framework, only takes the collected two-dimensional observation image as supervision, and does not depend on three-dimensional shape prior information, a CAD model base and other reference data. Because the three-dimensional labeling needs to be acquired by designing a three-dimensional CAD model or by adopting special equipment such as a three-dimensional scanner for three-dimensional scanning, the workload is huge. For some specific application scenarios, it is not even possible to acquire a three-dimensional reference shape. Under the condition, the weak supervision learning framework provided by the invention greatly reduces the workload of acquiring the labeled data and effectively realizes three-dimensional reconstruction.
(2) The method provided by the invention provides a new solution for the three-dimensional reconstruction based on deep learning, and comprises the modules of generating an initial three-dimensional scene model by adopting a space mapping method, acquiring rendering image streams of the three-dimensional scene model by adopting a virtual camera, training a network model by adopting a confrontation mode and the like. Through the cooperative work of a plurality of modules, the method can effectively improve the reconstruction precision and improve the stability of the training process.
Drawings
FIG. 1 is a system block diagram of an embodiment of a three-dimensional reconstruction system based on antagonistic learning according to the present invention.
FIG. 2 is a block diagram of a three-dimensional generation network in accordance with one embodiment of the present invention.
Fig. 3 is a block diagram of a three-dimensional discrimination network according to an embodiment of the present invention.
Detailed Description
The invention discloses a three-dimensional reconstruction system based on antagonistic learning and a method thereof, which adopt a GAN principle to realize high-quality three-dimensional reconstruction, and provide a novel antagonistic learning three-dimensional reconstruction frame system for iteratively improving and converging any original three-dimensional reconstruction model by training a GAN model. The model only takes a real-time two-dimensional observation image as a weak supervision means and does not depend on the prior knowledge of a shape model or any three-dimensional datum data. The method is a non-contact and convenient technology for rapidly reconstructing the three-dimensional shape of the object from the view, can be widely applied to multiple fields of ship comprehensive guarantee, equipment virtual maintenance, interactive electronic technical manuals, movies, animations, virtual reality, augmented reality, industrial manufacturing and the like, and has wide market prospect.
The present invention will be described in further detail with reference to the accompanying drawings.
FIG. 1 is a system block diagram of an embodiment of a three-dimensional reconstruction system based on antagonistic learning according to the present invention.
As shown in fig. 1, the system of this embodiment of the present invention includes: three-dimensional generation network and three-dimensional discrimination network.
The three-dimensional generation network comprises: and reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse the three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene. And (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1.
The three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: a classification probability value for the image is generated.
The method comprises the following steps: designing a loss function L for training a deep neural networkOverallThe three-dimensional generation network and the three-dimensional discrimination network are subjected to antagonistic training, and when the network model reaches Nash equilibrium, the three-dimensional generation network can reconstruct a three-dimensional scene model which is completely consistent with the characteristics and distribution of a real scene; for the observation image of the reconstructed three-dimensional scene model and the observation view of the real three-dimensional scene, the classification probability of the three-dimensional discrimination network is 0.5;
the confrontation training comprises the following processes:
And 2, placing the reconstructed three-dimensional scene model in a three-dimensional virtual environment, setting a virtual camera with the same parameters as the real camera in the three-dimensional virtual environment, and acquiring a rendering image stream of the three-dimensional scene model by using the virtual camera. The specific process is as follows: moving the virtual camera along a camera track T recorded in the process of acquiring the reference video; and projecting the reconstructed three-dimensional scene model to the two-dimensional image by using a pseudo renderer at the same position and view point as the real observation scene, and generating the rendered images with the same number as the reference images.
And 3, after the reference image and the rendering image are ready, performing network model confrontation training. Distinguishing a reference image and a rendered image by using a three-dimensional discrimination network; and calculating a total loss value through a loss function, carrying out network fine adjustment, and forming a new three-dimensional generation network and a new three-dimensional discrimination network. Loss function L for training deep neural networkOverallThe method comprises the following steps: reconstruction loss function LReconsAnd cross entropy loss function LGAN. Loss function LOverallIs represented as follows:
LOverall=λ·LRecons+(1-λ)·LGAN (1)
where λ is a parameter that adjusts the weight between the reconstruction loss and the cross-entropy loss.
Phi rebuilding loss function LRecons
Reconstruction loss function LReconsMay be defined by the difference between the reference image and the rendered image calculated from the three-dimensional discriminator. The invention adopts two indexes: structural SIMilarity (SSIM) is an image quality evaluation index based on the human visual system, the SSIM indicating value of two images is between 0 and 1, and when the SSIM value is close to 1, the difference between the images x and y is small; the Peak Signal-to-Noise Ratio (PSNR) index evaluates the difference of the influence effect from the aspect of gray fidelity, the common value of the PSNR is between 20 and 70dB, and the value of the PSNR is adjusted to the range of 0 to 1 by adopting a Sigmoid function:
where E _ Sigm () represents a Sigmoid function.
Reconstruction of the loss function L according to the inventionReconsIs defined as:
wherein, alpha and beta are parameters for adjusting PSNR and SSIM weight; subscript GjFjRepresenting a reference image and rendered image pair; n represents the total number of image pairs.
② cross entropy loss function LGAN
Cross entropy loss value LGANThe training process of the three-dimensional generation network and the three-dimensional discrimination network can be quantitatively reflected. WGAN (Wasserstein GAN) with gradient penalty for complex tri-vitamin mesh sum of trainingThe three-dimensional discrimination network is more effective. Therefore, the invention adopts the WGAN training method to design the cross entropy loss function LGANThe method comprises the following steps:
wherein, PrIs the true reference image distribution; pgIs rendering the image distribution; symbol x denotes a reference image, symbolIs a rendered image implicitly generated by a three-dimensional generating network;representing a gradient penalty of the three-dimensional generation network; theta is a parameter for adjusting the gradient penalty weight in the cross entropy loss;is implicitly defined as a data set along the path from PrAnd PgUniformly sampling straight lines between the distributed point pairs; e represents the mathematical expectation.
Step 4, iteratively training a three-dimensional generating network and a three-dimensional judging network, wherein the specific process comprises the following steps: generating a new three-dimensional reconstruction model by using a new three-dimensional generation net according to the reference image; and (3) transferring to the steps: and placing the three-dimensional reconstruction model in a virtual environment, observing by using a virtual camera, and inputting a newly observed rendering image and a reference image into a three-dimensional discrimination network for discrimination. And (d) repeating the steps (b) to (d), and training and creating a new three-dimensional generation network and a new three-dimensional discrimination network in an iterative manner until the total loss converges to a desired value.
FIG. 2 is a block diagram of a three-dimensional generation network in accordance with one embodiment of the present invention. As shown in fig. 2, includes:
1 two-dimensional convolutional layer (denoted Conv), 2 dense-connection blocks, 3 fully-connected layers (denoted FC), and 4 three-dimensional transposed convolutional layers (denoted ConvT).
The convolution kernel size of the two-dimensional convolution layer is 3 × 3, the step (denoted as Stride) is 2, and 16 feature maps (denoted as FM) are output.
The 2 dense connection modules are composed of 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 batch normalization layer (recorded as BN layer) and 1 ReLU activation function after each two-dimensional convolution layer; after the last convolutional layer, 1 average pooling layer (denoted as Avg Pool) was placed. Outputting 32 feature maps for each two-dimensional convolutional layer in the first dense connection module; each two-dimensional convolutional layer in the second dense-connected module outputs 64 feature maps.
The outputs of the 3 fully-connected layers are 2048, 1024 and 256 × 4 × 4 × 4, respectively, and 1 BN layer and 1 ReLU activation function are set after each fully-connected layer.
The core size of each of the 4 three-dimensional transposed convolution layers is 3 multiplied by 3, the step size is 2, the output channels are 256, 128, 64 and 16 respectively, and 1 BN layer and a ReLU activation function are arranged after each three-dimensional transposed convolution layer.
Fig. 3 is a block diagram of a three-dimensional discrimination network according to an embodiment of the present invention. As shown in fig. 3, includes:
1 two-dimensional convolutional layer, 2 dense connection modules, and 2 full connection layers.
The convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, and 64 characteristic graphs are output.
The 2 dense connection modules are composed of 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 BN layer and 1 ReLU activation function behind each two-dimensional convolution layer; and 1 Avg Pool is arranged behind the last convolution layer. Outputting 128 feature maps per two-dimensional convolutional layer in the first dense connection module; each two-dimensional convolutional layer in the second dense-connected module outputs 256 feature maps.
The outputs of the 2 full-connection layers are 2048 and 1 respectively, and 1 BN layer and 1 ReLU activation function are arranged behind the previous full-connection layer; the latter full connectivity layer is followed by a Sigmoid function.
In conclusion, the method combines the latest generation countermeasure GAN network principle and the advantages of the multi-view stereoscopic vision three-dimensional reconstruction method, and iteratively improves the reconstruction quality and stability through countermeasure training of the three-dimensional generation model and the three-dimensional discrimination model. The method provided by the invention belongs to a weak supervision learning framework, only takes the collected two-dimensional observation image as supervision, does not depend on reference data such as three-dimensional shape prior information, a CAD model base and the like, greatly reduces the workload of obtaining the marked data, and effectively realizes three-dimensional reconstruction. The method provided by the invention provides a new solution for the three-dimensional reconstruction based on deep learning, and comprises the modules of generating an initial three-dimensional scene model by adopting a space mapping method, acquiring rendering image streams of the three-dimensional scene model by adopting a virtual camera, training a network model by adopting a confrontation mode and the like. Through the cooperative work of a plurality of modules, the method can effectively improve the reconstruction precision and improve the stability of the training process. The invention provides a technology for rapidly reconstructing the three-dimensional shape of an object from a view without contact and convenience, can be used in a plurality of fields of ship comprehensive guarantee, equipment virtual maintenance, interactive electronic technical manuals, films, animations, virtual reality, augmented reality, industrial manufacturing and the like, and has wide market prospect.
Claims (5)
1. A three-dimensional reconstruction system based on antagonistic learning, comprising:
a three-dimensional generation network and a three-dimensional discrimination network;
the three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: generating a classification probability value of the image;
the three-dimensional generation network comprises: reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse a three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene; and (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1.
2. The system of claim 1, wherein the three-dimensional generation network comprises:
1 two-dimensional convolutional layer, denoted Conv; 2 dense connection modules; 3 full connectivity layers, denoted FC; 4 three-dimensional transposed convolution layers, denoted convT;
the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, the convolution kernel size is recorded as Stride, 16 characteristic graphs are output, and the characteristic graphs are recorded as FM;
the 2 dense connecting modules respectively comprise 4 two-dimensional convolution layers; the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 x 3, the last convolution kernel size is 1 x 1; the stride is 1; setting 1 batch normalization layer behind each two-dimensional convolution layer, recording as a BN layer and 1 ReLU activation function; setting 1 average pooling layer behind the last convolution layer, and recording as Avg Pool; outputting 32 feature maps for each two-dimensional convolutional layer in the first dense connection module; outputting 64 feature maps per two-dimensional convolutional layer in a second dense connection module;
the outputs of the 3 fully-connected layers are 2048, 1024 and 256 × 4 × 4 × 4, respectively, and 1 BN layer and 1 ReLU activation function are arranged behind each fully-connected layer;
the core size of each of the 4 three-dimensional transposed convolution layers is 3 multiplied by 3, the step size is 2, the output channels are 256, 128, 64 and 16 respectively, and 1 BN layer and a ReLU activation function are arranged after each three-dimensional transposed convolution layer.
3. The system of claim 1, wherein the three-dimensional discriminant network comprises:
1 two-dimensional convolution layer, 2 dense connection modules and 2 full connection layers;
the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, and 64 characteristic graphs are output;
the 2 dense connection modules comprise 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 BN layer and 1 ReLU activation function behind each two-dimensional convolution layer; setting 1 Avg Pool behind the last coiling layer; outputting 128 feature maps per two-dimensional convolutional layer in the first dense connection module; outputting 256 feature maps per two-dimensional convolutional layer in the second dense connection module;
the outputs of the 2 full-connection layers are 2048 and 1 respectively, and 1 BN layer and 1 ReLU activation function are arranged behind the previous full-connection layer; the latter full connectivity layer is followed by a Sigmoid function.
4. A three-dimensional reconstruction system and method based on antagonistic learning are characterized in that:
adopting a three-dimensional reconstruction system based on antagonistic learning, the system comprises: a three-dimensional generation network and a three-dimensional discrimination network;
the three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: generating a classification probability value of the image; the three-dimensional generation network comprises: reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse a three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene; and (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1;
the method comprises the following steps: designing a loss function L for training a deep neural networkOverallThe three-dimensional generation network and the three-dimensional discrimination network are subjected to antagonistic training, and when the network model reaches Nash equilibrium, the three-dimensional generation network can reconstruct a three-dimensional scene model which is completely consistent with the characteristics and distribution of a real scene; for the observation image of the reconstructed three-dimensional scene model and the observation view of the real three-dimensional scene, the classification probability of the three-dimensional discrimination network is 0.5;
the confrontation training comprises the following processes:
step 1, generating an initial three-dimensional scene model, and initializing a three-dimensional generation network; the specific process is as follows: shooting a video by using a camera, and generating a real reference image data set, camera parameters and a motion pose T according to the video; estimating image depth information by comparing differences between adjacent image frames; generating an initial three-dimensional scene model by adopting a space mapping method;
step 2, placing the reconstructed three-dimensional scene model in a three-dimensional virtual environment, setting a virtual camera with the same parameters as the real camera in the three-dimensional virtual environment, and collecting a rendering image stream of the three-dimensional scene model by using the virtual camera; the specific process is as follows: moving the virtual camera along a camera track T recorded in the process of acquiring the reference video; projecting the reconstructed three-dimensional scene model to a two-dimensional image by using a pseudo renderer at the same position and view point as the real observation scene to generate rendered images with the same number as the reference images;
step 3, after the reference image and the rendering image are ready, the network model confrontation training can be carried out; distinguishing a reference image and a rendered image by using a three-dimensional discrimination network; calculating a total loss value through a loss function, carrying out network fine adjustment, and forming a new three-dimensional generation network and a new three-dimensional discrimination network;
step 4, iteratively training a three-dimensional generating network and a three-dimensional judging network; the specific process is as follows: generating a new three-dimensional reconstruction model by using a new three-dimensional generation net according to the reference image; and (b) transferring to the step (b): placing the three-dimensional reconstruction model in a virtual environment, observing by using a virtual camera, and inputting a newly observed rendering image and a reference image into a three-dimensional discrimination network for discrimination; then, repeating steps (b) - (d), iteratively training and creating new three-dimensional generating networks and three-dimensional discriminating networks until the total loss converges to a desired value.
5. The three-dimensional reconstruction method based on antagonistic learning as claimed in claim 4, wherein the loss function L of the trained deep neural networkOverallThe method comprises the following steps: reconstruction loss function LReconsCross entropy loss function LGANThe definition is as follows:
LOverall=λ·LRecons+(1-λ)·LGAN (1)
wherein, λ is a parameter for adjusting the weight between the reconstruction loss and the cross entropy loss;
phi rebuilding loss function LRecons
Reconstruction loss function LReconsMay be defined by the difference between the reference image and the rendered image calculated from the three-dimensional discriminator; two criteria were used: the structural similarity SSIM is an image quality evaluation index based on a human visual system, an SSIM indicating value of a reference image and a rendered image is between 0 and 1, and when the SSIM value is close to 1, the difference between the image x and the image y is small; the peak signal-to-noise ratio PSNR, the index of which evaluates the difference of influence effects from the viewpoint of gray fidelity, has a common value of 20-70dB, and the value of PSNR is adjusted to the range of 0 to 1 by adopting a Sigmoid function:
wherein E _ Sigm () represents a Sigmoid function;
reconstruction of the loss function L according to the inventionReconsIs defined as:
wherein, alpha and beta are parameters for adjusting PSNR and SSIM weight; subscript GjFjRepresenting a reference image and rendered image pair; n represents the total number of image pairs;
② cross entropy loss function LGAN
Cross entropy loss value LGANThe training process of the three-dimensional generation network and the three-dimensional discrimination network can be quantitatively reflected; the WGAN with the gradient penalty is more effective for the complex three-dimensional net formation and the three-dimensional discrimination net formation of the training; here, a WGAN training method is adopted to design a cross entropy loss function LGAN:
Wherein, PrIs the true reference image distribution; pgIs rendering the image distribution; symbol x denotes a reference image, symbolIs a rendered image implicitly generated by a three-dimensional generating network;representing a gradient penalty of the three-dimensional generation network; theta is a parameter for adjusting the gradient penalty weight in the cross entropy loss;is implicitly defined as a data set along the path from PrAnd PgUniformly sampling straight lines between the distributed point pairs; e represents the mathematical expectation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011371730.4A CN112489198A (en) | 2020-11-30 | 2020-11-30 | Three-dimensional reconstruction system and method based on counterstudy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011371730.4A CN112489198A (en) | 2020-11-30 | 2020-11-30 | Three-dimensional reconstruction system and method based on counterstudy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112489198A true CN112489198A (en) | 2021-03-12 |
Family
ID=74937234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011371730.4A Pending CN112489198A (en) | 2020-11-30 | 2020-11-30 | Three-dimensional reconstruction system and method based on counterstudy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112489198A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113763536A (en) * | 2021-09-03 | 2021-12-07 | 济南大学 | Three-dimensional reconstruction method based on RGB image |
CN114004948A (en) * | 2021-07-30 | 2022-02-01 | 华东师范大学 | Creative product solver based on generation network and application thereof |
CN116723305A (en) * | 2023-04-24 | 2023-09-08 | 南通大学 | Virtual viewpoint quality enhancement method based on generation type countermeasure network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019214381A1 (en) * | 2018-05-09 | 2019-11-14 | 腾讯科技(深圳)有限公司 | Video deblurring method and apparatus, and storage medium and electronic apparatus |
US20200111194A1 (en) * | 2018-10-08 | 2020-04-09 | Rensselaer Polytechnic Institute | Ct super-resolution gan constrained by the identical, residual and cycle learning ensemble (gan-circle) |
WO2020172838A1 (en) * | 2019-02-26 | 2020-09-03 | 长沙理工大学 | Image classification method for improvement of auxiliary classifier gan |
-
2020
- 2020-11-30 CN CN202011371730.4A patent/CN112489198A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019214381A1 (en) * | 2018-05-09 | 2019-11-14 | 腾讯科技(深圳)有限公司 | Video deblurring method and apparatus, and storage medium and electronic apparatus |
US20200111194A1 (en) * | 2018-10-08 | 2020-04-09 | Rensselaer Polytechnic Institute | Ct super-resolution gan constrained by the identical, residual and cycle learning ensemble (gan-circle) |
WO2020172838A1 (en) * | 2019-02-26 | 2020-09-03 | 长沙理工大学 | Image classification method for improvement of auxiliary classifier gan |
Non-Patent Citations (2)
Title |
---|
KAIMING HE: "Deep Residual Learning for Image Recognition Kaiming", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
余翀: "基于半监督生成对抗网络的三维重建云工作室", 《智能科学与技术学报》, pages 1 - 6 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114004948A (en) * | 2021-07-30 | 2022-02-01 | 华东师范大学 | Creative product solver based on generation network and application thereof |
CN114004948B (en) * | 2021-07-30 | 2022-05-06 | 华东师范大学 | Creative product solver based on generation network and application thereof |
CN113763536A (en) * | 2021-09-03 | 2021-12-07 | 济南大学 | Three-dimensional reconstruction method based on RGB image |
CN116723305A (en) * | 2023-04-24 | 2023-09-08 | 南通大学 | Virtual viewpoint quality enhancement method based on generation type countermeasure network |
CN116723305B (en) * | 2023-04-24 | 2024-05-03 | 南通大学 | Virtual viewpoint quality enhancement method based on generation type countermeasure network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111325794B (en) | Visual simultaneous localization and map construction method based on depth convolution self-encoder | |
CN111462329B (en) | Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning | |
CN112489198A (en) | Three-dimensional reconstruction system and method based on counterstudy | |
CN110163974B (en) | Single-image picture reconstruction method based on undirected graph learning model | |
CN108648161A (en) | The binocular vision obstacle detection system and method for asymmetric nuclear convolutional neural networks | |
CN114666564B (en) | Method for synthesizing virtual viewpoint image based on implicit neural scene representation | |
CN112598775B (en) | Multi-view generation method based on contrast learning | |
WO2022198684A1 (en) | Methods and systems for training quantized neural radiance field | |
Sun et al. | Ssl-net: Point-cloud generation network with self-supervised learning | |
CN111914618A (en) | Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network | |
CN115239870A (en) | Multi-view stereo network three-dimensional reconstruction method based on attention cost body pyramid | |
CN110889868B (en) | Monocular image depth estimation method combining gradient and texture features | |
CN114996814A (en) | Furniture design system based on deep learning and three-dimensional reconstruction | |
CN116912405A (en) | Three-dimensional reconstruction method and system based on improved MVSNet | |
CN116958262A (en) | 6dof object pose estimation method based on single RGB image | |
CN116721210A (en) | Real-time efficient three-dimensional reconstruction method and device based on neurosigned distance field | |
Xu et al. | Wavenerf: Wavelet-based generalizable neural radiance fields | |
CN112927348B (en) | High-resolution human body three-dimensional reconstruction method based on multi-viewpoint RGBD camera | |
Chao et al. | Skeleton-based motion estimation for Point Cloud Compression | |
CN113096239B (en) | Three-dimensional point cloud reconstruction method based on deep learning | |
Hou et al. | Joint learning of image deblurring and depth estimation through adversarial multi-task network | |
CN112862946A (en) | Gray rock core image three-dimensional reconstruction method for generating countermeasure network based on cascade condition | |
CN115496859A (en) | Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning | |
CN115131245A (en) | Point cloud completion method based on attention mechanism | |
CN115512039A (en) | 3D face construction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |