CN112489198A - Three-dimensional reconstruction system and method based on counterstudy - Google Patents

Three-dimensional reconstruction system and method based on counterstudy Download PDF

Info

Publication number
CN112489198A
CN112489198A CN202011371730.4A CN202011371730A CN112489198A CN 112489198 A CN112489198 A CN 112489198A CN 202011371730 A CN202011371730 A CN 202011371730A CN 112489198 A CN112489198 A CN 112489198A
Authority
CN
China
Prior art keywords
dimensional
network
image
layer
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011371730.4A
Other languages
Chinese (zh)
Inventor
史金龙
白素琴
周志强
钱强
郭凌
欧镇
田朝晖
钱萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202011371730.4A priority Critical patent/CN112489198A/en
Publication of CN112489198A publication Critical patent/CN112489198A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a three-dimensional reconstruction system based on antagonistic learning and a method thereof. The invention adopts the GAN principle to realize high-quality three-dimensional reconstruction, provides a new antagonistic learning three-dimensional reconstruction frame, and improves and converges an original three-dimensional reconstruction model iteratively by training a GAN model. The model only takes a real-time two-dimensional observation image as a weak supervision means and does not depend on the prior knowledge of a shape model or any three-dimensional datum data. The invention provides a technology for rapidly reconstructing the three-dimensional shape of an object from a view without contact and conveniently, which is applicable to a plurality of fields of ship comprehensive guarantee, equipment virtual maintenance, interactive electronic technical manuals, movies, animation, virtual reality, augmented reality, industrial manufacturing and the like, and has wide market prospect.

Description

Three-dimensional reconstruction system and method based on counterstudy
Technical Field
The invention belongs to the technical field of computer three-dimensional reconstruction, and particularly relates to a three-dimensional reconstruction system and a three-dimensional reconstruction method based on counterstudy.
Technical Field
In the fields of computer graphics and computer vision, three-dimensional reconstruction is a technique to restore the shape, structure and appearance of real objects. Due to the abundant and visual expressive force, the three-dimensional reconstruction is widely applied to the fields of equipment guarantee, virtual maintenance, construction, geology, archaeology, games, virtual reality and the like. In the past several researchers have made significant progress in three-dimensional reconstruction. The traditional three-dimensional reconstruction methods such as SFM (Structure from motion) and MVS (Multi View Stereo) are as follows: firstly, seeking feature matching in two images, and estimating initial three-dimensional reconstruction results of two views; then, iteratively adding a new image on the basis of the two-view reconstruction result, and performing feature matching on the newly added image and the previous image; and reconstructing a three-dimensional model by using modes such as triangulation, structure, motion beam adjustment and the like. However, the temporal complexity of conventional SFM and MVS methods is typically high; in addition, when the surface of the reconstructed object lacks texture or has specular reflection, cavities, deformation and fuzzy parts are often generated, or the reconstruction of a voxelized three-dimensional model can be only carried out on a simple isolated object, so that the requirements of practical application are not met. Newly developed generation of countermeasure Networks (GAN) is a very influential approach in deep neural Networks, and has been successful in many fields of image processing. GAN has recently been used by some scholars for three-dimensional reconstruction. Among them, the representative work is 3D-GAN [ Jianjun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum.Learning a basic diagnostic space of objects maps via 3D genetic additive modeling. in Advances in Neural Information Processing Systems, pages 82-90,2016 ]. The 3D-GAN introduces a generative-antagonistic loss as a decision whether the object is real or reconstructed. Since three-dimensional objects are highly structured, the generative-antagonistic criterion has better performance than traditional methods in capturing three-dimensional object structural differences. The current GAN network three-dimensional reconstruction method still has many disadvantages, such as low precision, poor stability of the training process, etc.
Disclosure of Invention
The invention aims to provide a three-dimensional reconstruction system and a three-dimensional reconstruction method based on antagonistic learning by utilizing a mapping relation from a low-dimensional probability space to a three-dimensional object space by utilizing a GAN network technology aiming at the defects of the existing single-view three-dimensional reconstruction technology. The reconstruction process of the method does not depend on a three-dimensional CAD model and a training data set corresponding to a two-dimensional image.
In order to solve the technical problems, the invention adopts the following technical scheme.
The invention relates to a three-dimensional reconstruction system based on antagonistic learning, which comprises:
a three-dimensional generation network and a three-dimensional discrimination network;
the three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: generating a classification probability value of the image;
the three-dimensional generation network comprises: reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse a three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene; and (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1.
Further, the three-dimensional generation network includes:
1 two-dimensional convolutional layer, denoted Conv; 2 dense connection modules; 3 full connectivity layers, denoted FC; 4 three-dimensional transposed convolution layers, denoted convT;
the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, the convolution kernel size is recorded as Stride, 16 characteristic graphs are output, and the characteristic graphs are recorded as FM;
the 2 dense connecting modules respectively comprise 4 two-dimensional convolution layers; the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 x 3, the last convolution kernel size is 1 x 1; the stride is 1; setting 1 batch normalization layer behind each two-dimensional convolution layer, recording as a BN layer and 1 ReLU activation function; setting 1 average pooling layer behind the last convolution layer, and recording as Avg Pool; outputting 32 feature maps for each two-dimensional convolutional layer in the first dense connection module; outputting 64 feature maps per two-dimensional convolutional layer in a second dense connection module;
the outputs of the 3 fully-connected layers are 2048, 1024 and 256 × 4 × 4 × 4, respectively, and 1 BN layer and 1 ReLU activation function are arranged behind each fully-connected layer;
the core size of each of the 4 three-dimensional transposed convolution layers is 3 multiplied by 3, the step size is 2, the output channels are 256, 128, 64 and 16 respectively, and 1 BN layer and a ReLU activation function are arranged after each three-dimensional transposed convolution layer.
Further, the three-dimensional discrimination network includes:
1 two-dimensional convolution layer, 2 dense connection modules and 2 full connection layers;
the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, and 64 characteristic graphs are output;
the 2 dense connection modules comprise 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 BN layer and 1 ReLU activation function behind each two-dimensional convolution layer; setting 1 Avg Pool behind the last coiling layer; outputting 128 feature maps per two-dimensional convolutional layer in the first dense connection module; outputting 256 feature maps per two-dimensional convolutional layer in the second dense connection module;
the outputs of the 2 full-connection layers are 2048 and 1 respectively, and 1 BN layer and 1 ReLU activation function are arranged behind the previous full-connection layer; the latter full connectivity layer is followed by a Sigmoid function.
The invention relates to a three-dimensional reconstruction system based on antagonistic learning and a method thereof, wherein the three-dimensional reconstruction system comprises: designing a loss function L for training a deep neural networkOverallThe three-dimensional generation network and the three-dimensional discrimination network are trained in an antagonistic way, and when the network model reaches Nash equilibrium, the three-dimensional generation network can reconstruct a three-dimensional scene model which is completely consistent with the characteristics and distribution of a real scene(ii) a For the observation image of the reconstructed three-dimensional scene model and the observation view of the real three-dimensional scene, the classification probability of the three-dimensional discrimination network is 0.5;
the confrontation training comprises the following processes:
a. generating an initial three-dimensional scene model, and initializing a three-dimensional generation network; the specific process is as follows: shooting a video by using a camera, and generating a real reference image data set, camera parameters and a motion pose T according to the video; estimating image depth information by comparing differences between adjacent image frames; generating an initial three-dimensional scene model by adopting a space mapping method;
b. placing the reconstructed three-dimensional scene model in a three-dimensional virtual environment, setting a virtual camera with the same parameters as the real camera in the three-dimensional virtual environment, and acquiring a rendering image stream of the three-dimensional scene model by using the virtual camera; the specific process is as follows: moving the virtual camera along a camera track T recorded in the process of acquiring the reference video; projecting the reconstructed three-dimensional scene model to a two-dimensional image by using a pseudo renderer at the same position and view point as the real observation scene to generate rendered images with the same number as the reference images;
c. after the reference image and the rendering image are ready, the network model confrontation training can be carried out; distinguishing a reference image and a rendered image by using a three-dimensional discrimination network; calculating a total loss value through a loss function, carrying out network fine adjustment, and forming a new three-dimensional generation network and a new three-dimensional discrimination network;
d. iteratively training a three-dimensional generation network and a three-dimensional discrimination network; the specific process is as follows: generating a new three-dimensional reconstruction model by using a new three-dimensional generation net according to the reference image; and (b) transferring to the step (b): placing the three-dimensional reconstruction model in a virtual environment, observing by using a virtual camera, and inputting a newly observed rendering image and a reference image into a three-dimensional discrimination network for discrimination; then, repeating steps (b) - (d), iteratively training and creating new three-dimensional generating networks and three-dimensional discriminating networks until the total loss converges to a desired value.
Further, the loss function of the training deep neural networkNumber LOverallThe method comprises the following steps: reconstruction loss function LReconsCross entropy loss function LGANThe definition is as follows:
LOverall=λ·LRecons+(1-λ)·LGAN (1)
wherein, λ is a parameter for adjusting the weight between the reconstruction loss and the cross entropy loss;
phi rebuilding loss function LRecons
Reconstruction loss function LReconsMay be defined by the difference between the reference image and the rendered image calculated from the three-dimensional discriminator; two criteria were used: the structural similarity SSIM is an image quality evaluation index based on a human visual system, an SSIM indicating value of a reference image and a rendered image is between 0 and 1, and when the SSIM value is close to 1, the difference between the image x and the image y is small; the peak signal-to-noise ratio PSNR, the index of which evaluates the difference of influence effects from the viewpoint of gray fidelity, has a common value of 20-70dB, and the value of PSNR is adjusted to the range of 0 to 1 by adopting a Sigmoid function:
Figure BDA0002806292900000031
wherein E _ Sigm () represents a Sigmoid function;
reconstruction of the loss function L according to the inventionReconsIs defined as:
Figure BDA0002806292900000032
wherein, alpha and beta are parameters for adjusting PSNR and SSIM weight; subscript GjFjRepresenting a reference image and rendered image pair; n represents the total number of image pairs;
② cross entropy loss function LGAN
Cross entropy loss value LGANThe training process of the three-dimensional generation network and the three-dimensional discrimination network can be quantitatively reflected; the WGAN with the gradient penalty is more effective for the complex three-dimensional net formation and the three-dimensional discrimination net formation of the training; here, theDesign cross entropy loss function L by WGAN training methodGAN
Figure BDA0002806292900000033
Wherein, PrIs the true reference image distribution; pgIs rendering the image distribution; symbol x denotes a reference image, symbol
Figure BDA0002806292900000034
Is a rendered image implicitly generated by a three-dimensional generating network;
Figure BDA0002806292900000035
representing a gradient penalty of the three-dimensional generation network; theta is a parameter for adjusting the gradient penalty weight in the cross entropy loss;
Figure BDA0002806292900000036
is implicitly defined as a data set along the path from PrAnd PgUniformly sampling straight lines between the distributed point pairs; e represents the mathematical expectation.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the method provided by the invention belongs to a weak supervision learning framework, only takes the collected two-dimensional observation image as supervision, and does not depend on three-dimensional shape prior information, a CAD model base and other reference data. Because the three-dimensional labeling needs to be acquired by designing a three-dimensional CAD model or by adopting special equipment such as a three-dimensional scanner for three-dimensional scanning, the workload is huge. For some specific application scenarios, it is not even possible to acquire a three-dimensional reference shape. Under the condition, the weak supervision learning framework provided by the invention greatly reduces the workload of acquiring the labeled data and effectively realizes three-dimensional reconstruction.
(2) The method provided by the invention provides a new solution for the three-dimensional reconstruction based on deep learning, and comprises the modules of generating an initial three-dimensional scene model by adopting a space mapping method, acquiring rendering image streams of the three-dimensional scene model by adopting a virtual camera, training a network model by adopting a confrontation mode and the like. Through the cooperative work of a plurality of modules, the method can effectively improve the reconstruction precision and improve the stability of the training process.
Drawings
FIG. 1 is a system block diagram of an embodiment of a three-dimensional reconstruction system based on antagonistic learning according to the present invention.
FIG. 2 is a block diagram of a three-dimensional generation network in accordance with one embodiment of the present invention.
Fig. 3 is a block diagram of a three-dimensional discrimination network according to an embodiment of the present invention.
Detailed Description
The invention discloses a three-dimensional reconstruction system based on antagonistic learning and a method thereof, which adopt a GAN principle to realize high-quality three-dimensional reconstruction, and provide a novel antagonistic learning three-dimensional reconstruction frame system for iteratively improving and converging any original three-dimensional reconstruction model by training a GAN model. The model only takes a real-time two-dimensional observation image as a weak supervision means and does not depend on the prior knowledge of a shape model or any three-dimensional datum data. The method is a non-contact and convenient technology for rapidly reconstructing the three-dimensional shape of the object from the view, can be widely applied to multiple fields of ship comprehensive guarantee, equipment virtual maintenance, interactive electronic technical manuals, movies, animations, virtual reality, augmented reality, industrial manufacturing and the like, and has wide market prospect.
The present invention will be described in further detail with reference to the accompanying drawings.
FIG. 1 is a system block diagram of an embodiment of a three-dimensional reconstruction system based on antagonistic learning according to the present invention.
As shown in fig. 1, the system of this embodiment of the present invention includes: three-dimensional generation network and three-dimensional discrimination network.
The three-dimensional generation network comprises: and reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse the three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene. And (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1.
The three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: a classification probability value for the image is generated.
The method comprises the following steps: designing a loss function L for training a deep neural networkOverallThe three-dimensional generation network and the three-dimensional discrimination network are subjected to antagonistic training, and when the network model reaches Nash equilibrium, the three-dimensional generation network can reconstruct a three-dimensional scene model which is completely consistent with the characteristics and distribution of a real scene; for the observation image of the reconstructed three-dimensional scene model and the observation view of the real three-dimensional scene, the classification probability of the three-dimensional discrimination network is 0.5;
the confrontation training comprises the following processes:
step 1, generating an initial three-dimensional scene model and initializing a three-dimensional generation network. The specific process is as follows: shooting a video by using a camera, and generating a real reference image data set, camera parameters and a motion pose T according to the video; estimating image depth information by comparing differences between adjacent image frames; and generating an initial three-dimensional scene model by adopting a space mapping method.
And 2, placing the reconstructed three-dimensional scene model in a three-dimensional virtual environment, setting a virtual camera with the same parameters as the real camera in the three-dimensional virtual environment, and acquiring a rendering image stream of the three-dimensional scene model by using the virtual camera. The specific process is as follows: moving the virtual camera along a camera track T recorded in the process of acquiring the reference video; and projecting the reconstructed three-dimensional scene model to the two-dimensional image by using a pseudo renderer at the same position and view point as the real observation scene, and generating the rendered images with the same number as the reference images.
And 3, after the reference image and the rendering image are ready, performing network model confrontation training. Distinguishing a reference image and a rendered image by using a three-dimensional discrimination network; and calculating a total loss value through a loss function, carrying out network fine adjustment, and forming a new three-dimensional generation network and a new three-dimensional discrimination network. Loss function L for training deep neural networkOverallThe method comprises the following steps: reconstruction loss function LReconsAnd cross entropy loss function LGAN. Loss function LOverallIs represented as follows:
LOverall=λ·LRecons+(1-λ)·LGAN (1)
where λ is a parameter that adjusts the weight between the reconstruction loss and the cross-entropy loss.
Phi rebuilding loss function LRecons
Reconstruction loss function LReconsMay be defined by the difference between the reference image and the rendered image calculated from the three-dimensional discriminator. The invention adopts two indexes: structural SIMilarity (SSIM) is an image quality evaluation index based on the human visual system, the SSIM indicating value of two images is between 0 and 1, and when the SSIM value is close to 1, the difference between the images x and y is small; the Peak Signal-to-Noise Ratio (PSNR) index evaluates the difference of the influence effect from the aspect of gray fidelity, the common value of the PSNR is between 20 and 70dB, and the value of the PSNR is adjusted to the range of 0 to 1 by adopting a Sigmoid function:
Figure BDA0002806292900000051
where E _ Sigm () represents a Sigmoid function.
Reconstruction of the loss function L according to the inventionReconsIs defined as:
Figure BDA0002806292900000052
wherein, alpha and beta are parameters for adjusting PSNR and SSIM weight; subscript GjFjRepresenting a reference image and rendered image pair; n represents the total number of image pairs.
② cross entropy loss function LGAN
Cross entropy loss value LGANThe training process of the three-dimensional generation network and the three-dimensional discrimination network can be quantitatively reflected. WGAN (Wasserstein GAN) with gradient penalty for complex tri-vitamin mesh sum of trainingThe three-dimensional discrimination network is more effective. Therefore, the invention adopts the WGAN training method to design the cross entropy loss function LGANThe method comprises the following steps:
Figure BDA0002806292900000053
wherein, PrIs the true reference image distribution; pgIs rendering the image distribution; symbol x denotes a reference image, symbol
Figure BDA0002806292900000054
Is a rendered image implicitly generated by a three-dimensional generating network;
Figure BDA0002806292900000055
representing a gradient penalty of the three-dimensional generation network; theta is a parameter for adjusting the gradient penalty weight in the cross entropy loss;
Figure BDA0002806292900000056
is implicitly defined as a data set along the path from PrAnd PgUniformly sampling straight lines between the distributed point pairs; e represents the mathematical expectation.
Step 4, iteratively training a three-dimensional generating network and a three-dimensional judging network, wherein the specific process comprises the following steps: generating a new three-dimensional reconstruction model by using a new three-dimensional generation net according to the reference image; and (3) transferring to the steps: and placing the three-dimensional reconstruction model in a virtual environment, observing by using a virtual camera, and inputting a newly observed rendering image and a reference image into a three-dimensional discrimination network for discrimination. And (d) repeating the steps (b) to (d), and training and creating a new three-dimensional generation network and a new three-dimensional discrimination network in an iterative manner until the total loss converges to a desired value.
FIG. 2 is a block diagram of a three-dimensional generation network in accordance with one embodiment of the present invention. As shown in fig. 2, includes:
1 two-dimensional convolutional layer (denoted Conv), 2 dense-connection blocks, 3 fully-connected layers (denoted FC), and 4 three-dimensional transposed convolutional layers (denoted ConvT).
The convolution kernel size of the two-dimensional convolution layer is 3 × 3, the step (denoted as Stride) is 2, and 16 feature maps (denoted as FM) are output.
The 2 dense connection modules are composed of 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 batch normalization layer (recorded as BN layer) and 1 ReLU activation function after each two-dimensional convolution layer; after the last convolutional layer, 1 average pooling layer (denoted as Avg Pool) was placed. Outputting 32 feature maps for each two-dimensional convolutional layer in the first dense connection module; each two-dimensional convolutional layer in the second dense-connected module outputs 64 feature maps.
The outputs of the 3 fully-connected layers are 2048, 1024 and 256 × 4 × 4 × 4, respectively, and 1 BN layer and 1 ReLU activation function are set after each fully-connected layer.
The core size of each of the 4 three-dimensional transposed convolution layers is 3 multiplied by 3, the step size is 2, the output channels are 256, 128, 64 and 16 respectively, and 1 BN layer and a ReLU activation function are arranged after each three-dimensional transposed convolution layer.
Fig. 3 is a block diagram of a three-dimensional discrimination network according to an embodiment of the present invention. As shown in fig. 3, includes:
1 two-dimensional convolutional layer, 2 dense connection modules, and 2 full connection layers.
The convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, and 64 characteristic graphs are output.
The 2 dense connection modules are composed of 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 BN layer and 1 ReLU activation function behind each two-dimensional convolution layer; and 1 Avg Pool is arranged behind the last convolution layer. Outputting 128 feature maps per two-dimensional convolutional layer in the first dense connection module; each two-dimensional convolutional layer in the second dense-connected module outputs 256 feature maps.
The outputs of the 2 full-connection layers are 2048 and 1 respectively, and 1 BN layer and 1 ReLU activation function are arranged behind the previous full-connection layer; the latter full connectivity layer is followed by a Sigmoid function.
In conclusion, the method combines the latest generation countermeasure GAN network principle and the advantages of the multi-view stereoscopic vision three-dimensional reconstruction method, and iteratively improves the reconstruction quality and stability through countermeasure training of the three-dimensional generation model and the three-dimensional discrimination model. The method provided by the invention belongs to a weak supervision learning framework, only takes the collected two-dimensional observation image as supervision, does not depend on reference data such as three-dimensional shape prior information, a CAD model base and the like, greatly reduces the workload of obtaining the marked data, and effectively realizes three-dimensional reconstruction. The method provided by the invention provides a new solution for the three-dimensional reconstruction based on deep learning, and comprises the modules of generating an initial three-dimensional scene model by adopting a space mapping method, acquiring rendering image streams of the three-dimensional scene model by adopting a virtual camera, training a network model by adopting a confrontation mode and the like. Through the cooperative work of a plurality of modules, the method can effectively improve the reconstruction precision and improve the stability of the training process. The invention provides a technology for rapidly reconstructing the three-dimensional shape of an object from a view without contact and convenience, can be used in a plurality of fields of ship comprehensive guarantee, equipment virtual maintenance, interactive electronic technical manuals, films, animations, virtual reality, augmented reality, industrial manufacturing and the like, and has wide market prospect.

Claims (5)

1. A three-dimensional reconstruction system based on antagonistic learning, comprising:
a three-dimensional generation network and a three-dimensional discrimination network;
the three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: generating a classification probability value of the image;
the three-dimensional generation network comprises: reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse a three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene; and (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1.
2. The system of claim 1, wherein the three-dimensional generation network comprises:
1 two-dimensional convolutional layer, denoted Conv; 2 dense connection modules; 3 full connectivity layers, denoted FC; 4 three-dimensional transposed convolution layers, denoted convT;
the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, the convolution kernel size is recorded as Stride, 16 characteristic graphs are output, and the characteristic graphs are recorded as FM;
the 2 dense connecting modules respectively comprise 4 two-dimensional convolution layers; the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 x 3, the last convolution kernel size is 1 x 1; the stride is 1; setting 1 batch normalization layer behind each two-dimensional convolution layer, recording as a BN layer and 1 ReLU activation function; setting 1 average pooling layer behind the last convolution layer, and recording as Avg Pool; outputting 32 feature maps for each two-dimensional convolutional layer in the first dense connection module; outputting 64 feature maps per two-dimensional convolutional layer in a second dense connection module;
the outputs of the 3 fully-connected layers are 2048, 1024 and 256 × 4 × 4 × 4, respectively, and 1 BN layer and 1 ReLU activation function are arranged behind each fully-connected layer;
the core size of each of the 4 three-dimensional transposed convolution layers is 3 multiplied by 3, the step size is 2, the output channels are 256, 128, 64 and 16 respectively, and 1 BN layer and a ReLU activation function are arranged after each three-dimensional transposed convolution layer.
3. The system of claim 1, wherein the three-dimensional discriminant network comprises:
1 two-dimensional convolution layer, 2 dense connection modules and 2 full connection layers;
the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, and 64 characteristic graphs are output;
the 2 dense connection modules comprise 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 BN layer and 1 ReLU activation function behind each two-dimensional convolution layer; setting 1 Avg Pool behind the last coiling layer; outputting 128 feature maps per two-dimensional convolutional layer in the first dense connection module; outputting 256 feature maps per two-dimensional convolutional layer in the second dense connection module;
the outputs of the 2 full-connection layers are 2048 and 1 respectively, and 1 BN layer and 1 ReLU activation function are arranged behind the previous full-connection layer; the latter full connectivity layer is followed by a Sigmoid function.
4. A three-dimensional reconstruction system and method based on antagonistic learning are characterized in that:
adopting a three-dimensional reconstruction system based on antagonistic learning, the system comprises: a three-dimensional generation network and a three-dimensional discrimination network;
the three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: generating a classification probability value of the image; the three-dimensional generation network comprises: reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse a three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene; and (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1;
the method comprises the following steps: designing a loss function L for training a deep neural networkOverallThe three-dimensional generation network and the three-dimensional discrimination network are subjected to antagonistic training, and when the network model reaches Nash equilibrium, the three-dimensional generation network can reconstruct a three-dimensional scene model which is completely consistent with the characteristics and distribution of a real scene; for the observation image of the reconstructed three-dimensional scene model and the observation view of the real three-dimensional scene, the classification probability of the three-dimensional discrimination network is 0.5;
the confrontation training comprises the following processes:
step 1, generating an initial three-dimensional scene model, and initializing a three-dimensional generation network; the specific process is as follows: shooting a video by using a camera, and generating a real reference image data set, camera parameters and a motion pose T according to the video; estimating image depth information by comparing differences between adjacent image frames; generating an initial three-dimensional scene model by adopting a space mapping method;
step 2, placing the reconstructed three-dimensional scene model in a three-dimensional virtual environment, setting a virtual camera with the same parameters as the real camera in the three-dimensional virtual environment, and collecting a rendering image stream of the three-dimensional scene model by using the virtual camera; the specific process is as follows: moving the virtual camera along a camera track T recorded in the process of acquiring the reference video; projecting the reconstructed three-dimensional scene model to a two-dimensional image by using a pseudo renderer at the same position and view point as the real observation scene to generate rendered images with the same number as the reference images;
step 3, after the reference image and the rendering image are ready, the network model confrontation training can be carried out; distinguishing a reference image and a rendered image by using a three-dimensional discrimination network; calculating a total loss value through a loss function, carrying out network fine adjustment, and forming a new three-dimensional generation network and a new three-dimensional discrimination network;
step 4, iteratively training a three-dimensional generating network and a three-dimensional judging network; the specific process is as follows: generating a new three-dimensional reconstruction model by using a new three-dimensional generation net according to the reference image; and (b) transferring to the step (b): placing the three-dimensional reconstruction model in a virtual environment, observing by using a virtual camera, and inputting a newly observed rendering image and a reference image into a three-dimensional discrimination network for discrimination; then, repeating steps (b) - (d), iteratively training and creating new three-dimensional generating networks and three-dimensional discriminating networks until the total loss converges to a desired value.
5. The three-dimensional reconstruction method based on antagonistic learning as claimed in claim 4, wherein the loss function L of the trained deep neural networkOverallThe method comprises the following steps: reconstruction loss function LReconsCross entropy loss function LGANThe definition is as follows:
LOverall=λ·LRecons+(1-λ)·LGAN (1)
wherein, λ is a parameter for adjusting the weight between the reconstruction loss and the cross entropy loss;
phi rebuilding loss function LRecons
Reconstruction loss function LReconsMay be defined by the difference between the reference image and the rendered image calculated from the three-dimensional discriminator; two criteria were used: the structural similarity SSIM is an image quality evaluation index based on a human visual system, an SSIM indicating value of a reference image and a rendered image is between 0 and 1, and when the SSIM value is close to 1, the difference between the image x and the image y is small; the peak signal-to-noise ratio PSNR, the index of which evaluates the difference of influence effects from the viewpoint of gray fidelity, has a common value of 20-70dB, and the value of PSNR is adjusted to the range of 0 to 1 by adopting a Sigmoid function:
Figure FDA0002806292890000031
wherein E _ Sigm () represents a Sigmoid function;
reconstruction of the loss function L according to the inventionReconsIs defined as:
Figure FDA0002806292890000032
wherein, alpha and beta are parameters for adjusting PSNR and SSIM weight; subscript GjFjRepresenting a reference image and rendered image pair; n represents the total number of image pairs;
② cross entropy loss function LGAN
Cross entropy loss value LGANThe training process of the three-dimensional generation network and the three-dimensional discrimination network can be quantitatively reflected; the WGAN with the gradient penalty is more effective for the complex three-dimensional net formation and the three-dimensional discrimination net formation of the training; here, a WGAN training method is adopted to design a cross entropy loss function LGAN
Figure FDA0002806292890000033
Wherein, PrIs the true reference image distribution; pgIs rendering the image distribution; symbol x denotes a reference image, symbol
Figure FDA0002806292890000034
Is a rendered image implicitly generated by a three-dimensional generating network;
Figure FDA0002806292890000035
representing a gradient penalty of the three-dimensional generation network; theta is a parameter for adjusting the gradient penalty weight in the cross entropy loss;
Figure FDA0002806292890000036
is implicitly defined as a data set along the path from PrAnd PgUniformly sampling straight lines between the distributed point pairs; e represents the mathematical expectation.
CN202011371730.4A 2020-11-30 2020-11-30 Three-dimensional reconstruction system and method based on counterstudy Pending CN112489198A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011371730.4A CN112489198A (en) 2020-11-30 2020-11-30 Three-dimensional reconstruction system and method based on counterstudy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011371730.4A CN112489198A (en) 2020-11-30 2020-11-30 Three-dimensional reconstruction system and method based on counterstudy

Publications (1)

Publication Number Publication Date
CN112489198A true CN112489198A (en) 2021-03-12

Family

ID=74937234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011371730.4A Pending CN112489198A (en) 2020-11-30 2020-11-30 Three-dimensional reconstruction system and method based on counterstudy

Country Status (1)

Country Link
CN (1) CN112489198A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763536A (en) * 2021-09-03 2021-12-07 济南大学 Three-dimensional reconstruction method based on RGB image
CN114004948A (en) * 2021-07-30 2022-02-01 华东师范大学 Creative product solver based on generation network and application thereof
CN116723305A (en) * 2023-04-24 2023-09-08 南通大学 Virtual viewpoint quality enhancement method based on generation type countermeasure network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019214381A1 (en) * 2018-05-09 2019-11-14 腾讯科技(深圳)有限公司 Video deblurring method and apparatus, and storage medium and electronic apparatus
US20200111194A1 (en) * 2018-10-08 2020-04-09 Rensselaer Polytechnic Institute Ct super-resolution gan constrained by the identical, residual and cycle learning ensemble (gan-circle)
WO2020172838A1 (en) * 2019-02-26 2020-09-03 长沙理工大学 Image classification method for improvement of auxiliary classifier gan

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019214381A1 (en) * 2018-05-09 2019-11-14 腾讯科技(深圳)有限公司 Video deblurring method and apparatus, and storage medium and electronic apparatus
US20200111194A1 (en) * 2018-10-08 2020-04-09 Rensselaer Polytechnic Institute Ct super-resolution gan constrained by the identical, residual and cycle learning ensemble (gan-circle)
WO2020172838A1 (en) * 2019-02-26 2020-09-03 长沙理工大学 Image classification method for improvement of auxiliary classifier gan

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAIMING HE: "Deep Residual Learning for Image Recognition Kaiming", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
余翀: "基于半监督生成对抗网络的三维重建云工作室", 《智能科学与技术学报》, pages 1 - 6 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114004948A (en) * 2021-07-30 2022-02-01 华东师范大学 Creative product solver based on generation network and application thereof
CN114004948B (en) * 2021-07-30 2022-05-06 华东师范大学 Creative product solver based on generation network and application thereof
CN113763536A (en) * 2021-09-03 2021-12-07 济南大学 Three-dimensional reconstruction method based on RGB image
CN116723305A (en) * 2023-04-24 2023-09-08 南通大学 Virtual viewpoint quality enhancement method based on generation type countermeasure network
CN116723305B (en) * 2023-04-24 2024-05-03 南通大学 Virtual viewpoint quality enhancement method based on generation type countermeasure network

Similar Documents

Publication Publication Date Title
CN111325794B (en) Visual simultaneous localization and map construction method based on depth convolution self-encoder
CN111462329B (en) Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning
CN112489198A (en) Three-dimensional reconstruction system and method based on counterstudy
CN110163974B (en) Single-image picture reconstruction method based on undirected graph learning model
CN108648161A (en) The binocular vision obstacle detection system and method for asymmetric nuclear convolutional neural networks
CN114666564B (en) Method for synthesizing virtual viewpoint image based on implicit neural scene representation
CN112598775B (en) Multi-view generation method based on contrast learning
WO2022198684A1 (en) Methods and systems for training quantized neural radiance field
Sun et al. Ssl-net: Point-cloud generation network with self-supervised learning
CN111914618A (en) Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network
CN115239870A (en) Multi-view stereo network three-dimensional reconstruction method based on attention cost body pyramid
CN110889868B (en) Monocular image depth estimation method combining gradient and texture features
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
CN116912405A (en) Three-dimensional reconstruction method and system based on improved MVSNet
CN116958262A (en) 6dof object pose estimation method based on single RGB image
CN116721210A (en) Real-time efficient three-dimensional reconstruction method and device based on neurosigned distance field
Xu et al. Wavenerf: Wavelet-based generalizable neural radiance fields
CN112927348B (en) High-resolution human body three-dimensional reconstruction method based on multi-viewpoint RGBD camera
Chao et al. Skeleton-based motion estimation for Point Cloud Compression
CN113096239B (en) Three-dimensional point cloud reconstruction method based on deep learning
Hou et al. Joint learning of image deblurring and depth estimation through adversarial multi-task network
CN112862946A (en) Gray rock core image three-dimensional reconstruction method for generating countermeasure network based on cascade condition
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
CN115131245A (en) Point cloud completion method based on attention mechanism
CN115512039A (en) 3D face construction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination