CN112489198A

CN112489198A - Three-dimensional reconstruction system and method based on counterstudy

Info

Publication number: CN112489198A
Application number: CN202011371730.4A
Authority: CN
Inventors: 史金龙; 白素琴; 周志强; 钱强; 郭凌; 欧镇; 田朝晖; 钱萍
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-03-12

Abstract

The invention discloses a three-dimensional reconstruction system based on antagonistic learning and a method thereof. The invention adopts the GAN principle to realize high-quality three-dimensional reconstruction, provides a new antagonistic learning three-dimensional reconstruction frame, and improves and converges an original three-dimensional reconstruction model iteratively by training a GAN model. The model only takes a real-time two-dimensional observation image as a weak supervision means and does not depend on the prior knowledge of a shape model or any three-dimensional datum data. The invention provides a technology for rapidly reconstructing the three-dimensional shape of an object from a view without contact and conveniently, which is applicable to a plurality of fields of ship comprehensive guarantee, equipment virtual maintenance, interactive electronic technical manuals, movies, animation, virtual reality, augmented reality, industrial manufacturing and the like, and has wide market prospect.

Description

Three-dimensional reconstruction system and method based on counterstudy

Technical Field

The invention belongs to the technical field of computer three-dimensional reconstruction, and particularly relates to a three-dimensional reconstruction system and a three-dimensional reconstruction method based on counterstudy.

Technical Field

In the fields of computer graphics and computer vision, three-dimensional reconstruction is a technique to restore the shape, structure and appearance of real objects. Due to the abundant and visual expressive force, the three-dimensional reconstruction is widely applied to the fields of equipment guarantee, virtual maintenance, construction, geology, archaeology, games, virtual reality and the like. In the past several researchers have made significant progress in three-dimensional reconstruction. The traditional three-dimensional reconstruction methods such as SFM (Structure from motion) and MVS (Multi View Stereo) are as follows: firstly, seeking feature matching in two images, and estimating initial three-dimensional reconstruction results of two views; then, iteratively adding a new image on the basis of the two-view reconstruction result, and performing feature matching on the newly added image and the previous image; and reconstructing a three-dimensional model by using modes such as triangulation, structure, motion beam adjustment and the like. However, the temporal complexity of conventional SFM and MVS methods is typically high; in addition, when the surface of the reconstructed object lacks texture or has specular reflection, cavities, deformation and fuzzy parts are often generated, or the reconstruction of a voxelized three-dimensional model can be only carried out on a simple isolated object, so that the requirements of practical application are not met. Newly developed generation of countermeasure Networks (GAN) is a very influential approach in deep neural Networks, and has been successful in many fields of image processing. GAN has recently been used by some scholars for three-dimensional reconstruction. Among them, the representative work is 3D-GAN [ Jianjun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum.Learning a basic diagnostic space of objects maps via 3D genetic additive modeling. in Advances in Neural Information Processing Systems, pages 82-90,2016 ]. The 3D-GAN introduces a generative-antagonistic loss as a decision whether the object is real or reconstructed. Since three-dimensional objects are highly structured, the generative-antagonistic criterion has better performance than traditional methods in capturing three-dimensional object structural differences. The current GAN network three-dimensional reconstruction method still has many disadvantages, such as low precision, poor stability of the training process, etc.

Disclosure of Invention

The invention aims to provide a three-dimensional reconstruction system and a three-dimensional reconstruction method based on antagonistic learning by utilizing a mapping relation from a low-dimensional probability space to a three-dimensional object space by utilizing a GAN network technology aiming at the defects of the existing single-view three-dimensional reconstruction technology. The reconstruction process of the method does not depend on a three-dimensional CAD model and a training data set corresponding to a two-dimensional image.

In order to solve the technical problems, the invention adopts the following technical scheme.

The invention relates to a three-dimensional reconstruction system based on antagonistic learning, which comprises:

a three-dimensional generation network and a three-dimensional discrimination network;

the three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: generating a classification probability value of the image;

the three-dimensional generation network comprises: reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse a three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene; and (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1.

Further, the three-dimensional generation network includes:

1 two-dimensional convolutional layer, denoted Conv; 2 dense connection modules; 3 full connectivity layers, denoted FC; 4 three-dimensional transposed convolution layers, denoted convT;

the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, the convolution kernel size is recorded as Stride, 16 characteristic graphs are output, and the characteristic graphs are recorded as FM;

the 2 dense connecting modules respectively comprise 4 two-dimensional convolution layers; the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 x 3, the last convolution kernel size is 1 x 1; the stride is 1; setting 1 batch normalization layer behind each two-dimensional convolution layer, recording as a BN layer and 1 ReLU activation function; setting 1 average pooling layer behind the last convolution layer, and recording as Avg Pool; outputting 32 feature maps for each two-dimensional convolutional layer in the first dense connection module; outputting 64 feature maps per two-dimensional convolutional layer in a second dense connection module;

the outputs of the 3 fully-connected layers are 2048, 1024 and 256 × 4 × 4 × 4, respectively, and 1 BN layer and 1 ReLU activation function are arranged behind each fully-connected layer;

the core size of each of the 4 three-dimensional transposed convolution layers is 3 multiplied by 3, the step size is 2, the output channels are 256, 128, 64 and 16 respectively, and 1 BN layer and a ReLU activation function are arranged after each three-dimensional transposed convolution layer.

Further, the three-dimensional discrimination network includes:

1 two-dimensional convolution layer, 2 dense connection modules and 2 full connection layers;

the convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, and 64 characteristic graphs are output;

the 2 dense connection modules comprise 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 BN layer and 1 ReLU activation function behind each two-dimensional convolution layer; setting 1 Avg Pool behind the last coiling layer; outputting 128 feature maps per two-dimensional convolutional layer in the first dense connection module; outputting 256 feature maps per two-dimensional convolutional layer in the second dense connection module;

the outputs of the 2 full-connection layers are 2048 and 1 respectively, and 1 BN layer and 1 ReLU activation function are arranged behind the previous full-connection layer; the latter full connectivity layer is followed by a Sigmoid function.

The invention relates to a three-dimensional reconstruction system based on antagonistic learning and a method thereof, wherein the three-dimensional reconstruction system comprises: designing a loss function L for training a deep neural network_OverallThe three-dimensional generation network and the three-dimensional discrimination network are trained in an antagonistic way, and when the network model reaches Nash equilibrium, the three-dimensional generation network can reconstruct a three-dimensional scene model which is completely consistent with the characteristics and distribution of a real scene(ii) a For the observation image of the reconstructed three-dimensional scene model and the observation view of the real three-dimensional scene, the classification probability of the three-dimensional discrimination network is 0.5;

the confrontation training comprises the following processes:

a. generating an initial three-dimensional scene model, and initializing a three-dimensional generation network; the specific process is as follows: shooting a video by using a camera, and generating a real reference image data set, camera parameters and a motion pose T according to the video; estimating image depth information by comparing differences between adjacent image frames; generating an initial three-dimensional scene model by adopting a space mapping method;

b. placing the reconstructed three-dimensional scene model in a three-dimensional virtual environment, setting a virtual camera with the same parameters as the real camera in the three-dimensional virtual environment, and acquiring a rendering image stream of the three-dimensional scene model by using the virtual camera; the specific process is as follows: moving the virtual camera along a camera track T recorded in the process of acquiring the reference video; projecting the reconstructed three-dimensional scene model to a two-dimensional image by using a pseudo renderer at the same position and view point as the real observation scene to generate rendered images with the same number as the reference images;

c. after the reference image and the rendering image are ready, the network model confrontation training can be carried out; distinguishing a reference image and a rendered image by using a three-dimensional discrimination network; calculating a total loss value through a loss function, carrying out network fine adjustment, and forming a new three-dimensional generation network and a new three-dimensional discrimination network;

d. iteratively training a three-dimensional generation network and a three-dimensional discrimination network; the specific process is as follows: generating a new three-dimensional reconstruction model by using a new three-dimensional generation net according to the reference image; and (b) transferring to the step (b): placing the three-dimensional reconstruction model in a virtual environment, observing by using a virtual camera, and inputting a newly observed rendering image and a reference image into a three-dimensional discrimination network for discrimination; then, repeating steps (b) - (d), iteratively training and creating new three-dimensional generating networks and three-dimensional discriminating networks until the total loss converges to a desired value.

Further, the loss function of the training deep neural networkNumber L_OverallThe method comprises the following steps: reconstruction loss function L_ReconsCross entropy loss function L_GANThe definition is as follows:

L_Overall＝λ·L_Recons+(1-λ)·L_GAN (1)

wherein, λ is a parameter for adjusting the weight between the reconstruction loss and the cross entropy loss;

phi rebuilding loss function L_Recons

Reconstruction loss function L_ReconsMay be defined by the difference between the reference image and the rendered image calculated from the three-dimensional discriminator; two criteria were used: the structural similarity SSIM is an image quality evaluation index based on a human visual system, an SSIM indicating value of a reference image and a rendered image is between 0 and 1, and when the SSIM value is close to 1, the difference between the image x and the image y is small; the peak signal-to-noise ratio PSNR, the index of which evaluates the difference of influence effects from the viewpoint of gray fidelity, has a common value of 20-70dB, and the value of PSNR is adjusted to the range of 0 to 1 by adopting a Sigmoid function:

wherein E _ Sigm () represents a Sigmoid function;

reconstruction of the loss function L according to the invention_ReconsIs defined as:

wherein, alpha and beta are parameters for adjusting PSNR and SSIM weight; subscript G_jF_jRepresenting a reference image and rendered image pair; n represents the total number of image pairs;

② cross entropy loss function L_GAN

Cross entropy loss value L_GANThe training process of the three-dimensional generation network and the three-dimensional discrimination network can be quantitatively reflected; the WGAN with the gradient penalty is more effective for the complex three-dimensional net formation and the three-dimensional discrimination net formation of the training; here, theDesign cross entropy loss function L by WGAN training method_GAN：

Wherein, P_rIs the true reference image distribution; p_gIs rendering the image distribution; symbol x denotes a reference image, symbol

Is a rendered image implicitly generated by a three-dimensional generating network;

representing a gradient penalty of the three-dimensional generation network; theta is a parameter for adjusting the gradient penalty weight in the cross entropy loss;

is implicitly defined as a data set along the path from P_rAnd P_gUniformly sampling straight lines between the distributed point pairs; e represents the mathematical expectation.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) the method provided by the invention belongs to a weak supervision learning framework, only takes the collected two-dimensional observation image as supervision, and does not depend on three-dimensional shape prior information, a CAD model base and other reference data. Because the three-dimensional labeling needs to be acquired by designing a three-dimensional CAD model or by adopting special equipment such as a three-dimensional scanner for three-dimensional scanning, the workload is huge. For some specific application scenarios, it is not even possible to acquire a three-dimensional reference shape. Under the condition, the weak supervision learning framework provided by the invention greatly reduces the workload of acquiring the labeled data and effectively realizes three-dimensional reconstruction.

(2) The method provided by the invention provides a new solution for the three-dimensional reconstruction based on deep learning, and comprises the modules of generating an initial three-dimensional scene model by adopting a space mapping method, acquiring rendering image streams of the three-dimensional scene model by adopting a virtual camera, training a network model by adopting a confrontation mode and the like. Through the cooperative work of a plurality of modules, the method can effectively improve the reconstruction precision and improve the stability of the training process.

Drawings

FIG. 1 is a system block diagram of an embodiment of a three-dimensional reconstruction system based on antagonistic learning according to the present invention.

FIG. 2 is a block diagram of a three-dimensional generation network in accordance with one embodiment of the present invention.

Fig. 3 is a block diagram of a three-dimensional discrimination network according to an embodiment of the present invention.

Detailed Description

The invention discloses a three-dimensional reconstruction system based on antagonistic learning and a method thereof, which adopt a GAN principle to realize high-quality three-dimensional reconstruction, and provide a novel antagonistic learning three-dimensional reconstruction frame system for iteratively improving and converging any original three-dimensional reconstruction model by training a GAN model. The model only takes a real-time two-dimensional observation image as a weak supervision means and does not depend on the prior knowledge of a shape model or any three-dimensional datum data. The method is a non-contact and convenient technology for rapidly reconstructing the three-dimensional shape of the object from the view, can be widely applied to multiple fields of ship comprehensive guarantee, equipment virtual maintenance, interactive electronic technical manuals, movies, animations, virtual reality, augmented reality, industrial manufacturing and the like, and has wide market prospect.

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1, the system of this embodiment of the present invention includes: three-dimensional generation network and three-dimensional discrimination network.

The three-dimensional generation network comprises: and reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse the three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene. And (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1.

The three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: a classification probability value for the image is generated.

The method comprises the following steps: designing a loss function L for training a deep neural network_OverallThe three-dimensional generation network and the three-dimensional discrimination network are subjected to antagonistic training, and when the network model reaches Nash equilibrium, the three-dimensional generation network can reconstruct a three-dimensional scene model which is completely consistent with the characteristics and distribution of a real scene; for the observation image of the reconstructed three-dimensional scene model and the observation view of the real three-dimensional scene, the classification probability of the three-dimensional discrimination network is 0.5;

the confrontation training comprises the following processes:

step 1, generating an initial three-dimensional scene model and initializing a three-dimensional generation network. The specific process is as follows: shooting a video by using a camera, and generating a real reference image data set, camera parameters and a motion pose T according to the video; estimating image depth information by comparing differences between adjacent image frames; and generating an initial three-dimensional scene model by adopting a space mapping method.

And 2, placing the reconstructed three-dimensional scene model in a three-dimensional virtual environment, setting a virtual camera with the same parameters as the real camera in the three-dimensional virtual environment, and acquiring a rendering image stream of the three-dimensional scene model by using the virtual camera. The specific process is as follows: moving the virtual camera along a camera track T recorded in the process of acquiring the reference video; and projecting the reconstructed three-dimensional scene model to the two-dimensional image by using a pseudo renderer at the same position and view point as the real observation scene, and generating the rendered images with the same number as the reference images.

And 3, after the reference image and the rendering image are ready, performing network model confrontation training. Distinguishing a reference image and a rendered image by using a three-dimensional discrimination network; and calculating a total loss value through a loss function, carrying out network fine adjustment, and forming a new three-dimensional generation network and a new three-dimensional discrimination network. Loss function L for training deep neural network_OverallThe method comprises the following steps: reconstruction loss function L_ReconsAnd cross entropy loss function L_GAN. Loss function L_OverallIs represented as follows:

L_Overall＝λ·L_Recons+(1-λ)·L_GAN (1)

where λ is a parameter that adjusts the weight between the reconstruction loss and the cross-entropy loss.

Phi rebuilding loss function L_Recons

Reconstruction loss function L_ReconsMay be defined by the difference between the reference image and the rendered image calculated from the three-dimensional discriminator. The invention adopts two indexes: structural SIMilarity (SSIM) is an image quality evaluation index based on the human visual system, the SSIM indicating value of two images is between 0 and 1, and when the SSIM value is close to 1, the difference between the images x and y is small; the Peak Signal-to-Noise Ratio (PSNR) index evaluates the difference of the influence effect from the aspect of gray fidelity, the common value of the PSNR is between 20 and 70dB, and the value of the PSNR is adjusted to the range of 0 to 1 by adopting a Sigmoid function:

where E _ Sigm () represents a Sigmoid function.

wherein, alpha and beta are parameters for adjusting PSNR and SSIM weight; subscript G_jF_jRepresenting a reference image and rendered image pair; n represents the total number of image pairs.

② cross entropy loss function L_GAN

Cross entropy loss value L_GANThe training process of the three-dimensional generation network and the three-dimensional discrimination network can be quantitatively reflected. WGAN (Wasserstein GAN) with gradient penalty for complex tri-vitamin mesh sum of trainingThe three-dimensional discrimination network is more effective. Therefore, the invention adopts the WGAN training method to design the cross entropy loss function L_GANThe method comprises the following steps:

Step 4, iteratively training a three-dimensional generating network and a three-dimensional judging network, wherein the specific process comprises the following steps: generating a new three-dimensional reconstruction model by using a new three-dimensional generation net according to the reference image; and (3) transferring to the steps: and placing the three-dimensional reconstruction model in a virtual environment, observing by using a virtual camera, and inputting a newly observed rendering image and a reference image into a three-dimensional discrimination network for discrimination. And (d) repeating the steps (b) to (d), and training and creating a new three-dimensional generation network and a new three-dimensional discrimination network in an iterative manner until the total loss converges to a desired value.

FIG. 2 is a block diagram of a three-dimensional generation network in accordance with one embodiment of the present invention. As shown in fig. 2, includes:

1 two-dimensional convolutional layer (denoted Conv), 2 dense-connection blocks, 3 fully-connected layers (denoted FC), and 4 three-dimensional transposed convolutional layers (denoted ConvT).

The convolution kernel size of the two-dimensional convolution layer is 3 × 3, the step (denoted as Stride) is 2, and 16 feature maps (denoted as FM) are output.

The 2 dense connection modules are composed of 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 batch normalization layer (recorded as BN layer) and 1 ReLU activation function after each two-dimensional convolution layer; after the last convolutional layer, 1 average pooling layer (denoted as Avg Pool) was placed. Outputting 32 feature maps for each two-dimensional convolutional layer in the first dense connection module; each two-dimensional convolutional layer in the second dense-connected module outputs 64 feature maps.

The outputs of the 3 fully-connected layers are 2048, 1024 and 256 × 4 × 4 × 4, respectively, and 1 BN layer and 1 ReLU activation function are set after each fully-connected layer.

Fig. 3 is a block diagram of a three-dimensional discrimination network according to an embodiment of the present invention. As shown in fig. 3, includes:

1 two-dimensional convolutional layer, 2 dense connection modules, and 2 full connection layers.

The convolution kernel size of the two-dimensional convolution layer is 3 multiplied by 3, the step length is 2, and 64 characteristic graphs are output.

The 2 dense connection modules are composed of 4 two-dimensional convolution layers, the convolution kernel size of the first 3 convolution layers of each dense connection module is 3 multiplied by 3, and the convolution kernel size of the last convolution kernel is 1 multiplied by 1; the stride is 1; setting 1 BN layer and 1 ReLU activation function behind each two-dimensional convolution layer; and 1 Avg Pool is arranged behind the last convolution layer. Outputting 128 feature maps per two-dimensional convolutional layer in the first dense connection module; each two-dimensional convolutional layer in the second dense-connected module outputs 256 feature maps.

In conclusion, the method combines the latest generation countermeasure GAN network principle and the advantages of the multi-view stereoscopic vision three-dimensional reconstruction method, and iteratively improves the reconstruction quality and stability through countermeasure training of the three-dimensional generation model and the three-dimensional discrimination model. The method provided by the invention belongs to a weak supervision learning framework, only takes the collected two-dimensional observation image as supervision, does not depend on reference data such as three-dimensional shape prior information, a CAD model base and the like, greatly reduces the workload of obtaining the marked data, and effectively realizes three-dimensional reconstruction. The method provided by the invention provides a new solution for the three-dimensional reconstruction based on deep learning, and comprises the modules of generating an initial three-dimensional scene model by adopting a space mapping method, acquiring rendering image streams of the three-dimensional scene model by adopting a virtual camera, training a network model by adopting a confrontation mode and the like. Through the cooperative work of a plurality of modules, the method can effectively improve the reconstruction precision and improve the stability of the training process. The invention provides a technology for rapidly reconstructing the three-dimensional shape of an object from a view without contact and convenience, can be used in a plurality of fields of ship comprehensive guarantee, equipment virtual maintenance, interactive electronic technical manuals, films, animations, virtual reality, augmented reality, industrial manufacturing and the like, and has wide market prospect.

Claims

1. A three-dimensional reconstruction system based on antagonistic learning, comprising:

2. The system of claim 1, wherein the three-dimensional generation network comprises:

3. The system of claim 1, wherein the three-dimensional discriminant network comprises:

4. A three-dimensional reconstruction system and method based on antagonistic learning are characterized in that:

adopting a three-dimensional reconstruction system based on antagonistic learning, the system comprises: a three-dimensional generation network and a three-dimensional discrimination network;

the three-dimensional discrimination network comprises: distinguishing a three-dimensional scene model reconstructed by a three-dimensional generation network from a real three-dimensional scene; and (3) final output: generating a classification probability value of the image; the three-dimensional generation network comprises: reconstructing a three-dimensional scene model consistent with the real three-dimensional scene, and trying to confuse a three-dimensional discrimination network, so that the three-dimensional discrimination network cannot distinguish the real three-dimensional scene from the reconstructed three-dimensional model scene; and (3) final output: a three-dimensional mesh model with a resolution of 64 × 64 × 64 × 1;

the confrontation training comprises the following processes:

step 1, generating an initial three-dimensional scene model, and initializing a three-dimensional generation network; the specific process is as follows: shooting a video by using a camera, and generating a real reference image data set, camera parameters and a motion pose T according to the video; estimating image depth information by comparing differences between adjacent image frames; generating an initial three-dimensional scene model by adopting a space mapping method;

step 2, placing the reconstructed three-dimensional scene model in a three-dimensional virtual environment, setting a virtual camera with the same parameters as the real camera in the three-dimensional virtual environment, and collecting a rendering image stream of the three-dimensional scene model by using the virtual camera; the specific process is as follows: moving the virtual camera along a camera track T recorded in the process of acquiring the reference video; projecting the reconstructed three-dimensional scene model to a two-dimensional image by using a pseudo renderer at the same position and view point as the real observation scene to generate rendered images with the same number as the reference images;

step 3, after the reference image and the rendering image are ready, the network model confrontation training can be carried out; distinguishing a reference image and a rendered image by using a three-dimensional discrimination network; calculating a total loss value through a loss function, carrying out network fine adjustment, and forming a new three-dimensional generation network and a new three-dimensional discrimination network;

step 4, iteratively training a three-dimensional generating network and a three-dimensional judging network; the specific process is as follows: generating a new three-dimensional reconstruction model by using a new three-dimensional generation net according to the reference image; and (b) transferring to the step (b): placing the three-dimensional reconstruction model in a virtual environment, observing by using a virtual camera, and inputting a newly observed rendering image and a reference image into a three-dimensional discrimination network for discrimination; then, repeating steps (b) - (d), iteratively training and creating new three-dimensional generating networks and three-dimensional discriminating networks until the total loss converges to a desired value.

5. The three-dimensional reconstruction method based on antagonistic learning as claimed in claim 4, wherein the loss function L of the trained deep neural network_OverallThe method comprises the following steps: reconstruction loss function L_ReconsCross entropy loss function L_GANThe definition is as follows:

L_Overall＝λ·L_Recons+(1-λ)·L_GAN (1)

phi rebuilding loss function L_Recons

wherein E _ Sigm () represents a Sigmoid function;

② cross entropy loss function L_GAN

Cross entropy loss value L_GANThe training process of the three-dimensional generation network and the three-dimensional discrimination network can be quantitatively reflected; the WGAN with the gradient penalty is more effective for the complex three-dimensional net formation and the three-dimensional discrimination net formation of the training; here, a WGAN training method is adopted to design a cross entropy loss function L_GAN：