CN110517352B

CN110517352B - Three-dimensional reconstruction method, storage medium, terminal and system of object

Info

Publication number: CN110517352B
Application number: CN201910797141.3A
Authority: CN
Inventors: 匡平; 李凡; 何明耘; 彭亮
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2022-06-03
Anticipated expiration: 2039-08-27
Also published as: CN110517352A

Abstract

The invention discloses a three-dimensional reconstruction method, a storage medium, a terminal and a system of an object, belonging to the technical field of image reconstruction 3D models, wherein the method comprises the following steps: extracting high-dimensional characteristics of a single picture at any angle, and restoring a first fixed visual angle image of the object according to the high-dimensional characteristics; a shape mask is generated from the first fixed perspective image to generate a 3D model of the object. The system comprises a U-shaped generation type countermeasure network and a 3D condition generation type countermeasure network. According to the invention, the high-dimensional characteristics of a single picture are extracted to restore the object fixed view angle, namely the first fixed view angle view, so that information disturbance can be reduced; the shape mask is generated according to the fixed view angle view, the three-dimensional reconstruction efficiency and accuracy are improved, the method is suitable for the characteristics of pictures with any view angles and vivid effect, and the requirement of reconstructing the 3D model in real time by using a single object picture with any view angle is met.

Description

Three-dimensional reconstruction method, storage medium, terminal and system of object

Technical Field

The invention relates to the technical field of single-picture reconstruction of 3D models, in particular to a three-dimensional reconstruction method, a storage medium, a terminal and a system of an object.

Background

Three-dimensional reconstruction has wide application in the fields of computer vision and models. In the past, researchers have often used multiple pictures from different perspectives to solve three-dimensional reconstruction, and it is still difficult to achieve three-dimensional reconstruction from a single picture, because it requires a strong model understanding ability to predict its shape information from a low-dimensional space.

Recently, researchers have made great progress in voxel predictive 3D reconstruction using CNN. This type of method usually considers using a fixed view or a few views, which is not suitable for practical applications, because in practical applications the object can usually be viewed at any angle, and the taken picture is also at any angle. However, good effect is difficult to obtain by using pictures at any angle for learning and training, because the difference caused by pictures at different view angles disturbs the network to extract the shape characteristics of the pictures, namely, the information disturbance is large. How to avoid the difference caused by different view angle pictures is an urgent problem to be solved in order to currently reconstruct a 3D model of an object by using a single arbitrary view angle picture.

Disclosure of Invention

The invention aims to overcome the defect that the three-dimensional reconstruction of an object cannot be realized according to a single picture with any visual angle in the prior art, and provides a three-dimensional reconstruction method, a storage medium, a terminal and a system of the object.

The invention aims to realize the following technical scheme that the method for reconstructing the three-dimensional object specifically comprises the steps of extracting high-dimensional characteristics of a single picture with any angle by using a fixed view angle image, restoring the high-dimensional characteristics to obtain a first fixed view angle image of the object, and generating a shape mask according to the first fixed view angle image so as to generate a 3D model of the object.

Specifically, the first fixed perspective image generation shape mask specifically includes: extracting the shape contour binary image features of the object according to the first fixed view image to further generate a shape mask in a 3D space, wherein a calculation formula of the shape mask specifically comprises the following steps:

P_valid＝P{v＝1|mask＝1}＝1

P_invalid＝P{v＝1|mask＝0}＝0

wherein, P represents the expectation that the model and the shape contour binary image of the object have voxels in the corresponding positions in the 3D space, P _ valid represents the valid expectation that the model and the shape contour binary image of the object have voxels in the corresponding positions in the 3D space, P _ invalid represents the invalid expectation that the model and the shape contour binary image of the object have no voxels in the corresponding positions in the 3D space, mask represents a certain pixel value in the shape contour binary image, and v represents a three-dimensional pixel value in the 3D space.

Specifically, the first fixed viewing angle includes a side viewing angle, a top viewing angle, and a front viewing angle.

Specifically, obtaining a first fixed-view image of the object is realized through a U-shaped generation type countermeasure network; generating the shape mask from the first fixed perspective image and thereby generating the 3D model of the object is achieved through a 3D conditional generative confrontation network.

Specifically, the shape mask capable of being mapped with the first fixed view image is added to the judger in the 3D condition generation countermeasure network, and the shape mask can help the judger to judge the truth of the 3D model so as to reversely influence the generator to learn the contour picture information.

Specifically, the step of obtaining the first fixed-view image of the object further includes training a generative confrontation network:

preprocessing the data set to obtain a training data set of a plurality of pictures of each angle of each object;

and training a generator and a discriminator in the generative confrontation network model alternately according to the training data set, and adjusting the weights of each layer in the generator and the discriminator so as to obtain the generative confrontation network with stable performance.

Specifically, before the step of obtaining the first fixed-view image of the object by restoring, the method further includes: and carrying out random region clipping, inversion and color regularization on a single picture at any angle.

The invention also includes a storage medium having stored thereon computer instructions which, when executed, perform the steps of a method for three-dimensional reconstruction of an object.

The invention also comprises a terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, characterized in that the processor, when executing the computer instructions, performs the steps of a method for three-dimensional reconstruction of an object.

The invention also includes a system for three-dimensional reconstruction of an object, the system comprising: the U-shaped generation type countermeasure network is used for extracting high-dimensional characteristics of a single picture at any angle and restoring a first fixed view image of an object according to the high-dimensional characteristics; a 3D conditional generative confrontation network for generating a 3D model of the object from the first fixed perspective image.

Compared with the prior art, the invention has the beneficial effects that:

(1) according to the method, the high-dimensional characteristics of the single picture are extracted to restore the fixed visual angle (first fixed visual angle) view of the object, so that information disturbance can be reduced, the shape mask is generated according to the restored fixed visual angle (first fixed visual angle) view, the efficiency and the accuracy of three-dimensional reconstruction are improved, and finally the 3D model reconstruction of the object is completed according to the shape mask. The method and the device can generate the corresponding 3D model of the object by simply inputting the image of the object with any visual angle by the user. On one hand, a modeling worker can rapidly generate the 3D model of the object through a simple object picture, so that the workload is greatly reduced, and meanwhile, each view angle figure of the object picture with any view angle can be predicted. In addition, the method can also be applied to rapid scene demonstration, and in some simulation scenes, the required 3D model precision is not high, and model objects and scenes need to be rapidly generated to perform timely demonstration.

(2) The U-shaped generation type countermeasure network is used for processing the problem of large difference of reconstruction effects generated by object images with different visual angles, wherein the trained U-shaped generation type countermeasure network can predict an object fixed visual angle (first fixed visual angle) view of the object image with any visual angle so as to solve information interference caused by different object irradiation visual angles; the 3D conditional generative countermeasure network generates a corresponding 3D model using a fixed perspective (first fixed perspective) view generated by the U-shaped generative countermeasure network as a condition.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the figure:

FIG. 1 is a flowchart of a method of example 1 of the present invention;

FIG. 2 is a schematic diagram of a 3D model generated from a single picture;

FIG. 3 is a graph showing the effect of the method of the present invention;

FIG. 4 is a system framework diagram according to embodiment 2 of the present invention;

FIG. 5 is a schematic diagram of a 3D conditional generation countermeasure network structure for shape masking;

FIG. 6 is a diagram of a U-shaped generation countermeasure network structure according to the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

In the description of the present invention, it should be noted that directions or positional relationships indicated by "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like are based on the directions or positional relationships of the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless otherwise explicitly stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Example 1

As shown in fig. 1, in embodiment 1, a method for three-dimensional reconstruction of an object specifically includes the following steps:

s01: training a network model; in this embodiment, the generative network includes a U-generative countermeasure network and a 3D conditional generative countermeasure network, and both the U-generative countermeasure network and the 3D conditional generative countermeasure network are based on the generative countermeasure network, when different from the original generative countermeasure network. In the U-shaped generation type countermeasure network, an input generator G is not a random vector but an object picture with an arbitrary view angle. Input to generator G in the 3D conditional-generation countermeasure network is a profile side view of the corresponding object.

Further, training the whole network specifically includes the following sub-steps:

s011: preprocessing the data set; based on a public data set ShapeNet, which includes 20 types of model data, 10 pictures at various angles were generated as a training data set of a U-shaped generated countermeasure network by randomly setting an irradiation angle for each object. Since the model provided by the data set sharenet is a grid model, for a 3D model of the grid model, we need to convert it into a voxel model of 64 × 64 space size as a training data set of the 3D conditional generation countermeasure network.

S012: training a generator G and a discriminator D alternately, and adjusting the weights of each layer in the generator and the discriminator to further obtain a generating network with stable performance; specifically, the preprocessed data set is input into a generator G for prediction to obtain an output image G (X), the image G (X) and the original real image X are respectively sent into a discriminator D for discrimination, and the result guides the training generator G and the discriminator D. More specifically, in training the discriminator D, the generator G is required to generate an image and output it to the discriminator D; from the input/target image pair (X, Y) and the input/output image pair (X, G (X)), the discriminator D identifies the probability that the image given by the generator G is a true image; the discriminator D adjusts the weight of each layer in the discriminator D according to the classification error of the input/target image pair and the input/output image pair, and the specific formula is as follows:

V_CGAN(G,D)＝E_(X,Y)[logD(X,Y)]+

E_X[log(1-D(X,G(X)))]

when training the generator G, the weights of the respective layers in the generator G are adjusted based on the classification error, which is the result of discrimination by the discriminator D, and the difference between the output image and the target image, which is calculated from the following equation:

further, a random gradient descent of Batch size 8 was used, and Adam optimizer was employed, and Batch normalization was used. The present embodiment alternates between the gradient update of the generator G and the gradient update of the discriminator D. When 80 iterations are trained, the network performance is stable.

S02: preprocessing data; specifically, an object target picture at a random visual angle is cut into pictures with fixed sizes, the color of a background area is removed, and the background area is filled with green to form a whole to-be-processed image which is used as the input of a generation network for processing any visual angle.

S03: extracting high-dimensional characteristics of a single picture at any angle, and restoring a first fixed visual angle image of the object according to the high-dimensional characteristics; as an option, the first fixed viewing angle includes, but is not limited to, a side viewing angle, a front viewing angle, a top viewing angle, and the like.

And further, extracting high-dimensional features of the preprocessed single image at any angle, restoring a side view image of the object image at any view angle according to the high-dimensional features, and extracting a shape profile binary image according to the side view object boundary.

Furthermore, the U-shaped generation type countermeasure network for processing any view angle diagram into a fixed view angle diagram comprises a generator G and a discriminator D; wherein the generator G is trained from a feature/true image pair (X, Y), where X is a random view picture of an object and Y is a side view of the object corresponding to X; the trained generator G transforms the input X to obtain an object side view G (X); the trained discriminator D is used for judging whether an unknown image is an image G (X) produced by the generator or not, wherein the unknown image comprises a real target image Y from the data set or an output image G (X) from the generator G;

the objective function for generating the network is:

wherein:

V_CGAN(G,D)＝E_(X,Y)[logD(X,Y)]+

E_X[log(1-D(X,G(X)))]；

wherein, D (X, Y) and D (X, G (X)) are the discrimination results of discriminator D on different image pairs, and represent the probability of judging true; and E_(X,Y)Means for accumulating discrimination calculations for all feature/true image pairs (X, Y) from the sample and further writing out using an expected form of probability distribution; e_XThis means that the feature/generation image pair (X, g (X)) is subjected to the above-described corresponding processing;

during the training process, the generator G aims to generate a real picture as much as possible to cheat the discriminant discriminator D. The aim of the discriminator D is to separate the picture generated by the generator G from the actual picture as much as possible. Thus, the generator G and the discriminator D constitute a dynamic minmax game, in which case, in the most ideal situation, the generator G can generate enough pictures G (x) to be "fake". For authentication D, it is difficult to determine whether the picture generated by generator G is authentic, so that D (G (x)) is 0.5.

Further, in the step of processing a single picture with any view angle into a picture with a fixed view angle (side view angle) by the U-shaped generation type countermeasure network, in order to bring more variability to the training data, random region clipping, inversion and color regularization are carried out on the input data.

S04: a shape mask is generated from the first fixed perspective image to generate a 3D model of the object. Specifically, the 3D countermeasure generation network that generates the model includes a 3D generator G₁And a 3D discriminator D₁；

Further, the step of generating the shape Mask information is as follows:

s041: to be input to a 3D generator G₁A shape Mask covering the model voxel is generated on the basis of the 2D object profile map. The calculation formula is as follows:

P_valid＝P{v＝1|mask＝1}＝1

P_invalid＝P{v＝1|mask＝0}＝0

wherein, P represents the expectation that the model and the shape contour binary image of the object have voxels in the corresponding positions in the 3D space, P _ valid represents the valid expectation that the model and the shape contour binary image of the object have voxels in the corresponding positions in the 3D space, P _ invalid represents the invalid expectation that the model and the shape contour binary image of the object have no voxels in the corresponding positions in the 3D space, mask represents a certain pixel value in the shape contour binary image, and v represents a three-dimensional pixel value in the 3D space. In a 2D object contour map, if the pixel value at a certain position is 1 and pixel (x, y) ═ 1, we will set voxels (from (x, y,0) to (x, y,63)) to 1 in 3D voxel space. Mask (y) generates a shape Mask covering the model voxels based on the 2D object contour map, as shown in fig. 4.

S042: generating a 3D model of the object from the shape mask; 3D Generator G₁From a feature/true image pair (X)₁,Y₁) Training formationWherein X is₁Is a profile side view of an object to be treated, the object area of which is filled with black, the periphery of the profile is filled with white, Y₁Is a reaction of with X₁In a three-dimensional space, an object coverage area is filled with 1, and the rest areas are filled with 0; trained 3D generator G₁For inputted X₁Feature extraction and 3D model generation to obtain a reconstructed voxel model G (X)₁) (ii) a The trained 3D discriminator D is used to discriminate whether the unknown model is the model G (X) produced by the 3D generator₁) The unknown model comprises a true object model Y from the data set₁Or from a 3D generator G₁Output model G (X)₁)。

Further, the objective function of the 3D countermeasure generation network is:

in the above formula, D (x | Mask (y)) and D (G (z | y) | Mask (y)) are both 3D discriminators D₁The judgment results of different model pairs represent the rate of true judgment; and E_x～PdataAnd E_z～PnoiseRepresentation all image and model pairs from true sample and generation, respectively, (X)₁,Y₁) The results of the discrimination calculations are accumulated and further written out using the expected form of the probability distribution.

S05: theme rendering; specifically, after the 3D model is generated, rendering and displaying of the model are performed based on a uniform rendering engine. In order to realize modeling rendering and display of an output model, a model renderer based on a uniform rendering engine is built in the embodiment, after a voxel model obtained in the previous section is obtained, the rendering engine generates small cubes according to the positions of the voxel model, and each cube represents that a corresponding position belongs to a model coverage area, so that a 3D voxel model is built. In addition, an illumination relation is established, and as shown in fig. 2, a rendering result of the voxel model in the renderer is obtained.

S7: terminal application communication; the client requests the server to process through the HTTP protocol, and a return result is obtained. In order to realize more convenient application, the application is divided into a server side and a client side. The client requests the service of the server through the HTTP protocol. The client side is mainly responsible for processing user interaction response, UI display and 3D model rendering functions. The server side runs a main network which is a U-shaped countermeasure generation network and a 3D condition generation countermeasure network respectively and is responsible for processing core computing functions, including responding to the client side, preprocessing an original picture, generating a contour side view, generating an object model and returning a result to the client side.

The present embodiment further provides a storage medium, on which computer instructions are stored, and when the computer instructions are executed, the steps of the method for three-dimensional reconstruction of an object in embodiment 1 are executed.

Based on such understanding, the technical solution of the present embodiment or parts of the technical solution may be essentially implemented in the form of a software product, which is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The present embodiment also provides a terminal, which includes a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the steps of the method for three-dimensional reconstruction of an object in embodiment 1 when executing the computer instructions. The processor may be a single or multi-core central processing unit or a specific integrated circuit, or one or more integrated circuits configured to implement the present invention.

Each functional unit in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The embodiment further includes a device, which is configured to receive the 3D model obtained by the above three-dimensional reconstruction method of the object, and perform displaying or other purposes.

According to the method, the high-dimensional characteristics of the single picture are extracted to restore the fixed view angle (first fixed view angle) view of the object, so that information disturbance can be reduced, the shape mask is generated according to the fixed view angle (first fixed view angle) view, the efficiency and the accuracy of three-dimensional reconstruction are improved, the method has the characteristics of high robustness, suitability for the picture with any view angle and vivid effect, the requirement of real-time reconstruction of a 3D model of the single picture of the object with any view angle is met, as shown in figure 3, the effect diagram in the third row in the figure is the 3D model of the object generated by the method, and the second row is the 3D model diagram of the object which is not generated by the method (unreduced object fixed view angle). The method and the device can generate the corresponding 3D model of the object by simply inputting the image of the object with any visual angle by the user. On one hand, a modeling worker can rapidly generate the 3D model of the object through a simple object picture, so that the workload is greatly reduced, and meanwhile, each view angle figure of the object picture with any view angle can be predicted. In addition, the method can also be applied to rapid scene demonstration, and in some simulation scenes, the required 3D model precision is not high, and model objects and scenes need to be rapidly generated to perform timely demonstration.

Example 2

The present embodiment is based on the same inventive concept as embodiment 1, and provides a three-dimensional reconstruction system of an object on the basis of embodiment 1, as shown in fig. 4, the system specifically includes a U-shaped generation countermeasure network and a 3D condition generation countermeasure network, the U-shaped generation countermeasure network extracts high-order features of a single arbitrary angle picture to restore a first fixed view image of the object, and the first fixed view image of the object is input to the 3D condition generation countermeasure network; the 3D condition generation type countermeasure network generates a shape mask according to the shape outline binary image characteristics of the first fixed view angle image so as to generate a 3D model of the object. The U-shaped generation type countermeasure network and the 3D condition generation type countermeasure network both include a generator and a discriminator, fig. 5 is a schematic diagram of a 3D condition generation type countermeasure network structure of a shape mask, and fig. 6 is a structural diagram of the U-shaped generation type countermeasure network of the present invention.

Further, the architecture of the generator G is an encoder-decoder network, the encoder part of which consists of a series of full convolution layers (convolution size 3 × 3) and resolution reduction, and the decoder consists of a series of deconvolution/upsampling. In addition, in the decoding part, each layer is thus connected to a layer of lower resolution and an additional skip connection connects it to the encoder layer of the same resolution (U-net) as it. These additional connections allow bypassing the bottleneck of the encoder-decoder by passing low-level information from the input directly to the output.

Further, the generator G comprises an m-layer encoder and an m-layer decoder which are sequentially connected, wherein an image X is input at an input end of the encoder, and an image G (X) is output at an output end of the decoder; each encoder comprises a convolution layer, a Batch Norm layer and a ReLU layer which are sequentially connected, and each decoder comprises a deconvolution/up-sampling layer, a Batch Norm layer and a ReLU layer; and the output end of the convolution layer of the nth layer is in jump connection with the input end of the deconvolution layer of the (m-n) th layer, wherein m is the number of layers.

Further, the encoder gradually reduces the spatial dimension of the pooling layer, and the decoder gradually restores the details and spatial dimension of the object. A quick connection usually exists between the encoder and the decoder, so that the decoder can be helped to better repair the details of the target and extract the high-dimensional features of the image. Since much of the information in the network is shared between the input and output, the information in the encoder needs to be passed directly to the decoder. To achieve information contribution, a hopping connection between the nth layer and the m-nth layers is added to the network. Where m is the number of network layers, i.e. each hop connection directly passes the nth layer (encoder) information to the m-nth layer (decoder).

Further, discriminator D includes a plurality of convolutional layers connected in sequence, including a Batch Norm layer and a ReLU layer between adjacent convolutional layers.

Further, each layer of the network in the generator G and the discriminator D includes a number of training optimized parameter weights, whose values are dynamically updated by the training.

Furthermore, in this embodiment, the size of the random view picture X of an object is 512 × 512 × 3, where the input channel is 3, which means that the input image is RGB three channels, and 512 × 512 means that the image resolution is 512 × 512 pixels; the size of an output image G (X) is 512 multiplied by 3, wherein an input channel is 3, which means that the input image is an RGB three channel; where 256 × 256 × 3 resolution images may also be used. The size of the image obtained by each layer of encoder is: 256 × 256 × 64, 128 × 128 × 128, 64 × 64 × 256, 32 × 32 × 512, 16 × 16 × 512, 8 × 8 × 512, 4 × 4 × 512, 2 × 2 × 512, and the feature size of the image output by the output end of the encoder is 1 × 1 × 1024; the image sizes obtained by the decoders for each layer are 2 × 2 × 512, 4 × 4 × 512, 8 × 8 × 512, 16 × 16 × 512, 32 × 32 × 512, 64 × 64 × 256, 128 × 128 × 128, 256 × 256 × 64, and 512 × 512 × 3, respectively.

The embodiment also comprises equipment, wherein the three-dimensional reconstruction system for a single object picture with any visual angle is adopted to realize the three-dimensional reconstruction of the picture, and the equipment is used for model display and other purposes.

The U-shaped generation type countermeasure network is used for processing the problem of large difference of reconstruction effects generated by object images with different visual angles, wherein the trained U-shaped generation type countermeasure network can predict an object fixed visual angle (first fixed visual angle) view of the object image with any visual angle so as to solve information interference caused by different object irradiation visual angles; the 3D conditional generative countermeasure network generates a corresponding 3D model using a fixed perspective (first fixed perspective) view generated by the U-shaped generative countermeasure network as a condition.

The above detailed description is for the purpose of describing the invention in detail, and it should not be construed that the detailed description is limited to the description, and it will be apparent to those skilled in the art that various modifications and substitutions can be made without departing from the spirit of the invention.

Claims

1. A method of three-dimensional reconstruction of an object, characterized by: the method comprises the following steps:

extracting high-dimensional features of a single picture at any angle by using a fixed view image, restoring the high-dimensional features to obtain a first fixed view image of the object according to the high-dimensional features, and generating a shape mask according to the first fixed view image so as to generate a 3D model of the object;

the first fixed view angle image of the object is obtained through a U-shaped generation type countermeasure network;

generating a shape mask according to the first fixed view image and further generating a 3D model of the object is realized through a 3D condition generating type countermeasure network;

the U-shaped generation type countermeasure network and the 3D condition generation type countermeasure network both comprise generators and discriminators; the generator G comprises an m-layer encoder and an m-layer decoder which are sequentially connected, wherein an image X is input at the input end of the encoder, and an image G (X) is output at the output end of the decoder; each encoder comprises a convolution layer, a Batch Norm layer and a ReLU layer which are sequentially connected, and each decoder comprises a deconvolution/up-sampling layer, a Batch Norm layer and a ReLU layer; the output end of the convolution layer of the nth layer is in jumping connection with the input end of the deconvolution layer of the (m-n) th layer, wherein m is the number of layers;

the step of obtaining the first fixed perspective image of the object further comprises training a generative confrontation network:

alternately training a generator and a discriminator in the generative confrontation network model according to a training data set, and adjusting the weight of each layer in the generator and the discriminator so as to obtain the generative confrontation network with stable performance;

the weights of each layer in the discriminator are adjusted according to the classification errors of the input/target image pair and the input/output image pair, and the specific formula is as follows:

V_CGAN(G,D)＝E_(X,Y)[log D(X,Y)]+E_X[log(1-D(X,G(X)))]

wherein D represents a discriminator; x represents an input image; y represents a target image; (X, Y) represents an input/target image pair; (X, g (X)) represents an input/output image pair; g (X) represents an output image of the generator; d (X, Y) and D (X, G (X)) are the discrimination results of the discriminator D on different image pairs;

during training of the generator, the weights of the respective layers in the generator G are adjusted based on the classification error, which is the result of discrimination by the discriminator, and the difference between the output image and the target image, which is calculated from the following equation:

wherein E is_(X,Y)The representation accumulates discrimination calculations from all feature/true image pairs (X, Y) of the sample and is written using the expected form of the probability distribution.

2. A method of three-dimensional reconstruction of an object according to claim 1, characterized by: generating a shape mask from the first fixed perspective image specifically includes: extracting the shape contour binary image features of the object according to the first fixed view image to further generate a shape mask in a 3D space, wherein a shape mask calculation formula specifically comprises the following steps:

P_valid＝P{v＝1|mask＝1}＝1

P_invalid＝P{v＝1|mask＝0}＝0

3. A method of three-dimensional reconstruction of an object according to claim 1, characterized by: the first fixed viewing angle comprises a side viewing angle, a top viewing angle and a front viewing angle.

4. A method of three-dimensional reconstruction of an object according to claim 1, characterized by: the discriminators in the 3D condition generating countermeasure network incorporate a shape mask that can be mapped with the first fixed perspective image.

5. A method of three-dimensional reconstruction of an object according to claim 1, characterized by: before the step of obtaining the first fixed-view image of the object by restoring, the method further comprises:

and carrying out random region clipping, inversion and color regularization on a single picture with any angle.

6. A storage medium having stored thereon computer instructions, characterized in that: the computer instructions when executed perform the steps of a method of three-dimensional reconstruction of an object as claimed in any one of claims 1 to 5.

7. A terminal comprising a memory and a processor, said memory having stored thereon computer instructions executable on said processor, wherein said processor, when executing said computer instructions, performs the steps of a method for three-dimensional reconstruction of an object according to any of claims 1-5.

8. A system for three-dimensional reconstruction of an object, characterized by: the system is applied based on the method of any one of claims 1-5, and comprises:

the U-shaped generation type countermeasure network is used for extracting high-dimensional characteristics of a single picture at any angle and restoring a first fixed view image of an object according to the high-dimensional characteristics;

a 3D conditional generative confrontation network for generating a 3D model of an object from the first fixed perspective image.