CN111192305A

CN111192305A - Method and apparatus for generating three-dimensional image

Info

Publication number: CN111192305A
Application number: CN201811359444.9A
Authority: CN
Inventors: 彭明浩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-11-15
Filing date: 2018-11-15
Publication date: 2020-05-22
Anticipated expiration: 2038-11-15
Also published as: CN111192305B

Abstract

The embodiment of the application discloses a method and a device for generating a three-dimensional image. One embodiment of the method includes obtaining a target color image; inputting the target color image into a depth network model trained in advance to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating a depth image; and generating a target three-dimensional image based on the target depth image. The implementation mode gets rid of the dependence on the three-dimensional camera, reduces the implementation cost, and the color image is very convenient to acquire, so that the method is applicable to various different scenes and has wide applicability.

Description

Method and apparatus for generating three-dimensional image

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating a three-dimensional image.

Background

The three-dimensional image is also called a 3D perspective. The 3D stereogram is clear in level and bright in color in vision, has strong visual impact, and enables the watching people to stay in the scene for a long time to leave a deep impression. The 3D stereograph is real, vivid, reminiscent of people, has a feeling of being personally on the scene, and has high artistic appreciation value.

Currently, 3D cameras are typically used to acquire 3D stereograms. The 3D camera is a camera that can enjoy stereoscopic pictures or moving pictures with naked eyes. The birth of the 3D camera means that people can enjoy the effect of a stereoscopic image with naked eyes without using professional glasses. The 3D camera is generally equipped with 2 lenses so that stereoscopic images can be reproduced, and the price of the 3D camera is generally high.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating a three-dimensional image.

In a first aspect, an embodiment of the present application provides a method for generating a three-dimensional image, including: acquiring a target color image; inputting the target color image into a depth network model trained in advance to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating a depth image; and generating a target three-dimensional image based on the target depth image.

In some embodiments, the deep network model is trained by: acquiring a training sample set, wherein training samples in the training sample set comprise sample color images and corresponding sample depth images; and training the initial deep network model based on the training sample set by using a machine learning method to obtain the deep network model.

In some embodiments, the initial deep network model is a generative confrontation network, the generative confrontation network including a generative model and a discriminative model; training the initial deep network model based on the training sample set by using a machine learning method to obtain a deep network model, wherein the deep network model comprises the following steps: for a training sample in a training sample set, inputting a sample color image in the training sample into a generation model to obtain a generated depth image; inputting the generated depth image and the sample depth image in the training sample into a discrimination model to obtain a discrimination result, wherein the discrimination result is used for representing the probability that the generated depth image and the sample depth image in the training sample are real depth images; parameters of the generated model and the discriminant model are adjusted based on the discrimination result.

In some embodiments, the training sample set is acquired using a Kinect device or an RGB-D camera.

In some embodiments, generating the target three-dimensional image based on the target depth image comprises: and generating a target three-dimensional image by combining the target depth image and the target color image.

In a second aspect, an embodiment of the present application provides an apparatus for generating a three-dimensional image, including: an acquisition unit configured to acquire a target color image; the input unit is configured to input the target color image into a depth network model trained in advance to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating a depth image; a generating unit configured to generate a target three-dimensional image based on the target depth image.

In some embodiments, the generating unit is further configured to: and generating a target three-dimensional image by combining the target depth image and the target color image.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

According to the method and the device for generating the three-dimensional image, the target color image is obtained firstly, then the target color image is input into the depth network model trained in advance to obtain the target depth image corresponding to the target color image, and then the target three-dimensional image is generated based on the target depth image, so that the dependence on a three-dimensional camera is eliminated, the implementation cost is reduced, the color image is very convenient to obtain, and the method and the device for generating the three-dimensional image are applicable to various different scenes and have wide applicability.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture to which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a three-dimensional image according to the present application;

FIG. 3 is a schematic diagram of an application scenario of the method for generating a three-dimensional image provided in FIG. 2;

FIG. 4 is a flow diagram of one embodiment of a method for training a deep network model according to the present application;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for generating three-dimensional images according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for generating a three-dimensional image or the apparatus for generating a three-dimensional image of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include photographing

devices

101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium of communication links between the photographing

devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The photographing

devices

101, 102, 103 may interact with the server 105 through the network 104 to receive or transmit messages or the like. The photographing

devices

101, 102, 103 may be hardware or software. When the photographing

devices

101, 102, 103 are hardware, they may be various electronic devices supporting a color image photographing function, including but not limited to a video camera, a still camera, a smart phone, and the like. When the photographing

apparatuses

101, 102, 103 are software, they may be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may provide various services, and for example, the server 105 may analyze and process data such as target color images acquired from the photographing

apparatuses

101, 102, and 103 to generate a processing result (e.g., a target three-dimensional image).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for generating a three-dimensional image provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for generating a three-dimensional image is generally disposed in the server 105.

It should be understood that the number of photographing devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of cameras, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a three-dimensional image in accordance with the present application is shown. The method for generating a three-dimensional image comprises the following steps:

step 201, acquiring a target color image.

In the present embodiment, the execution subject (e.g., the server 105 shown in fig. 1) of the method for generating a three-dimensional image may acquire a target color image from a photographing apparatus (e.g., the photographing

apparatuses

101, 102, 103 shown in fig. 1) by a wired connection manner or a wireless connection manner. In general, a photographing apparatus supports a color image photographing function, and can photograph a subject to obtain a color image of the subject. The target may include, but is not limited to, a human, an animal, a plant, an item, and the like, among others. The target color image is also called a target RGB image, and the pixel value of each pixel point of the target color image can be the color value of each point of the shot target surface. Generally, all colors that can be perceived by human vision are obtained by variation of three color channels of red (R), green (G), and blue (B) and their superposition with each other.

Step 202, inputting the target color image into a depth network model trained in advance to obtain a target depth image corresponding to the target color image.

In this embodiment, the executing entity may input the target color image to a depth network model trained in advance, so as to obtain a target depth image corresponding to the target color image. The target Depth image is also called a target Depth image, and the pixel value of each pixel point of the target Depth image can be the distance between the shooting device and each point of the shot target surface. In general, the target color image and the target depth image corresponding thereto are registered, and thus there is a one-to-one correspondence between pixel points of the target color image and the target depth image.

In this embodiment, the depth network model may be used to generate a depth image, characterizing a correspondence between the target color image and the target depth image.

In some optional implementation manners of this embodiment, the depth network model may be a table obtained by statistically analyzing a large number of sample color images and corresponding sample depth images by a person skilled in the art, and storing a correspondence between a plurality of sample color images and corresponding sample depth images. At this time, the execution subject may calculate the similarity between the target color image and each sample color image in the correspondence table, and obtain the target depth image corresponding to the target color image from the correspondence table based on the result of the similarity. For example, the execution subject may look up a sample depth image corresponding to a sample color image with the highest similarity to the target color image from the correspondence table as the target depth image.

In some optional implementations of the present embodiment, the deep network model may be obtained by performing supervised or unsupervised training on an existing machine learning model (e.g., various artificial neural networks, etc.) by using various machine learning methods and training samples. At this time, the execution subject may input the target color image to the depth network model, and undergo vector conversion, convolution calculation, and the like, thereby outputting a target depth image corresponding to the target color image.

Step 203, generating a target three-dimensional image based on the target depth image.

In this embodiment, the execution subject described above may generate a target three-dimensional image based on the target depth image. The target three-dimensional image is also called a target 3D image, and the three-dimensional stereo image is formed by directly seeing a three-dimensional stereo image by using the visual difference of two eyes and the optical refraction principle of people in one plane, wherein objects in the image can be protruded out of the image or hidden in the image.

In some optional implementations of the present embodiment, the executing subject may generate the target three-dimensional image by combining the target depth image and the target color image. The pixel value of each pixel point of the target depth image can be the distance between the shooting device and each point of the shot target surface, the pixel value of each pixel point of the target color image can be the color value of each point of the shot target surface, and the pixel points of the target color image and the target depth image have one-to-one correspondence, so that the target three-dimensional image can be generated by combining the two.

With continued reference to fig. 3, fig. 3 is a schematic illustration of an application scenario of the method for generating a three-dimensional image provided in fig. 2. In the application scenario shown in fig. 3, when a three-dimensional image of a human body 301 is to be obtained, the human body 301 may be first photographed by a mobile phone 302 to obtain a color image 304 of the human body 301, and the color image may be transmitted to a server 303. Then, the server 303 inputs the color image 304 of the human body 301 to the depth network model 305 trained, and obtains a depth image 306 corresponding to the color image 304 of the human body 301. Finally, the server 303 combines the depth image 306 of the human body and the color image 304 of the human body to generate a three-dimensional image 307 of the human body 301, and sends the three-dimensional image to the mobile phone 302 for displaying.

According to the method for generating the three-dimensional image, the target color image is obtained firstly, then the target color image is input into the depth network model trained in advance to obtain the target depth image corresponding to the target color image, and the target three-dimensional image is generated based on the target depth image, so that dependence on a three-dimensional camera is eliminated, the implementation cost is reduced, the color image is very convenient to obtain, and the method is applicable to various different scenes and has wide applicability.

With further reference to FIG. 4, a flow 400 of one embodiment of a method for training a deep network model in accordance with the present application is illustrated. The method for training the deep network model comprises the following steps:

step 401, a training sample set is obtained.

In this embodiment, an executing agent (e.g., the server 105 shown in fig. 1) of the method for training the deep network model may obtain a set of training samples. Wherein each training sample in the set of training samples may comprise a sample color image and a corresponding sample depth image. Since the sample color image and the corresponding sample depth image are usually obtained by shooting the sample at the same time, the sample color image and the corresponding sample depth image are registered, and there is a one-to-one correspondence between pixel points.

In some optional implementations of the present embodiment, the training samples may be acquired using a Kinect device or an RGB-D camera. In general, a Kinect device has three "eyes", from left to right: the Kinect equipment has no other forms of user input except voice instructions and somatosensory operation instructions, and the key of an input system of the Kinect equipment is an inductor system consisting of the microphone and the camera. The infrared camera is used for actively projecting near infrared spectrum, and after the near infrared spectrum irradiates a rough object or penetrates through ground glass, the spectrum is distorted, random reflection spots, namely speckles, can be formed, and further can be read by the infrared camera. The infrared camera is used for analyzing the infrared spectrum and creating depth images of human bodies and objects in a visible range. The color camera is used for shooting color images within a visual angle range. An RGB-D camera, also called a depth camera, may be used to capture RGB-D images. The RGB-D image may include a color image and a depth image. Because the color image and the depth image are obtained simultaneously, the pixel points have one-to-one correspondence.

And 402, training the initial deep network model based on the training sample set by using a machine learning method to obtain the deep network model.

In this embodiment, the executing entity may utilize a machine learning method to train the initial deep network model based on the training sample set, so as to obtain the deep network model. Wherein, the initial deep network model can be an untrained or untrained complete deep network model. Here, for the untrained deep network model, its various parameters (e.g., weight parameters and bias parameters) are initialized with some different small random numbers. The small random number is used for ensuring that the model does not enter a saturation state due to overlarge weight value, so that training fails, and the difference is used for ensuring that the model can be normally learned. For the untrained deep network model, the parameters may be adjusted parameters, but the generation effect of the deep network model generally does not meet the preset constraint condition.

In some optional implementations of the present embodiment, the deep network model may be obtained by supervised training of an existing machine learning model using various machine learning methods and training samples. In this case, the execution subject may train the initial depth network model by using the sample color image as an input and the corresponding sample depth image as an output, so as to obtain the depth network model.

In some optional implementations of the present embodiment, the deep network model may be obtained by unsupervised training of an existing machine learning model using various machine learning methods and training samples. Typically, the initial deep network model may be a Generative Adaptive Networks (GAN). The GAN may include a Generative Model (Generative Model) and a Discriminative Model (Discriminative Model). The generated model is mainly used for learning the distribution of real images so as to enable the self-generated images to be more real and cheat the discriminant model. The discrimination model needs to discriminate the received image. In the whole process, the generated model strives to make the generated image more real, and the discriminant model strives to identify the true and false of the image, which is equivalent to a two-person game, and the generated model and the discriminant model continuously resist against each other as time goes on, and finally, two networks reach a dynamic balance: the image generated by the generation model is close to the distribution of a real image, and the discrimination model can not identify the real image and the false image. At this time, for each training sample in the training sample set, the executing subject may first input the color image of the sample in the training sample into the generation model to obtain a generated depth image; then inputting the generated depth image and the sample depth image in the training sample into a discrimination model to obtain a discrimination result; and finally, adjusting parameters of the generated model and the discrimination model based on the discrimination result. Wherein the discrimination result can be used to characterize the probability that the generated depth image and the sample depth image in the training sample are true depth images.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating a three-dimensional image, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for generating a three-dimensional image of the present embodiment may include: an acquisition unit 501, an input unit 502, and a generation unit 503. Wherein, the acquiring unit 501 is configured to acquire a target color image; an input unit 502 configured to input the target color image into a depth network model trained in advance, to obtain a target depth image corresponding to the target color image, where the depth network model is used to generate a depth image; a generating unit 503 configured to generate a target three-dimensional image based on the target depth image.

In the present embodiment, in the apparatus 500 for generating a three-dimensional image: the specific processing of the obtaining unit 501, the input unit 502 and the generating unit 503 and the technical effects thereof can refer to the related descriptions of step 201, step 202 and step 203 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementations of this embodiment, the deep network model is trained by the following steps: acquiring a training sample set, wherein training samples in the training sample set comprise sample color images and corresponding sample depth images; and training the initial deep network model based on the training sample set by using a machine learning method to obtain the deep network model.

In some optional implementations of this embodiment, the initial deep network model is a generative confrontation network, and the generative confrontation network includes a generative model and a discriminant model; training the initial deep network model based on the training sample set by using a machine learning method to obtain a deep network model, wherein the deep network model comprises the following steps: for a training sample in a training sample set, inputting a sample color image in the training sample into a generation model to obtain a generated depth image; inputting the generated depth image and the sample depth image in the training sample into a discrimination model to obtain a discrimination result, wherein the discrimination result is used for representing the probability that the generated depth image and the sample depth image in the training sample are real depth images; parameters of the generated model and the discriminant model are adjusted based on the discrimination result.

In some optional implementations of the present embodiment, the set of training samples is acquired using a Kinect device or an RGB-D camera.

In some optional implementations of this embodiment, the generating unit 503 is further configured to: and generating a target three-dimensional image by combining the target depth image and the target color image.

Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing an electronic device (e.g., server 105 of FIG. 1) of an embodiment of the present application is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an input unit, and a generation unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the acquisition unit may also be described as a "unit that acquires a target color image".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target color image; inputting the target color image into a depth network model trained in advance to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating a depth image; and generating a target three-dimensional image based on the target depth image.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating a three-dimensional image, comprising:

acquiring a target color image;

inputting the target color image into a depth network model trained in advance to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating a depth image;

generating a target three-dimensional image based on the target depth image.

2. The method of claim 1, wherein the deep network model is trained by:

acquiring a training sample set, wherein training samples in the training sample set comprise sample color images and corresponding sample depth images;

and training an initial deep network model based on the training sample set by using a machine learning method to obtain the deep network model.

3. The method of claim 2, wherein the initial deep network model is a generative confrontation network comprising a generative model and a discriminative model; and

the training an initial deep network model based on the training sample set by using a machine learning method to obtain the deep network model comprises the following steps:

for the training samples in the training sample set, inputting the color images of the samples in the training samples into the generation model to obtain a generated depth image;

inputting the generated depth image and the sample depth image in the training sample into the discrimination model to obtain a discrimination result, wherein the discrimination result is used for representing the probability that the generated depth image and the sample depth image in the training sample are real depth images;

and adjusting parameters of the generated model and the discriminant model based on the discriminant result.

4. A method as claimed in claim 2 or 3, wherein the set of training samples is acquired using a Kinect device or an RGB-D camera.

5. The method of one of claims 1 to 3, wherein the generating a target three-dimensional image based on the target depth image comprises:

and generating the target three-dimensional image by combining the target depth image and the target color image.

6. An apparatus for generating a three-dimensional image, comprising:

an acquisition unit configured to acquire a target color image;

the input unit is configured to input the target color image into a depth network model trained in advance, and a target depth image corresponding to the target color image is obtained, wherein the depth network model is used for generating a depth image;

a generating unit configured to generate a target three-dimensional image based on the target depth image.

7. The apparatus of claim 6, wherein the deep network model is trained by:

8. The apparatus of claim 7, wherein the initial deep network model is a generative confrontation network comprising a generative model and a discriminant model; and

9. The apparatus of claim 7 or 8, wherein the set of training samples is acquired with a Kinect device or an RGB-D camera.

10. The apparatus according to one of claims 6-8, wherein the generating unit is further configured to:

11. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the method according to any one of claims 1-5.