CN111192305B

CN111192305B - Method and apparatus for generating three-dimensional image

Info

Publication number: CN111192305B
Application number: CN201811359444.9A
Authority: CN
Inventors: 彭明浩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-11-15
Filing date: 2018-11-15
Publication date: 2023-11-21
Anticipated expiration: 2038-11-15
Also published as: CN111192305A

Abstract

The embodiment of the application discloses a method and a device for generating a three-dimensional image. One embodiment of the method includes acquiring a target color image; inputting the target color image into a pre-trained depth network model to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating the depth image; a target three-dimensional image is generated based on the target depth image. The implementation method gets rid of dependence on a three-dimensional camera, reduces the implementation cost, and is very convenient for acquiring the color image, thereby being applicable to various different scenes and having wide applicability.

Description

Method and apparatus for generating three-dimensional image

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating a three-dimensional image.

Background

Three-dimensional images are also called 3D perspective views. The 3D stereogram is distinct in vision and bright in color, has strong visual impact, and ensures that the watched person has long residence time and has a profound impression. The 3D stereogram gives people a real, lifelike feeling, and the people feel like feeling of being personally on the scene, and has high artistic appreciation value.

Currently, 3D cameras are commonly employed to acquire 3D perspective views. The 3D camera is a camera that can enjoy stereoscopic images or moving pictures with naked eyes. The advent of 3D cameras means that people can enjoy stereoscopic images without using professional glasses and with naked eyes. A 3D camera is generally equipped with 2 lenses so that stereoscopic images can be reproduced, and the price of the 3D camera is generally high.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating a three-dimensional image.

In a first aspect, an embodiment of the present application provides a method for generating a three-dimensional image, including: acquiring a target color image; inputting the target color image into a pre-trained depth network model to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating the depth image; a target three-dimensional image is generated based on the target depth image.

In some embodiments, the depth network model is trained by: acquiring a training sample set, wherein training samples in the training sample set comprise sample color images and corresponding sample depth images; and training the initial deep network model based on the training sample set by using a machine learning method to obtain the deep network model.

In some embodiments, the initial depth network model is a generative antagonism network, the generative antagonism network comprising a generative model and a discriminant model; training the initial deep network model based on the training sample set by using a machine learning method to obtain a deep network model, wherein the training method comprises the following steps: for a training sample in the training sample set, inputting a sample color image in the training sample into a generating model to obtain a generating depth image; inputting the generated depth image and the sample depth image in the training sample into a discrimination model to obtain a discrimination result, wherein the discrimination result is used for representing the probability that the generated depth image and the sample depth image in the training sample are real depth images; parameters of the generation model and the discrimination model are adjusted based on the discrimination result.

In some embodiments, the training sample set is acquired using a Kinect apparatus or an RGB-D camera.

In some embodiments, generating the target three-dimensional image based on the target depth image includes: and generating a target three-dimensional image by combining the target depth image and the target color image.

In a second aspect, an embodiment of the present application provides an apparatus for generating a three-dimensional image, including: an acquisition unit configured to acquire a target color image; the input unit is configured to input the target color image into a pre-trained depth network model to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating the depth image; and a generation unit configured to generate a target three-dimensional image based on the target depth image.

In some embodiments, the generating unit is further configured to: and generating a target three-dimensional image by combining the target depth image and the target color image.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

The method and the device for generating the three-dimensional image, provided by the embodiment of the application, firstly acquire the target color image, then input the target color image into the pre-trained depth network model to obtain the target depth image corresponding to the target color image, and further generate the target three-dimensional image based on the target depth image, thereby getting rid of dependence on a three-dimensional camera, reducing the realization cost, and being very convenient for acquiring the color image, thereby being applicable to various different scenes and having wide applicability.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method for generating a three-dimensional image according to the present application;

FIG. 3 is a schematic illustration of an application scenario of the method for generating a three-dimensional image provided in FIG. 2;

FIG. 4 is a flow chart of one embodiment of a method for training a deep network model in accordance with the present application;

FIG. 5 is a schematic structural view of one embodiment of an apparatus for generating three-dimensional images according to the present application;

fig. 6 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which an embodiment of the method for generating a three-dimensional image or the apparatus for generating a three-dimensional image of the present application may be applied.

As shown in fig. 1, a photographing apparatus 101, 102, 103, a network 104, and a server 105 may be included in a system architecture 100. The network 104 is a medium for providing a communication link between the photographing apparatuses 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The photographing devices 101, 102, 103 may interact with the server 105 through the network 104 to receive or transmit messages or the like. The photographing devices 101, 102, 103 may be hardware or software. When the photographing devices 101, 102, 103 are hardware, it may be various electronic devices supporting a color image photographing function, including, but not limited to, video cameras, still cameras, smart phones, and the like. When the photographing apparatuses 101, 102, 103 are software, they may be installed in the above-described electronic apparatuses. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present application is not particularly limited herein.

The server 105 may provide various services, for example, the server 105 may perform processing such as analysis on data of a target color image or the like acquired from the photographing devices 101, 102, 103, and generate a processing result (e.g., a target three-dimensional image).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present application is not particularly limited herein.

It should be noted that, the method for generating a three-dimensional image according to the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for generating a three-dimensional image is generally disposed in the server 105.

It should be understood that the number of photographing apparatuses, networks, and servers in fig. 1 is merely illustrative. There may be any number of photographing devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a three-dimensional image in accordance with the present application is shown. The method for generating a three-dimensional image comprises the steps of:

in step 201, a target color image is acquired.

In the present embodiment, an execution subject of the method for generating a three-dimensional image (e.g., the server 105 shown in fig. 1) may acquire a target color image from a photographing apparatus (e.g., the photographing apparatuses 101, 102, 103 shown in fig. 1) through a wired connection or a wireless connection. In general, a photographing apparatus supports a color image photographing function, and may photograph a subject to obtain a subject color image. Among other things, targets may include, but are not limited to, humans, animals, plants, items, and the like. The target color image, also called target RGB image, may have a pixel value for each pixel point that is the color value of each point of the captured target surface. In general, all colors perceived by human vision are obtained by varying the three color channels of red (R), green (G), blue (B) and their superposition with each other.

Step 202, inputting the target color image into a pre-trained depth network model to obtain a target depth image corresponding to the target color image.

In this embodiment, the execution subject may input the target color image into a pre-trained depth network model, so as to obtain a target depth image corresponding to the target color image. Wherein the target Depth image, also called target Depth image, may have a pixel value of each pixel point that is the distance between the photographing device and each point of the photographed target surface. Typically, the target color image and its corresponding target depth image are registered, so there is a one-to-one correspondence between the pixels of the target color image and the target depth image.

In this embodiment, the depth network model may be used to generate a depth image, characterizing the correspondence between the target color image and the target depth image.

In some alternative implementations of the present embodiment, the depth network model may be a table obtained by a person skilled in the art performing statistical analysis on a large number of sample color images and corresponding sample depth images, where a correspondence table is stored between a plurality of sample color images and corresponding sample depth images. At this time, the execution subject may calculate the similarity between the target color image and each sample color image in the correspondence table, and obtain the target depth image corresponding to the target color image from the correspondence table based on the similarity result. For example, the execution subject may query, from the correspondence table, a sample depth image corresponding to a sample color image having the highest similarity to the target color image as the target depth image.

In some alternative implementations of the present embodiments, the deep network model may be obtained by supervised or unsupervised training of existing machine learning models (e.g., various artificial neural networks, etc.) using various machine learning methods and training samples. At this time, the execution subject may input the target color image into the depth network model, and output a target depth image corresponding to the target color image through vector conversion, convolution calculation, and the like.

A target three-dimensional image is generated based on the target depth image, step 203.

In this embodiment, the execution subject may generate the target three-dimensional image based on the target depth image. The target three-dimensional image is also called a target 3D image, and people directly see a three-dimensional stereogram in a plane by utilizing the visual difference of two eyes and the optical refraction principle, and things in the picture can be projected out of the picture or can be deeply hidden in the picture.

In some optional implementations of this embodiment, the execution subject may generate the target three-dimensional image in combination with the target depth image and the target color image. Since the pixel value of each pixel point of the target depth image may be a distance between the photographing apparatus and each point of the photographed target surface, the pixel value of each pixel point of the target color image may be a color value of each point of the photographed target surface, and the pixel points of the target color image and the target depth image have a one-to-one correspondence relationship, a target three-dimensional image may be generated by a combination of the two.

With continued reference to fig. 3, fig. 3 is a schematic illustration of one application scenario of the method for generating a three-dimensional image provided in fig. 2. In the application scenario shown in fig. 3, if a three-dimensional image of the human body 301 is desired, the human body 301 may be first photographed by the mobile phone 302 to obtain a color image 304 of the human body 301, and sent to the server 303. Then, the server 303 inputs the color image 304 of the human body 301 to the training depth network model 305, and obtains a depth image 306 corresponding to the color image 304 of the human body 301. Finally, the server 303 generates a three-dimensional image 307 of the human body 301 in combination with the depth image 306 of the human body and the color image 304 of the human body, and sends the three-dimensional image 307 to the mobile phone 302 for display.

According to the method for generating the three-dimensional image, the target color image is firstly acquired, then the target color image is input into the pre-trained depth network model, so that the target depth image corresponding to the target color image is obtained, and the target three-dimensional image is generated based on the target depth image, so that dependence on a three-dimensional camera is eliminated, the implementation cost is reduced, the color image is very convenient to acquire, and the method is applicable to various different scenes and has wide applicability.

With further reference to fig. 4, a flow 400 of one embodiment of a method for training a deep network model in accordance with the present application is shown. The method for training the deep network model comprises the following steps:

step 401, a training sample set is obtained.

In this embodiment, an executing body of a method for training a deep network model (e.g., server 105 shown in fig. 1) may obtain a training sample set. Wherein each training sample in the training sample set may include a sample color image and a corresponding sample depth image. Since the sample color image and the corresponding sample depth image are usually obtained by capturing the sample at the same time, the sample color image and the corresponding sample depth image are registered, and there is a one-to-one correspondence between pixels.

In some alternative implementations of the present embodiment, the training samples may be acquired using a Kinect device or an RGB-D camera. Typically, a Kinect device has three "eyes", left to right: the infrared camera, the color camera and the infrared camera, in addition, four ears, namely an L-shaped layout microphone array, are hidden, besides a voice command and a somatosensory operation command, the Kinect device has no other forms of user input, and the key of an input system is a sensor system consisting of a microphone and a camera. The infrared camera is used for actively projecting near infrared spectrum, and after the infrared camera irradiates a rough object or penetrates ground glass, the spectrum is distorted to form random reflection spots, namely speckles, so that the scattered spots can be read by the infrared camera. The infrared camera is used for analyzing infrared spectrums and creating depth images of human bodies and objects in a visible range. The color camera is used for shooting color images in a visual angle range. RGB-D cameras, also known as depth cameras, may be used to capture RGB-D images. The RGB-D image may include a color image and a depth image. Since the color image and the depth image are acquired simultaneously, there is a one-to-one correspondence between the pixels.

Step 402, training the initial deep network model based on the training sample set by using a machine learning method to obtain a deep network model.

In this embodiment, the executing body may train the initial deep network model based on the training sample set by using a machine learning method to obtain the deep network model. The initial depth network model may be an untrained or untrained depth network model, among other things. Here, for an untrained deep network model, its various parameters (e.g., weight parameters and bias parameters) are initialized with some different small random numbers. The small random number is used for ensuring that the model does not enter a saturated state due to overlarge weight, so that training fails, and the different random numbers are used for ensuring that the model can learn normally. For an untrained depth network model, the parameters of the untrained depth network model may be adjusted parameters, but the generation effect of the depth network model generally does not meet the preset constraint condition.

In some alternative implementations of the present embodiments, the deep network model may be obtained by supervised training of an existing machine learning model using various machine learning methods and training samples. At this time, the execution subject may train the initial depth network model by taking the sample color image as input and the corresponding sample depth image as output, thereby obtaining the depth network model.

In some alternative implementations of the present embodiments, the deep network model may be derived from unsupervised training of an existing machine learning model using various machine learning methods and training samples. In general, the initial deep network model may be a generative antagonism network (Generative Adversarial Networks, GAN). The GAN may include a Generative Model (Generative Model) and a discriminant Model (Discriminative Model). The generated model is mainly used for learning the real image distribution so that the self-generated image is more real, and the discrimination model is deceived. The discrimination model needs to discriminate the received image. In the whole process, the generated model is striven to make the generated image more real, and the judging model is striven to identify the true and false of the image, the process is equivalent to a two-player game, the generated model and the judging model are continuously confronted along with the time, and finally, two networks achieve a dynamic balance: the generated image of the generated model is close to the real image distribution, and the discrimination model can not recognize the real and false images. At this time, for each training sample in the training sample set, the executing body may first input a sample color image in the training sample into the generating model to obtain a generated depth image; then inputting the generated depth image and the sample depth image in the training sample into a discrimination model to obtain a discrimination result; finally, parameters of the generation model and the discrimination model are adjusted based on the discrimination result. The discrimination result may be used to characterize the probability of generating a depth image and the sample depth image in the training sample being a true depth image.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating a three-dimensional image, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for generating a three-dimensional image of the present embodiment may include: an acquisition unit 501, an input unit 502, and a generation unit 503. Wherein the acquisition unit 501 is configured to acquire a target color image; the input unit 502 is configured to input the target color image into a pre-trained depth network model to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating the depth image; the generating unit 503 is configured to generate a target three-dimensional image based on the target depth image.

In the present embodiment, in the apparatus 500 for generating a three-dimensional image: the specific processes of the obtaining unit 501, the input unit 502 and the generating unit 503 and the technical effects thereof may refer to the descriptions related to step 201, step 202 and step 203 in the corresponding embodiment of fig. 2, and are not repeated herein.

In some optional implementations of the present embodiment, the deep network model is trained by: acquiring a training sample set, wherein training samples in the training sample set comprise sample color images and corresponding sample depth images; and training the initial deep network model based on the training sample set by using a machine learning method to obtain the deep network model.

In some optional implementations of the present embodiment, the initial depth network model is a generative countermeasure network, the generative countermeasure network comprising a generative model and a discriminant model; training the initial deep network model based on the training sample set by using a machine learning method to obtain a deep network model, wherein the training method comprises the following steps: for a training sample in the training sample set, inputting a sample color image in the training sample into a generating model to obtain a generating depth image; inputting the generated depth image and the sample depth image in the training sample into a discrimination model to obtain a discrimination result, wherein the discrimination result is used for representing the probability that the generated depth image and the sample depth image in the training sample are real depth images; parameters of the generation model and the discrimination model are adjusted based on the discrimination result.

In some alternative implementations of the present embodiment, the training sample set is acquired using a Kinect device or an RGB-D camera.

In some optional implementations of the present embodiment, the generating unit 503 is further configured to: and generating a target three-dimensional image by combining the target depth image and the target color image.

Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in an electronic device (e.g., server 105 of FIG. 1) for implementing an embodiment of the present application. The electronic device shown in fig. 6 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium according to the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, an input unit, and a generation unit. The names of these units do not constitute limitations on the unit itself in some cases, and the acquisition unit may also be described as "a unit that acquires a target color image", for example.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target color image; inputting the target color image into a pre-trained depth network model to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating the depth image; a target three-dimensional image is generated based on the target depth image.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. A method for generating a three-dimensional image, comprising:

acquiring a target color image;

inputting the target color image into a pre-trained depth network model to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating the depth image, and the depth network model is obtained through training by the following steps: acquiring a training sample set, wherein training samples in the training sample set comprise sample color images and corresponding sample depth images; training an initial depth network model based on the training sample set by using a machine learning method to obtain the depth network model, initializing each parameter of the initial depth network model by using different small random numbers, and registering a sample color image and a corresponding sample depth image, wherein a one-to-one correspondence exists between pixel points;

generating a target three-dimensional image based on the target depth image;

wherein the initial depth network model is a generative countermeasure network comprising a generative model and a discriminant model; and

the training the initial deep network model based on the training sample set by using the machine learning method to obtain the deep network model comprises the following steps:

for training samples in the training sample set, inputting sample color images in the training samples into the generation model to obtain a generation depth image;

inputting the generated depth image and the sample depth image in the training sample into the discrimination model to obtain a discrimination result, wherein the discrimination result is used for representing the probability that the generated depth image and the sample depth image in the training sample are real depth images;

and adjusting parameters of the generated model and the discrimination model based on the discrimination result.

2. The method of claim 1, wherein the set of training samples is acquired using a Kinect apparatus or an RGB-D camera.

3. The method of one of claims 1-2, wherein the generating a target three-dimensional image based on the target depth image comprises:

and generating the target three-dimensional image by combining the target depth image and the target color image.

4. An apparatus for generating a three-dimensional image, comprising:

an acquisition unit configured to acquire a target color image;

the input unit is configured to input the target color image into a pre-trained depth network model to obtain a target depth image corresponding to the target color image, wherein the depth network model is used for generating a depth image, and the depth network model is obtained through training by the following steps: acquiring a training sample set, wherein training samples in the training sample set comprise sample color images and corresponding sample depth images; training an initial depth network model based on the training sample set by using a machine learning method to obtain the depth network model, initializing each parameter of the initial depth network model by using different small random numbers, and registering a sample color image and a corresponding sample depth image, wherein a one-to-one correspondence exists between pixel points;

a generation unit configured to generate a target three-dimensional image based on the target depth image;

5. The apparatus of claim 4, wherein the set of training samples is acquired using a Kinect device or an RGB-D camera.

6. The apparatus according to one of claims 4-5, wherein the generating unit is further configured to:

7. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-3.

8. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-3.