CN107578435B

CN107578435B - A kind of picture depth prediction technique and device

Info

Publication number: CN107578435B
Application number: CN201710811182.4A
Authority: CN
Inventors: 戴琼海; 刘侃; 方璐; 王好谦
Original assignee: Tsinghua Berkeley Shenzhen College Preparatory Office
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2019-11-29
Anticipated expiration: 2037-09-11
Also published as: CN107578435A

Abstract

The embodiment of the invention discloses a kind of picture depth prediction technique and devices.Wherein method includes: to carry out depth prediction processing based on first nerves network handles processing image, generates overall depth figure；Depth prediction processing is carried out based at least one topography of at least one nervus opticus network to the image to be processed, generates at least one partial depth map；Processing is weighted to the overall depth figure and at least one described partial depth map according to default fusion weight, generates fusion depth map.The embodiment of the present invention solves the problems, such as that picture depth precision of prediction is low, complicated for operation in the prior art, realizes and carries out high accuracy depth prediction to image.

Description

A kind of picture depth prediction technique and device

Technical field

The present embodiments relate to image processing techniques more particularly to a kind of picture depth prediction technique and devices.

Background technique

Depth prediction is the FAQs of computer vision field and field of image processing, and depth information can be used for conveying 3D (Three Dimensions, three-dimensional) information, and further solve the machine visual task such as scene understanding or Object identifying.

The traditional approach of the extraction of depth information generally requires multiple input pictures, such as multi-view image, motion structure Multi-view image is used for photometric stereo and multifocal image etc..Existing method is by study 2D image and 3D rendering Between correlation, and then obtain image predetermined depth information.But true picture covers a large amount of different scenes, 2D image It is widely different between 3D rendering, cause depth prediction precision low, effect is poor.

Summary of the invention

The present invention provides a kind of picture depth prediction technique and device, carries out high accuracy depth prediction to image to realize.

In a first aspect, the embodiment of the invention provides a kind of picture depth prediction techniques, this method comprises:

Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure；

It is pre- that depth is carried out based at least one topography of at least one nervus opticus network to the image to be processed Survey processing, generates at least one partial depth map；

Processing is weighted to the overall depth figure and at least one described partial depth map according to default fusion weight, Generate fusion depth map.

Further, at least one topography based at least one nervus opticus network to the image to be processed Before progress depth prediction processing, further includes:

Feature identification is carried out to the image to be processed based on third nerve network, generates feature point image；

Local image region is determined according to the feature point image, and processing is cut out to the local image region, it is raw At at least one topography.

Further, Yu Suoshu first nerves network, the third nerve network and the nervus opticus network struction are complete Cheng Shi carries out network parameter initialization to the first nerves network, the third nerve network and the third nerve net, and First nerves network, third nerve network and the nervus opticus network after initialization are instructed according to least disadvantage function fashion Practice, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.

Further, the default fusion weight is pre-set, and the setting method of the default fusion weight includes:

Nervus opticus network described in the first nerves network is subjected to depth prediction processing by test sample, generates test Overall depth figure and at least one test partial depth map；

Obtain respectively it is described test partial depth map the first test error and test overall depth figure in the test Second test error of partial depth map corresponding region；

Default fusion weight is determined according to first test error and the second test error.

Further, the first nerves network and the third nerve network form multitask neural network, described more Convolutional layer of the front of task neural network point including the first preset quantity, for extracting the characteristic information of input picture；

The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second The warp lamination of preset quantity, for the overall depth figure for generating the input picture, second branch to include pond Change layer and full articulamentum and generate at least one topography for being cut out processing to the input picture, wherein in institute It states convolutional layer and is connected with pond layer, normalization layer and activation primitive layer later, activation letter is connected with after the full articulamentum Several layers.

Further, the nervus opticus network includes the convolutional layer of third preset quantity and the warp of the 4th preset quantity Lamination, wherein pond layer, normalization layer and activation primitive layer are connected with after Yu Suoshu convolutional layer.

Further, the image to be processed is facial image, and the topography includes at least one of following: left eye figure Picture, eye image, nose image and mouth image.

Second aspect, the embodiment of the invention also provides a kind of picture depth prediction meanss, which includes:

Overall depth figure generation module, for carrying out depth prediction processing based on first nerves network handles processing image, Generate overall depth figure；

Partial depth map generation module, for based at least one nervus opticus network to the image to be processed at least One topography carries out depth prediction processing, generates at least one partial depth map；

Merge depth map generation module, for according to default fusion weight to the overall depth figure and it is described at least one Partial depth map is weighted processing, generates fusion depth map.

Further, device further include:

Characteristic point determining module, for based at least one nervus opticus network to the image to be processed at least one Before a topography carries out depth prediction processing, feature identification is carried out to the image to be processed based on third nerve network, Generate feature point image；

Topography's generation module, for determining local image region according to the feature point image, to the Local map As region is cut out processing, at least one topography is generated.

Further, the default fusion weight is pre-set, and weight setting module includes:

Depth map determination unit, for the first nerves network and the nervus opticus network to be carried out to test sample Depth prediction processing generates test overall depth figure and at least one test partial depth map；

Test error determination unit, for obtaining the first test error of the test partial depth map respectively and testing whole In body depth map with it is described test partial depth map corresponding region the second test error；

Weight determining unit, for determining default fusion weight according to first test error and the second test error.

The embodiment of the present invention is by presetting neural network at least one Local map of image to be processed and image to be processed As carrying out depth prediction processing, overall depth figure and partial depth map are generated, and according to default fusion weight by overall depth figure It is weighted fusion with partial depth map, high-precision fusion depth map is generated, it is low to solve depth prediction precision in the prior art The problem of, it realizes and high accuracy depth prediction is carried out to image.

Detailed description of the invention

Fig. 1 is a kind of flow chart for picture depth prediction technique that the embodiment of the present invention one provides；

Fig. 2A is the facial image to be processed that the embodiment of the present invention one provides；

Fig. 2 B is the overall depth figure that the image to be processed that the embodiment of the present invention one provides is generated through first nerves network；

Fig. 2 C is that the feature point image that the image to be processed that the embodiment of the present invention one provides is generated through third nerve network is shown It is intended to；

Fig. 2 D is a kind of schematic diagram for multitask neural network that the embodiment of the present invention one provides；

Fig. 3 is a kind of structural schematic diagram of picture depth prediction meanss provided by Embodiment 2 of the present invention.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

Embodiment one

Fig. 1 is a kind of flow chart for picture depth prediction technique that the embodiment of the present invention one provides, and the present embodiment is applicable In carry out high accuracy depth prediction to image automatically the case where, this method can be deep by a kind of image provided in an embodiment of the present invention Prediction meanss are spent to execute, and the mode which can be used software and/or hardware is realized.The device specifically includes:

S110, depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure.

Wherein, depth prediction processing refers to extracting the processing mode of depth information in middle image to be processed, depth information Refer in image that the actual hierarchical information of each object or far and near distance information have level if image has depth information Sense and three-dimensional sense, visual effect are preferable.

Overall depth figure refers to the image comprising depth information in image to be processed.Wherein, overall depth figure is gray scale Image characterizes the grayscale information of pixel by gray value, and illustratively, grey scale pixel value is bigger, shows that actual object is remoter, Gray value is smaller, shows that actual object is closer.

In the present embodiment, before carrying out depth prediction processing to image to be processed, image to be processed is pre-processed, Wherein, pretreatment includes amplification, diminution or the image segmentation etc. of image, and wherein image segmentation refers to deleting in image to be processed Background image, simplify the input information of first nerves network.Optionally, the image to be processed point of first nerves network is inputted Resolution is fixed, and illustratively, such as the resolution ratio of image to be processed can be 384x384.In the present embodiment, by to be processed The pretreatment of image process carries out the unification of simplified and size to image to be processed, it is useful to be conducive to first nerves network rapidly extracting Characteristic information, avoid the interference of background information.

In the present embodiment, first nerves network is that preparatory training obtains, first nerves network can be including convolutional layer, Warp lamination and pond layer, and can connection pool layer, activation primitive layer and normalization (Batch after each convolutional layer Normalization, BN) layer, wherein the order of connection of pond layer, activation primitive layer and normalization layer is without limitation.It is exemplary , the quantity of convolutional layer can be 4 layers, and warp lamination can be 5 layers, and 5 layers of warp lamination are connected to after 4 layers of convolutional layer, Activation primitive for example can be ReLU function, PReLU function or RReLU function.

S120, at least one topography progress depth based at least one nervus opticus network handles processing image are pre- Survey processing, generates at least one partial depth map.

In the present embodiment, the topological structure of nervus opticus network can be it is identical as the topological structure of first nerves network or It is different.

Optionally, nervus opticus network can be multiple, and multiple nervus opticus networks can be with phase homeomorphism knot Structure and heterogeneous networks parameter, are obtained based on different training samples.Different topographies corresponds to different nervus opticus networks.

In the present embodiment, multiple topographies can be to be generated by manually determining that local image region is cut, and can also be By the local image region of automatic identification image to be processed, and cuts and generate automatically according to recognition result.Optionally, based on the Three neural networks are cut out processing to image to be processed, generate at least one topography.

Wherein, it cuts out processing to refer to by identifying key message in image to be processed, and intercepts key message location Domain, forms topography, and topography refers to the image comprising image local information to be processed.Illustratively, if it is to be processed Image is character image, and key message can be human limbs information；If image to be processed is facial image, key message can To be human face information.

In the present embodiment, third nerve network is that preparatory training obtains, third nerve network can be including convolutional layer, Pond layer and full articulamentum, and activation primitive layer and normalization (Batch can be connected after each convolutional layer Normalization, BN) layer.Illustratively, the quantity of convolutional layer can be 4 layers, and full articulamentum can be 1 layer, and pond layer can To be after 1 layer, wherein pond layer and full articulamentum are connected to 4 layers of convolutional layer in turn, activation primitive for example can be ReLU letter Number, PReLU function or RReLU function.

Optionally, processing is cut out based on third nerve network handles processing image, generates at least one topography, Include: that feature identification is carried out based on third nerve network handles processing image, generates feature point image；It is true according to feature point image Determine local image region, processing is cut out to local image region, generates at least one topography.

In the present embodiment, the key message in image to be processed is characterized by characteristic point in feature point image, illustratively, It can be the profile for forming key message by multiple characteristic points, can also be multiple and key message is covered by characteristic point, it will The region or characteristic point overlay area that characteristic point connecting line includes determine local image region, and carry out to local image region It cuts out, generates topography.Optionally, the corresponding characteristic point of different key messages can be different, illustratively, different crucial letters Breath can be the feature point image using different colours or figure.

Optionally, image to be processed is facial image, and topography includes at least one of following: left-eye image, right eye figure Picture, nose image and mouth image.Optionally, topography can also include left eyebrow image and right eyebrow image.

Illustratively, referring to fig. 2 A, Fig. 2 B and Fig. 2 C, Fig. 2A be the embodiment of the present invention one provide face figure to be processed Picture, wherein the facial image is the facial image to be processed obtained by pretreatment；Fig. 2 B is that the embodiment of the present invention one provides The overall depth figure that is generated through first nerves network of image to be processed；Fig. 2 C is the figure to be processed that the embodiment of the present invention one provides As the feature point image schematic diagram generated through third nerve network, wherein the dot in Fig. 2 C is characterized a little, and eye area There are characteristic point in domain, nasal area, mouth region and brow region, according to features described above point, can cut to image to be processed Reason is made arrangement after due consideration, multiple topographies are generated.

Optionally, before topography being inputted nervus opticus network, the resolution ratio for adjusting topography is default differentiate Rate, illustratively, default resolution ratio can be 384x384.Correspondingly, the depth prediction based on nervus opticus network is handled The resolution recovery of the partial depth map arrived to topography initial resolution.In the present embodiment, by increasing topography Resolution ratio, be conducive to improve partial depth map precision.

Optionally, nervus opticus network includes the convolutional layer of third preset quantity and the warp lamination of the 4th preset quantity, Wherein, it is connected with pond layer after the convolutional layer, normalizes layer and activation primitive layer, wherein pond layer, activation primitive layer and returns One changes the order of connection of layer without limitation.Illustratively, third preset quantity can be 4, and the 4th preset quantity can be 5, swash Function living for example can be ReLU function, PReLU function or RReLU function.

It should be noted that step S110 can be performed simultaneously with step S120, successive sequential relationship is not present.

S130, processing is weighted to overall depth figure and at least one partial depth map according to default fusion weight, it is raw At fusion depth map.

In the present embodiment, overall depth figure and partial depth map are to characterize depth information by gray value, and according to same One gray value corresponds to identical depth value.In the present embodiment, overall depth figure and at least one partial depth map are weighted Processing refers to for the gray value of pixel each in topography gray value corresponding with overall depth figure being based on default fusion Weight is weighted, and determines fusion depth map.Wherein, the corresponding default fusion weight of different topographies can it is identical or It is different.Illustratively, left eye partial depth map and the left eye region of overall depth figure are carried out to the weighted calculation of corresponding pixel points, The mouth region of mouth partial depth map and overall depth figure is carried out to the weighted calculation of corresponding pixel points.Optionally, above-mentioned two The corresponding default fusion weight of a local gray level figure can be identical or different.

It should be noted that the precision of partial depth map is higher than the precision of corresponding region in overall depth figure.The present embodiment In, partial depth map and overall depth figure are carried out by obtaining high-precision partial depth map, and based on default fusion weight Fusion, generates high-precision fusion depth map, solves the problems, such as that depth prediction precision is low in the prior art, meanwhile, based on mind Depth prediction processing through network is end-to-end processing mode, easy to operate.

Optionally, three-dimensional image reconstruction is carried out according to fusion depth map, can be applied to video conference, visual telephone, virtual Game, recognition of face or film or cartoon making etc. are conducive to the clarity for improving later image or video production.

Optionally, default fusion weight is pre-set, and the setting method for presetting fusion weight includes: by test sample First nerves network and nervus opticus network are subjected to depth prediction processing, generate test overall depth figure and at least one test Partial depth map；It obtains respectively deep with test part in the first test error and test overall depth figure of test partial depth map Spend the second test error of figure corresponding region；Default fusion weight is determined according to the first test error and the second test error.

It is on the basis of first nerves network and nervus opticus network training are completed, test sample is defeated in the present embodiment Enter first nerves network and obtain test overall depth figure, the topography of test sample is inputted into corresponding nervus opticus network, Generate corresponding test partial depth map.The first test error of test overall depth figure is determined based on standard overall depth figure, Illustratively, each pixel is determined by the difference of standard overall depth figure and test overall depth figure corresponding pixel points gray value Error, the mean value of each pixel point tolerance is determined as the first test error.Optionally, test overall depth figure in determine with Topography corresponds to the first test error of regional area.Similarly, each test partial-depth is determined based on standard partial depth map Second test error of figure.

Optionally, the corresponding test error of the respective weights of overall depth figure and partial depth map is inversely proportional.Example Property, the method that default fusion weight is illustrated by taking left-eye image as an example, according to corresponding second test error of left-eye image and the One test error determines that the second depth error and the first depth error, such as the second depth error can be 0.1mm, the first depth Error 0.2mm, the regional area of the corresponding overall depth figure of left-eye image and the default fusion weight of left eye depth map can be 2/3 and 1/3.Optionally, the settable different fusion weight of the different zones of overall depth figure.

Wherein, it presets fusion weight and determines that illustratively, test sample quantity can be by a large amount of test sample 1900。

In the present embodiment, by determining default fusion weight according to the test error of overall depth figure and partial depth map, The weight of high-precision depth map is improved, while reducing the weight of low precision depth map, further improves the depth of fusion depth map Spend precision of prediction.

The technical solution of the present embodiment, by default neural network at least one of image to be processed and image to be processed Topography carries out depth prediction processing, generates overall depth figure and partial depth map, and will be whole according to default fusion weight Depth map and partial depth map are weighted fusion, generate high-precision fusion depth map, solve depth prediction in the prior art The low problem of precision realizes and carries out high accuracy depth prediction to image.

On the basis of the above embodiments, it before carrying out depth prediction processing to image to be processed, establishes and training the One neural network, third nerve network and nervus opticus network.Wherein: in first nerves network, third nerve network and second When neural network building is completed, network parameter initialization is carried out to first nerves network, third nerve network and third nerve net, And first nerves network, third nerve network and the nervus opticus network after initialization are carried out according to least disadvantage function fashion Training, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.

Wherein, network parameter initialization refers to that initial network parameter is arranged for neural network, in the present embodiment, each nerve The initial network parameter of network is arranged according to one-dimensional gaussian profile, carries out instead of in the prior art to neural network random initial Change, is conducive to the training effectiveness for improving neural network, it is slow to avoid the neural network rate of convergence as caused by random initializtion Or the problem of can not restraining.

Optionally, the training method of first nerves network includes: by first sample image through first nerves net to be trained Network carries out depth prediction processing, generates the first training image；According to the first training image standard corresponding with first sample image Overall depth figure generates first-loss function, and the network ginseng of first nerves network to be trained is adjusted according to first-loss function Number.

In the present embodiment, first sample image for example can be a large amount of facial image, illustratively, first sample image It can be including 2500 colorized face images, wherein male's facial image 1100 is opened, and women facial image 1400 is opened, optional , first sample image is unified size.

In the present embodiment, standard overall depth figure can be it is pre-set, can also be in first nerves network training process Period extracts.For example, the depth information that information model extracts training image is extracted by predetermined depth figure, illustratively, Predetermined depth figure, which extracts information model, for example can be HourGlass model, wherein HourGlass model is previously obtained.The One loss function is the standard spy of the characteristic information and standard overall depth figure for characterizing the training image of neural network generation The inconsistent degree of reference breath, the value of first-loss function is smaller, and the robustness of first nerves network is usually better.It is exemplary , the form that mean square error (Mean Squared Error, MSE) can be used in first-loss function determines.

In the present embodiment, first-loss function is subjected to gradient anti-pass, and first nerves are adjusted according to first-loss function The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.

Optionally, the training method of nervus opticus network includes: by the second sample image through nervus opticus net to be trained Network carries out depth prediction processing, generates the second training image；According to the second training image standard corresponding with the second sample image Partial depth map generates the second loss function, and the network ginseng of nervus opticus network to be trained is adjusted according to the second loss function Number.

In the present embodiment, the second sample image is set, wherein the second sample image and first sample images match.It is exemplary , if nervus opticus network is used to carry out left-eye image depth prediction processing, the second sample image is and first sample figure As corresponding left eye topography.In the present embodiment, standard partial depth map can be it is pre-set, can also be in nervus opticus It is extracted during network training process.Second loss function is the spy for characterizing the second training image of neural network generation The inconsistent degree of reference breath and the standard feature information of standard partial depth map, the value of the second loss function is smaller, the second mind Robustness through network is usually better.Illustratively, the form that mean square error can be used in the second loss function determines.

In the present embodiment, the second loss function is subjected to gradient anti-pass, and nervus opticus is adjusted according to the second loss function The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.

Optionally, the training method of third nerve network includes: by third sample image through third nerve net to be trained Network is cut out processing, generates at least one training topography；Training topography and corresponding standard part are obtained respectively The boundary coordinate information of image；Believed according to the boundary coordinate of the boundary coordinate information of training topography and standard topography Breath, determines third loss function；The network parameter of third nerve network is adjusted according to third loss function.

In the present embodiment, third sample image can be identical as first sample image, reduce sample collection quantity.Its Middle third loss function is used to characterize boundary coordinate information and the standard topography of the training topography of neural network generation The inconsistent degree of boundary coordinate information illustratively can be and the average error value of each boundary pixel point is determined as Three loss function values.Third loss function is subjected to gradient anti-pass, and third nerve network is adjusted according to third loss function Network parameter.Optionally, network parameter includes but is not limited to weight and deviant.

Optionally, first nerves network and third nerve network form multitask neural network, illustratively, referring to figure 2D, Fig. 2 D are a kind of schematic diagrames for multitask neural network that the embodiment of the present invention one provides.The front of multitask neural network Divide the convolutional layer including the first preset quantity, for extracting the characteristic information of input picture；The rear part of multitask neural network Including the first branch and the second branch, the first branch includes the warp lamination of the second preset quantity, for generation input picture Overall depth figure, the second branch includes pond layer and full articulamentum, for being cut out processing to input picture, is generated at least One topography, wherein be connected with pond layer after the convolutional layer, normalize layer and activation primitive layer, in full articulamentum it After be connected with activation primitive layer.

In the present embodiment, depth prediction processing is carried out to image to be processed simultaneously by multitask neural network and cuts out place Reason generates overall depth figure and at least one topography, realizes and be completed at the same time multiple tasks, instead of a neural network It can be only done a task, simplify the training process of neural network.

Embodiment two

Fig. 3 is a kind of structural schematic diagram of picture depth prediction meanss provided by Embodiment 2 of the present invention, and the device is specific Include:

Overall depth figure generation module 210, for being carried out at depth prediction based on first nerves network handles processing image Reason generates overall depth figure；

Partial depth map generation module 220, for handling image at least based at least one nervus opticus network handles One topography carries out depth prediction processing, generates at least one partial depth map；

Depth map generation module 230 is merged, for the default fusion weight of basis to overall depth figure and at least one part Depth map is weighted processing, generates fusion depth map.

Optionally, described device further include:

Topography's generation module, for determining local image region according to feature point image, to local image region into Row cuts out processing, generates at least one topography, wherein the corresponding feature point image in different images region is different.

Optionally, when first nerves network, third nerve network and nervus opticus network struction are completed, to first nerves Network, third nerve network and third nerve net carry out network parameter initialization, and according to least disadvantage function fashion to initial First nerves network, third nerve network and nervus opticus network after change are trained, wherein the network parameter root of initialization It is arranged according to one-dimensional gaussian profile.

Optionally, default fusion weight is pre-set, and weight setting module includes:

Depth map determination unit, for by first nerves network and nervus opticus network processes, generating and surveying test sample Try overall depth figure and at least one test partial depth map；

Test error determination unit, the first test error and test for obtaining test partial depth map respectively are whole deep Spend the second test error in figure with test partial depth map corresponding region；

Weight determining unit, for determining default fusion weight according to the first test error and the second test error.

Optionally, first nerves network and third nerve network form multitask neural network, multitask neural network Convolutional layer of the front point including the first preset quantity, for extracting the characteristic information of input picture；

The rear portion of multitask neural network point includes the first branch and the second branch, and the first branch includes the second preset quantity Warp lamination be used for pair for the overall depth figure for generating input picture, the second branch to include pond layer and full articulamentum Input picture is cut out processing, generates at least one topography, wherein pond layer, normalizing are connected with after convolutional layer Change layer and activation primitive layer, activation primitive layer is connected with after full articulamentum.

Optionally, nervus opticus network includes the convolutional layer of third preset quantity and the warp lamination of the 4th preset quantity, Wherein, pond layer, normalization layer and activation primitive layer are connected with after convolutional layer.

Optionally, image to be processed is facial image, and topography includes at least one of following: left-eye image, right eye figure Picture, nose image and mouth image.

Image provided by any embodiment of the invention can be performed in picture depth prediction meanss provided in an embodiment of the present invention Depth prediction approach has and executes the corresponding functional module of picture depth prediction technique and beneficial effect.

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of picture depth prediction technique characterized by comprising

Local image region is determined according to the feature point image, and processing is cut out to the local image region, is generated extremely A few topography；

It is carried out at depth prediction based at least one topography of at least one nervus opticus network to the image to be processed Reason, generates at least one partial depth map；

Processing is weighted to the overall depth figure and at least one described partial depth map according to default fusion weight, is generated Merge depth map；

Wherein, the first nerves network and the third nerve network form multitask neural network, the multitask nerve Convolutional layer of the front of network point including the first preset quantity, for extracting the characteristic information of input picture；

The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second default The warp lamination of quantity, for generating the overall depth figure of the input picture, second branch include pond layer and Full articulamentum generates at least one topography for being cut out processing to the input picture, wherein Yu Suoshu convolution It is connected with pond layer, normalization layer and activation primitive layer after layer, activation primitive layer is connected with after the full articulamentum.

2. the method according to claim 1, wherein in the first nerves network, the third nerve network When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and third mind Network parameter initialization is carried out through net, and according to least disadvantage function fashion to first nerves network, the third mind after initialization It is trained through network and nervus opticus network, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.

3. the method according to claim 1, wherein the default fusion weight be it is pre-set, it is described pre- If the setting method of fusion weight includes:

Test sample is based on the first nerves network and the nervus opticus network carries out depth prediction processing, generates test Overall depth figure and at least one test partial depth map；

It obtains respectively local with the test in the first test error and test overall depth figure of the test partial depth map Second test error of depth map corresponding region；

4. the method according to claim 1, wherein the nervus opticus network includes the volume of third preset quantity The warp lamination of lamination and the 4th preset quantity, wherein pond layer, normalization layer and activation are connected with after Yu Suoshu convolutional layer Function layer.

5. method according to claim 1 to 4, which is characterized in that the image to be processed is facial image, described Topography includes at least one of following: left-eye image, eye image, nose image and mouth image.

6. a kind of picture depth prediction meanss characterized by comprising

Overall depth figure generation module is generated for carrying out depth prediction processing based on first nerves network handles processing image Overall depth figure；

Characteristic point determining module generates feature for carrying out feature identification to the image to be processed based on third nerve network Point image；

Topography's generation module, for determining local image region according to the feature point image, to the area, topography Domain is cut out processing, generates at least one topography；

Depth map generation module is merged, for the default fusion weight of basis to the overall depth figure and at least one described part Depth map is weighted processing, generates fusion depth map；

7. device according to claim 6, which is characterized in that Yu Suoshu first nerves network, the third nerve network When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and third mind Network parameter initialization is carried out through net, and according to least disadvantage function fashion to first nerves network, the third mind after initialization It is trained through network and nervus opticus network, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.

8. device according to claim 6, which is characterized in that the default fusion weight be it is pre-set, weight is set Setting module includes:

Depth map determination unit, for the first nerves network and the nervus opticus network to be carried out depth by test sample Prediction processing generates test overall depth figure and at least one test partial depth map；

Test error determination unit, the first test error and test for obtaining the test partial depth map respectively are whole deep Spend the second test error in figure with the test partial depth map corresponding region；

9. device according to claim 6, which is characterized in that the nervus opticus network includes the volume of third preset quantity The warp lamination of lamination and the 4th preset quantity, wherein pond layer, normalization layer and activation are connected with after Yu Suoshu convolutional layer Function layer.

10. according to any device of claim 6-9, which is characterized in that the image to be processed is facial image, described Topography includes at least one of following: left-eye image, eye image, nose image and mouth image.