CN107578435A

CN107578435A - A kind of picture depth Forecasting Methodology and device

Info

Publication number: CN107578435A
Application number: CN201710811182.4A
Authority: CN
Inventors: 戴琼海; 刘侃; 方璐; 王好谦
Original assignee: Tsinghua Berkeley Shenzhen College Preparatory Office
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2018-01-12
Anticipated expiration: 2037-09-11
Also published as: CN107578435B

Abstract

The embodiment of the invention discloses a kind of picture depth Forecasting Methodology and device.Wherein method includes：Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure；Depth prediction processing is carried out at least one topography of the pending image based at least one nervus opticus network, generates at least one partial depth map；Processing, generation fusion depth map are weighted to the overall depth figure and at least one partial depth map according to default fusion weight.The embodiment of the present invention solves the problems, such as that picture depth precision of prediction is low in the prior art, complex operation, realizes and carries out high accuracy depth prediction to image.

Description

A kind of picture depth Forecasting Methodology and device

Technical field

The present embodiments relate to image processing techniques, more particularly to a kind of picture depth Forecasting Methodology and device.

Background technology

Depth prediction is the FAQs of computer vision field and image processing field, and depth information can be used for passing on 3D (Three Dimensions, three-dimensional) information, and further solve the machine visual tasks such as scene understanding or Object identifying.

The traditional approach of the extraction of depth information generally requires multiple input pictures, such as multi-view image, motion structure Multi-view image or for photometric stereo and multifocal image etc..Existing method is by learning 2D images and 3D rendering Between correlation, and then obtain image predetermined depth information.But true picture covers substantial amounts of different scenes, 2D images It is widely different between 3D rendering, cause that depth prediction precision is low, and effect is poor.

The content of the invention

The present invention provides a kind of picture depth Forecasting Methodology and device, and high accuracy depth prediction is carried out to image to realize.

In a first aspect, the embodiments of the invention provide a kind of picture depth Forecasting Methodology, this method includes：

Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure；

It is pre- that depth is carried out at least one topography of the pending image based at least one nervus opticus network Survey is handled, and generates at least one partial depth map；

Processing is weighted to the overall depth figure and at least one partial depth map according to default fusion weight, Generation fusion depth map.

Further, at least one topography based at least one nervus opticus network to the pending image Before carrying out depth prediction processing, in addition to：

Feature recognition is carried out to the pending image based on third nerve network, generates feature dot image；

Local image region is determined according to the feature dot image, processing is cut out to the local image region, it is raw Into at least one topography.

Further, it is complete in the first nerves network, the third nerve network and the nervus opticus network struction Cheng Shi, network parameter initialization is carried out to the first nerves network, the third nerve network and the third nerve net, and The first nerves network after initialization, third nerve network and nervus opticus network are instructed according to least disadvantage function fashion Practice, wherein, the network parameter of initialization is set according to one-dimensional gaussian profile.

Further, the default fusion weight is pre-set, and the method to set up of the default fusion weight includes：

Nervus opticus network described in the first nerves network is subjected to depth prediction processing, generation test by test sample Overall depth figure and at least one test partial depth map；

Obtain respectively it is described test partial depth map the first test error and test overall depth figure in the test Second test error of partial depth map corresponding region；

Default fusion weight is determined according to first test error and the second test error.

Further, the first nerves network and third nerve network composition multitask neutral net are described more The forward part of task neutral net includes the convolutional layer of the first predetermined number, for extracting the characteristic information of input picture；

The rear portion of the multitask neutral net point includes the first branch and the second branch, and first branch includes second The warp lamination of predetermined number, for the overall depth figure to generating the input picture, second branch includes pond Change layer and full articulamentum, for being cut out processing to the input picture, generate at least one topography, wherein, in institute State convolutional layer and be connected with pond layer, normalization layer and activation primitive layer afterwards, activation letter is connected with after the full articulamentum Several layers.

Further, the nervus opticus network includes the convolutional layer of the 3rd predetermined number and the warp of the 4th predetermined number Lamination, wherein, pond layer, normalization layer and activation primitive layer are connected with after the convolutional layer.

Further, the pending image is facial image, and the topography includes at least one of following：Left eye figure Picture, eye image, nose image and face image.

Second aspect, the embodiment of the present invention additionally provide a kind of picture depth prediction meanss, and the device includes：

Overall depth figure generation module, for carrying out depth prediction processing based on first nerves network handles processing image, Generate overall depth figure；

Partial depth map generation module, for based at least one nervus opticus network to the pending image at least One topography carries out depth prediction processing, generates at least one partial depth map；

Merge depth map generation module, for according to default fusion weight to the overall depth figure and described at least one Partial depth map is weighted processing, generation fusion depth map.

Further, device also includes：

Characteristic point determining module, for based at least one nervus opticus network to the pending image at least one Before individual topography carries out depth prediction processing, feature recognition is carried out to the pending image based on third nerve network, Generate feature dot image；

Topography's generation module, for determining local image region according to the feature dot image, to the Local map As region is cut out processing, at least one topography is generated.

Further, the default fusion weight is pre-set, and weight setting module includes：

Depth map determining unit, for test sample to be carried out the first nerves network and the nervus opticus network Depth prediction processing, generation test overall depth figure and at least one test partial depth map；

Test error determining unit, for obtaining the first test error of the test partial depth map respectively and testing whole In body depth map with the second test error of the test partial depth map corresponding region；

Weight determining unit, for determining default fusion weight according to first test error and the second test error.

The embodiment of the present invention is by default neutral net to pending image and at least one Local map of pending image As carrying out depth prediction processing, overall depth figure and partial depth map are generated, and according to the default weight that merges by overall depth figure Fusion is weighted with partial depth map, high accuracy fusion depth map is generated, it is low to solve depth prediction precision in the prior art The problem of, realize and high accuracy depth prediction is carried out to image.

Brief description of the drawings

Fig. 1 is a kind of flow chart for picture depth Forecasting Methodology that the embodiment of the present invention one provides；

Fig. 2A is the pending facial image that the embodiment of the present invention one provides；

Fig. 2 B are the overall depth figures that the pending image that the embodiment of the present invention one provides generates through first nerves network；

Fig. 2 C are that the feature dot image that the pending image that the embodiment of the present invention one provides generates through third nerve network is shown It is intended to；

Fig. 2 D are a kind of schematic diagrames for multitask neutral net that the embodiment of the present invention one provides；

Fig. 3 is a kind of structural representation for picture depth prediction meanss that the embodiment of the present invention two provides.

Embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.

Embodiment one

Fig. 1 is a kind of flow chart for picture depth Forecasting Methodology that the embodiment of the present invention one provides, and the present embodiment is applicable In the situation for carrying out high accuracy depth prediction to image automatically, this method can be deep by a kind of image provided in an embodiment of the present invention Prediction meanss are spent to perform, and the device can be realized by the way of software and/or hardware.The device specifically includes：

S110, image progress depth prediction processing is handled based on first nerves network handles, generate overall depth figure.

Wherein, depth prediction processing refers to extracting the processing mode of depth information in pending middle image, depth information The hierarchical information or far and near distance information that each object is actual in image are referred to, if image has depth information, with level Sense and third dimension, visual effect are preferable.

Overall depth figure refers to including the image of depth information in pending image.Wherein, overall depth figure is gray scale Image, the half-tone information of pixel is characterized by gray value, exemplary, grey scale pixel value is bigger, shows that actual object is more remote, Gray value is smaller, shows that actual object is nearer.

In the present embodiment, before depth prediction processing is carried out to pending image, pending image is pre-processed, Wherein, pretreatment includes amplification, diminution or image segmentation of image etc., and wherein image refers to deleting in pending image Background image, simplify the input information of first nerves network.Optionally, the pending image point of first nerves network is inputted Resolution is fixed, exemplary, such as the resolution ratio of pending image can be 384x384.In the present embodiment, by pending Image process is pre-processed, and pending image is simplified and size is unified, it is useful to be advantageous to first nerves network rapid extraction Characteristic information, avoid the interference of background information.

In the present embodiment, first nerves network is that training in advance obtains, first nerves network can be include convolutional layer, Warp lamination and pond layer, and can connection pool layer, activation primitive layer and normalization (Batch after each convolutional layer Normalization, BN) layer, wherein, the order of connection of pond layer, activation primitive layer and normalization layer does not limit.It is exemplary , the quantity of convolutional layer can be 4 layers, and warp lamination can be 5 layers, and 5 layers of warp lamination are connected to after 4 layers of convolutional layer, Activation primitive for example can be ReLU functions, PReLU functions or RReLU functions.

S120, at least one topography progress depth based at least one nervus opticus network handles processing image are pre- Survey is handled, and generates at least one partial depth map.

In the present embodiment, the topological structure of nervus opticus network can be it is identical with the topological structure of first nerves network or It is different.

Optionally, nervus opticus network can be multiple, and multiple nervus opticus networks can be with phase homeomorphism knot Structure and heterogeneous networks parameter, are obtained based on different training samples.Different topographies corresponds to different nervus opticus networks.

In the present embodiment, multiple topographies can be by manually determine local image region cut generation or By the local image region of the pending image of automatic identification, and cut and generate automatically according to recognition result.Optionally, based on Three neutral nets are cut out processing to pending image, generate at least one topography.

Wherein, cut out processing to refer to by identifying key message in pending image, and intercept key message location Domain, forms topography, and topography refers to including the image of pending image local information.Exemplary, if pending Image is character image, and key message can be human limbs information；If pending image is facial image, key message can To be human face information.

In the present embodiment, third nerve network is that training in advance obtains, third nerve network can be include convolutional layer, Pond layer and full articulamentum, and activation primitive layer and normalization (Batch can be connected after each convolutional layer Normalization, BN) layer.Exemplary, the quantity of convolutional layer can be 4 layers, and full articulamentum can be 1 layer, and pond layer can To be after 1 layer, wherein pond layer and full articulamentum are connected to 4 layers of convolutional layer in turn, activation primitive for example can be ReLU letters Number, PReLU functions or RReLU functions.

Optionally, processing is cut out based on third nerve network handles processing image, generates at least one topography, Including：Feature recognition is carried out based on third nerve network handles processing image, generates feature dot image；It is true according to feature dot image Determine local image region, processing is cut out to local image region, generate at least one topography.

In the present embodiment, the key message in pending image is characterized by characteristic point in feature dot image, it is exemplary, It can be the profile that key message is formed by multiple characteristic points, can also be multiple and key message is covered by characteristic point, will The region or characteristic point overlay area that characteristic point connecting line includes determine local image region, and local image region is carried out Cut out, generate topography.Optionally, characteristic point corresponding to different key messages can be different, exemplary, different crucial letters Breath can be the feature dot image using different colours or figure.

Optionally, pending image is facial image, and topography includes at least one of following：Left-eye image, right eye figure Picture, nose image and face image.Optionally, topography can also include left eyebrow image and right eyebrow image.

Exemplary, it is the pending face figure that the embodiment of the present invention one provides referring to Fig. 2A, Fig. 2 B and Fig. 2 C, Fig. 2A Picture, wherein, the facial image is the pending facial image obtained by pretreatment；Fig. 2 B are that the embodiment of the present invention one provides The overall depth figure that is generated through first nerves network of pending image；Fig. 2 C are the pending figures that the embodiment of the present invention one provides As the feature dot image schematic diagram generated through third nerve network, wherein, the dot in Fig. 2 C is characterized a little, and eye area There is characteristic point domain, nasal area, face region and brow region, and according to features described above point, pending image can be cut Reason is made arrangement after due consideration, generates multiple topographies.

Optionally, before topography being inputted into nervus opticus network, the resolution ratio for adjusting topography is differentiated to be default Rate, exemplary, default resolution ratio can be 384x384.Accordingly, the depth prediction based on nervus opticus network is handled The resolution recovery of the partial depth map arrived to topography initial resolution.In the present embodiment, by increasing topography Resolution ratio, be advantageous to improve partial depth map precision.

Optionally, nervus opticus network includes the convolutional layer of the 3rd predetermined number and the warp lamination of the 4th predetermined number, Wherein, pond layer is connected with after the convolutional layer, normalizes layer and activation primitive layer, wherein, pond layer, activation primitive layer and return The order of connection of one change layer does not limit.Exemplary, the 3rd predetermined number can be 4, and the 4th predetermined number can be 5, swash Function living for example can be ReLU functions, PReLU functions or RReLU functions.

It should be noted that step S110 can be performed simultaneously with step S120, in the absence of priority sequential relationship.

The default fusion weight of S130, basis is weighted processing to overall depth figure and at least one partial depth map, raw Into fusion depth map.

In the present embodiment, overall depth figure and partial depth map are to characterize depth information by gray value, and according to same One gray value corresponds to identical depth value.In the present embodiment, overall depth figure and at least one partial depth map are weighted Processing refers to the gray value corresponding with overall depth figure of the gray value of each pixel in topography being based on default fusion Weight is weighted, it is determined that fusion depth map.Wherein, preset corresponding to different topographies fusion weight can with identical or It is different.It is exemplary, left eye partial depth map and the left eye region of overall depth figure are carried out to the weighted calculation of corresponding pixel points, The face region of face partial depth map and overall depth figure is carried out to the weighted calculation of corresponding pixel points.Optionally, above-mentioned two Fusion weight is preset corresponding to individual local gray level figure can be with identical or different.

It should be noted that the precision of partial depth map is higher than the precision of corresponding region in overall depth figure.The present embodiment In, partial depth map and overall depth figure are carried out by obtaining high-precision partial depth map, and based on default fusion weight Fusion, generates high-precision fusion depth map, solves the problems, such as that depth prediction precision is low in the prior art, meanwhile, based on god Depth prediction processing through network is end-to-end processing mode, simple to operate.

Optionally, three-dimensional image reconstruction is carried out according to fusion depth map, can be applied to video conference, visual telephone, virtual Game, recognition of face or film or cartoon making etc., be advantageous to improve later image or the definition of video production.

Optionally, preset fusion weight to pre-set, the method to set up of default fusion weight includes：By test sample First nerves network and nervus opticus network are subjected to depth prediction processing, generation test overall depth figure and at least one test Partial depth map；Obtain respectively local deep with test in the first test error and test overall depth figure of test partial depth map Spend the second test error of figure corresponding region；Default fusion weight is determined according to the first test error and the second test error.

It is on the basis of first nerves network and nervus opticus network training are completed, test sample is defeated in the present embodiment Enter first nerves network to obtain testing overall depth figure, by nervus opticus network corresponding to the input of the topography of test sample, Test partial depth map corresponding to generation.The first test error of test overall depth figure is determined based on standard overall depth figure, Exemplary, each pixel is determined by the difference of standard overall depth figure and test overall depth figure corresponding pixel points gray value Error, the average of each pixel point tolerance is defined as the first test error.Optionally, in overall depth figure is tested determine with Topography corresponds to the first test error of regional area.Similarly, each test partial-depth is determined based on standard partial depth map Second test error of figure.

Optionally, the corresponding test error of the respective weights of overall depth figure and partial depth map is inversely proportional.Example Property, the method that default fusion weight is illustrated by taking left-eye image as an example, the second test error and the according to corresponding to left-eye image One test error determines the second depth error and the first depth error, such as the second depth error can be 0.1mm, the first depth The default fusion weight of error 0.2mm, the regional area of overall depth figure corresponding to left-eye image and left eye depth map can be 2/3 and 1/3.Optionally, different fusion weights can be set in the different zones of overall depth figure.

Wherein, preset fusion weight and determine that exemplary, test sample quantity can be by substantial amounts of test sample 1900。

In the present embodiment, by determining default fusion weight according to the test error of overall depth figure and partial depth map, The weight of high-precision depth map is improved, while reduces the weight of low precision depth map, further increases the depth of fusion depth map Spend precision of prediction.

The technical scheme of the present embodiment, by presetting neutral net at least one of pending image and pending image Topography carries out depth prediction processing, generates overall depth figure and partial depth map, and will be overall according to default fusion weight Depth map and partial depth map are weighted fusion, generate high accuracy fusion depth map, solve depth prediction in the prior art The problem of precision is low, realizes and high accuracy depth prediction is carried out to image.

On the basis of above-described embodiment, before depth prediction processing is carried out to pending image, establish and train the One neutral net, third nerve network and nervus opticus network.Wherein：In first nerves network, third nerve network and second When neutral net structure is completed, network parameter initialization is carried out to first nerves network, third nerve network and third nerve net, And the first nerves network after initialization, third nerve network and nervus opticus network are carried out according to least disadvantage function fashion Training, wherein, the network parameter of initialization is set according to one-dimensional gaussian profile.

Wherein, network parameter initialization refers to setting initial network parameter, in the present embodiment, each nerve for neutral net The initial network parameter of network is set according to one-dimensional gaussian profile, instead of random to neutral net progress initial in the prior art Change, be advantageous to improve the training effectiveness of neutral net, avoid because neutral net rate of convergence is slow caused by random initializtion Or the problem of can not restraining.

Optionally, the training method of first nerves network includes：By first sample image through first nerves net to be trained Network carries out depth prediction processing, generates the first training image；According to the first training image standard corresponding with first sample image Overall depth figure, first-loss function is generated, the network ginseng of first nerves network to be trained is adjusted according to first-loss function Number.

In the present embodiment, first sample image for example can be substantial amounts of facial image, exemplary, first sample image Can include 2500 colorized face images, wherein male's facial image 1100 is opened, and women facial image 1400 is opened, optional , first sample image is unified size.

In the present embodiment, standard overall depth figure can be pre-set, also can be in first nerves network training process Period is extracted.For example, the depth information of information model extraction training image is extracted by predetermined depth figure, it is exemplary, Predetermined depth figure extraction information model for example can be HourGlass models, wherein, HourGlass models are previously obtained.The One loss function is special for characterizing the standard of the characteristic information of the training image of neutral net generation and standard overall depth figure The inconsistent degree of reference breath, the value of first-loss function is smaller, and the robustness of first nerves network is generally better.It is exemplary , first-loss function can use the form of mean square error (Mean Squared Error, MSE) to determine.

In the present embodiment, first-loss function is subjected to gradient anti-pass, and first nerves is adjusted according to first-loss function The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.

Optionally, the training method of nervus opticus network includes：By the second sample image through nervus opticus net to be trained Network carries out depth prediction processing, generates the second training image；According to the second training image standard corresponding with the second sample image Partial depth map, the second loss function is generated, the network ginseng of nervus opticus network to be trained is adjusted according to the second loss function Number.

In the present embodiment, the second sample image is set, wherein the second sample image and first sample images match.It is exemplary , if nervus opticus network is used to carry out left-eye image depth prediction processing, the second sample image is and first sample figure The left eye topography as corresponding to.In the present embodiment, standard partial depth map can be pre-set, also can be in nervus opticus Extracted during network training process.Second loss function is the spy for characterizing the second training image of neutral net generation Reference ceases the inconsistent degree with the standard feature information of standard partial depth map, and the value of the second loss function is smaller, the second god Robustness through network is generally better.Exemplary, the second loss function can be determined in the form of mean square error.

In the present embodiment, the second loss function is subjected to gradient anti-pass, and nervus opticus is adjusted according to the second loss function The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.

Optionally, the training method of third nerve network includes：By the 3rd sample image through third nerve net to be trained Network is cut out processing, generates at least one training topography；Training topography is obtained respectively and corresponding standard is local The boundary coordinate information of image；Believed according to the boundary coordinate information of training topography and the boundary coordinate of standard topography Breath, determines the 3rd loss function；The network parameter of third nerve network is adjusted according to the 3rd loss function.

In the present embodiment, the 3rd sample image can be identical with first sample image, reduce sample collection quantity.Its In the 3rd loss function boundary coordinate information and the standard topography of the training topography that are used to characterizing neutral net generation Boundary coordinate information inconsistent degree, it is exemplary, can be that the average error value of each boundary pixel point is defined as Three loss function values.3rd loss function is subjected to gradient anti-pass, and third nerve network is adjusted according to the 3rd loss function Network parameter.Optionally, network parameter includes but is not limited to weight and deviant.

Optionally, first nerves network and third nerve network composition multitask neutral net, it is exemplary, referring to figure 2D, Fig. 2 D are a kind of schematic diagrames for multitask neutral net that the embodiment of the present invention one provides.The front portion of multitask neutral net Dividing includes the convolutional layer of the first predetermined number, for extracting the characteristic information of input picture；The rear part of multitask neutral net Including the first branch and the second branch, the first branch includes the warp lamination of the second predetermined number, for generating input picture Overall depth figure, the second branch includes pond layer and full articulamentum, and for being cut out processing to input picture, generation is at least One topography, wherein, pond layer is connected with after the convolutional layer, normalizes layer and activation primitive layer, in full articulamentum it After be connected with activation primitive layer.

In the present embodiment, depth prediction processing is carried out to pending image simultaneously by multitask neutral net and cuts out place Reason, overall depth figure and at least one topography are generated, realizes while completes multiple tasks, instead of a neutral net A task is can be only done, simplifies the training process of neutral net.

Embodiment two

Fig. 3 is a kind of structural representation for picture depth prediction meanss that the embodiment of the present invention two provides, and the device is specific Including：

Overall depth figure generation module 210, for being carried out based on first nerves network handles processing image at depth prediction Reason, generate overall depth figure；

Partial depth map generation module 220, for handling image at least based at least one nervus opticus network handles One topography carries out depth prediction processing, generates at least one partial depth map；

Depth map generation module 230 is merged, for the default fusion weight of basis to overall depth figure and at least one part Depth map is weighted processing, generation fusion depth map.

Optionally, described device also includes：

Topography's generation module, for determining local image region according to feature dot image, local image region is entered Row cuts out processing, generates at least one topography, wherein, feature dot image corresponding to different images region is different.

Optionally, when first nerves network, third nerve network and nervus opticus network struction are completed, to first nerves Network, third nerve network and third nerve net carry out network parameter initialization, and according to least disadvantage function fashion to initial First nerves network, third nerve network and nervus opticus network after change are trained, wherein, the network parameter root of initialization Set according to one-dimensional gaussian profile.

Optionally, preset fusion weight to pre-set, weight setting module includes：

Depth map determining unit, for test sample to be surveyed first nerves network and nervus opticus network processes, generation Try overall depth figure and at least one test partial depth map；

Test error determining unit, the first test error and test for obtaining test partial depth map respectively are overall deep Spend the second test error with test partial depth map corresponding region in figure；

Weight determining unit, for determining default fusion weight according to the first test error and the second test error.

Optionally, first nerves network and third nerve network composition multitask neutral net, multitask neutral net Forward part includes the convolutional layer of the first predetermined number, for extracting the characteristic information of input picture；

The rear portion of multitask neutral net point includes the first branch and the second branch, and the first branch includes the second predetermined number Warp lamination, for the overall depth figure to generating input picture, the second branch includes pond layer and full articulamentum, for pair Input picture is cut out processing, generates at least one topography, wherein, it is connected with pond layer, normalizing after convolutional layer Change layer and activation primitive layer, activation primitive layer is connected with after full articulamentum.

Optionally, nervus opticus network includes the convolutional layer of the 3rd predetermined number and the warp lamination of the 4th predetermined number, Wherein, pond layer, normalization layer and activation primitive layer are connected with after convolutional layer.

Optionally, pending image is facial image, and topography includes at least one of following：Left-eye image, right eye figure Picture, nose image and face image.

Picture depth prediction meanss provided in an embodiment of the present invention can perform the image that any embodiment of the present invention is provided Depth prediction approach, possess and perform the corresponding functional module of picture depth Forecasting Methodology and beneficial effect.

Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes, Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims

A kind of 1. picture depth Forecasting Methodology, it is characterised in that including：

Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure；

At least one topography of the pending image is carried out at depth prediction based at least one nervus opticus network Reason, generates at least one partial depth map；

Processing is weighted to the overall depth figure and at least one partial depth map according to default fusion weight, generated Merge depth map.
2. according to the method for claim 1, it is characterised in that waiting to locate to described based at least one nervus opticus network Before at least one topography progress depth prediction processing for managing image, in addition to：

Feature recognition is carried out to the pending image based on third nerve network, generates feature dot image；

Local image region is determined according to the feature dot image, processing is cut out to the local image region, generation is extremely A few topography.
3. according to the method for claim 2, it is characterised in that in the first nerves network, the third nerve network When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and the 3rd god Network parameter initialization is carried out through net, and according to least disadvantage function fashion to the first nerves network after initialization, the 3rd god It is trained through network and nervus opticus network, wherein, the network parameter of initialization is set according to one-dimensional gaussian profile.
4. according to the method for claim 1, it is characterised in that the default fusion weight is pre-set, described pre- If the method to set up of fusion weight includes：

Test sample is based on the first nerves network and the nervus opticus network carries out depth prediction processing, generation test Overall depth figure and at least one test partial depth map；

Obtain respectively local with the test in the first test error and test overall depth figure of the test partial depth map Second test error of depth map corresponding region；

Default fusion weight is determined according to first test error and the second test error.
5. according to the method for claim 2, it is characterised in that the first nerves network and the third nerve group of networks Into multitask neutral net, the forward part of the multitask neutral net includes the convolutional layer of the first predetermined number, for extracting The characteristic information of input picture；

The rear portion of the multitask neutral net point includes the first branch and the second branch, and it is default that first branch includes second The warp lamination of quantity, for the overall depth figure to generating the input picture, second branch includes pond layer With full articulamentum, for being cut out processing to the input picture, at least one topography is generated, wherein, in the volume Pond layer, normalization layer and activation primitive layer are connected with after lamination, activation primitive layer is connected with after the full articulamentum.
6. according to the method for claim 1, it is characterised in that the nervus opticus network includes the volume of the 3rd predetermined number The warp lamination of lamination and the 4th predetermined number, wherein, pond layer, normalization layer and activation are connected with after the convolutional layer Function layer.
7. according to any described methods of claim 1-6, it is characterised in that the pending image is facial image, described Topography includes at least one of following：Left-eye image, eye image, nose image and face image.
A kind of 8. picture depth prediction meanss, it is characterised in that including：

Overall depth figure generation module, for carrying out depth prediction processing, generation based on first nerves network handles processing image Overall depth figure；

Partial depth map generation module, for based at least one nervus opticus network at least one of the pending image Topography carries out depth prediction processing, generates at least one partial depth map；

Depth map generation module is merged, for the default fusion weight of basis to the overall depth figure and at least one part Depth map is weighted processing, generation fusion depth map.
9. device according to claim 8, it is characterised in that described device also includes：

Characteristic point determining module, at least one office based at least one nervus opticus network to the pending image Before portion's image carries out depth prediction processing, feature recognition, generation are carried out to the pending image based on third nerve network Feature dot image；

Topography's generation module, for determining local image region according to the feature dot image, to the area of topography Domain is cut out processing, generates at least one topography.
10. device according to claim 9, it is characterised in that in the first nerves network, the third nerve network When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and the 3rd god Network parameter initialization is carried out through net, and according to least disadvantage function fashion to the first nerves network after initialization, the 3rd god It is trained through network and nervus opticus network, wherein, the network parameter of initialization is set according to one-dimensional gaussian profile.
11. device according to claim 8, it is characterised in that the default fusion weight is pre-set, and weight is set Putting module includes：

Depth map determining unit, for test sample to be carried out into depth by the first nerves network and the nervus opticus network Prediction is handled, generation test overall depth figure and at least one test partial depth map；

Test error determining unit, the first test error and test for obtaining the test partial depth map respectively are overall deep Spend the second test error with the test partial depth map corresponding region in figure；

Weight determining unit, for determining default fusion weight according to first test error and the second test error.
12. device according to claim 9, it is characterised in that the first nerves network and the third nerve network Multitask neutral net is formed, the forward part of the multitask neutral net includes the convolutional layer of the first predetermined number, for carrying Take the characteristic information of input picture；

The rear portion of the multitask neutral net point includes the first branch and the second branch, and it is default that first branch includes second The warp lamination of quantity, for the overall depth figure to generating the input picture, second branch includes pond layer With full articulamentum, for being cut out processing to the input picture, at least one topography is generated, wherein, in the volume Pond layer, normalization layer and activation primitive layer are connected with after lamination, activation primitive layer is connected with after the full articulamentum.
13. device according to claim 8, it is characterised in that the nervus opticus network includes the 3rd predetermined number The warp lamination of convolutional layer and the 4th predetermined number, wherein, pond layer, normalization layer are connected with after the convolutional layer and is swashed Function layer living.
14. according to any described devices of claim 8-13, it is characterised in that the pending image is facial image, institute Topography is stated including at least one of following：Left-eye image, eye image, nose image and face image.