CN107578435B - A kind of picture depth prediction technique and device - Google Patents

A kind of picture depth prediction technique and device Download PDF

Info

Publication number
CN107578435B
CN107578435B CN201710811182.4A CN201710811182A CN107578435B CN 107578435 B CN107578435 B CN 107578435B CN 201710811182 A CN201710811182 A CN 201710811182A CN 107578435 B CN107578435 B CN 107578435B
Authority
CN
China
Prior art keywords
network
image
depth
test
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710811182.4A
Other languages
Chinese (zh)
Other versions
CN107578435A (en
Inventor
戴琼海
刘侃
方璐
王好谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Tsinghua Berkeley Shenzhen College Preparatory Office
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua Berkeley Shenzhen College Preparatory Office filed Critical Tsinghua Berkeley Shenzhen College Preparatory Office
Priority to CN201710811182.4A priority Critical patent/CN107578435B/en
Publication of CN107578435A publication Critical patent/CN107578435A/en
Application granted granted Critical
Publication of CN107578435B publication Critical patent/CN107578435B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment of the invention discloses a kind of picture depth prediction technique and devices.Wherein method includes: to carry out depth prediction processing based on first nerves network handles processing image, generates overall depth figure;Depth prediction processing is carried out based at least one topography of at least one nervus opticus network to the image to be processed, generates at least one partial depth map;Processing is weighted to the overall depth figure and at least one described partial depth map according to default fusion weight, generates fusion depth map.The embodiment of the present invention solves the problems, such as that picture depth precision of prediction is low, complicated for operation in the prior art, realizes and carries out high accuracy depth prediction to image.

Description

A kind of picture depth prediction technique and device
Technical field
The present embodiments relate to image processing techniques more particularly to a kind of picture depth prediction technique and devices.
Background technique
Depth prediction is the FAQs of computer vision field and field of image processing, and depth information can be used for conveying 3D (Three Dimensions, three-dimensional) information, and further solve the machine visual task such as scene understanding or Object identifying.
The traditional approach of the extraction of depth information generally requires multiple input pictures, such as multi-view image, motion structure Multi-view image is used for photometric stereo and multifocal image etc..Existing method is by study 2D image and 3D rendering Between correlation, and then obtain image predetermined depth information.But true picture covers a large amount of different scenes, 2D image It is widely different between 3D rendering, cause depth prediction precision low, effect is poor.
Summary of the invention
The present invention provides a kind of picture depth prediction technique and device, carries out high accuracy depth prediction to image to realize.
In a first aspect, the embodiment of the invention provides a kind of picture depth prediction techniques, this method comprises:
Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure;
It is pre- that depth is carried out based at least one topography of at least one nervus opticus network to the image to be processed Survey processing, generates at least one partial depth map;
Processing is weighted to the overall depth figure and at least one described partial depth map according to default fusion weight, Generate fusion depth map.
Further, at least one topography based at least one nervus opticus network to the image to be processed Before progress depth prediction processing, further includes:
Feature identification is carried out to the image to be processed based on third nerve network, generates feature point image;
Local image region is determined according to the feature point image, and processing is cut out to the local image region, it is raw At at least one topography.
Further, Yu Suoshu first nerves network, the third nerve network and the nervus opticus network struction are complete Cheng Shi carries out network parameter initialization to the first nerves network, the third nerve network and the third nerve net, and First nerves network, third nerve network and the nervus opticus network after initialization are instructed according to least disadvantage function fashion Practice, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
Further, the default fusion weight is pre-set, and the setting method of the default fusion weight includes:
Nervus opticus network described in the first nerves network is subjected to depth prediction processing by test sample, generates test Overall depth figure and at least one test partial depth map;
Obtain respectively it is described test partial depth map the first test error and test overall depth figure in the test Second test error of partial depth map corresponding region;
Default fusion weight is determined according to first test error and the second test error.
Further, the first nerves network and the third nerve network form multitask neural network, described more Convolutional layer of the front of task neural network point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second The warp lamination of preset quantity, for the overall depth figure for generating the input picture, second branch to include pond Change layer and full articulamentum and generate at least one topography for being cut out processing to the input picture, wherein in institute It states convolutional layer and is connected with pond layer, normalization layer and activation primitive layer later, activation letter is connected with after the full articulamentum Several layers.
Further, the nervus opticus network includes the convolutional layer of third preset quantity and the warp of the 4th preset quantity Lamination, wherein pond layer, normalization layer and activation primitive layer are connected with after Yu Suoshu convolutional layer.
Further, the image to be processed is facial image, and the topography includes at least one of following: left eye figure Picture, eye image, nose image and mouth image.
Second aspect, the embodiment of the invention also provides a kind of picture depth prediction meanss, which includes:
Overall depth figure generation module, for carrying out depth prediction processing based on first nerves network handles processing image, Generate overall depth figure;
Partial depth map generation module, for based at least one nervus opticus network to the image to be processed at least One topography carries out depth prediction processing, generates at least one partial depth map;
Merge depth map generation module, for according to default fusion weight to the overall depth figure and it is described at least one Partial depth map is weighted processing, generates fusion depth map.
Further, device further include:
Characteristic point determining module, for based at least one nervus opticus network to the image to be processed at least one Before a topography carries out depth prediction processing, feature identification is carried out to the image to be processed based on third nerve network, Generate feature point image;
Topography's generation module, for determining local image region according to the feature point image, to the Local map As region is cut out processing, at least one topography is generated.
Further, Yu Suoshu first nerves network, the third nerve network and the nervus opticus network struction are complete Cheng Shi carries out network parameter initialization to the first nerves network, the third nerve network and the third nerve net, and First nerves network, third nerve network and the nervus opticus network after initialization are instructed according to least disadvantage function fashion Practice, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
Further, the default fusion weight is pre-set, and weight setting module includes:
Depth map determination unit, for the first nerves network and the nervus opticus network to be carried out to test sample Depth prediction processing generates test overall depth figure and at least one test partial depth map;
Test error determination unit, for obtaining the first test error of the test partial depth map respectively and testing whole In body depth map with it is described test partial depth map corresponding region the second test error;
Weight determining unit, for determining default fusion weight according to first test error and the second test error.
Further, the first nerves network and the third nerve network form multitask neural network, described more Convolutional layer of the front of task neural network point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second The warp lamination of preset quantity, for the overall depth figure for generating the input picture, second branch to include pond Change layer and full articulamentum and generate at least one topography for being cut out processing to the input picture, wherein in institute It states convolutional layer and is connected with pond layer, normalization layer and activation primitive layer later, activation letter is connected with after the full articulamentum Several layers.
Further, the nervus opticus network includes the convolutional layer of third preset quantity and the warp of the 4th preset quantity Lamination, wherein pond layer, normalization layer and activation primitive layer are connected with after Yu Suoshu convolutional layer.
Further, the image to be processed is facial image, and the topography includes at least one of following: left eye figure Picture, eye image, nose image and mouth image.
The embodiment of the present invention is by presetting neural network at least one Local map of image to be processed and image to be processed As carrying out depth prediction processing, overall depth figure and partial depth map are generated, and according to default fusion weight by overall depth figure It is weighted fusion with partial depth map, high-precision fusion depth map is generated, it is low to solve depth prediction precision in the prior art The problem of, it realizes and high accuracy depth prediction is carried out to image.
Detailed description of the invention
Fig. 1 is a kind of flow chart for picture depth prediction technique that the embodiment of the present invention one provides;
Fig. 2A is the facial image to be processed that the embodiment of the present invention one provides;
Fig. 2 B is the overall depth figure that the image to be processed that the embodiment of the present invention one provides is generated through first nerves network;
Fig. 2 C is that the feature point image that the image to be processed that the embodiment of the present invention one provides is generated through third nerve network is shown It is intended to;
Fig. 2 D is a kind of schematic diagram for multitask neural network that the embodiment of the present invention one provides;
Fig. 3 is a kind of structural schematic diagram of picture depth prediction meanss provided by Embodiment 2 of the present invention.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart for picture depth prediction technique that the embodiment of the present invention one provides, and the present embodiment is applicable In carry out high accuracy depth prediction to image automatically the case where, this method can be deep by a kind of image provided in an embodiment of the present invention Prediction meanss are spent to execute, and the mode which can be used software and/or hardware is realized.The device specifically includes:
S110, depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure.
Wherein, depth prediction processing refers to extracting the processing mode of depth information in middle image to be processed, depth information Refer in image that the actual hierarchical information of each object or far and near distance information have level if image has depth information Sense and three-dimensional sense, visual effect are preferable.
Overall depth figure refers to the image comprising depth information in image to be processed.Wherein, overall depth figure is gray scale Image characterizes the grayscale information of pixel by gray value, and illustratively, grey scale pixel value is bigger, shows that actual object is remoter, Gray value is smaller, shows that actual object is closer.
In the present embodiment, before carrying out depth prediction processing to image to be processed, image to be processed is pre-processed, Wherein, pretreatment includes amplification, diminution or the image segmentation etc. of image, and wherein image segmentation refers to deleting in image to be processed Background image, simplify the input information of first nerves network.Optionally, the image to be processed point of first nerves network is inputted Resolution is fixed, and illustratively, such as the resolution ratio of image to be processed can be 384x384.In the present embodiment, by to be processed The pretreatment of image process carries out the unification of simplified and size to image to be processed, it is useful to be conducive to first nerves network rapidly extracting Characteristic information, avoid the interference of background information.
In the present embodiment, first nerves network is that preparatory training obtains, first nerves network can be including convolutional layer, Warp lamination and pond layer, and can connection pool layer, activation primitive layer and normalization (Batch after each convolutional layer Normalization, BN) layer, wherein the order of connection of pond layer, activation primitive layer and normalization layer is without limitation.It is exemplary , the quantity of convolutional layer can be 4 layers, and warp lamination can be 5 layers, and 5 layers of warp lamination are connected to after 4 layers of convolutional layer, Activation primitive for example can be ReLU function, PReLU function or RReLU function.
S120, at least one topography progress depth based at least one nervus opticus network handles processing image are pre- Survey processing, generates at least one partial depth map.
In the present embodiment, the topological structure of nervus opticus network can be it is identical as the topological structure of first nerves network or It is different.
Optionally, nervus opticus network can be multiple, and multiple nervus opticus networks can be with phase homeomorphism knot Structure and heterogeneous networks parameter, are obtained based on different training samples.Different topographies corresponds to different nervus opticus networks.
In the present embodiment, multiple topographies can be to be generated by manually determining that local image region is cut, and can also be By the local image region of automatic identification image to be processed, and cuts and generate automatically according to recognition result.Optionally, based on the Three neural networks are cut out processing to image to be processed, generate at least one topography.
Wherein, it cuts out processing to refer to by identifying key message in image to be processed, and intercepts key message location Domain, forms topography, and topography refers to the image comprising image local information to be processed.Illustratively, if it is to be processed Image is character image, and key message can be human limbs information;If image to be processed is facial image, key message can To be human face information.
In the present embodiment, third nerve network is that preparatory training obtains, third nerve network can be including convolutional layer, Pond layer and full articulamentum, and activation primitive layer and normalization (Batch can be connected after each convolutional layer Normalization, BN) layer.Illustratively, the quantity of convolutional layer can be 4 layers, and full articulamentum can be 1 layer, and pond layer can To be after 1 layer, wherein pond layer and full articulamentum are connected to 4 layers of convolutional layer in turn, activation primitive for example can be ReLU letter Number, PReLU function or RReLU function.
Optionally, processing is cut out based on third nerve network handles processing image, generates at least one topography, Include: that feature identification is carried out based on third nerve network handles processing image, generates feature point image;It is true according to feature point image Determine local image region, processing is cut out to local image region, generates at least one topography.
In the present embodiment, the key message in image to be processed is characterized by characteristic point in feature point image, illustratively, It can be the profile for forming key message by multiple characteristic points, can also be multiple and key message is covered by characteristic point, it will The region or characteristic point overlay area that characteristic point connecting line includes determine local image region, and carry out to local image region It cuts out, generates topography.Optionally, the corresponding characteristic point of different key messages can be different, illustratively, different crucial letters Breath can be the feature point image using different colours or figure.
Optionally, image to be processed is facial image, and topography includes at least one of following: left-eye image, right eye figure Picture, nose image and mouth image.Optionally, topography can also include left eyebrow image and right eyebrow image.
Illustratively, referring to fig. 2 A, Fig. 2 B and Fig. 2 C, Fig. 2A be the embodiment of the present invention one provide face figure to be processed Picture, wherein the facial image is the facial image to be processed obtained by pretreatment;Fig. 2 B is that the embodiment of the present invention one provides The overall depth figure that is generated through first nerves network of image to be processed;Fig. 2 C is the figure to be processed that the embodiment of the present invention one provides As the feature point image schematic diagram generated through third nerve network, wherein the dot in Fig. 2 C is characterized a little, and eye area There are characteristic point in domain, nasal area, mouth region and brow region, according to features described above point, can cut to image to be processed Reason is made arrangement after due consideration, multiple topographies are generated.
Optionally, before topography being inputted nervus opticus network, the resolution ratio for adjusting topography is default differentiate Rate, illustratively, default resolution ratio can be 384x384.Correspondingly, the depth prediction based on nervus opticus network is handled The resolution recovery of the partial depth map arrived to topography initial resolution.In the present embodiment, by increasing topography Resolution ratio, be conducive to improve partial depth map precision.
Optionally, nervus opticus network includes the convolutional layer of third preset quantity and the warp lamination of the 4th preset quantity, Wherein, it is connected with pond layer after the convolutional layer, normalizes layer and activation primitive layer, wherein pond layer, activation primitive layer and returns One changes the order of connection of layer without limitation.Illustratively, third preset quantity can be 4, and the 4th preset quantity can be 5, swash Function living for example can be ReLU function, PReLU function or RReLU function.
It should be noted that step S110 can be performed simultaneously with step S120, successive sequential relationship is not present.
S130, processing is weighted to overall depth figure and at least one partial depth map according to default fusion weight, it is raw At fusion depth map.
In the present embodiment, overall depth figure and partial depth map are to characterize depth information by gray value, and according to same One gray value corresponds to identical depth value.In the present embodiment, overall depth figure and at least one partial depth map are weighted Processing refers to for the gray value of pixel each in topography gray value corresponding with overall depth figure being based on default fusion Weight is weighted, and determines fusion depth map.Wherein, the corresponding default fusion weight of different topographies can it is identical or It is different.Illustratively, left eye partial depth map and the left eye region of overall depth figure are carried out to the weighted calculation of corresponding pixel points, The mouth region of mouth partial depth map and overall depth figure is carried out to the weighted calculation of corresponding pixel points.Optionally, above-mentioned two The corresponding default fusion weight of a local gray level figure can be identical or different.
It should be noted that the precision of partial depth map is higher than the precision of corresponding region in overall depth figure.The present embodiment In, partial depth map and overall depth figure are carried out by obtaining high-precision partial depth map, and based on default fusion weight Fusion, generates high-precision fusion depth map, solves the problems, such as that depth prediction precision is low in the prior art, meanwhile, based on mind Depth prediction processing through network is end-to-end processing mode, easy to operate.
Optionally, three-dimensional image reconstruction is carried out according to fusion depth map, can be applied to video conference, visual telephone, virtual Game, recognition of face or film or cartoon making etc. are conducive to the clarity for improving later image or video production.
Optionally, default fusion weight is pre-set, and the setting method for presetting fusion weight includes: by test sample First nerves network and nervus opticus network are subjected to depth prediction processing, generate test overall depth figure and at least one test Partial depth map;It obtains respectively deep with test part in the first test error and test overall depth figure of test partial depth map Spend the second test error of figure corresponding region;Default fusion weight is determined according to the first test error and the second test error.
It is on the basis of first nerves network and nervus opticus network training are completed, test sample is defeated in the present embodiment Enter first nerves network and obtain test overall depth figure, the topography of test sample is inputted into corresponding nervus opticus network, Generate corresponding test partial depth map.The first test error of test overall depth figure is determined based on standard overall depth figure, Illustratively, each pixel is determined by the difference of standard overall depth figure and test overall depth figure corresponding pixel points gray value Error, the mean value of each pixel point tolerance is determined as the first test error.Optionally, test overall depth figure in determine with Topography corresponds to the first test error of regional area.Similarly, each test partial-depth is determined based on standard partial depth map Second test error of figure.
Optionally, the corresponding test error of the respective weights of overall depth figure and partial depth map is inversely proportional.Example Property, the method that default fusion weight is illustrated by taking left-eye image as an example, according to corresponding second test error of left-eye image and the One test error determines that the second depth error and the first depth error, such as the second depth error can be 0.1mm, the first depth Error 0.2mm, the regional area of the corresponding overall depth figure of left-eye image and the default fusion weight of left eye depth map can be 2/3 and 1/3.Optionally, the settable different fusion weight of the different zones of overall depth figure.
Wherein, it presets fusion weight and determines that illustratively, test sample quantity can be by a large amount of test sample 1900。
In the present embodiment, by determining default fusion weight according to the test error of overall depth figure and partial depth map, The weight of high-precision depth map is improved, while reducing the weight of low precision depth map, further improves the depth of fusion depth map Spend precision of prediction.
The technical solution of the present embodiment, by default neural network at least one of image to be processed and image to be processed Topography carries out depth prediction processing, generates overall depth figure and partial depth map, and will be whole according to default fusion weight Depth map and partial depth map are weighted fusion, generate high-precision fusion depth map, solve depth prediction in the prior art The low problem of precision realizes and carries out high accuracy depth prediction to image.
On the basis of the above embodiments, it before carrying out depth prediction processing to image to be processed, establishes and training the One neural network, third nerve network and nervus opticus network.Wherein: in first nerves network, third nerve network and second When neural network building is completed, network parameter initialization is carried out to first nerves network, third nerve network and third nerve net, And first nerves network, third nerve network and the nervus opticus network after initialization are carried out according to least disadvantage function fashion Training, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
Wherein, network parameter initialization refers to that initial network parameter is arranged for neural network, in the present embodiment, each nerve The initial network parameter of network is arranged according to one-dimensional gaussian profile, carries out instead of in the prior art to neural network random initial Change, is conducive to the training effectiveness for improving neural network, it is slow to avoid the neural network rate of convergence as caused by random initializtion Or the problem of can not restraining.
Optionally, the training method of first nerves network includes: by first sample image through first nerves net to be trained Network carries out depth prediction processing, generates the first training image;According to the first training image standard corresponding with first sample image Overall depth figure generates first-loss function, and the network ginseng of first nerves network to be trained is adjusted according to first-loss function Number.
In the present embodiment, first sample image for example can be a large amount of facial image, illustratively, first sample image It can be including 2500 colorized face images, wherein male's facial image 1100 is opened, and women facial image 1400 is opened, optional , first sample image is unified size.
In the present embodiment, standard overall depth figure can be it is pre-set, can also be in first nerves network training process Period extracts.For example, the depth information that information model extracts training image is extracted by predetermined depth figure, illustratively, Predetermined depth figure, which extracts information model, for example can be HourGlass model, wherein HourGlass model is previously obtained.The One loss function is the standard spy of the characteristic information and standard overall depth figure for characterizing the training image of neural network generation The inconsistent degree of reference breath, the value of first-loss function is smaller, and the robustness of first nerves network is usually better.It is exemplary , the form that mean square error (Mean Squared Error, MSE) can be used in first-loss function determines.
In the present embodiment, first-loss function is subjected to gradient anti-pass, and first nerves are adjusted according to first-loss function The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.
Optionally, the training method of nervus opticus network includes: by the second sample image through nervus opticus net to be trained Network carries out depth prediction processing, generates the second training image;According to the second training image standard corresponding with the second sample image Partial depth map generates the second loss function, and the network ginseng of nervus opticus network to be trained is adjusted according to the second loss function Number.
In the present embodiment, the second sample image is set, wherein the second sample image and first sample images match.It is exemplary , if nervus opticus network is used to carry out left-eye image depth prediction processing, the second sample image is and first sample figure As corresponding left eye topography.In the present embodiment, standard partial depth map can be it is pre-set, can also be in nervus opticus It is extracted during network training process.Second loss function is the spy for characterizing the second training image of neural network generation The inconsistent degree of reference breath and the standard feature information of standard partial depth map, the value of the second loss function is smaller, the second mind Robustness through network is usually better.Illustratively, the form that mean square error can be used in the second loss function determines.
In the present embodiment, the second loss function is subjected to gradient anti-pass, and nervus opticus is adjusted according to the second loss function The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.
Optionally, the training method of third nerve network includes: by third sample image through third nerve net to be trained Network is cut out processing, generates at least one training topography;Training topography and corresponding standard part are obtained respectively The boundary coordinate information of image;Believed according to the boundary coordinate of the boundary coordinate information of training topography and standard topography Breath, determines third loss function;The network parameter of third nerve network is adjusted according to third loss function.
In the present embodiment, third sample image can be identical as first sample image, reduce sample collection quantity.Its Middle third loss function is used to characterize boundary coordinate information and the standard topography of the training topography of neural network generation The inconsistent degree of boundary coordinate information illustratively can be and the average error value of each boundary pixel point is determined as Three loss function values.Third loss function is subjected to gradient anti-pass, and third nerve network is adjusted according to third loss function Network parameter.Optionally, network parameter includes but is not limited to weight and deviant.
Optionally, first nerves network and third nerve network form multitask neural network, illustratively, referring to figure 2D, Fig. 2 D are a kind of schematic diagrames for multitask neural network that the embodiment of the present invention one provides.The front of multitask neural network Divide the convolutional layer including the first preset quantity, for extracting the characteristic information of input picture;The rear part of multitask neural network Including the first branch and the second branch, the first branch includes the warp lamination of the second preset quantity, for generation input picture Overall depth figure, the second branch includes pond layer and full articulamentum, for being cut out processing to input picture, is generated at least One topography, wherein be connected with pond layer after the convolutional layer, normalize layer and activation primitive layer, in full articulamentum it After be connected with activation primitive layer.
In the present embodiment, depth prediction processing is carried out to image to be processed simultaneously by multitask neural network and cuts out place Reason generates overall depth figure and at least one topography, realizes and be completed at the same time multiple tasks, instead of a neural network It can be only done a task, simplify the training process of neural network.
Embodiment two
Fig. 3 is a kind of structural schematic diagram of picture depth prediction meanss provided by Embodiment 2 of the present invention, and the device is specific Include:
Overall depth figure generation module 210, for being carried out at depth prediction based on first nerves network handles processing image Reason generates overall depth figure;
Partial depth map generation module 220, for handling image at least based at least one nervus opticus network handles One topography carries out depth prediction processing, generates at least one partial depth map;
Depth map generation module 230 is merged, for the default fusion weight of basis to overall depth figure and at least one part Depth map is weighted processing, generates fusion depth map.
Optionally, described device further include:
Characteristic point determining module, for based at least one nervus opticus network to the image to be processed at least one Before a topography carries out depth prediction processing, feature identification is carried out to the image to be processed based on third nerve network, Generate feature point image;
Topography's generation module, for determining local image region according to feature point image, to local image region into Row cuts out processing, generates at least one topography, wherein the corresponding feature point image in different images region is different.
Optionally, when first nerves network, third nerve network and nervus opticus network struction are completed, to first nerves Network, third nerve network and third nerve net carry out network parameter initialization, and according to least disadvantage function fashion to initial First nerves network, third nerve network and nervus opticus network after change are trained, wherein the network parameter root of initialization It is arranged according to one-dimensional gaussian profile.
Optionally, default fusion weight is pre-set, and weight setting module includes:
Depth map determination unit, for by first nerves network and nervus opticus network processes, generating and surveying test sample Try overall depth figure and at least one test partial depth map;
Test error determination unit, the first test error and test for obtaining test partial depth map respectively are whole deep Spend the second test error in figure with test partial depth map corresponding region;
Weight determining unit, for determining default fusion weight according to the first test error and the second test error.
Optionally, first nerves network and third nerve network form multitask neural network, multitask neural network Convolutional layer of the front point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of multitask neural network point includes the first branch and the second branch, and the first branch includes the second preset quantity Warp lamination be used for pair for the overall depth figure for generating input picture, the second branch to include pond layer and full articulamentum Input picture is cut out processing, generates at least one topography, wherein pond layer, normalizing are connected with after convolutional layer Change layer and activation primitive layer, activation primitive layer is connected with after full articulamentum.
Optionally, nervus opticus network includes the convolutional layer of third preset quantity and the warp lamination of the 4th preset quantity, Wherein, pond layer, normalization layer and activation primitive layer are connected with after convolutional layer.
Optionally, image to be processed is facial image, and topography includes at least one of following: left-eye image, right eye figure Picture, nose image and mouth image.
Image provided by any embodiment of the invention can be performed in picture depth prediction meanss provided in an embodiment of the present invention Depth prediction approach has and executes the corresponding functional module of picture depth prediction technique and beneficial effect.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (10)

1. a kind of picture depth prediction technique characterized by comprising
Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure;
Feature identification is carried out to the image to be processed based on third nerve network, generates feature point image;
Local image region is determined according to the feature point image, and processing is cut out to the local image region, is generated extremely A few topography;
It is carried out at depth prediction based at least one topography of at least one nervus opticus network to the image to be processed Reason, generates at least one partial depth map;
Processing is weighted to the overall depth figure and at least one described partial depth map according to default fusion weight, is generated Merge depth map;
Wherein, the first nerves network and the third nerve network form multitask neural network, the multitask nerve Convolutional layer of the front of network point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second default The warp lamination of quantity, for generating the overall depth figure of the input picture, second branch include pond layer and Full articulamentum generates at least one topography for being cut out processing to the input picture, wherein Yu Suoshu convolution It is connected with pond layer, normalization layer and activation primitive layer after layer, activation primitive layer is connected with after the full articulamentum.
2. the method according to claim 1, wherein in the first nerves network, the third nerve network When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and third mind Network parameter initialization is carried out through net, and according to least disadvantage function fashion to first nerves network, the third mind after initialization It is trained through network and nervus opticus network, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
3. the method according to claim 1, wherein the default fusion weight be it is pre-set, it is described pre- If the setting method of fusion weight includes:
Test sample is based on the first nerves network and the nervus opticus network carries out depth prediction processing, generates test Overall depth figure and at least one test partial depth map;
It obtains respectively local with the test in the first test error and test overall depth figure of the test partial depth map Second test error of depth map corresponding region;
Default fusion weight is determined according to first test error and the second test error.
4. the method according to claim 1, wherein the nervus opticus network includes the volume of third preset quantity The warp lamination of lamination and the 4th preset quantity, wherein pond layer, normalization layer and activation are connected with after Yu Suoshu convolutional layer Function layer.
5. method according to claim 1 to 4, which is characterized in that the image to be processed is facial image, described Topography includes at least one of following: left-eye image, eye image, nose image and mouth image.
6. a kind of picture depth prediction meanss characterized by comprising
Overall depth figure generation module is generated for carrying out depth prediction processing based on first nerves network handles processing image Overall depth figure;
Characteristic point determining module generates feature for carrying out feature identification to the image to be processed based on third nerve network Point image;
Topography's generation module, for determining local image region according to the feature point image, to the area, topography Domain is cut out processing, generates at least one topography;
Partial depth map generation module, for based at least one nervus opticus network to the image to be processed at least one Topography carries out depth prediction processing, generates at least one partial depth map;
Depth map generation module is merged, for the default fusion weight of basis to the overall depth figure and at least one described part Depth map is weighted processing, generates fusion depth map;
Wherein, the first nerves network and the third nerve network form multitask neural network, the multitask nerve Convolutional layer of the front of network point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second default The warp lamination of quantity, for generating the overall depth figure of the input picture, second branch include pond layer and Full articulamentum generates at least one topography for being cut out processing to the input picture, wherein Yu Suoshu convolution It is connected with pond layer, normalization layer and activation primitive layer after layer, activation primitive layer is connected with after the full articulamentum.
7. device according to claim 6, which is characterized in that Yu Suoshu first nerves network, the third nerve network When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and third mind Network parameter initialization is carried out through net, and according to least disadvantage function fashion to first nerves network, the third mind after initialization It is trained through network and nervus opticus network, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
8. device according to claim 6, which is characterized in that the default fusion weight be it is pre-set, weight is set Setting module includes:
Depth map determination unit, for the first nerves network and the nervus opticus network to be carried out depth by test sample Prediction processing generates test overall depth figure and at least one test partial depth map;
Test error determination unit, the first test error and test for obtaining the test partial depth map respectively are whole deep Spend the second test error in figure with the test partial depth map corresponding region;
Weight determining unit, for determining default fusion weight according to first test error and the second test error.
9. device according to claim 6, which is characterized in that the nervus opticus network includes the volume of third preset quantity The warp lamination of lamination and the 4th preset quantity, wherein pond layer, normalization layer and activation are connected with after Yu Suoshu convolutional layer Function layer.
10. according to any device of claim 6-9, which is characterized in that the image to be processed is facial image, described Topography includes at least one of following: left-eye image, eye image, nose image and mouth image.
CN201710811182.4A 2017-09-11 2017-09-11 A kind of picture depth prediction technique and device Active CN107578435B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710811182.4A CN107578435B (en) 2017-09-11 2017-09-11 A kind of picture depth prediction technique and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710811182.4A CN107578435B (en) 2017-09-11 2017-09-11 A kind of picture depth prediction technique and device

Publications (2)

Publication Number Publication Date
CN107578435A CN107578435A (en) 2018-01-12
CN107578435B true CN107578435B (en) 2019-11-29

Family

ID=61033100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710811182.4A Active CN107578435B (en) 2017-09-11 2017-09-11 A kind of picture depth prediction technique and device

Country Status (1)

Country Link
CN (1) CN107578435B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108335322B (en) 2018-02-01 2021-02-12 深圳市商汤科技有限公司 Depth estimation method and apparatus, electronic device, program, and medium
CN109191514B (en) * 2018-10-23 2020-11-24 北京字节跳动网络技术有限公司 Method and apparatus for generating a depth detection model
CN109829886A (en) * 2018-12-25 2019-05-31 苏州江奥光电科技有限公司 A kind of pcb board defect inspection method based on depth information
CN110309706B (en) * 2019-05-06 2023-05-12 深圳华付技术股份有限公司 Face key point detection method and device, computer equipment and storage medium
CN110363296B (en) * 2019-06-28 2022-02-08 腾讯医疗健康(深圳)有限公司 Task model obtaining method and device, storage medium and electronic device
CN111428859A (en) * 2020-03-05 2020-07-17 北京三快在线科技有限公司 Depth estimation network training method and device for automatic driving scene and autonomous vehicle
CN111414923B (en) * 2020-03-05 2022-07-12 南昌航空大学 Indoor scene three-dimensional reconstruction method and system based on single RGB image
CN112488104B (en) * 2020-11-30 2024-04-09 华为技术有限公司 Depth and confidence estimation system
CN116721143B (en) * 2023-08-04 2023-10-20 南京诺源医疗器械有限公司 Depth information processing device and method for 3D medical image

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177440A (en) * 2012-12-20 2013-06-26 香港应用科技研究院有限公司 System and method of generating image depth map
CN106204522A (en) * 2015-05-28 2016-12-07 奥多比公司 The combined depth of single image is estimated and semantic tagger
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177440A (en) * 2012-12-20 2013-06-26 香港应用科技研究院有限公司 System and method of generating image depth map
CN106204522A (en) * 2015-05-28 2016-12-07 奥多比公司 The combined depth of single image is estimated and semantic tagger
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture;David Eigen 等;《Proc.IEEE ICCV》;20151231;2650-2658 *
Single-View and Multi-View Depth Fusion;Jose M. Facil 等;《IEEE ROBOTICS AND AUTOMATION LETTERS》;20170615;第2卷(第4期);1994-2001 *

Also Published As

Publication number Publication date
CN107578435A (en) 2018-01-12

Similar Documents

Publication Publication Date Title
CN107578435B (en) A kind of picture depth prediction technique and device
CN109255831B (en) Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning
CN110222628A (en) A kind of face restorative procedure based on production confrontation network
CN108495110A (en) A kind of virtual visual point image generating method fighting network based on production
CN110807364B (en) Modeling and capturing method and system for three-dimensional face and eyeball motion
CN107609383A (en) 3D face identity authentications and device
JP2008535116A (en) Method and apparatus for three-dimensional rendering
CN107633165A (en) 3D face identity authentications and device
CN107748869A (en) 3D face identity authentications and device
CN103608847B (en) A kind of method and apparatus built for iconic model
CN109978984A (en) Face three-dimensional rebuilding method and terminal device
CN110197462A (en) A kind of facial image beautifies in real time and texture synthesis method
CN116109798B (en) Image data processing method, device, equipment and medium
EP1150254A3 (en) Methods for creating an image for a three-dimensional display, for calculating depth information, and for image processing using the depth information
KR101759188B1 (en) the automatic 3D modeliing method using 2D facial image
CN109598210A (en) A kind of image processing method and device
CN110909634A (en) Visible light and double infrared combined rapid in vivo detection method
CN111833236A (en) Method and device for generating three-dimensional face model simulating user
CN110175505A (en) Determination method, apparatus, storage medium and the electronic device of micro- expression type
CN106909904B (en) Human face obverse method based on learnable deformation field
Beacco et al. Automatic 3d character reconstruction from frontal and lateral monocular 2d rgb views
CN109218706A (en) A method of 3 D visual image is generated by single image
CN105872516A (en) Method and device for obtaining parallax parameters of three-dimensional film source
CN110602476B (en) Hole filling method of Gaussian mixture model based on depth information assistance
CN116630508A (en) 3D model processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221123

Address after: 518000 2nd floor, building a, Tsinghua campus, Shenzhen University Town, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen International Graduate School of Tsinghua University

Address before: 518000 Nanshan Zhiyuan 1001, Xue Yuan Avenue, Nanshan District, Shenzhen, Guangdong.

Patentee before: TSINGHUA-BERKELEY SHENZHEN INSTITUTE