CN107578435A - A kind of picture depth Forecasting Methodology and device - Google Patents
A kind of picture depth Forecasting Methodology and device Download PDFInfo
- Publication number
- CN107578435A CN107578435A CN201710811182.4A CN201710811182A CN107578435A CN 107578435 A CN107578435 A CN 107578435A CN 201710811182 A CN201710811182 A CN 201710811182A CN 107578435 A CN107578435 A CN 107578435A
- Authority
- CN
- China
- Prior art keywords
- network
- image
- depth
- test
- depth map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 210000005036 nerve Anatomy 0.000 claims abstract description 91
- 238000012545 processing Methods 0.000 claims abstract description 66
- 238000012876 topography Methods 0.000 claims abstract description 59
- 230000004927 fusion Effects 0.000 claims abstract description 47
- 238000012360 testing method Methods 0.000 claims description 79
- 230000007935 neutral effect Effects 0.000 claims description 35
- 230000004913 activation Effects 0.000 claims description 23
- 238000003475 lamination Methods 0.000 claims description 18
- 238000010606 normalization Methods 0.000 claims description 14
- 230000001815 facial effect Effects 0.000 claims description 13
- 230000006870 function Effects 0.000 description 33
- 238000012549 training Methods 0.000 description 23
- 210000004218 nerve net Anatomy 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 210000004709 eyebrow Anatomy 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
Landscapes
- Image Processing (AREA)
Abstract
The embodiment of the invention discloses a kind of picture depth Forecasting Methodology and device.Wherein method includes:Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure;Depth prediction processing is carried out at least one topography of the pending image based at least one nervus opticus network, generates at least one partial depth map;Processing, generation fusion depth map are weighted to the overall depth figure and at least one partial depth map according to default fusion weight.The embodiment of the present invention solves the problems, such as that picture depth precision of prediction is low in the prior art, complex operation, realizes and carries out high accuracy depth prediction to image.
Description
Technical field
The present embodiments relate to image processing techniques, more particularly to a kind of picture depth Forecasting Methodology and device.
Background technology
Depth prediction is the FAQs of computer vision field and image processing field, and depth information can be used for passing on 3D
(Three Dimensions, three-dimensional) information, and further solve the machine visual tasks such as scene understanding or Object identifying.
The traditional approach of the extraction of depth information generally requires multiple input pictures, such as multi-view image, motion structure
Multi-view image or for photometric stereo and multifocal image etc..Existing method is by learning 2D images and 3D rendering
Between correlation, and then obtain image predetermined depth information.But true picture covers substantial amounts of different scenes, 2D images
It is widely different between 3D rendering, cause that depth prediction precision is low, and effect is poor.
The content of the invention
The present invention provides a kind of picture depth Forecasting Methodology and device, and high accuracy depth prediction is carried out to image to realize.
In a first aspect, the embodiments of the invention provide a kind of picture depth Forecasting Methodology, this method includes:
Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure;
It is pre- that depth is carried out at least one topography of the pending image based at least one nervus opticus network
Survey is handled, and generates at least one partial depth map;
Processing is weighted to the overall depth figure and at least one partial depth map according to default fusion weight,
Generation fusion depth map.
Further, at least one topography based at least one nervus opticus network to the pending image
Before carrying out depth prediction processing, in addition to:
Feature recognition is carried out to the pending image based on third nerve network, generates feature dot image;
Local image region is determined according to the feature dot image, processing is cut out to the local image region, it is raw
Into at least one topography.
Further, it is complete in the first nerves network, the third nerve network and the nervus opticus network struction
Cheng Shi, network parameter initialization is carried out to the first nerves network, the third nerve network and the third nerve net, and
The first nerves network after initialization, third nerve network and nervus opticus network are instructed according to least disadvantage function fashion
Practice, wherein, the network parameter of initialization is set according to one-dimensional gaussian profile.
Further, the default fusion weight is pre-set, and the method to set up of the default fusion weight includes:
Nervus opticus network described in the first nerves network is subjected to depth prediction processing, generation test by test sample
Overall depth figure and at least one test partial depth map;
Obtain respectively it is described test partial depth map the first test error and test overall depth figure in the test
Second test error of partial depth map corresponding region;
Default fusion weight is determined according to first test error and the second test error.
Further, the first nerves network and third nerve network composition multitask neutral net are described more
The forward part of task neutral net includes the convolutional layer of the first predetermined number, for extracting the characteristic information of input picture;
The rear portion of the multitask neutral net point includes the first branch and the second branch, and first branch includes second
The warp lamination of predetermined number, for the overall depth figure to generating the input picture, second branch includes pond
Change layer and full articulamentum, for being cut out processing to the input picture, generate at least one topography, wherein, in institute
State convolutional layer and be connected with pond layer, normalization layer and activation primitive layer afterwards, activation letter is connected with after the full articulamentum
Several layers.
Further, the nervus opticus network includes the convolutional layer of the 3rd predetermined number and the warp of the 4th predetermined number
Lamination, wherein, pond layer, normalization layer and activation primitive layer are connected with after the convolutional layer.
Further, the pending image is facial image, and the topography includes at least one of following:Left eye figure
Picture, eye image, nose image and face image.
Second aspect, the embodiment of the present invention additionally provide a kind of picture depth prediction meanss, and the device includes:
Overall depth figure generation module, for carrying out depth prediction processing based on first nerves network handles processing image,
Generate overall depth figure;
Partial depth map generation module, for based at least one nervus opticus network to the pending image at least
One topography carries out depth prediction processing, generates at least one partial depth map;
Merge depth map generation module, for according to default fusion weight to the overall depth figure and described at least one
Partial depth map is weighted processing, generation fusion depth map.
Further, device also includes:
Characteristic point determining module, for based at least one nervus opticus network to the pending image at least one
Before individual topography carries out depth prediction processing, feature recognition is carried out to the pending image based on third nerve network,
Generate feature dot image;
Topography's generation module, for determining local image region according to the feature dot image, to the Local map
As region is cut out processing, at least one topography is generated.
Further, it is complete in the first nerves network, the third nerve network and the nervus opticus network struction
Cheng Shi, network parameter initialization is carried out to the first nerves network, the third nerve network and the third nerve net, and
The first nerves network after initialization, third nerve network and nervus opticus network are instructed according to least disadvantage function fashion
Practice, wherein, the network parameter of initialization is set according to one-dimensional gaussian profile.
Further, the default fusion weight is pre-set, and weight setting module includes:
Depth map determining unit, for test sample to be carried out the first nerves network and the nervus opticus network
Depth prediction processing, generation test overall depth figure and at least one test partial depth map;
Test error determining unit, for obtaining the first test error of the test partial depth map respectively and testing whole
In body depth map with the second test error of the test partial depth map corresponding region;
Weight determining unit, for determining default fusion weight according to first test error and the second test error.
Further, the first nerves network and third nerve network composition multitask neutral net are described more
The forward part of task neutral net includes the convolutional layer of the first predetermined number, for extracting the characteristic information of input picture;
The rear portion of the multitask neutral net point includes the first branch and the second branch, and first branch includes second
The warp lamination of predetermined number, for the overall depth figure to generating the input picture, second branch includes pond
Change layer and full articulamentum, for being cut out processing to the input picture, generate at least one topography, wherein, in institute
State convolutional layer and be connected with pond layer, normalization layer and activation primitive layer afterwards, activation letter is connected with after the full articulamentum
Several layers.
Further, the nervus opticus network includes the convolutional layer of the 3rd predetermined number and the warp of the 4th predetermined number
Lamination, wherein, pond layer, normalization layer and activation primitive layer are connected with after the convolutional layer.
Further, the pending image is facial image, and the topography includes at least one of following:Left eye figure
Picture, eye image, nose image and face image.
The embodiment of the present invention is by default neutral net to pending image and at least one Local map of pending image
As carrying out depth prediction processing, overall depth figure and partial depth map are generated, and according to the default weight that merges by overall depth figure
Fusion is weighted with partial depth map, high accuracy fusion depth map is generated, it is low to solve depth prediction precision in the prior art
The problem of, realize and high accuracy depth prediction is carried out to image.
Brief description of the drawings
Fig. 1 is a kind of flow chart for picture depth Forecasting Methodology that the embodiment of the present invention one provides;
Fig. 2A is the pending facial image that the embodiment of the present invention one provides;
Fig. 2 B are the overall depth figures that the pending image that the embodiment of the present invention one provides generates through first nerves network;
Fig. 2 C are that the feature dot image that the pending image that the embodiment of the present invention one provides generates through third nerve network is shown
It is intended to;
Fig. 2 D are a kind of schematic diagrames for multitask neutral net that the embodiment of the present invention one provides;
Fig. 3 is a kind of structural representation for picture depth prediction meanss that the embodiment of the present invention two provides.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just
Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 is a kind of flow chart for picture depth Forecasting Methodology that the embodiment of the present invention one provides, and the present embodiment is applicable
In the situation for carrying out high accuracy depth prediction to image automatically, this method can be deep by a kind of image provided in an embodiment of the present invention
Prediction meanss are spent to perform, and the device can be realized by the way of software and/or hardware.The device specifically includes:
S110, image progress depth prediction processing is handled based on first nerves network handles, generate overall depth figure.
Wherein, depth prediction processing refers to extracting the processing mode of depth information in pending middle image, depth information
The hierarchical information or far and near distance information that each object is actual in image are referred to, if image has depth information, with level
Sense and third dimension, visual effect are preferable.
Overall depth figure refers to including the image of depth information in pending image.Wherein, overall depth figure is gray scale
Image, the half-tone information of pixel is characterized by gray value, exemplary, grey scale pixel value is bigger, shows that actual object is more remote,
Gray value is smaller, shows that actual object is nearer.
In the present embodiment, before depth prediction processing is carried out to pending image, pending image is pre-processed,
Wherein, pretreatment includes amplification, diminution or image segmentation of image etc., and wherein image refers to deleting in pending image
Background image, simplify the input information of first nerves network.Optionally, the pending image point of first nerves network is inputted
Resolution is fixed, exemplary, such as the resolution ratio of pending image can be 384x384.In the present embodiment, by pending
Image process is pre-processed, and pending image is simplified and size is unified, it is useful to be advantageous to first nerves network rapid extraction
Characteristic information, avoid the interference of background information.
In the present embodiment, first nerves network is that training in advance obtains, first nerves network can be include convolutional layer,
Warp lamination and pond layer, and can connection pool layer, activation primitive layer and normalization (Batch after each convolutional layer
Normalization, BN) layer, wherein, the order of connection of pond layer, activation primitive layer and normalization layer does not limit.It is exemplary
, the quantity of convolutional layer can be 4 layers, and warp lamination can be 5 layers, and 5 layers of warp lamination are connected to after 4 layers of convolutional layer,
Activation primitive for example can be ReLU functions, PReLU functions or RReLU functions.
S120, at least one topography progress depth based at least one nervus opticus network handles processing image are pre-
Survey is handled, and generates at least one partial depth map.
In the present embodiment, the topological structure of nervus opticus network can be it is identical with the topological structure of first nerves network or
It is different.
Optionally, nervus opticus network can be multiple, and multiple nervus opticus networks can be with phase homeomorphism knot
Structure and heterogeneous networks parameter, are obtained based on different training samples.Different topographies corresponds to different nervus opticus networks.
In the present embodiment, multiple topographies can be by manually determine local image region cut generation or
By the local image region of the pending image of automatic identification, and cut and generate automatically according to recognition result.Optionally, based on
Three neutral nets are cut out processing to pending image, generate at least one topography.
Wherein, cut out processing to refer to by identifying key message in pending image, and intercept key message location
Domain, forms topography, and topography refers to including the image of pending image local information.Exemplary, if pending
Image is character image, and key message can be human limbs information;If pending image is facial image, key message can
To be human face information.
In the present embodiment, third nerve network is that training in advance obtains, third nerve network can be include convolutional layer,
Pond layer and full articulamentum, and activation primitive layer and normalization (Batch can be connected after each convolutional layer
Normalization, BN) layer.Exemplary, the quantity of convolutional layer can be 4 layers, and full articulamentum can be 1 layer, and pond layer can
To be after 1 layer, wherein pond layer and full articulamentum are connected to 4 layers of convolutional layer in turn, activation primitive for example can be ReLU letters
Number, PReLU functions or RReLU functions.
Optionally, processing is cut out based on third nerve network handles processing image, generates at least one topography,
Including:Feature recognition is carried out based on third nerve network handles processing image, generates feature dot image;It is true according to feature dot image
Determine local image region, processing is cut out to local image region, generate at least one topography.
In the present embodiment, the key message in pending image is characterized by characteristic point in feature dot image, it is exemplary,
It can be the profile that key message is formed by multiple characteristic points, can also be multiple and key message is covered by characteristic point, will
The region or characteristic point overlay area that characteristic point connecting line includes determine local image region, and local image region is carried out
Cut out, generate topography.Optionally, characteristic point corresponding to different key messages can be different, exemplary, different crucial letters
Breath can be the feature dot image using different colours or figure.
Optionally, pending image is facial image, and topography includes at least one of following:Left-eye image, right eye figure
Picture, nose image and face image.Optionally, topography can also include left eyebrow image and right eyebrow image.
Exemplary, it is the pending face figure that the embodiment of the present invention one provides referring to Fig. 2A, Fig. 2 B and Fig. 2 C, Fig. 2A
Picture, wherein, the facial image is the pending facial image obtained by pretreatment;Fig. 2 B are that the embodiment of the present invention one provides
The overall depth figure that is generated through first nerves network of pending image;Fig. 2 C are the pending figures that the embodiment of the present invention one provides
As the feature dot image schematic diagram generated through third nerve network, wherein, the dot in Fig. 2 C is characterized a little, and eye area
There is characteristic point domain, nasal area, face region and brow region, and according to features described above point, pending image can be cut
Reason is made arrangement after due consideration, generates multiple topographies.
Optionally, before topography being inputted into nervus opticus network, the resolution ratio for adjusting topography is differentiated to be default
Rate, exemplary, default resolution ratio can be 384x384.Accordingly, the depth prediction based on nervus opticus network is handled
The resolution recovery of the partial depth map arrived to topography initial resolution.In the present embodiment, by increasing topography
Resolution ratio, be advantageous to improve partial depth map precision.
Optionally, nervus opticus network includes the convolutional layer of the 3rd predetermined number and the warp lamination of the 4th predetermined number,
Wherein, pond layer is connected with after the convolutional layer, normalizes layer and activation primitive layer, wherein, pond layer, activation primitive layer and return
The order of connection of one change layer does not limit.Exemplary, the 3rd predetermined number can be 4, and the 4th predetermined number can be 5, swash
Function living for example can be ReLU functions, PReLU functions or RReLU functions.
It should be noted that step S110 can be performed simultaneously with step S120, in the absence of priority sequential relationship.
The default fusion weight of S130, basis is weighted processing to overall depth figure and at least one partial depth map, raw
Into fusion depth map.
In the present embodiment, overall depth figure and partial depth map are to characterize depth information by gray value, and according to same
One gray value corresponds to identical depth value.In the present embodiment, overall depth figure and at least one partial depth map are weighted
Processing refers to the gray value corresponding with overall depth figure of the gray value of each pixel in topography being based on default fusion
Weight is weighted, it is determined that fusion depth map.Wherein, preset corresponding to different topographies fusion weight can with identical or
It is different.It is exemplary, left eye partial depth map and the left eye region of overall depth figure are carried out to the weighted calculation of corresponding pixel points,
The face region of face partial depth map and overall depth figure is carried out to the weighted calculation of corresponding pixel points.Optionally, above-mentioned two
Fusion weight is preset corresponding to individual local gray level figure can be with identical or different.
It should be noted that the precision of partial depth map is higher than the precision of corresponding region in overall depth figure.The present embodiment
In, partial depth map and overall depth figure are carried out by obtaining high-precision partial depth map, and based on default fusion weight
Fusion, generates high-precision fusion depth map, solves the problems, such as that depth prediction precision is low in the prior art, meanwhile, based on god
Depth prediction processing through network is end-to-end processing mode, simple to operate.
Optionally, three-dimensional image reconstruction is carried out according to fusion depth map, can be applied to video conference, visual telephone, virtual
Game, recognition of face or film or cartoon making etc., be advantageous to improve later image or the definition of video production.
Optionally, preset fusion weight to pre-set, the method to set up of default fusion weight includes:By test sample
First nerves network and nervus opticus network are subjected to depth prediction processing, generation test overall depth figure and at least one test
Partial depth map;Obtain respectively local deep with test in the first test error and test overall depth figure of test partial depth map
Spend the second test error of figure corresponding region;Default fusion weight is determined according to the first test error and the second test error.
It is on the basis of first nerves network and nervus opticus network training are completed, test sample is defeated in the present embodiment
Enter first nerves network to obtain testing overall depth figure, by nervus opticus network corresponding to the input of the topography of test sample,
Test partial depth map corresponding to generation.The first test error of test overall depth figure is determined based on standard overall depth figure,
Exemplary, each pixel is determined by the difference of standard overall depth figure and test overall depth figure corresponding pixel points gray value
Error, the average of each pixel point tolerance is defined as the first test error.Optionally, in overall depth figure is tested determine with
Topography corresponds to the first test error of regional area.Similarly, each test partial-depth is determined based on standard partial depth map
Second test error of figure.
Optionally, the corresponding test error of the respective weights of overall depth figure and partial depth map is inversely proportional.Example
Property, the method that default fusion weight is illustrated by taking left-eye image as an example, the second test error and the according to corresponding to left-eye image
One test error determines the second depth error and the first depth error, such as the second depth error can be 0.1mm, the first depth
The default fusion weight of error 0.2mm, the regional area of overall depth figure corresponding to left-eye image and left eye depth map can be
2/3 and 1/3.Optionally, different fusion weights can be set in the different zones of overall depth figure.
Wherein, preset fusion weight and determine that exemplary, test sample quantity can be by substantial amounts of test sample
1900。
In the present embodiment, by determining default fusion weight according to the test error of overall depth figure and partial depth map,
The weight of high-precision depth map is improved, while reduces the weight of low precision depth map, further increases the depth of fusion depth map
Spend precision of prediction.
The technical scheme of the present embodiment, by presetting neutral net at least one of pending image and pending image
Topography carries out depth prediction processing, generates overall depth figure and partial depth map, and will be overall according to default fusion weight
Depth map and partial depth map are weighted fusion, generate high accuracy fusion depth map, solve depth prediction in the prior art
The problem of precision is low, realizes and high accuracy depth prediction is carried out to image.
On the basis of above-described embodiment, before depth prediction processing is carried out to pending image, establish and train the
One neutral net, third nerve network and nervus opticus network.Wherein:In first nerves network, third nerve network and second
When neutral net structure is completed, network parameter initialization is carried out to first nerves network, third nerve network and third nerve net,
And the first nerves network after initialization, third nerve network and nervus opticus network are carried out according to least disadvantage function fashion
Training, wherein, the network parameter of initialization is set according to one-dimensional gaussian profile.
Wherein, network parameter initialization refers to setting initial network parameter, in the present embodiment, each nerve for neutral net
The initial network parameter of network is set according to one-dimensional gaussian profile, instead of random to neutral net progress initial in the prior art
Change, be advantageous to improve the training effectiveness of neutral net, avoid because neutral net rate of convergence is slow caused by random initializtion
Or the problem of can not restraining.
Optionally, the training method of first nerves network includes:By first sample image through first nerves net to be trained
Network carries out depth prediction processing, generates the first training image;According to the first training image standard corresponding with first sample image
Overall depth figure, first-loss function is generated, the network ginseng of first nerves network to be trained is adjusted according to first-loss function
Number.
In the present embodiment, first sample image for example can be substantial amounts of facial image, exemplary, first sample image
Can include 2500 colorized face images, wherein male's facial image 1100 is opened, and women facial image 1400 is opened, optional
, first sample image is unified size.
In the present embodiment, standard overall depth figure can be pre-set, also can be in first nerves network training process
Period is extracted.For example, the depth information of information model extraction training image is extracted by predetermined depth figure, it is exemplary,
Predetermined depth figure extraction information model for example can be HourGlass models, wherein, HourGlass models are previously obtained.The
One loss function is special for characterizing the standard of the characteristic information of the training image of neutral net generation and standard overall depth figure
The inconsistent degree of reference breath, the value of first-loss function is smaller, and the robustness of first nerves network is generally better.It is exemplary
, first-loss function can use the form of mean square error (Mean Squared Error, MSE) to determine.
In the present embodiment, first-loss function is subjected to gradient anti-pass, and first nerves is adjusted according to first-loss function
The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.
Optionally, the training method of nervus opticus network includes:By the second sample image through nervus opticus net to be trained
Network carries out depth prediction processing, generates the second training image;According to the second training image standard corresponding with the second sample image
Partial depth map, the second loss function is generated, the network ginseng of nervus opticus network to be trained is adjusted according to the second loss function
Number.
In the present embodiment, the second sample image is set, wherein the second sample image and first sample images match.It is exemplary
, if nervus opticus network is used to carry out left-eye image depth prediction processing, the second sample image is and first sample figure
The left eye topography as corresponding to.In the present embodiment, standard partial depth map can be pre-set, also can be in nervus opticus
Extracted during network training process.Second loss function is the spy for characterizing the second training image of neutral net generation
Reference ceases the inconsistent degree with the standard feature information of standard partial depth map, and the value of the second loss function is smaller, the second god
Robustness through network is generally better.Exemplary, the second loss function can be determined in the form of mean square error.
In the present embodiment, the second loss function is subjected to gradient anti-pass, and nervus opticus is adjusted according to the second loss function
The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.
Optionally, the training method of third nerve network includes:By the 3rd sample image through third nerve net to be trained
Network is cut out processing, generates at least one training topography;Training topography is obtained respectively and corresponding standard is local
The boundary coordinate information of image;Believed according to the boundary coordinate information of training topography and the boundary coordinate of standard topography
Breath, determines the 3rd loss function;The network parameter of third nerve network is adjusted according to the 3rd loss function.
In the present embodiment, the 3rd sample image can be identical with first sample image, reduce sample collection quantity.Its
In the 3rd loss function boundary coordinate information and the standard topography of the training topography that are used to characterizing neutral net generation
Boundary coordinate information inconsistent degree, it is exemplary, can be that the average error value of each boundary pixel point is defined as
Three loss function values.3rd loss function is subjected to gradient anti-pass, and third nerve network is adjusted according to the 3rd loss function
Network parameter.Optionally, network parameter includes but is not limited to weight and deviant.
Optionally, first nerves network and third nerve network composition multitask neutral net, it is exemplary, referring to figure
2D, Fig. 2 D are a kind of schematic diagrames for multitask neutral net that the embodiment of the present invention one provides.The front portion of multitask neutral net
Dividing includes the convolutional layer of the first predetermined number, for extracting the characteristic information of input picture;The rear part of multitask neutral net
Including the first branch and the second branch, the first branch includes the warp lamination of the second predetermined number, for generating input picture
Overall depth figure, the second branch includes pond layer and full articulamentum, and for being cut out processing to input picture, generation is at least
One topography, wherein, pond layer is connected with after the convolutional layer, normalizes layer and activation primitive layer, in full articulamentum it
After be connected with activation primitive layer.
In the present embodiment, depth prediction processing is carried out to pending image simultaneously by multitask neutral net and cuts out place
Reason, overall depth figure and at least one topography are generated, realizes while completes multiple tasks, instead of a neutral net
A task is can be only done, simplifies the training process of neutral net.
Embodiment two
Fig. 3 is a kind of structural representation for picture depth prediction meanss that the embodiment of the present invention two provides, and the device is specific
Including:
Overall depth figure generation module 210, for being carried out based on first nerves network handles processing image at depth prediction
Reason, generate overall depth figure;
Partial depth map generation module 220, for handling image at least based at least one nervus opticus network handles
One topography carries out depth prediction processing, generates at least one partial depth map;
Depth map generation module 230 is merged, for the default fusion weight of basis to overall depth figure and at least one part
Depth map is weighted processing, generation fusion depth map.
Optionally, described device also includes:
Characteristic point determining module, for based at least one nervus opticus network to the pending image at least one
Before individual topography carries out depth prediction processing, feature recognition is carried out to the pending image based on third nerve network,
Generate feature dot image;
Topography's generation module, for determining local image region according to feature dot image, local image region is entered
Row cuts out processing, generates at least one topography, wherein, feature dot image corresponding to different images region is different.
Optionally, when first nerves network, third nerve network and nervus opticus network struction are completed, to first nerves
Network, third nerve network and third nerve net carry out network parameter initialization, and according to least disadvantage function fashion to initial
First nerves network, third nerve network and nervus opticus network after change are trained, wherein, the network parameter root of initialization
Set according to one-dimensional gaussian profile.
Optionally, preset fusion weight to pre-set, weight setting module includes:
Depth map determining unit, for test sample to be surveyed first nerves network and nervus opticus network processes, generation
Try overall depth figure and at least one test partial depth map;
Test error determining unit, the first test error and test for obtaining test partial depth map respectively are overall deep
Spend the second test error with test partial depth map corresponding region in figure;
Weight determining unit, for determining default fusion weight according to the first test error and the second test error.
Optionally, first nerves network and third nerve network composition multitask neutral net, multitask neutral net
Forward part includes the convolutional layer of the first predetermined number, for extracting the characteristic information of input picture;
The rear portion of multitask neutral net point includes the first branch and the second branch, and the first branch includes the second predetermined number
Warp lamination, for the overall depth figure to generating input picture, the second branch includes pond layer and full articulamentum, for pair
Input picture is cut out processing, generates at least one topography, wherein, it is connected with pond layer, normalizing after convolutional layer
Change layer and activation primitive layer, activation primitive layer is connected with after full articulamentum.
Optionally, nervus opticus network includes the convolutional layer of the 3rd predetermined number and the warp lamination of the 4th predetermined number,
Wherein, pond layer, normalization layer and activation primitive layer are connected with after convolutional layer.
Optionally, pending image is facial image, and topography includes at least one of following:Left-eye image, right eye figure
Picture, nose image and face image.
Picture depth prediction meanss provided in an embodiment of the present invention can perform the image that any embodiment of the present invention is provided
Depth prediction approach, possess and perform the corresponding functional module of picture depth Forecasting Methodology and beneficial effect.
Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that
The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes,
Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention
It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also
Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.
Claims (14)
- A kind of 1. picture depth Forecasting Methodology, it is characterised in that including:Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure;At least one topography of the pending image is carried out at depth prediction based at least one nervus opticus network Reason, generates at least one partial depth map;Processing is weighted to the overall depth figure and at least one partial depth map according to default fusion weight, generated Merge depth map.
- 2. according to the method for claim 1, it is characterised in that waiting to locate to described based at least one nervus opticus network Before at least one topography progress depth prediction processing for managing image, in addition to:Feature recognition is carried out to the pending image based on third nerve network, generates feature dot image;Local image region is determined according to the feature dot image, processing is cut out to the local image region, generation is extremely A few topography.
- 3. according to the method for claim 2, it is characterised in that in the first nerves network, the third nerve network When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and the 3rd god Network parameter initialization is carried out through net, and according to least disadvantage function fashion to the first nerves network after initialization, the 3rd god It is trained through network and nervus opticus network, wherein, the network parameter of initialization is set according to one-dimensional gaussian profile.
- 4. according to the method for claim 1, it is characterised in that the default fusion weight is pre-set, described pre- If the method to set up of fusion weight includes:Test sample is based on the first nerves network and the nervus opticus network carries out depth prediction processing, generation test Overall depth figure and at least one test partial depth map;Obtain respectively local with the test in the first test error and test overall depth figure of the test partial depth map Second test error of depth map corresponding region;Default fusion weight is determined according to first test error and the second test error.
- 5. according to the method for claim 2, it is characterised in that the first nerves network and the third nerve group of networks Into multitask neutral net, the forward part of the multitask neutral net includes the convolutional layer of the first predetermined number, for extracting The characteristic information of input picture;The rear portion of the multitask neutral net point includes the first branch and the second branch, and it is default that first branch includes second The warp lamination of quantity, for the overall depth figure to generating the input picture, second branch includes pond layer With full articulamentum, for being cut out processing to the input picture, at least one topography is generated, wherein, in the volume Pond layer, normalization layer and activation primitive layer are connected with after lamination, activation primitive layer is connected with after the full articulamentum.
- 6. according to the method for claim 1, it is characterised in that the nervus opticus network includes the volume of the 3rd predetermined number The warp lamination of lamination and the 4th predetermined number, wherein, pond layer, normalization layer and activation are connected with after the convolutional layer Function layer.
- 7. according to any described methods of claim 1-6, it is characterised in that the pending image is facial image, described Topography includes at least one of following:Left-eye image, eye image, nose image and face image.
- A kind of 8. picture depth prediction meanss, it is characterised in that including:Overall depth figure generation module, for carrying out depth prediction processing, generation based on first nerves network handles processing image Overall depth figure;Partial depth map generation module, for based at least one nervus opticus network at least one of the pending image Topography carries out depth prediction processing, generates at least one partial depth map;Depth map generation module is merged, for the default fusion weight of basis to the overall depth figure and at least one part Depth map is weighted processing, generation fusion depth map.
- 9. device according to claim 8, it is characterised in that described device also includes:Characteristic point determining module, at least one office based at least one nervus opticus network to the pending image Before portion's image carries out depth prediction processing, feature recognition, generation are carried out to the pending image based on third nerve network Feature dot image;Topography's generation module, for determining local image region according to the feature dot image, to the area of topography Domain is cut out processing, generates at least one topography.
- 10. device according to claim 9, it is characterised in that in the first nerves network, the third nerve network When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and the 3rd god Network parameter initialization is carried out through net, and according to least disadvantage function fashion to the first nerves network after initialization, the 3rd god It is trained through network and nervus opticus network, wherein, the network parameter of initialization is set according to one-dimensional gaussian profile.
- 11. device according to claim 8, it is characterised in that the default fusion weight is pre-set, and weight is set Putting module includes:Depth map determining unit, for test sample to be carried out into depth by the first nerves network and the nervus opticus network Prediction is handled, generation test overall depth figure and at least one test partial depth map;Test error determining unit, the first test error and test for obtaining the test partial depth map respectively are overall deep Spend the second test error with the test partial depth map corresponding region in figure;Weight determining unit, for determining default fusion weight according to first test error and the second test error.
- 12. device according to claim 9, it is characterised in that the first nerves network and the third nerve network Multitask neutral net is formed, the forward part of the multitask neutral net includes the convolutional layer of the first predetermined number, for carrying Take the characteristic information of input picture;The rear portion of the multitask neutral net point includes the first branch and the second branch, and it is default that first branch includes second The warp lamination of quantity, for the overall depth figure to generating the input picture, second branch includes pond layer With full articulamentum, for being cut out processing to the input picture, at least one topography is generated, wherein, in the volume Pond layer, normalization layer and activation primitive layer are connected with after lamination, activation primitive layer is connected with after the full articulamentum.
- 13. device according to claim 8, it is characterised in that the nervus opticus network includes the 3rd predetermined number The warp lamination of convolutional layer and the 4th predetermined number, wherein, pond layer, normalization layer are connected with after the convolutional layer and is swashed Function layer living.
- 14. according to any described devices of claim 8-13, it is characterised in that the pending image is facial image, institute Topography is stated including at least one of following:Left-eye image, eye image, nose image and face image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710811182.4A CN107578435B (en) | 2017-09-11 | 2017-09-11 | A kind of picture depth prediction technique and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710811182.4A CN107578435B (en) | 2017-09-11 | 2017-09-11 | A kind of picture depth prediction technique and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107578435A true CN107578435A (en) | 2018-01-12 |
CN107578435B CN107578435B (en) | 2019-11-29 |
Family
ID=61033100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710811182.4A Active CN107578435B (en) | 2017-09-11 | 2017-09-11 | A kind of picture depth prediction technique and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107578435B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191514A (en) * | 2018-10-23 | 2019-01-11 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating depth detection model |
CN109829886A (en) * | 2018-12-25 | 2019-05-31 | 苏州江奥光电科技有限公司 | A kind of pcb board defect inspection method based on depth information |
WO2019149206A1 (en) * | 2018-02-01 | 2019-08-08 | 深圳市商汤科技有限公司 | Depth estimation method and apparatus, electronic device, program, and medium |
CN110309706A (en) * | 2019-05-06 | 2019-10-08 | 深圳市华付信息技术有限公司 | Face critical point detection method, apparatus, computer equipment and storage medium |
CN110363296A (en) * | 2019-06-28 | 2019-10-22 | 腾讯科技(深圳)有限公司 | Task model acquisition methods and device, storage medium and electronic device |
CN111414923A (en) * | 2020-03-05 | 2020-07-14 | 南昌航空大学 | Indoor scene three-dimensional reconstruction method and system based on single RGB image |
CN111428859A (en) * | 2020-03-05 | 2020-07-17 | 北京三快在线科技有限公司 | Depth estimation network training method and device for automatic driving scene and autonomous vehicle |
CN112488104A (en) * | 2020-11-30 | 2021-03-12 | 华为技术有限公司 | Depth and confidence estimation system |
CN116721143A (en) * | 2023-08-04 | 2023-09-08 | 南京诺源医疗器械有限公司 | Depth information processing device and method for 3D medical image |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177440A (en) * | 2012-12-20 | 2013-06-26 | 香港应用科技研究院有限公司 | System and method of generating image depth map |
CN106204522A (en) * | 2015-05-28 | 2016-12-07 | 奥多比公司 | The combined depth of single image is estimated and semantic tagger |
CN107038429A (en) * | 2017-05-03 | 2017-08-11 | 四川云图睿视科技有限公司 | A kind of multitask cascade face alignment method based on deep learning |
-
2017
- 2017-09-11 CN CN201710811182.4A patent/CN107578435B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177440A (en) * | 2012-12-20 | 2013-06-26 | 香港应用科技研究院有限公司 | System and method of generating image depth map |
CN106204522A (en) * | 2015-05-28 | 2016-12-07 | 奥多比公司 | The combined depth of single image is estimated and semantic tagger |
CN107038429A (en) * | 2017-05-03 | 2017-08-11 | 四川云图睿视科技有限公司 | A kind of multitask cascade face alignment method based on deep learning |
Non-Patent Citations (2)
Title |
---|
DAVID EIGEN 等: "Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture", 《PROC.IEEE ICCV》 * |
JOSE M. FACIL 等: "Single-View and Multi-View Depth Fusion", 《IEEE ROBOTICS AND AUTOMATION LETTERS》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20200049833A (en) * | 2018-02-01 | 2020-05-08 | 선전 센스타임 테크놀로지 컴퍼니 리미티드 | Depth estimation methods and apparatus, electronic devices, programs and media |
US11308638B2 (en) | 2018-02-01 | 2022-04-19 | Shenzhen Sensetime Technology Co., Ltd. | Depth estimation method and apparatus, electronic device, program, and medium |
WO2019149206A1 (en) * | 2018-02-01 | 2019-08-08 | 深圳市商汤科技有限公司 | Depth estimation method and apparatus, electronic device, program, and medium |
KR102295403B1 (en) | 2018-02-01 | 2021-08-31 | 선전 센스타임 테크놀로지 컴퍼니 리미티드 | Depth estimation method and apparatus, electronic device, program and medium |
CN109191514B (en) * | 2018-10-23 | 2020-11-24 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating a depth detection model |
CN109191514A (en) * | 2018-10-23 | 2019-01-11 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating depth detection model |
CN109829886A (en) * | 2018-12-25 | 2019-05-31 | 苏州江奥光电科技有限公司 | A kind of pcb board defect inspection method based on depth information |
CN110309706A (en) * | 2019-05-06 | 2019-10-08 | 深圳市华付信息技术有限公司 | Face critical point detection method, apparatus, computer equipment and storage medium |
CN110363296A (en) * | 2019-06-28 | 2019-10-22 | 腾讯科技(深圳)有限公司 | Task model acquisition methods and device, storage medium and electronic device |
CN110363296B (en) * | 2019-06-28 | 2022-02-08 | 腾讯医疗健康(深圳)有限公司 | Task model obtaining method and device, storage medium and electronic device |
CN111414923A (en) * | 2020-03-05 | 2020-07-14 | 南昌航空大学 | Indoor scene three-dimensional reconstruction method and system based on single RGB image |
CN111428859A (en) * | 2020-03-05 | 2020-07-17 | 北京三快在线科技有限公司 | Depth estimation network training method and device for automatic driving scene and autonomous vehicle |
CN111414923B (en) * | 2020-03-05 | 2022-07-12 | 南昌航空大学 | Indoor scene three-dimensional reconstruction method and system based on single RGB image |
CN112488104A (en) * | 2020-11-30 | 2021-03-12 | 华为技术有限公司 | Depth and confidence estimation system |
CN112488104B (en) * | 2020-11-30 | 2024-04-09 | 华为技术有限公司 | Depth and confidence estimation system |
CN116721143A (en) * | 2023-08-04 | 2023-09-08 | 南京诺源医疗器械有限公司 | Depth information processing device and method for 3D medical image |
CN116721143B (en) * | 2023-08-04 | 2023-10-20 | 南京诺源医疗器械有限公司 | Depth information processing device and method for 3D medical image |
Also Published As
Publication number | Publication date |
---|---|
CN107578435B (en) | 2019-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107578435B (en) | A kind of picture depth prediction technique and device | |
CN109255831B (en) | Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning | |
CN113012293B (en) | Stone carving model construction method, device, equipment and storage medium | |
CN105427385B (en) | A kind of high-fidelity face three-dimensional rebuilding method based on multilayer deformation model | |
CN105404392B (en) | Virtual method of wearing and system based on monocular cam | |
CN110084304B (en) | Target detection method based on synthetic data set | |
CN108495110A (en) | A kind of virtual visual point image generating method fighting network based on production | |
WO2015188684A1 (en) | Three-dimensional model reconstruction method and system | |
CN109978984A (en) | Face three-dimensional rebuilding method and terminal device | |
CN107154032B (en) | A kind of image processing method and device | |
CN110223377A (en) | One kind being based on stereo visual system high accuracy three-dimensional method for reconstructing | |
CN110148217A (en) | A kind of real-time three-dimensional method for reconstructing, device and equipment | |
JP2008535116A (en) | Method and apparatus for three-dimensional rendering | |
CN110197462A (en) | A kind of facial image beautifies in real time and texture synthesis method | |
CN104809638A (en) | Virtual glasses trying method and system based on mobile terminal | |
CN110246209B (en) | Image processing method and device | |
CN110189202A (en) | A kind of three-dimensional virtual fitting method and system | |
CN116109798A (en) | Image data processing method, device, equipment and medium | |
CN108520510B (en) | No-reference stereo image quality evaluation method based on overall and local analysis | |
CN104599317A (en) | Mobile terminal and method for achieving 3D (three-dimensional) scanning modeling function | |
CN110517306A (en) | A kind of method and system of the binocular depth vision estimation based on deep learning | |
CN107578469A (en) | A kind of 3D human body modeling methods and device based on single photo | |
CN109218706A (en) | A method of 3 D visual image is generated by single image | |
CN107469355A (en) | Game image creation method and device, terminal device | |
CN113144613B (en) | Model-based method for generating volume cloud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221123 Address after: 518000 2nd floor, building a, Tsinghua campus, Shenzhen University Town, Xili street, Nanshan District, Shenzhen City, Guangdong Province Patentee after: Tsinghua Shenzhen International Graduate School Address before: 518000 Nanshan Zhiyuan 1001, Xue Yuan Avenue, Nanshan District, Shenzhen, Guangdong. Patentee before: TSINGHUA-BERKELEY SHENZHEN INSTITUTE |
|
TR01 | Transfer of patent right |