CN107578435B - A kind of picture depth prediction technique and device - Google Patents
A kind of picture depth prediction technique and device Download PDFInfo
- Publication number
- CN107578435B CN107578435B CN201710811182.4A CN201710811182A CN107578435B CN 107578435 B CN107578435 B CN 107578435B CN 201710811182 A CN201710811182 A CN 201710811182A CN 107578435 B CN107578435 B CN 107578435B
- Authority
- CN
- China
- Prior art keywords
- network
- image
- depth
- test
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The embodiment of the invention discloses a kind of picture depth prediction technique and devices.Wherein method includes: to carry out depth prediction processing based on first nerves network handles processing image, generates overall depth figure;Depth prediction processing is carried out based at least one topography of at least one nervus opticus network to the image to be processed, generates at least one partial depth map;Processing is weighted to the overall depth figure and at least one described partial depth map according to default fusion weight, generates fusion depth map.The embodiment of the present invention solves the problems, such as that picture depth precision of prediction is low, complicated for operation in the prior art, realizes and carries out high accuracy depth prediction to image.
Description
Technical field
The present embodiments relate to image processing techniques more particularly to a kind of picture depth prediction technique and devices.
Background technique
Depth prediction is the FAQs of computer vision field and field of image processing, and depth information can be used for conveying 3D
(Three Dimensions, three-dimensional) information, and further solve the machine visual task such as scene understanding or Object identifying.
The traditional approach of the extraction of depth information generally requires multiple input pictures, such as multi-view image, motion structure
Multi-view image is used for photometric stereo and multifocal image etc..Existing method is by study 2D image and 3D rendering
Between correlation, and then obtain image predetermined depth information.But true picture covers a large amount of different scenes, 2D image
It is widely different between 3D rendering, cause depth prediction precision low, effect is poor.
Summary of the invention
The present invention provides a kind of picture depth prediction technique and device, carries out high accuracy depth prediction to image to realize.
In a first aspect, the embodiment of the invention provides a kind of picture depth prediction techniques, this method comprises:
Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure;
It is pre- that depth is carried out based at least one topography of at least one nervus opticus network to the image to be processed
Survey processing, generates at least one partial depth map;
Processing is weighted to the overall depth figure and at least one described partial depth map according to default fusion weight,
Generate fusion depth map.
Further, at least one topography based at least one nervus opticus network to the image to be processed
Before progress depth prediction processing, further includes:
Feature identification is carried out to the image to be processed based on third nerve network, generates feature point image;
Local image region is determined according to the feature point image, and processing is cut out to the local image region, it is raw
At at least one topography.
Further, Yu Suoshu first nerves network, the third nerve network and the nervus opticus network struction are complete
Cheng Shi carries out network parameter initialization to the first nerves network, the third nerve network and the third nerve net, and
First nerves network, third nerve network and the nervus opticus network after initialization are instructed according to least disadvantage function fashion
Practice, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
Further, the default fusion weight is pre-set, and the setting method of the default fusion weight includes:
Nervus opticus network described in the first nerves network is subjected to depth prediction processing by test sample, generates test
Overall depth figure and at least one test partial depth map;
Obtain respectively it is described test partial depth map the first test error and test overall depth figure in the test
Second test error of partial depth map corresponding region;
Default fusion weight is determined according to first test error and the second test error.
Further, the first nerves network and the third nerve network form multitask neural network, described more
Convolutional layer of the front of task neural network point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second
The warp lamination of preset quantity, for the overall depth figure for generating the input picture, second branch to include pond
Change layer and full articulamentum and generate at least one topography for being cut out processing to the input picture, wherein in institute
It states convolutional layer and is connected with pond layer, normalization layer and activation primitive layer later, activation letter is connected with after the full articulamentum
Several layers.
Further, the nervus opticus network includes the convolutional layer of third preset quantity and the warp of the 4th preset quantity
Lamination, wherein pond layer, normalization layer and activation primitive layer are connected with after Yu Suoshu convolutional layer.
Further, the image to be processed is facial image, and the topography includes at least one of following: left eye figure
Picture, eye image, nose image and mouth image.
Second aspect, the embodiment of the invention also provides a kind of picture depth prediction meanss, which includes:
Overall depth figure generation module, for carrying out depth prediction processing based on first nerves network handles processing image,
Generate overall depth figure;
Partial depth map generation module, for based at least one nervus opticus network to the image to be processed at least
One topography carries out depth prediction processing, generates at least one partial depth map;
Merge depth map generation module, for according to default fusion weight to the overall depth figure and it is described at least one
Partial depth map is weighted processing, generates fusion depth map.
Further, device further include:
Characteristic point determining module, for based at least one nervus opticus network to the image to be processed at least one
Before a topography carries out depth prediction processing, feature identification is carried out to the image to be processed based on third nerve network,
Generate feature point image;
Topography's generation module, for determining local image region according to the feature point image, to the Local map
As region is cut out processing, at least one topography is generated.
Further, Yu Suoshu first nerves network, the third nerve network and the nervus opticus network struction are complete
Cheng Shi carries out network parameter initialization to the first nerves network, the third nerve network and the third nerve net, and
First nerves network, third nerve network and the nervus opticus network after initialization are instructed according to least disadvantage function fashion
Practice, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
Further, the default fusion weight is pre-set, and weight setting module includes:
Depth map determination unit, for the first nerves network and the nervus opticus network to be carried out to test sample
Depth prediction processing generates test overall depth figure and at least one test partial depth map;
Test error determination unit, for obtaining the first test error of the test partial depth map respectively and testing whole
In body depth map with it is described test partial depth map corresponding region the second test error;
Weight determining unit, for determining default fusion weight according to first test error and the second test error.
Further, the first nerves network and the third nerve network form multitask neural network, described more
Convolutional layer of the front of task neural network point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second
The warp lamination of preset quantity, for the overall depth figure for generating the input picture, second branch to include pond
Change layer and full articulamentum and generate at least one topography for being cut out processing to the input picture, wherein in institute
It states convolutional layer and is connected with pond layer, normalization layer and activation primitive layer later, activation letter is connected with after the full articulamentum
Several layers.
Further, the nervus opticus network includes the convolutional layer of third preset quantity and the warp of the 4th preset quantity
Lamination, wherein pond layer, normalization layer and activation primitive layer are connected with after Yu Suoshu convolutional layer.
Further, the image to be processed is facial image, and the topography includes at least one of following: left eye figure
Picture, eye image, nose image and mouth image.
The embodiment of the present invention is by presetting neural network at least one Local map of image to be processed and image to be processed
As carrying out depth prediction processing, overall depth figure and partial depth map are generated, and according to default fusion weight by overall depth figure
It is weighted fusion with partial depth map, high-precision fusion depth map is generated, it is low to solve depth prediction precision in the prior art
The problem of, it realizes and high accuracy depth prediction is carried out to image.
Detailed description of the invention
Fig. 1 is a kind of flow chart for picture depth prediction technique that the embodiment of the present invention one provides;
Fig. 2A is the facial image to be processed that the embodiment of the present invention one provides;
Fig. 2 B is the overall depth figure that the image to be processed that the embodiment of the present invention one provides is generated through first nerves network;
Fig. 2 C is that the feature point image that the image to be processed that the embodiment of the present invention one provides is generated through third nerve network is shown
It is intended to;
Fig. 2 D is a kind of schematic diagram for multitask neural network that the embodiment of the present invention one provides;
Fig. 3 is a kind of structural schematic diagram of picture depth prediction meanss provided by Embodiment 2 of the present invention.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart for picture depth prediction technique that the embodiment of the present invention one provides, and the present embodiment is applicable
In carry out high accuracy depth prediction to image automatically the case where, this method can be deep by a kind of image provided in an embodiment of the present invention
Prediction meanss are spent to execute, and the mode which can be used software and/or hardware is realized.The device specifically includes:
S110, depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure.
Wherein, depth prediction processing refers to extracting the processing mode of depth information in middle image to be processed, depth information
Refer in image that the actual hierarchical information of each object or far and near distance information have level if image has depth information
Sense and three-dimensional sense, visual effect are preferable.
Overall depth figure refers to the image comprising depth information in image to be processed.Wherein, overall depth figure is gray scale
Image characterizes the grayscale information of pixel by gray value, and illustratively, grey scale pixel value is bigger, shows that actual object is remoter,
Gray value is smaller, shows that actual object is closer.
In the present embodiment, before carrying out depth prediction processing to image to be processed, image to be processed is pre-processed,
Wherein, pretreatment includes amplification, diminution or the image segmentation etc. of image, and wherein image segmentation refers to deleting in image to be processed
Background image, simplify the input information of first nerves network.Optionally, the image to be processed point of first nerves network is inputted
Resolution is fixed, and illustratively, such as the resolution ratio of image to be processed can be 384x384.In the present embodiment, by to be processed
The pretreatment of image process carries out the unification of simplified and size to image to be processed, it is useful to be conducive to first nerves network rapidly extracting
Characteristic information, avoid the interference of background information.
In the present embodiment, first nerves network is that preparatory training obtains, first nerves network can be including convolutional layer,
Warp lamination and pond layer, and can connection pool layer, activation primitive layer and normalization (Batch after each convolutional layer
Normalization, BN) layer, wherein the order of connection of pond layer, activation primitive layer and normalization layer is without limitation.It is exemplary
, the quantity of convolutional layer can be 4 layers, and warp lamination can be 5 layers, and 5 layers of warp lamination are connected to after 4 layers of convolutional layer,
Activation primitive for example can be ReLU function, PReLU function or RReLU function.
S120, at least one topography progress depth based at least one nervus opticus network handles processing image are pre-
Survey processing, generates at least one partial depth map.
In the present embodiment, the topological structure of nervus opticus network can be it is identical as the topological structure of first nerves network or
It is different.
Optionally, nervus opticus network can be multiple, and multiple nervus opticus networks can be with phase homeomorphism knot
Structure and heterogeneous networks parameter, are obtained based on different training samples.Different topographies corresponds to different nervus opticus networks.
In the present embodiment, multiple topographies can be to be generated by manually determining that local image region is cut, and can also be
By the local image region of automatic identification image to be processed, and cuts and generate automatically according to recognition result.Optionally, based on the
Three neural networks are cut out processing to image to be processed, generate at least one topography.
Wherein, it cuts out processing to refer to by identifying key message in image to be processed, and intercepts key message location
Domain, forms topography, and topography refers to the image comprising image local information to be processed.Illustratively, if it is to be processed
Image is character image, and key message can be human limbs information;If image to be processed is facial image, key message can
To be human face information.
In the present embodiment, third nerve network is that preparatory training obtains, third nerve network can be including convolutional layer,
Pond layer and full articulamentum, and activation primitive layer and normalization (Batch can be connected after each convolutional layer
Normalization, BN) layer.Illustratively, the quantity of convolutional layer can be 4 layers, and full articulamentum can be 1 layer, and pond layer can
To be after 1 layer, wherein pond layer and full articulamentum are connected to 4 layers of convolutional layer in turn, activation primitive for example can be ReLU letter
Number, PReLU function or RReLU function.
Optionally, processing is cut out based on third nerve network handles processing image, generates at least one topography,
Include: that feature identification is carried out based on third nerve network handles processing image, generates feature point image;It is true according to feature point image
Determine local image region, processing is cut out to local image region, generates at least one topography.
In the present embodiment, the key message in image to be processed is characterized by characteristic point in feature point image, illustratively,
It can be the profile for forming key message by multiple characteristic points, can also be multiple and key message is covered by characteristic point, it will
The region or characteristic point overlay area that characteristic point connecting line includes determine local image region, and carry out to local image region
It cuts out, generates topography.Optionally, the corresponding characteristic point of different key messages can be different, illustratively, different crucial letters
Breath can be the feature point image using different colours or figure.
Optionally, image to be processed is facial image, and topography includes at least one of following: left-eye image, right eye figure
Picture, nose image and mouth image.Optionally, topography can also include left eyebrow image and right eyebrow image.
Illustratively, referring to fig. 2 A, Fig. 2 B and Fig. 2 C, Fig. 2A be the embodiment of the present invention one provide face figure to be processed
Picture, wherein the facial image is the facial image to be processed obtained by pretreatment;Fig. 2 B is that the embodiment of the present invention one provides
The overall depth figure that is generated through first nerves network of image to be processed;Fig. 2 C is the figure to be processed that the embodiment of the present invention one provides
As the feature point image schematic diagram generated through third nerve network, wherein the dot in Fig. 2 C is characterized a little, and eye area
There are characteristic point in domain, nasal area, mouth region and brow region, according to features described above point, can cut to image to be processed
Reason is made arrangement after due consideration, multiple topographies are generated.
Optionally, before topography being inputted nervus opticus network, the resolution ratio for adjusting topography is default differentiate
Rate, illustratively, default resolution ratio can be 384x384.Correspondingly, the depth prediction based on nervus opticus network is handled
The resolution recovery of the partial depth map arrived to topography initial resolution.In the present embodiment, by increasing topography
Resolution ratio, be conducive to improve partial depth map precision.
Optionally, nervus opticus network includes the convolutional layer of third preset quantity and the warp lamination of the 4th preset quantity,
Wherein, it is connected with pond layer after the convolutional layer, normalizes layer and activation primitive layer, wherein pond layer, activation primitive layer and returns
One changes the order of connection of layer without limitation.Illustratively, third preset quantity can be 4, and the 4th preset quantity can be 5, swash
Function living for example can be ReLU function, PReLU function or RReLU function.
It should be noted that step S110 can be performed simultaneously with step S120, successive sequential relationship is not present.
S130, processing is weighted to overall depth figure and at least one partial depth map according to default fusion weight, it is raw
At fusion depth map.
In the present embodiment, overall depth figure and partial depth map are to characterize depth information by gray value, and according to same
One gray value corresponds to identical depth value.In the present embodiment, overall depth figure and at least one partial depth map are weighted
Processing refers to for the gray value of pixel each in topography gray value corresponding with overall depth figure being based on default fusion
Weight is weighted, and determines fusion depth map.Wherein, the corresponding default fusion weight of different topographies can it is identical or
It is different.Illustratively, left eye partial depth map and the left eye region of overall depth figure are carried out to the weighted calculation of corresponding pixel points,
The mouth region of mouth partial depth map and overall depth figure is carried out to the weighted calculation of corresponding pixel points.Optionally, above-mentioned two
The corresponding default fusion weight of a local gray level figure can be identical or different.
It should be noted that the precision of partial depth map is higher than the precision of corresponding region in overall depth figure.The present embodiment
In, partial depth map and overall depth figure are carried out by obtaining high-precision partial depth map, and based on default fusion weight
Fusion, generates high-precision fusion depth map, solves the problems, such as that depth prediction precision is low in the prior art, meanwhile, based on mind
Depth prediction processing through network is end-to-end processing mode, easy to operate.
Optionally, three-dimensional image reconstruction is carried out according to fusion depth map, can be applied to video conference, visual telephone, virtual
Game, recognition of face or film or cartoon making etc. are conducive to the clarity for improving later image or video production.
Optionally, default fusion weight is pre-set, and the setting method for presetting fusion weight includes: by test sample
First nerves network and nervus opticus network are subjected to depth prediction processing, generate test overall depth figure and at least one test
Partial depth map;It obtains respectively deep with test part in the first test error and test overall depth figure of test partial depth map
Spend the second test error of figure corresponding region;Default fusion weight is determined according to the first test error and the second test error.
It is on the basis of first nerves network and nervus opticus network training are completed, test sample is defeated in the present embodiment
Enter first nerves network and obtain test overall depth figure, the topography of test sample is inputted into corresponding nervus opticus network,
Generate corresponding test partial depth map.The first test error of test overall depth figure is determined based on standard overall depth figure,
Illustratively, each pixel is determined by the difference of standard overall depth figure and test overall depth figure corresponding pixel points gray value
Error, the mean value of each pixel point tolerance is determined as the first test error.Optionally, test overall depth figure in determine with
Topography corresponds to the first test error of regional area.Similarly, each test partial-depth is determined based on standard partial depth map
Second test error of figure.
Optionally, the corresponding test error of the respective weights of overall depth figure and partial depth map is inversely proportional.Example
Property, the method that default fusion weight is illustrated by taking left-eye image as an example, according to corresponding second test error of left-eye image and the
One test error determines that the second depth error and the first depth error, such as the second depth error can be 0.1mm, the first depth
Error 0.2mm, the regional area of the corresponding overall depth figure of left-eye image and the default fusion weight of left eye depth map can be
2/3 and 1/3.Optionally, the settable different fusion weight of the different zones of overall depth figure.
Wherein, it presets fusion weight and determines that illustratively, test sample quantity can be by a large amount of test sample
1900。
In the present embodiment, by determining default fusion weight according to the test error of overall depth figure and partial depth map,
The weight of high-precision depth map is improved, while reducing the weight of low precision depth map, further improves the depth of fusion depth map
Spend precision of prediction.
The technical solution of the present embodiment, by default neural network at least one of image to be processed and image to be processed
Topography carries out depth prediction processing, generates overall depth figure and partial depth map, and will be whole according to default fusion weight
Depth map and partial depth map are weighted fusion, generate high-precision fusion depth map, solve depth prediction in the prior art
The low problem of precision realizes and carries out high accuracy depth prediction to image.
On the basis of the above embodiments, it before carrying out depth prediction processing to image to be processed, establishes and training the
One neural network, third nerve network and nervus opticus network.Wherein: in first nerves network, third nerve network and second
When neural network building is completed, network parameter initialization is carried out to first nerves network, third nerve network and third nerve net,
And first nerves network, third nerve network and the nervus opticus network after initialization are carried out according to least disadvantage function fashion
Training, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
Wherein, network parameter initialization refers to that initial network parameter is arranged for neural network, in the present embodiment, each nerve
The initial network parameter of network is arranged according to one-dimensional gaussian profile, carries out instead of in the prior art to neural network random initial
Change, is conducive to the training effectiveness for improving neural network, it is slow to avoid the neural network rate of convergence as caused by random initializtion
Or the problem of can not restraining.
Optionally, the training method of first nerves network includes: by first sample image through first nerves net to be trained
Network carries out depth prediction processing, generates the first training image;According to the first training image standard corresponding with first sample image
Overall depth figure generates first-loss function, and the network ginseng of first nerves network to be trained is adjusted according to first-loss function
Number.
In the present embodiment, first sample image for example can be a large amount of facial image, illustratively, first sample image
It can be including 2500 colorized face images, wherein male's facial image 1100 is opened, and women facial image 1400 is opened, optional
, first sample image is unified size.
In the present embodiment, standard overall depth figure can be it is pre-set, can also be in first nerves network training process
Period extracts.For example, the depth information that information model extracts training image is extracted by predetermined depth figure, illustratively,
Predetermined depth figure, which extracts information model, for example can be HourGlass model, wherein HourGlass model is previously obtained.The
One loss function is the standard spy of the characteristic information and standard overall depth figure for characterizing the training image of neural network generation
The inconsistent degree of reference breath, the value of first-loss function is smaller, and the robustness of first nerves network is usually better.It is exemplary
, the form that mean square error (Mean Squared Error, MSE) can be used in first-loss function determines.
In the present embodiment, first-loss function is subjected to gradient anti-pass, and first nerves are adjusted according to first-loss function
The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.
Optionally, the training method of nervus opticus network includes: by the second sample image through nervus opticus net to be trained
Network carries out depth prediction processing, generates the second training image;According to the second training image standard corresponding with the second sample image
Partial depth map generates the second loss function, and the network ginseng of nervus opticus network to be trained is adjusted according to the second loss function
Number.
In the present embodiment, the second sample image is set, wherein the second sample image and first sample images match.It is exemplary
, if nervus opticus network is used to carry out left-eye image depth prediction processing, the second sample image is and first sample figure
As corresponding left eye topography.In the present embodiment, standard partial depth map can be it is pre-set, can also be in nervus opticus
It is extracted during network training process.Second loss function is the spy for characterizing the second training image of neural network generation
The inconsistent degree of reference breath and the standard feature information of standard partial depth map, the value of the second loss function is smaller, the second mind
Robustness through network is usually better.Illustratively, the form that mean square error can be used in the second loss function determines.
In the present embodiment, the second loss function is subjected to gradient anti-pass, and nervus opticus is adjusted according to the second loss function
The network parameter of network.Optionally, network parameter includes but is not limited to weight and deviant.
Optionally, the training method of third nerve network includes: by third sample image through third nerve net to be trained
Network is cut out processing, generates at least one training topography;Training topography and corresponding standard part are obtained respectively
The boundary coordinate information of image;Believed according to the boundary coordinate of the boundary coordinate information of training topography and standard topography
Breath, determines third loss function;The network parameter of third nerve network is adjusted according to third loss function.
In the present embodiment, third sample image can be identical as first sample image, reduce sample collection quantity.Its
Middle third loss function is used to characterize boundary coordinate information and the standard topography of the training topography of neural network generation
The inconsistent degree of boundary coordinate information illustratively can be and the average error value of each boundary pixel point is determined as
Three loss function values.Third loss function is subjected to gradient anti-pass, and third nerve network is adjusted according to third loss function
Network parameter.Optionally, network parameter includes but is not limited to weight and deviant.
Optionally, first nerves network and third nerve network form multitask neural network, illustratively, referring to figure
2D, Fig. 2 D are a kind of schematic diagrames for multitask neural network that the embodiment of the present invention one provides.The front of multitask neural network
Divide the convolutional layer including the first preset quantity, for extracting the characteristic information of input picture;The rear part of multitask neural network
Including the first branch and the second branch, the first branch includes the warp lamination of the second preset quantity, for generation input picture
Overall depth figure, the second branch includes pond layer and full articulamentum, for being cut out processing to input picture, is generated at least
One topography, wherein be connected with pond layer after the convolutional layer, normalize layer and activation primitive layer, in full articulamentum it
After be connected with activation primitive layer.
In the present embodiment, depth prediction processing is carried out to image to be processed simultaneously by multitask neural network and cuts out place
Reason generates overall depth figure and at least one topography, realizes and be completed at the same time multiple tasks, instead of a neural network
It can be only done a task, simplify the training process of neural network.
Embodiment two
Fig. 3 is a kind of structural schematic diagram of picture depth prediction meanss provided by Embodiment 2 of the present invention, and the device is specific
Include:
Overall depth figure generation module 210, for being carried out at depth prediction based on first nerves network handles processing image
Reason generates overall depth figure;
Partial depth map generation module 220, for handling image at least based at least one nervus opticus network handles
One topography carries out depth prediction processing, generates at least one partial depth map;
Depth map generation module 230 is merged, for the default fusion weight of basis to overall depth figure and at least one part
Depth map is weighted processing, generates fusion depth map.
Optionally, described device further include:
Characteristic point determining module, for based at least one nervus opticus network to the image to be processed at least one
Before a topography carries out depth prediction processing, feature identification is carried out to the image to be processed based on third nerve network,
Generate feature point image;
Topography's generation module, for determining local image region according to feature point image, to local image region into
Row cuts out processing, generates at least one topography, wherein the corresponding feature point image in different images region is different.
Optionally, when first nerves network, third nerve network and nervus opticus network struction are completed, to first nerves
Network, third nerve network and third nerve net carry out network parameter initialization, and according to least disadvantage function fashion to initial
First nerves network, third nerve network and nervus opticus network after change are trained, wherein the network parameter root of initialization
It is arranged according to one-dimensional gaussian profile.
Optionally, default fusion weight is pre-set, and weight setting module includes:
Depth map determination unit, for by first nerves network and nervus opticus network processes, generating and surveying test sample
Try overall depth figure and at least one test partial depth map;
Test error determination unit, the first test error and test for obtaining test partial depth map respectively are whole deep
Spend the second test error in figure with test partial depth map corresponding region;
Weight determining unit, for determining default fusion weight according to the first test error and the second test error.
Optionally, first nerves network and third nerve network form multitask neural network, multitask neural network
Convolutional layer of the front point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of multitask neural network point includes the first branch and the second branch, and the first branch includes the second preset quantity
Warp lamination be used for pair for the overall depth figure for generating input picture, the second branch to include pond layer and full articulamentum
Input picture is cut out processing, generates at least one topography, wherein pond layer, normalizing are connected with after convolutional layer
Change layer and activation primitive layer, activation primitive layer is connected with after full articulamentum.
Optionally, nervus opticus network includes the convolutional layer of third preset quantity and the warp lamination of the 4th preset quantity,
Wherein, pond layer, normalization layer and activation primitive layer are connected with after convolutional layer.
Optionally, image to be processed is facial image, and topography includes at least one of following: left-eye image, right eye figure
Picture, nose image and mouth image.
Image provided by any embodiment of the invention can be performed in picture depth prediction meanss provided in an embodiment of the present invention
Depth prediction approach has and executes the corresponding functional module of picture depth prediction technique and beneficial effect.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (10)
1. a kind of picture depth prediction technique characterized by comprising
Depth prediction processing is carried out based on first nerves network handles processing image, generates overall depth figure;
Feature identification is carried out to the image to be processed based on third nerve network, generates feature point image;
Local image region is determined according to the feature point image, and processing is cut out to the local image region, is generated extremely
A few topography;
It is carried out at depth prediction based at least one topography of at least one nervus opticus network to the image to be processed
Reason, generates at least one partial depth map;
Processing is weighted to the overall depth figure and at least one described partial depth map according to default fusion weight, is generated
Merge depth map;
Wherein, the first nerves network and the third nerve network form multitask neural network, the multitask nerve
Convolutional layer of the front of network point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second default
The warp lamination of quantity, for generating the overall depth figure of the input picture, second branch include pond layer and
Full articulamentum generates at least one topography for being cut out processing to the input picture, wherein Yu Suoshu convolution
It is connected with pond layer, normalization layer and activation primitive layer after layer, activation primitive layer is connected with after the full articulamentum.
2. the method according to claim 1, wherein in the first nerves network, the third nerve network
When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and third mind
Network parameter initialization is carried out through net, and according to least disadvantage function fashion to first nerves network, the third mind after initialization
It is trained through network and nervus opticus network, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
3. the method according to claim 1, wherein the default fusion weight be it is pre-set, it is described pre-
If the setting method of fusion weight includes:
Test sample is based on the first nerves network and the nervus opticus network carries out depth prediction processing, generates test
Overall depth figure and at least one test partial depth map;
It obtains respectively local with the test in the first test error and test overall depth figure of the test partial depth map
Second test error of depth map corresponding region;
Default fusion weight is determined according to first test error and the second test error.
4. the method according to claim 1, wherein the nervus opticus network includes the volume of third preset quantity
The warp lamination of lamination and the 4th preset quantity, wherein pond layer, normalization layer and activation are connected with after Yu Suoshu convolutional layer
Function layer.
5. method according to claim 1 to 4, which is characterized in that the image to be processed is facial image, described
Topography includes at least one of following: left-eye image, eye image, nose image and mouth image.
6. a kind of picture depth prediction meanss characterized by comprising
Overall depth figure generation module is generated for carrying out depth prediction processing based on first nerves network handles processing image
Overall depth figure;
Characteristic point determining module generates feature for carrying out feature identification to the image to be processed based on third nerve network
Point image;
Topography's generation module, for determining local image region according to the feature point image, to the area, topography
Domain is cut out processing, generates at least one topography;
Partial depth map generation module, for based at least one nervus opticus network to the image to be processed at least one
Topography carries out depth prediction processing, generates at least one partial depth map;
Depth map generation module is merged, for the default fusion weight of basis to the overall depth figure and at least one described part
Depth map is weighted processing, generates fusion depth map;
Wherein, the first nerves network and the third nerve network form multitask neural network, the multitask nerve
Convolutional layer of the front of network point including the first preset quantity, for extracting the characteristic information of input picture;
The rear portion of the multitask neural network point includes the first branch and the second branch, and first branch includes second default
The warp lamination of quantity, for generating the overall depth figure of the input picture, second branch include pond layer and
Full articulamentum generates at least one topography for being cut out processing to the input picture, wherein Yu Suoshu convolution
It is connected with pond layer, normalization layer and activation primitive layer after layer, activation primitive layer is connected with after the full articulamentum.
7. device according to claim 6, which is characterized in that Yu Suoshu first nerves network, the third nerve network
When being completed with the nervus opticus network struction, to the first nerves network, the third nerve network and third mind
Network parameter initialization is carried out through net, and according to least disadvantage function fashion to first nerves network, the third mind after initialization
It is trained through network and nervus opticus network, wherein the network parameter of initialization is arranged according to one-dimensional gaussian profile.
8. device according to claim 6, which is characterized in that the default fusion weight be it is pre-set, weight is set
Setting module includes:
Depth map determination unit, for the first nerves network and the nervus opticus network to be carried out depth by test sample
Prediction processing generates test overall depth figure and at least one test partial depth map;
Test error determination unit, the first test error and test for obtaining the test partial depth map respectively are whole deep
Spend the second test error in figure with the test partial depth map corresponding region;
Weight determining unit, for determining default fusion weight according to first test error and the second test error.
9. device according to claim 6, which is characterized in that the nervus opticus network includes the volume of third preset quantity
The warp lamination of lamination and the 4th preset quantity, wherein pond layer, normalization layer and activation are connected with after Yu Suoshu convolutional layer
Function layer.
10. according to any device of claim 6-9, which is characterized in that the image to be processed is facial image, described
Topography includes at least one of following: left-eye image, eye image, nose image and mouth image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710811182.4A CN107578435B (en) | 2017-09-11 | 2017-09-11 | A kind of picture depth prediction technique and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710811182.4A CN107578435B (en) | 2017-09-11 | 2017-09-11 | A kind of picture depth prediction technique and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107578435A CN107578435A (en) | 2018-01-12 |
CN107578435B true CN107578435B (en) | 2019-11-29 |
Family
ID=61033100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710811182.4A Active CN107578435B (en) | 2017-09-11 | 2017-09-11 | A kind of picture depth prediction technique and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107578435B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108335322B (en) | 2018-02-01 | 2021-02-12 | 深圳市商汤科技有限公司 | Depth estimation method and apparatus, electronic device, program, and medium |
CN109191514B (en) * | 2018-10-23 | 2020-11-24 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating a depth detection model |
CN109829886A (en) * | 2018-12-25 | 2019-05-31 | 苏州江奥光电科技有限公司 | A kind of pcb board defect inspection method based on depth information |
CN110309706B (en) * | 2019-05-06 | 2023-05-12 | 深圳华付技术股份有限公司 | Face key point detection method and device, computer equipment and storage medium |
CN110363296B (en) * | 2019-06-28 | 2022-02-08 | 腾讯医疗健康(深圳)有限公司 | Task model obtaining method and device, storage medium and electronic device |
CN111428859A (en) * | 2020-03-05 | 2020-07-17 | 北京三快在线科技有限公司 | Depth estimation network training method and device for automatic driving scene and autonomous vehicle |
CN111414923B (en) * | 2020-03-05 | 2022-07-12 | 南昌航空大学 | Indoor scene three-dimensional reconstruction method and system based on single RGB image |
CN112488104B (en) * | 2020-11-30 | 2024-04-09 | 华为技术有限公司 | Depth and confidence estimation system |
CN116721143B (en) * | 2023-08-04 | 2023-10-20 | 南京诺源医疗器械有限公司 | Depth information processing device and method for 3D medical image |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177440A (en) * | 2012-12-20 | 2013-06-26 | 香港应用科技研究院有限公司 | System and method of generating image depth map |
CN106204522A (en) * | 2015-05-28 | 2016-12-07 | 奥多比公司 | The combined depth of single image is estimated and semantic tagger |
CN107038429A (en) * | 2017-05-03 | 2017-08-11 | 四川云图睿视科技有限公司 | A kind of multitask cascade face alignment method based on deep learning |
-
2017
- 2017-09-11 CN CN201710811182.4A patent/CN107578435B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177440A (en) * | 2012-12-20 | 2013-06-26 | 香港应用科技研究院有限公司 | System and method of generating image depth map |
CN106204522A (en) * | 2015-05-28 | 2016-12-07 | 奥多比公司 | The combined depth of single image is estimated and semantic tagger |
CN107038429A (en) * | 2017-05-03 | 2017-08-11 | 四川云图睿视科技有限公司 | A kind of multitask cascade face alignment method based on deep learning |
Non-Patent Citations (2)
Title |
---|
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture;David Eigen 等;《Proc.IEEE ICCV》;20151231;2650-2658 * |
Single-View and Multi-View Depth Fusion;Jose M. Facil 等;《IEEE ROBOTICS AND AUTOMATION LETTERS》;20170615;第2卷(第4期);1994-2001 * |
Also Published As
Publication number | Publication date |
---|---|
CN107578435A (en) | 2018-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107578435B (en) | A kind of picture depth prediction technique and device | |
CN109255831B (en) | Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning | |
CN110222628A (en) | A kind of face restorative procedure based on production confrontation network | |
CN108495110A (en) | A kind of virtual visual point image generating method fighting network based on production | |
CN110807364B (en) | Modeling and capturing method and system for three-dimensional face and eyeball motion | |
CN107609383A (en) | 3D face identity authentications and device | |
JP2008535116A (en) | Method and apparatus for three-dimensional rendering | |
CN107633165A (en) | 3D face identity authentications and device | |
CN107748869A (en) | 3D face identity authentications and device | |
CN103608847B (en) | A kind of method and apparatus built for iconic model | |
CN109978984A (en) | Face three-dimensional rebuilding method and terminal device | |
CN110197462A (en) | A kind of facial image beautifies in real time and texture synthesis method | |
CN116109798B (en) | Image data processing method, device, equipment and medium | |
EP1150254A3 (en) | Methods for creating an image for a three-dimensional display, for calculating depth information, and for image processing using the depth information | |
KR101759188B1 (en) | the automatic 3D modeliing method using 2D facial image | |
CN109598210A (en) | A kind of image processing method and device | |
CN110909634A (en) | Visible light and double infrared combined rapid in vivo detection method | |
CN111833236A (en) | Method and device for generating three-dimensional face model simulating user | |
CN110175505A (en) | Determination method, apparatus, storage medium and the electronic device of micro- expression type | |
CN106909904B (en) | Human face obverse method based on learnable deformation field | |
Beacco et al. | Automatic 3d character reconstruction from frontal and lateral monocular 2d rgb views | |
CN109218706A (en) | A method of 3 D visual image is generated by single image | |
CN105872516A (en) | Method and device for obtaining parallax parameters of three-dimensional film source | |
CN110602476B (en) | Hole filling method of Gaussian mixture model based on depth information assistance | |
CN116630508A (en) | 3D model processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221123 Address after: 518000 2nd floor, building a, Tsinghua campus, Shenzhen University Town, Xili street, Nanshan District, Shenzhen City, Guangdong Province Patentee after: Shenzhen International Graduate School of Tsinghua University Address before: 518000 Nanshan Zhiyuan 1001, Xue Yuan Avenue, Nanshan District, Shenzhen, Guangdong. Patentee before: TSINGHUA-BERKELEY SHENZHEN INSTITUTE |