CN108305229A - A kind of multiple view method for reconstructing based on deep learning profile network - Google Patents
A kind of multiple view method for reconstructing based on deep learning profile network Download PDFInfo
- Publication number
- CN108305229A CN108305229A CN201810081726.0A CN201810081726A CN108305229A CN 108305229 A CN108305229 A CN 108305229A CN 201810081726 A CN201810081726 A CN 201810081726A CN 108305229 A CN108305229 A CN 108305229A
- Authority
- CN
- China
- Prior art keywords
- profile
- network
- shape
- dimensional
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000012360 testing method Methods 0.000 claims abstract description 22
- 238000012512 characterization method Methods 0.000 claims abstract description 18
- 230000008569 process Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 30
- 238000012545 processing Methods 0.000 claims description 8
- 238000013480 data collection Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000009877 rendering Methods 0.000 claims description 6
- 238000013461 design Methods 0.000 claims description 5
- 239000000463 material Substances 0.000 claims description 5
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 abstract description 2
- 238000004458 analytical method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The present invention proposes a kind of multiple view method for reconstructing based on deep learning profile network, and main contents include:It introduces deep depth and learns framework, 3D shape encodes, build profile network, network training and test, its process is, introduce the profile network of a deep learning, network learns the 3D shape coding of one or more input pictures, new view is generated using this code adjustment decoder later, it is subsequently introduced agency's loss based on profile, when decoder does not include three-dimensional characterization, this two dimension loss of Web vector graphic encodes 3D shape, two dimension loss is not limited by three-dimensional characterization resolution ratio, generate the simultaneously huge mottled object data set network of pre-training one, profile network on data set is finely adjusted.The present invention uses neural network learning 3D shape, the profile generated in new view that network code 3D shape, the information of the multiple views of efficient combination is forced to improve multiple view reconstruction performance.
Description
Technical field
The present invention relates to view reconstruction fields, more particularly, to a kind of multiple view weight based on deep learning profile network
Construction method.
Background technology
Multiple view reconstruction is the method for recovering scene threedimensional model using the different visual angles picture of multiple scenes,
The multiview three-dimensional of natural scene rebuilds the basic problem of always computer vision field, by the three-dimensional mould for reconstructing target
Type, so that it may to carry out the relevant information of quantitative analysis and processing target to target.Multiple view reconstruction technique be applied to medicine at
Picture passes through view reconstruction, on the one hand, visual structure or function information can be obtained in biomedical imaging technology, for biology
Research or clinical diagnosis use;On the other hand, it again may be by reconstruction technique and obtain structoure of the human body model, and then before surgery
Research discusses out best therapeutic scheme.Carrying out information collection by the means such as simply take photo by plane in advance in military war can borrow
The threedimensional model in battlefield is obtained with multiple view reconstruction technique, and then seizes war first chance, preferably controls the war situation, arranges army,
Planning strategy.Society of today becomes increasingly complex, and various delinquent cases emerge one after another, and public security is handled a case the case where faced
It becomes increasingly complex, and utilizes multiple view reconstruction technique, by public security criminal-scene three-dimensional reconstruction and sunykatuib analysis, design a case
Part scene three-dimensional animation restores analysis system, and case investigation technical staff is according to spot plane sketch, scene photograph and basic
Process of commission of crime verbal description carries out scene rebuilding;In addition it can carry out video camera, object, environment, animal and the animation of human body
Design and emulation, to reach various simulation criminal-scene scenes and human body and event generation, the reproduction of process, result;Pass through case
Part scene three-dimensional animation restores the advanced rendering function of analysis, produces three-dimensional scenic picture and the cartoon of high fidelity,
These three-dimensional scenic picture, cartoon and sound, word combine, so that it may be generated for investigation, technology, commanding various three-dimensional empty
The multimedia video and imaging material of quasi- criminal-scene scene and case process.Although a lot of in multiple view reconstruction research,
It is wide in the less baseline of former view image and in the case that illumination condition is complicated, the information from multiple views is combined,
And reconstruction performance is correspondingly improved, even there is certain challenge.
The present invention proposes a kind of multiple view method for reconstructing based on deep learning profile network, introduces a deep learning
Profile network, network learns the 3D shapes of one or more input pictures coding, this coding is used to adjust later
Whole decoder generates new view, agency's loss based on profile is subsequently introduced, when decoder does not include three-dimensional characterization
It waits, this two dimension loss of Web vector graphic encodes 3D shape, and two dimension loss is not limited by three-dimensional characterization resolution ratio, raw
At the simultaneously huge mottled object data set network of pre-training one, the profile network on data set is finely adjusted.The present invention
A kind of multiple view method for reconstructing based on deep learning profile network is proposed newly to regard using neural network learning 3D shape
The profile generated in figure forces network code 3D shape, the information of the multiple views of efficient combination to improve multiple view reconstruction capability
Energy.
Invention content
For view reconstruction, the present invention proposes a kind of multiple view method for reconstructing based on deep learning profile network, draws
Enter the profile network of a deep learning, network learns the 3D shape coding of one or more input pictures, later
New view is generated using this code adjustment decoder, agency's loss based on profile is subsequently introduced, when decoder does not wrap
When characterization containing three-dimensional, this two dimension loss of Web vector graphic encodes 3D shape, and two dimension loss is not characterized by three-dimensional
The limitation of resolution ratio generates the simultaneously huge mottled object data set network of pre-training one, to the profile network on data set
It is finely adjusted.
To solve the above problems, the present invention proposes a kind of multiple view method for reconstructing based on deep learning profile network,
Main contents include:
(1) deep learning framework is introduced;
(2) 3D shape encodes;
(3) profile network is built;
(4) network training and test.
Wherein, the introducing deep learning framework, introduces mottled object data set and true sculpture data set, wherein
Mottled object data set is used for pre-training, and sculpture data set can learn and encode 3D shape for proving profile, and
New profile view is generated in variously-shaped and material sculpture, in order to which the 3D shape and processing of predicting multiple images are three-dimensional
Smooth surface is applied to smooth sculpture texture, introduces the profile network (SilNet) of a deep learning, one, network pair
Or the 3D shape coding of multiple input image is learnt, and generates new regard using this code adjustment decoder later
Figure is subsequently introduced agency's loss based on profile, when decoder does not include three-dimensional characterization, this two dimension damage of Web vector graphic
Mistake encodes 3D shape, and two dimension loss is not limited by three-dimensional characterization resolution ratio, generates and pre-training one is huge
Mottled object data set network, is finely adjusted the SilNet on data set.
Further, the mottled object data set, the smooth surface created by implicit surface form, including
11706 spot objects, by 75:10:15 ratio is divided into training set, assessment collection, test set, each object there are five image,
The coding that deep learning profile network must find 3D shape is learnt, and is shown scheme by rectangular projection in a mixer
Picture, steps are as follows:First, three light sources are randomly dispersed on object;Secondly, video camera is rotated around z-axis, each renders angle
The random selection from [0 °, 120 °] of the value of θ;Finally, using a complicated texture model, it is ensured that it is with surface scattering and mirror
Face is reflected.
Further, the true sculpture data set works out the new data set of 307 true sculptures, with mottled right
Image data collection is the same, by 75:10:15 ratio is divided into training set, assessment collection, test set, for spot object 5 views of completion
Image rendering provides the sculpture of image not stringent direction limitation.
Further, the loss, loss function are as follows:Give an angle set θ1…θNWith one group of image I1…
IN, the profile of S expression calibration, Sx,y∈ 0,1, wherein 0 represents object, 1 represents non-object, learns a function gθ′, angle, θ '
Place generates a S, and the binary cross entropy loss function L of a pixel will demarcate profile and predict that the difference between profile is minimum
Change, by being given with minor function:
Wherein, 3D shape coding, given one or more images, the new view for generating three dimensional object,
It is required that network predicts object outline by the new viewpoint angle uniquely provided, in order to execute task, network needs to encode three-dimensional
Shape, due to having concentrated profile, network does not have to the intensity of study prognostic chart picture, therefore learning process is easier, during training
The characterization for not needing several picture or 3 D stereo pixel, from geometrically, if network can predict the profile of multiple views,
Then it must encode the profile of three-dimension object in given image, therefore SilNet encodes 3D shape, then wants
It asks the coding to create the profile of new object in new viewpoint, or is detected by extracting the 3D shape learnt.
Wherein, the structure profile network, in order to generate profile from multiple views, SilNet is conciliate using encoder
The design of code device, it is made of an encoder f, and the number that encoder is replicated is no less than amount of views, due to all codings
The parameter sharing of device, so memory will not increase, encoder is combined in a pond layer φ, each module of maximum pondization
Feature vector, for learn combination feature vector, this allows SilNet to handle multiple images;It is each in processing pond layer
The important feature of image, with a decoder gθ′From feature vector to up-sampling, in the middle X-Y schemes for generating profile of new viewpoint θ '
Picture, the feature vector learnt generates a hiding 3D shape by three-dimensional decoder and characterizes, by projection layer that this is hidden
Shape project is hidden to the profile of two dimensional image, each image and θ encoded in an individual coder module, image
The parameter that size is adjusted to 112 × 112, θ is encoded as (sin θ, cos θ) to indicate the distribution of angle, these θ values pass through two
Full articulamentum transmits, and is connected to corresponding module, and in a decoder, feature vector is sampled, and is followed by one with pixel
The sigmoid function that mode carries out, and add an additional convolutional layer after most latter two up-sampling layer.
Further, the three-dimensional decoder, in order to extract three dimensional object and determine two dimensional character whether to three
The information coding for tieing up shape, changes the decoder of SilNet, so that SilNet learns three in the case where encoder is kept fixed
The hiding characterization of dimension shape to the feature vector of combination adopt in three-dimensional decoder using Three dimensional convolution transposed matrix
Sample generates the vector that a volume is 57 × 57 × 57, and volume is indicated with V, is a sigmoid function layer behind, this individual
Product can be used as the three dimensional representation of object, projects to obtain profile using projection layer, as twodimensional decoder, fall into a trap in projected outline
It calculates binary and intersects entropy loss, result of calculation shows that three dimensional representation does not have direct losses.
Further, the projection layer, uses Tθ′It indicates projection layer, gives a voxel, it is assumed that it indicates three-dimensional
Shape, using the loss function on profile by pixel projection to two dimensional image, θ ' is rotated by the first time of nearest sampler
To V, determine whether pixel is filled by the minimum value of all depth values in each location of pixels, it is assumed that in vertical direction pair
θ ' carry out orthographic projections and rotation, projected image pixel pj,kIt is given by:
Rotating frame Vθ′(i, j, k) is given by:
Wherein Vθ′(i, j, k) indicates rotating frame, is a differentiable combination of function, the Classification Loss of Pixel-level can lead to
Cross this layer of backpropagation.
Wherein, the network training and test, using detection evaluation function (IoU) data reporting collection test subregion,
For given prediction profile S and calibration profileIt is defined asWherein I is target function, if pixel corresponds to
One object, then I=1 indicate the average value of all images of average IoU, data set be randomized into training group, evaluation group and
Test group so that during object is gathered at one, this can ensure adaptives of the SilNet to invisible object, when with n-th mould
When block is trained, randomly choose N+1 view of an object, the mask of view by as the profile next to be predicted,
Remaining is as input picture, after result of calculation, then retrains SilNet and training set, assessment collection and test set;At random
The N+1 views of an object and n-th module are selected, compare disparate modules as a result, each new module includes a volume
Outer unselected view, in more different SilNet, it is ensured that these non-selected views are consistent, for spot
Shape object data set, SilNet are trained using stochastic gradient descent, and momentum is set as 0.9, and weight decays to 0.001, batch
Amount is set as 16, then network is initialized using mottled object data set, for being finely adjusted on engraved data collection.
Description of the drawings
Fig. 1 is a kind of system framework figure of the multiple view method for reconstructing based on deep learning profile network of the present invention.
Fig. 2 is a kind of new view of the generation of the multiple view method for reconstructing based on deep learning profile network of the present invention.
Fig. 3 is a kind of training frame diagram of the multiple view method for reconstructing based on deep learning profile network of the present invention.
Fig. 4 is a kind of data set sample graph of the multiple view method for reconstructing based on deep learning profile network of the present invention.
Specific implementation mode
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of system framework figure of the multiple view method for reconstructing based on deep learning profile network of the present invention.Mainly
Including introducing deep learning framework, 3D shape coding, structure profile network, network training and test.
Wherein, the introducing deep learning framework, introduces mottled object data set and true sculpture data set, wherein
Mottled object data set is used for pre-training, and sculpture data set can learn and encode 3D shape for proving profile, and
New profile view is generated in variously-shaped and material sculpture, in order to which the 3D shape and processing of predicting multiple images are three-dimensional
Smooth surface is applied to smooth sculpture texture, introduces the profile network (SilNet) of a deep learning, one, network pair
Or the 3D shape coding of multiple input image is learnt, and generates new regard using this code adjustment decoder later
Figure is subsequently introduced agency's loss based on profile, when decoder does not include three-dimensional characterization, this two dimension damage of Web vector graphic
Mistake encodes 3D shape, and two dimension loss is not limited by three-dimensional characterization resolution ratio, generates and pre-training one is huge
Mottled object data set network, is finely adjusted the SilNet on data set.
Further, the mottled object data set, the smooth surface created by implicit surface form, including
11706 spot objects, by 75:10:15 ratio is divided into training set, assessment collection, test set, each object there are five image,
The coding that deep learning profile network must find 3D shape is learnt, and is shown scheme by rectangular projection in a mixer
Picture, steps are as follows:First, three light sources are randomly dispersed on object;Secondly, video camera is rotated around z-axis, each renders angle
The random selection from [0 °, 120 °] of the value of θ;Finally, using a complicated texture model, it is ensured that it is with surface scattering and mirror
Face is reflected.
Further, the true sculpture data set works out the new data set of 307 true sculptures, with mottled right
Image data collection is the same, by 75:10:15 ratio is divided into training set, assessment collection, test set, for spot object 5 views of completion
Image rendering provides the sculpture of image not stringent direction limitation.
Wherein, 3D shape coding, given one or more images, the new view for generating three dimensional object,
It is required that network predicts object outline by the new viewpoint angle uniquely provided, in order to execute task, network needs to encode three-dimensional
Shape, due to having concentrated profile, network does not have to the intensity of study prognostic chart picture, therefore learning process is easier, during training
The characterization for not needing several picture or 3 D stereo pixel, from geometrically, if network can predict the profile of multiple views,
Then it must encode the profile of three-dimension object in given image, therefore SilNet encodes 3D shape, then wants
It asks the coding to create the profile of new object in new viewpoint, or is detected by extracting the 3D shape learnt.
Further, the three-dimensional decoder, in order to extract three dimensional object and determine two dimensional character whether to three
The information coding for tieing up shape, changes the decoder of SilNet, so that SilNet learns three in the case where encoder is kept fixed
The hiding characterization of dimension shape to the feature vector of combination adopt in three-dimensional decoder using Three dimensional convolution transposed matrix
Sample generates the vector that a volume is 57 × 57 × 57, and volume is indicated with V, is a sigmoid function layer behind, this individual
Product can be used as the three dimensional representation of object, projects to obtain profile using projection layer, as twodimensional decoder, fall into a trap in projected outline
It calculates binary and intersects entropy loss, result of calculation shows that three dimensional representation does not have direct losses.
Further, the projection layer, uses Tθ′It indicates projection layer, gives a voxel, it is assumed that it indicates three-dimensional
Shape, using the loss function on profile by pixel projection to two dimensional image, θ ' is rotated by the first time of nearest sampler
To V, determine whether pixel is filled by the minimum value of all depth values in each location of pixels, it is assumed that in vertical direction pair
θ ' carry out orthographic projections and rotation, projected image pixel pj,kIt is given by:
Rotating frame Vθ′(i, j, k) is given by:
Wherein Vθ′(i, j, k) indicates rotating frame, is a differentiable combination of function, the Classification Loss of Pixel-level can lead to
Cross this layer of backpropagation.
Wherein, the network training and test, using detection evaluation function (IoU) data reporting collection test subregion,
For given prediction profile S and calibration profileIt is defined asWherein I is target function, if pixel corresponds to
One object, then I=1 indicate the average value of all images of average IoU, data set be randomized into training group, evaluation group and
Test group so that during object is gathered at one, this can ensure adaptives of the SilNet to invisible object, when with n-th mould
When block is trained, randomly choose N+1 view of an object, the mask of view by as the profile next to be predicted,
Remaining is as input picture, after result of calculation, then retrains SilNet and training set, assessment collection and test set;At random
The N+1 views of an object and n-th module are selected, compare disparate modules as a result, each new module includes a volume
Outer unselected view, in more different SilNet, it is ensured that these non-selected views are consistent, for spot
Shape object data set, SilNet are trained using stochastic gradient descent, and momentum is set as 0.9, and weight decays to 0.001, batch
Amount is set as 16, then network is initialized using mottled object data set, for being finely adjusted on engraved data collection.
Fig. 2 is a kind of new view of the generation of the multiple view method for reconstructing based on deep learning profile network of the present invention.
SilNet handles image and renders angle, θ simultaneously, generates the new view of sculpture.Give an angle set θ1…θNWith a group picture
As I1…IN, the profile of S expression calibration, Sx,y∈ 0,1, wherein 0 represents object, 1 represents non-object, learns a function gθ′,
Angle, θ ' place generates a S, and the binary cross entropy loss function L of a pixel will demarcate profile and predict the difference between profile
It minimizes, by being given with minor function:
Fig. 3 is a kind of training frame diagram of the multiple view method for reconstructing based on deep learning profile network of the present invention.Pass through
Independent encoder f processing image I and rendering angle, θ export feature vector, and decoder g is in the middle generation targets of new view θ '
Profile.In order to generate profile from multiple views, SilNet uses the design of encoder and decoder, it is by an encoder f
Composition, the number that encoder is replicated is no less than amount of views, due to the parameter sharing of all encoders, so memory will not increase
Add, encoder is combined in a pond layer φ, the feature vector of maximum each module of pondization, for learning the spy of combination
Sign vector, this allows SilNet to handle multiple images;The important feature of each image in processing pond layer, with a decoder
gθ′From feature vector to up-sampling, in the middle two dimensional images for generating profile of new viewpoint θ ', the feature vector learnt passes through three-dimensional
Decoder generates a hiding 3D shape characterization, by projection layer by this hide shape project to two dimensional image profile,
Each image and θ are encoded in an individual coder module, and the size of image is adjusted to the parameter of 112 × 112, θ
(sin θ, cos θ) is encoded as to indicate the distribution of angle, these θ values are transmitted by two full articulamentums, and are connected to corresponding
Module, in a decoder, feature vector is sampled, and is followed by a sigmoid function carried out with pixel-wise, and last
An additional convolutional layer is added after two up-sampling layers.
Fig. 4 is a kind of data set sample graph of the multiple view method for reconstructing based on deep learning profile network of the present invention.Come
The diversity and complexity of sculpture are presented from the sample of each data set, mottled object data set is used for pre-training, sculpture
Data set generates new wheel for proving that profile can learn and encode 3D shape in variously-shaped and material sculpture
Wide view.
For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present invention
In the case of refreshing and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair
Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's
Protection domain.Therefore, the following claims are intended to be interpreted as including preferred embodiment and falls into all changes of the scope of the invention
More and change.
Claims (10)
1. a kind of multiple view method for reconstructing based on deep learning profile network, which is characterized in that main includes introducing depth
Practise framework (one);3D shape encodes (two);Build profile network (three);Network training and test (four).
2. based on the introducing deep learning framework (one) described in claims 1, which is characterized in that introduce mottled object data
Collection and true sculpture data set, wherein mottled object data set is used for pre-training, sculpture data set is for proving that profile can
Study and coding 3D shape, and new profile view is generated in variously-shaped and material sculpture, in order to predict multiple figures
The 3D shape of picture and the three-dimensional smooth surface of processing, are applied to smooth sculpture texture, introduce the wheel of a deep learning
Wide network (SilNet), network learn the 3D shape coding of one or more input pictures, use this volume later
Code adjusts decoder to generate new view, is subsequently introduced agency's loss based on profile, when decoder does not include three-dimensional characterize
When, this two dimension loss of Web vector graphic encodes 3D shape, and two dimension loss is not limited by three-dimensional characterization resolution ratio
System generates the simultaneously huge mottled object data set network of pre-training one, is finely adjusted to the SilNet on data set.
3. based on the mottled object data set described in claims 2, which is characterized in that the smooth song created by implicit surface
Face forms, including 11706 spot objects, by 75:10:15 ratio is divided into training set, assessment collection, test set, each object
There are five images, and the coding that deep learning profile network must find 3D shape is learnt, in a mixer by orthogonal
Projection-display image, steps are as follows:First, three light sources are randomly dispersed on object;Secondly, video camera is rotated around z-axis, often
The random selection from [0 °, 120 °] of a value for rendering angle, θ;Finally, using a complicated texture model, it is ensured that it is with table
Area scattering and mirror-reflection.
4. based on the true sculpture data set described in claims 2, which is characterized in that the new number of 307 true sculptures of establishment
It is the same with mottled object data set according to collection, by 75:10:15 ratio is divided into training set, assessment collection, test set, is spot pair
Image rendering as completing 5 views provides the sculpture of image not stringent direction limitation.
5. based on the loss described in claims 2, which is characterized in that loss function is as follows:Give an angle set θ1…θN
With one group of image I1…IN, the profile of S expression calibration, Sx,y∈ 0,1, wherein 0 represents object, 1 represents non-object, learns a letter
Number gθ′, generate a S in angle, θ ' place, the binary cross entropy loss function L of a pixel will demarcate profile and predict profile it
Between difference minimize, by being given with minor function:
6. encoding (two) based on the 3D shape described in claims 1, which is characterized in that given one or more image is used
In the new view for generating three dimensional object, it is desirable that network predicts object outline by the new viewpoint angle uniquely provided, in order to hold
Row task, network need to encode 3D shape, and due to having concentrated profile, network does not have to the intensity of study prognostic chart picture, therefore learns
Habit process is easier, and the characterization of several picture or 3 D stereo pixel is not needed during training, from geometrically, if network
It can predict the profile of multiple views, then it must encode the profile of three-dimension object in given image, therefore SilNet pairs
3D shape is encoded, and the coding is then required to create the profile of new object in new viewpoint, or learnt by extraction
3D shape is detected.
7. based on the structure profile network (three) described in claims 1, which is characterized in that in order to generate wheel from multiple views
Exterior feature, SilNet use the design of encoder and decoder, it is made of an encoder f, and the number that encoder is replicated is many
In amount of views, due to the parameter sharing of all encoders, so memory will not increase, encoder is combined in a pond layer
In φ, the feature vector of maximum each module of pondization, for learning the feature vector of combination, it is more that this allows SilNet to handle
A image;The important feature of each image in processing pond layer, with a decoder gθ′From feature vector to up-sampling, regarded newly
The middle two dimensional images for generating profile of point θ ', the feature vector learnt generate a hiding 3D shape by three-dimensional decoder
Characterization, hides shape project to the profile of two dimensional image, each image and θ are in an individual encoder by projection layer by this
It is encoded in module, the parameter that the size of image is adjusted to 112 × 112, θ is encoded as (sin θ, cos θ) to indicate angle
Distribution, these θ values are transmitted by two full articulamentums, and are connected to corresponding module, in a decoder, in feature vector
Sampling, is followed by a sigmoid function carried out with pixel-wise, and addition one is additionally after most latter two up-sampling layer
Convolutional layer.
8. based on the three-dimensional decoder described in claims 7, which is characterized in that in order to extract three dimensional object and determine that two dimension is special
Whether sign changes the decoder of SilNet, so that SilNet in encoder be kept fixed to the information coding of 3D shape
In the case of learn 3D shape hiding characterization, in three-dimensional decoder, using Three dimensional convolution transposed matrix to the spy of combination
Sign vector is up-sampled, and generates the vector that a volume is 57 × 57 × 57, volume is indicated with V, is a S-shaped behind
Function layer, this volume can be used as the three dimensional representation of object, project to obtain profile using projection layer, as twodimensional decoder,
Binary is calculated in projected outline and intersects entropy loss, and result of calculation shows that three dimensional representation does not have direct losses.
9. based on the projection layer described in claims 7, which is characterized in that use Tθ′It indicates projection layer, gives a voxel,
Assuming that it indicates 3D shape, using the loss function on profile by pixel projection to two dimensional image, θ ' passes through nearest sampler
First time rotate to obtain V, determine whether pixel is filled by the minimum value of all depth values in each location of pixels, it is false
Vertical direction is located to θ ' carry out orthographic projections and rotation, projected image pixel pj,kIt is given by:
Rotating frame Vθ′(i, j, k) is given by:
Wherein Vθ′(i, j, k) indicates rotating frame, is a differentiable combination of function, the Classification Loss of Pixel-level can pass through this
A layer of backpropagation.
10. based on described in claims 1 network training and test (four), which is characterized in that using detection evaluation function
(IoU) the test subregion of data reporting collection, for given prediction profile S and calibration profileIt is defined asIts
Middle I is target function, if pixel corresponds to an object, I=1 indicates the average value of all images of average IoU, data
Collection is randomized into training group, evaluation group and test group so that during object is gathered at one, this can ensure SilNet to can not
The adaptive for seeing object randomly chooses N+1 view of an object, the mask of view when being trained with n-th module
By as the profile next to be predicted, remaining after result of calculation, then retrains SilNet and training as input picture
Collection, assessment collection and test set;Randomly choose the N+1 views of an object and n-th module, compare disparate modules as a result,
Each new module includes an additional unselected view, in more different SilNet, it is ensured that these are unselected
The view selected is consistent, and for mottled object data set, SilNet is trained using stochastic gradient descent, momentum setting
It is 0.9, weight decays to 0.001, and then batch setting 16 initializes network using mottled object data set, be used for
It is finely adjusted on engraved data collection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810081726.0A CN108305229A (en) | 2018-01-29 | 2018-01-29 | A kind of multiple view method for reconstructing based on deep learning profile network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810081726.0A CN108305229A (en) | 2018-01-29 | 2018-01-29 | A kind of multiple view method for reconstructing based on deep learning profile network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108305229A true CN108305229A (en) | 2018-07-20 |
Family
ID=62866589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810081726.0A Withdrawn CN108305229A (en) | 2018-01-29 | 2018-01-29 | A kind of multiple view method for reconstructing based on deep learning profile network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108305229A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255831A (en) * | 2018-09-21 | 2019-01-22 | 南京大学 | The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate |
CN109410289A (en) * | 2018-11-09 | 2019-03-01 | 中国科学院武汉物理与数学研究所 | A kind of high lack sampling hyperpolarized gas lung MRI method for reconstructing of deep learning |
CN109655011A (en) * | 2018-12-13 | 2019-04-19 | 北京健康有益科技有限公司 | A kind of method and system of Human Modeling dimension measurement |
CN109840500A (en) * | 2019-01-31 | 2019-06-04 | 深圳市商汤科技有限公司 | A kind of 3 D human body posture information detection method and device |
CN109934107A (en) * | 2019-01-31 | 2019-06-25 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
CN109977945A (en) * | 2019-02-26 | 2019-07-05 | 博众精工科技股份有限公司 | Localization method and system based on deep learning |
CN109993825A (en) * | 2019-03-11 | 2019-07-09 | 北京工业大学 | A kind of three-dimensional rebuilding method based on deep learning |
CN110113593A (en) * | 2019-06-11 | 2019-08-09 | 南开大学 | Wide baseline multi-view point video synthetic method based on convolutional neural networks |
CN110197156A (en) * | 2019-05-30 | 2019-09-03 | 清华大学 | Manpower movement and the shape similarity metric method and device of single image based on deep learning |
CN110234040A (en) * | 2019-05-10 | 2019-09-13 | 九阳股份有限公司 | A kind of the food materials image acquiring method and cooking equipment of cooking equipment |
CN110675488A (en) * | 2019-09-24 | 2020-01-10 | 电子科技大学 | Construction method of creative three-dimensional voxel model modeling system based on deep learning |
CN111105475A (en) * | 2019-12-24 | 2020-05-05 | 电子科技大学 | Bone three-dimensional reconstruction method based on orthogonal angle X-ray |
CN111337898A (en) * | 2020-02-19 | 2020-06-26 | 北京百度网讯科技有限公司 | Laser point cloud processing method, device, equipment and storage medium |
CN111340067A (en) * | 2020-02-10 | 2020-06-26 | 天津大学 | Redistribution method for multi-view classification |
CN112037200A (en) * | 2020-08-31 | 2020-12-04 | 上海交通大学 | Method for automatically identifying anatomical features and reconstructing model in medical image |
CN112733698A (en) * | 2021-01-05 | 2021-04-30 | 北京大学 | Three-dimensional multi-view covariant representation learning method and three-dimensional object identification method |
CN113506362A (en) * | 2021-06-02 | 2021-10-15 | 湖南大学 | Method for synthesizing new view of single-view transparent object based on coding and decoding network |
CN113808006A (en) * | 2021-09-01 | 2021-12-17 | 南京信息工程大学 | Method and device for reconstructing three-dimensional grid model based on two-dimensional image |
CN114119723A (en) * | 2020-08-25 | 2022-03-01 | 辉达公司 | Generating views using one or more neural networks |
CN116434009A (en) * | 2023-04-19 | 2023-07-14 | 应急管理部国家减灾中心(应急管理部卫星减灾应用中心) | Construction method and system for deep learning sample set of damaged building |
CN117047286A (en) * | 2023-10-09 | 2023-11-14 | 东莞市富明钮扣有限公司 | Method for processing workpiece surface by laser, processing system, processor and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106780543A (en) * | 2017-01-13 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of double framework estimating depths and movement technique based on convolutional neural networks |
CN107341506A (en) * | 2017-06-12 | 2017-11-10 | 华南理工大学 | A kind of Image emotional semantic classification method based on the expression of many-sided deep learning |
-
2018
- 2018-01-29 CN CN201810081726.0A patent/CN108305229A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106780543A (en) * | 2017-01-13 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of double framework estimating depths and movement technique based on convolutional neural networks |
CN107341506A (en) * | 2017-06-12 | 2017-11-10 | 华南理工大学 | A kind of Image emotional semantic classification method based on the expression of many-sided deep learning |
Non-Patent Citations (1)
Title |
---|
OLIVIA WILES: "SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes", 《ARXIV:1711.07888V1》 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255831B (en) * | 2018-09-21 | 2020-06-12 | 南京大学 | Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning |
CN109255831A (en) * | 2018-09-21 | 2019-01-22 | 南京大学 | The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate |
CN109410289A (en) * | 2018-11-09 | 2019-03-01 | 中国科学院武汉物理与数学研究所 | A kind of high lack sampling hyperpolarized gas lung MRI method for reconstructing of deep learning |
CN109410289B (en) * | 2018-11-09 | 2021-11-12 | 中国科学院精密测量科学与技术创新研究院 | Deep learning high undersampling hyperpolarized gas lung MRI reconstruction method |
CN109655011A (en) * | 2018-12-13 | 2019-04-19 | 北京健康有益科技有限公司 | A kind of method and system of Human Modeling dimension measurement |
CN109934107B (en) * | 2019-01-31 | 2022-03-01 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic device and storage medium |
CN109840500A (en) * | 2019-01-31 | 2019-06-04 | 深圳市商汤科技有限公司 | A kind of 3 D human body posture information detection method and device |
CN109934107A (en) * | 2019-01-31 | 2019-06-25 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
CN109977945A (en) * | 2019-02-26 | 2019-07-05 | 博众精工科技股份有限公司 | Localization method and system based on deep learning |
CN109993825A (en) * | 2019-03-11 | 2019-07-09 | 北京工业大学 | A kind of three-dimensional rebuilding method based on deep learning |
CN109993825B (en) * | 2019-03-11 | 2023-06-20 | 北京工业大学 | Three-dimensional reconstruction method based on deep learning |
CN110234040B (en) * | 2019-05-10 | 2022-08-09 | 九阳股份有限公司 | Food material image acquisition method of cooking equipment and cooking equipment |
CN110234040A (en) * | 2019-05-10 | 2019-09-13 | 九阳股份有限公司 | A kind of the food materials image acquiring method and cooking equipment of cooking equipment |
CN110197156A (en) * | 2019-05-30 | 2019-09-03 | 清华大学 | Manpower movement and the shape similarity metric method and device of single image based on deep learning |
CN110197156B (en) * | 2019-05-30 | 2021-08-17 | 清华大学 | Single-image human hand action and shape reconstruction method and device based on deep learning |
CN110113593A (en) * | 2019-06-11 | 2019-08-09 | 南开大学 | Wide baseline multi-view point video synthetic method based on convolutional neural networks |
CN110675488B (en) * | 2019-09-24 | 2023-02-28 | 电子科技大学 | Method for constructing modeling system of creative three-dimensional voxel model based on deep learning |
CN110675488A (en) * | 2019-09-24 | 2020-01-10 | 电子科技大学 | Construction method of creative three-dimensional voxel model modeling system based on deep learning |
CN111105475A (en) * | 2019-12-24 | 2020-05-05 | 电子科技大学 | Bone three-dimensional reconstruction method based on orthogonal angle X-ray |
CN111105475B (en) * | 2019-12-24 | 2022-11-22 | 电子科技大学 | Bone three-dimensional reconstruction method based on orthogonal angle X-ray |
CN111340067A (en) * | 2020-02-10 | 2020-06-26 | 天津大学 | Redistribution method for multi-view classification |
CN111340067B (en) * | 2020-02-10 | 2022-07-08 | 天津大学 | Redistribution method for multi-view classification |
CN111337898A (en) * | 2020-02-19 | 2020-06-26 | 北京百度网讯科技有限公司 | Laser point cloud processing method, device, equipment and storage medium |
CN111337898B (en) * | 2020-02-19 | 2022-10-14 | 北京百度网讯科技有限公司 | Laser point cloud processing method, device, equipment and storage medium |
CN114119723A (en) * | 2020-08-25 | 2022-03-01 | 辉达公司 | Generating views using one or more neural networks |
CN112037200A (en) * | 2020-08-31 | 2020-12-04 | 上海交通大学 | Method for automatically identifying anatomical features and reconstructing model in medical image |
CN112733698A (en) * | 2021-01-05 | 2021-04-30 | 北京大学 | Three-dimensional multi-view covariant representation learning method and three-dimensional object identification method |
CN113506362A (en) * | 2021-06-02 | 2021-10-15 | 湖南大学 | Method for synthesizing new view of single-view transparent object based on coding and decoding network |
CN113506362B (en) * | 2021-06-02 | 2024-03-19 | 湖南大学 | Method for synthesizing new view of single-view transparent object based on coding and decoding network |
CN113808006A (en) * | 2021-09-01 | 2021-12-17 | 南京信息工程大学 | Method and device for reconstructing three-dimensional grid model based on two-dimensional image |
CN116434009A (en) * | 2023-04-19 | 2023-07-14 | 应急管理部国家减灾中心(应急管理部卫星减灾应用中心) | Construction method and system for deep learning sample set of damaged building |
CN116434009B (en) * | 2023-04-19 | 2023-10-24 | 应急管理部国家减灾中心(应急管理部卫星减灾应用中心) | Construction method and system for deep learning sample set of damaged building |
CN117047286A (en) * | 2023-10-09 | 2023-11-14 | 东莞市富明钮扣有限公司 | Method for processing workpiece surface by laser, processing system, processor and storage medium |
CN117047286B (en) * | 2023-10-09 | 2024-01-16 | 东莞市富明钮扣有限公司 | Method for processing workpiece surface by laser, processing system, processor and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108305229A (en) | A kind of multiple view method for reconstructing based on deep learning profile network | |
Zhao et al. | Generative multiplane images: Making a 2d gan 3d-aware | |
Mildenhall et al. | Nerf: Representing scenes as neural radiance fields for view synthesis | |
Chibane et al. | Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes | |
CN109255831B (en) | Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning | |
Georgoulis et al. | Reflectance and natural illumination from single-material specular objects using deep learning | |
CN113706714A (en) | New visual angle synthesis method based on depth image and nerve radiation field | |
WO2021164759A1 (en) | Three-dimensional facial reconstruction | |
DE102019008168A1 (en) | Dynamic estimation of lighting parameters for positions within reality-enhanced scenes using a neural network | |
CN111753698B (en) | Multi-mode three-dimensional point cloud segmentation system and method | |
CN114255313B (en) | Three-dimensional reconstruction method and device for mirror surface object, computer equipment and storage medium | |
CN108665414A (en) | Natural scene picture generation method | |
CN114067041A (en) | Material generation method and device of three-dimensional model, computer equipment and storage medium | |
Zhou et al. | Photomat: A material generator learned from single flash photos | |
CN117372644A (en) | Three-dimensional content generation method based on period implicit representation | |
Li et al. | SpectralNeRF: Physically Based Spectral Rendering with Neural Radiance Field | |
Martin et al. | Synthesising light field volume visualisations using image warping in real-time | |
Lucas et al. | Discovering 3-D Structure of LEO Obects | |
Fu et al. | Multi-scene representation learning with neural radiance fields | |
CN117333609B (en) | Image rendering method, network training method, device and medium | |
Cyriac | Neural Radiance Fields (NeRF) for 3D Visualization Rendering Based on 2D Images | |
CN118710792A (en) | 3D Gaussian sputtering scene reconstruction method based on view dependency difference decoupling | |
Papathomas et al. | The application of depth separation to the display of large data sets | |
CN118608667A (en) | Complex scene efficient rendering method based on visual perception radiation field | |
Jiang | Constructing and Assessing Surrogates for Volume Visualization Using Neural Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180720 |