CN108305229A - A kind of multiple view method for reconstructing based on deep learning profile network - Google Patents

A kind of multiple view method for reconstructing based on deep learning profile network Download PDF

Info

Publication number
CN108305229A
CN108305229A CN201810081726.0A CN201810081726A CN108305229A CN 108305229 A CN108305229 A CN 108305229A CN 201810081726 A CN201810081726 A CN 201810081726A CN 108305229 A CN108305229 A CN 108305229A
Authority
CN
China
Prior art keywords
profile
network
shape
dimensional
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810081726.0A
Other languages
Chinese (zh)
Inventor
夏春秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vision Technology Co Ltd
Original Assignee
Shenzhen Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vision Technology Co Ltd filed Critical Shenzhen Vision Technology Co Ltd
Priority to CN201810081726.0A priority Critical patent/CN108305229A/en
Publication of CN108305229A publication Critical patent/CN108305229A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present invention proposes a kind of multiple view method for reconstructing based on deep learning profile network, and main contents include:It introduces deep depth and learns framework, 3D shape encodes, build profile network, network training and test, its process is, introduce the profile network of a deep learning, network learns the 3D shape coding of one or more input pictures, new view is generated using this code adjustment decoder later, it is subsequently introduced agency's loss based on profile, when decoder does not include three-dimensional characterization, this two dimension loss of Web vector graphic encodes 3D shape, two dimension loss is not limited by three-dimensional characterization resolution ratio, generate the simultaneously huge mottled object data set network of pre-training one, profile network on data set is finely adjusted.The present invention uses neural network learning 3D shape, the profile generated in new view that network code 3D shape, the information of the multiple views of efficient combination is forced to improve multiple view reconstruction performance.

Description

A kind of multiple view method for reconstructing based on deep learning profile network
Technical field
The present invention relates to view reconstruction fields, more particularly, to a kind of multiple view weight based on deep learning profile network Construction method.
Background technology
Multiple view reconstruction is the method for recovering scene threedimensional model using the different visual angles picture of multiple scenes, The multiview three-dimensional of natural scene rebuilds the basic problem of always computer vision field, by the three-dimensional mould for reconstructing target Type, so that it may to carry out the relevant information of quantitative analysis and processing target to target.Multiple view reconstruction technique be applied to medicine at Picture passes through view reconstruction, on the one hand, visual structure or function information can be obtained in biomedical imaging technology, for biology Research or clinical diagnosis use;On the other hand, it again may be by reconstruction technique and obtain structoure of the human body model, and then before surgery Research discusses out best therapeutic scheme.Carrying out information collection by the means such as simply take photo by plane in advance in military war can borrow The threedimensional model in battlefield is obtained with multiple view reconstruction technique, and then seizes war first chance, preferably controls the war situation, arranges army, Planning strategy.Society of today becomes increasingly complex, and various delinquent cases emerge one after another, and public security is handled a case the case where faced It becomes increasingly complex, and utilizes multiple view reconstruction technique, by public security criminal-scene three-dimensional reconstruction and sunykatuib analysis, design a case Part scene three-dimensional animation restores analysis system, and case investigation technical staff is according to spot plane sketch, scene photograph and basic Process of commission of crime verbal description carries out scene rebuilding;In addition it can carry out video camera, object, environment, animal and the animation of human body Design and emulation, to reach various simulation criminal-scene scenes and human body and event generation, the reproduction of process, result;Pass through case Part scene three-dimensional animation restores the advanced rendering function of analysis, produces three-dimensional scenic picture and the cartoon of high fidelity, These three-dimensional scenic picture, cartoon and sound, word combine, so that it may be generated for investigation, technology, commanding various three-dimensional empty The multimedia video and imaging material of quasi- criminal-scene scene and case process.Although a lot of in multiple view reconstruction research, It is wide in the less baseline of former view image and in the case that illumination condition is complicated, the information from multiple views is combined, And reconstruction performance is correspondingly improved, even there is certain challenge.
The present invention proposes a kind of multiple view method for reconstructing based on deep learning profile network, introduces a deep learning Profile network, network learns the 3D shapes of one or more input pictures coding, this coding is used to adjust later Whole decoder generates new view, agency's loss based on profile is subsequently introduced, when decoder does not include three-dimensional characterization It waits, this two dimension loss of Web vector graphic encodes 3D shape, and two dimension loss is not limited by three-dimensional characterization resolution ratio, raw At the simultaneously huge mottled object data set network of pre-training one, the profile network on data set is finely adjusted.The present invention A kind of multiple view method for reconstructing based on deep learning profile network is proposed newly to regard using neural network learning 3D shape The profile generated in figure forces network code 3D shape, the information of the multiple views of efficient combination to improve multiple view reconstruction capability Energy.
Invention content
For view reconstruction, the present invention proposes a kind of multiple view method for reconstructing based on deep learning profile network, draws Enter the profile network of a deep learning, network learns the 3D shape coding of one or more input pictures, later New view is generated using this code adjustment decoder, agency's loss based on profile is subsequently introduced, when decoder does not wrap When characterization containing three-dimensional, this two dimension loss of Web vector graphic encodes 3D shape, and two dimension loss is not characterized by three-dimensional The limitation of resolution ratio generates the simultaneously huge mottled object data set network of pre-training one, to the profile network on data set It is finely adjusted.
To solve the above problems, the present invention proposes a kind of multiple view method for reconstructing based on deep learning profile network, Main contents include:
(1) deep learning framework is introduced;
(2) 3D shape encodes;
(3) profile network is built;
(4) network training and test.
Wherein, the introducing deep learning framework, introduces mottled object data set and true sculpture data set, wherein Mottled object data set is used for pre-training, and sculpture data set can learn and encode 3D shape for proving profile, and New profile view is generated in variously-shaped and material sculpture, in order to which the 3D shape and processing of predicting multiple images are three-dimensional Smooth surface is applied to smooth sculpture texture, introduces the profile network (SilNet) of a deep learning, one, network pair Or the 3D shape coding of multiple input image is learnt, and generates new regard using this code adjustment decoder later Figure is subsequently introduced agency's loss based on profile, when decoder does not include three-dimensional characterization, this two dimension damage of Web vector graphic Mistake encodes 3D shape, and two dimension loss is not limited by three-dimensional characterization resolution ratio, generates and pre-training one is huge Mottled object data set network, is finely adjusted the SilNet on data set.
Further, the mottled object data set, the smooth surface created by implicit surface form, including 11706 spot objects, by 75:10:15 ratio is divided into training set, assessment collection, test set, each object there are five image, The coding that deep learning profile network must find 3D shape is learnt, and is shown scheme by rectangular projection in a mixer Picture, steps are as follows:First, three light sources are randomly dispersed on object;Secondly, video camera is rotated around z-axis, each renders angle The random selection from [0 °, 120 °] of the value of θ;Finally, using a complicated texture model, it is ensured that it is with surface scattering and mirror Face is reflected.
Further, the true sculpture data set works out the new data set of 307 true sculptures, with mottled right Image data collection is the same, by 75:10:15 ratio is divided into training set, assessment collection, test set, for spot object 5 views of completion Image rendering provides the sculpture of image not stringent direction limitation.
Further, the loss, loss function are as follows:Give an angle set θ1…θNWith one group of image I1… IN, the profile of S expression calibration, Sx,y∈ 0,1, wherein 0 represents object, 1 represents non-object, learns a function gθ′, angle, θ ' Place generates a S, and the binary cross entropy loss function L of a pixel will demarcate profile and predict that the difference between profile is minimum Change, by being given with minor function:
Wherein, 3D shape coding, given one or more images, the new view for generating three dimensional object, It is required that network predicts object outline by the new viewpoint angle uniquely provided, in order to execute task, network needs to encode three-dimensional Shape, due to having concentrated profile, network does not have to the intensity of study prognostic chart picture, therefore learning process is easier, during training The characterization for not needing several picture or 3 D stereo pixel, from geometrically, if network can predict the profile of multiple views, Then it must encode the profile of three-dimension object in given image, therefore SilNet encodes 3D shape, then wants It asks the coding to create the profile of new object in new viewpoint, or is detected by extracting the 3D shape learnt.
Wherein, the structure profile network, in order to generate profile from multiple views, SilNet is conciliate using encoder The design of code device, it is made of an encoder f, and the number that encoder is replicated is no less than amount of views, due to all codings The parameter sharing of device, so memory will not increase, encoder is combined in a pond layer φ, each module of maximum pondization Feature vector, for learn combination feature vector, this allows SilNet to handle multiple images;It is each in processing pond layer The important feature of image, with a decoder gθ′From feature vector to up-sampling, in the middle X-Y schemes for generating profile of new viewpoint θ ' Picture, the feature vector learnt generates a hiding 3D shape by three-dimensional decoder and characterizes, by projection layer that this is hidden Shape project is hidden to the profile of two dimensional image, each image and θ encoded in an individual coder module, image The parameter that size is adjusted to 112 × 112, θ is encoded as (sin θ, cos θ) to indicate the distribution of angle, these θ values pass through two Full articulamentum transmits, and is connected to corresponding module, and in a decoder, feature vector is sampled, and is followed by one with pixel The sigmoid function that mode carries out, and add an additional convolutional layer after most latter two up-sampling layer.
Further, the three-dimensional decoder, in order to extract three dimensional object and determine two dimensional character whether to three The information coding for tieing up shape, changes the decoder of SilNet, so that SilNet learns three in the case where encoder is kept fixed The hiding characterization of dimension shape to the feature vector of combination adopt in three-dimensional decoder using Three dimensional convolution transposed matrix Sample generates the vector that a volume is 57 × 57 × 57, and volume is indicated with V, is a sigmoid function layer behind, this individual Product can be used as the three dimensional representation of object, projects to obtain profile using projection layer, as twodimensional decoder, fall into a trap in projected outline It calculates binary and intersects entropy loss, result of calculation shows that three dimensional representation does not have direct losses.
Further, the projection layer, uses Tθ′It indicates projection layer, gives a voxel, it is assumed that it indicates three-dimensional Shape, using the loss function on profile by pixel projection to two dimensional image, θ ' is rotated by the first time of nearest sampler To V, determine whether pixel is filled by the minimum value of all depth values in each location of pixels, it is assumed that in vertical direction pair θ ' carry out orthographic projections and rotation, projected image pixel pj,kIt is given by:
Rotating frame Vθ′(i, j, k) is given by:
Wherein Vθ′(i, j, k) indicates rotating frame, is a differentiable combination of function, the Classification Loss of Pixel-level can lead to Cross this layer of backpropagation.
Wherein, the network training and test, using detection evaluation function (IoU) data reporting collection test subregion, For given prediction profile S and calibration profileIt is defined asWherein I is target function, if pixel corresponds to One object, then I=1 indicate the average value of all images of average IoU, data set be randomized into training group, evaluation group and Test group so that during object is gathered at one, this can ensure adaptives of the SilNet to invisible object, when with n-th mould When block is trained, randomly choose N+1 view of an object, the mask of view by as the profile next to be predicted, Remaining is as input picture, after result of calculation, then retrains SilNet and training set, assessment collection and test set;At random The N+1 views of an object and n-th module are selected, compare disparate modules as a result, each new module includes a volume Outer unselected view, in more different SilNet, it is ensured that these non-selected views are consistent, for spot Shape object data set, SilNet are trained using stochastic gradient descent, and momentum is set as 0.9, and weight decays to 0.001, batch Amount is set as 16, then network is initialized using mottled object data set, for being finely adjusted on engraved data collection.
Description of the drawings
Fig. 1 is a kind of system framework figure of the multiple view method for reconstructing based on deep learning profile network of the present invention.
Fig. 2 is a kind of new view of the generation of the multiple view method for reconstructing based on deep learning profile network of the present invention.
Fig. 3 is a kind of training frame diagram of the multiple view method for reconstructing based on deep learning profile network of the present invention.
Fig. 4 is a kind of data set sample graph of the multiple view method for reconstructing based on deep learning profile network of the present invention.
Specific implementation mode
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of system framework figure of the multiple view method for reconstructing based on deep learning profile network of the present invention.Mainly Including introducing deep learning framework, 3D shape coding, structure profile network, network training and test.
Wherein, the introducing deep learning framework, introduces mottled object data set and true sculpture data set, wherein Mottled object data set is used for pre-training, and sculpture data set can learn and encode 3D shape for proving profile, and New profile view is generated in variously-shaped and material sculpture, in order to which the 3D shape and processing of predicting multiple images are three-dimensional Smooth surface is applied to smooth sculpture texture, introduces the profile network (SilNet) of a deep learning, one, network pair Or the 3D shape coding of multiple input image is learnt, and generates new regard using this code adjustment decoder later Figure is subsequently introduced agency's loss based on profile, when decoder does not include three-dimensional characterization, this two dimension damage of Web vector graphic Mistake encodes 3D shape, and two dimension loss is not limited by three-dimensional characterization resolution ratio, generates and pre-training one is huge Mottled object data set network, is finely adjusted the SilNet on data set.
Further, the mottled object data set, the smooth surface created by implicit surface form, including 11706 spot objects, by 75:10:15 ratio is divided into training set, assessment collection, test set, each object there are five image, The coding that deep learning profile network must find 3D shape is learnt, and is shown scheme by rectangular projection in a mixer Picture, steps are as follows:First, three light sources are randomly dispersed on object;Secondly, video camera is rotated around z-axis, each renders angle The random selection from [0 °, 120 °] of the value of θ;Finally, using a complicated texture model, it is ensured that it is with surface scattering and mirror Face is reflected.
Further, the true sculpture data set works out the new data set of 307 true sculptures, with mottled right Image data collection is the same, by 75:10:15 ratio is divided into training set, assessment collection, test set, for spot object 5 views of completion Image rendering provides the sculpture of image not stringent direction limitation.
Wherein, 3D shape coding, given one or more images, the new view for generating three dimensional object, It is required that network predicts object outline by the new viewpoint angle uniquely provided, in order to execute task, network needs to encode three-dimensional Shape, due to having concentrated profile, network does not have to the intensity of study prognostic chart picture, therefore learning process is easier, during training The characterization for not needing several picture or 3 D stereo pixel, from geometrically, if network can predict the profile of multiple views, Then it must encode the profile of three-dimension object in given image, therefore SilNet encodes 3D shape, then wants It asks the coding to create the profile of new object in new viewpoint, or is detected by extracting the 3D shape learnt.
Further, the three-dimensional decoder, in order to extract three dimensional object and determine two dimensional character whether to three The information coding for tieing up shape, changes the decoder of SilNet, so that SilNet learns three in the case where encoder is kept fixed The hiding characterization of dimension shape to the feature vector of combination adopt in three-dimensional decoder using Three dimensional convolution transposed matrix Sample generates the vector that a volume is 57 × 57 × 57, and volume is indicated with V, is a sigmoid function layer behind, this individual Product can be used as the three dimensional representation of object, projects to obtain profile using projection layer, as twodimensional decoder, fall into a trap in projected outline It calculates binary and intersects entropy loss, result of calculation shows that three dimensional representation does not have direct losses.
Further, the projection layer, uses Tθ′It indicates projection layer, gives a voxel, it is assumed that it indicates three-dimensional Shape, using the loss function on profile by pixel projection to two dimensional image, θ ' is rotated by the first time of nearest sampler To V, determine whether pixel is filled by the minimum value of all depth values in each location of pixels, it is assumed that in vertical direction pair θ ' carry out orthographic projections and rotation, projected image pixel pj,kIt is given by:
Rotating frame Vθ′(i, j, k) is given by:
Wherein Vθ′(i, j, k) indicates rotating frame, is a differentiable combination of function, the Classification Loss of Pixel-level can lead to Cross this layer of backpropagation.
Wherein, the network training and test, using detection evaluation function (IoU) data reporting collection test subregion, For given prediction profile S and calibration profileIt is defined asWherein I is target function, if pixel corresponds to One object, then I=1 indicate the average value of all images of average IoU, data set be randomized into training group, evaluation group and Test group so that during object is gathered at one, this can ensure adaptives of the SilNet to invisible object, when with n-th mould When block is trained, randomly choose N+1 view of an object, the mask of view by as the profile next to be predicted, Remaining is as input picture, after result of calculation, then retrains SilNet and training set, assessment collection and test set;At random The N+1 views of an object and n-th module are selected, compare disparate modules as a result, each new module includes a volume Outer unselected view, in more different SilNet, it is ensured that these non-selected views are consistent, for spot Shape object data set, SilNet are trained using stochastic gradient descent, and momentum is set as 0.9, and weight decays to 0.001, batch Amount is set as 16, then network is initialized using mottled object data set, for being finely adjusted on engraved data collection.
Fig. 2 is a kind of new view of the generation of the multiple view method for reconstructing based on deep learning profile network of the present invention. SilNet handles image and renders angle, θ simultaneously, generates the new view of sculpture.Give an angle set θ1…θNWith a group picture As I1…IN, the profile of S expression calibration, Sx,y∈ 0,1, wherein 0 represents object, 1 represents non-object, learns a function gθ′, Angle, θ ' place generates a S, and the binary cross entropy loss function L of a pixel will demarcate profile and predict the difference between profile It minimizes, by being given with minor function:
Fig. 3 is a kind of training frame diagram of the multiple view method for reconstructing based on deep learning profile network of the present invention.Pass through Independent encoder f processing image I and rendering angle, θ export feature vector, and decoder g is in the middle generation targets of new view θ ' Profile.In order to generate profile from multiple views, SilNet uses the design of encoder and decoder, it is by an encoder f Composition, the number that encoder is replicated is no less than amount of views, due to the parameter sharing of all encoders, so memory will not increase Add, encoder is combined in a pond layer φ, the feature vector of maximum each module of pondization, for learning the spy of combination Sign vector, this allows SilNet to handle multiple images;The important feature of each image in processing pond layer, with a decoder gθ′From feature vector to up-sampling, in the middle two dimensional images for generating profile of new viewpoint θ ', the feature vector learnt passes through three-dimensional Decoder generates a hiding 3D shape characterization, by projection layer by this hide shape project to two dimensional image profile, Each image and θ are encoded in an individual coder module, and the size of image is adjusted to the parameter of 112 × 112, θ (sin θ, cos θ) is encoded as to indicate the distribution of angle, these θ values are transmitted by two full articulamentums, and are connected to corresponding Module, in a decoder, feature vector is sampled, and is followed by a sigmoid function carried out with pixel-wise, and last An additional convolutional layer is added after two up-sampling layers.
Fig. 4 is a kind of data set sample graph of the multiple view method for reconstructing based on deep learning profile network of the present invention.Come The diversity and complexity of sculpture are presented from the sample of each data set, mottled object data set is used for pre-training, sculpture Data set generates new wheel for proving that profile can learn and encode 3D shape in variously-shaped and material sculpture Wide view.
For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's Protection domain.Therefore, the following claims are intended to be interpreted as including preferred embodiment and falls into all changes of the scope of the invention More and change.

Claims (10)

1. a kind of multiple view method for reconstructing based on deep learning profile network, which is characterized in that main includes introducing depth Practise framework (one);3D shape encodes (two);Build profile network (three);Network training and test (four).
2. based on the introducing deep learning framework (one) described in claims 1, which is characterized in that introduce mottled object data Collection and true sculpture data set, wherein mottled object data set is used for pre-training, sculpture data set is for proving that profile can Study and coding 3D shape, and new profile view is generated in variously-shaped and material sculpture, in order to predict multiple figures The 3D shape of picture and the three-dimensional smooth surface of processing, are applied to smooth sculpture texture, introduce the wheel of a deep learning Wide network (SilNet), network learn the 3D shape coding of one or more input pictures, use this volume later Code adjusts decoder to generate new view, is subsequently introduced agency's loss based on profile, when decoder does not include three-dimensional characterize When, this two dimension loss of Web vector graphic encodes 3D shape, and two dimension loss is not limited by three-dimensional characterization resolution ratio System generates the simultaneously huge mottled object data set network of pre-training one, is finely adjusted to the SilNet on data set.
3. based on the mottled object data set described in claims 2, which is characterized in that the smooth song created by implicit surface Face forms, including 11706 spot objects, by 75:10:15 ratio is divided into training set, assessment collection, test set, each object There are five images, and the coding that deep learning profile network must find 3D shape is learnt, in a mixer by orthogonal Projection-display image, steps are as follows:First, three light sources are randomly dispersed on object;Secondly, video camera is rotated around z-axis, often The random selection from [0 °, 120 °] of a value for rendering angle, θ;Finally, using a complicated texture model, it is ensured that it is with table Area scattering and mirror-reflection.
4. based on the true sculpture data set described in claims 2, which is characterized in that the new number of 307 true sculptures of establishment It is the same with mottled object data set according to collection, by 75:10:15 ratio is divided into training set, assessment collection, test set, is spot pair Image rendering as completing 5 views provides the sculpture of image not stringent direction limitation.
5. based on the loss described in claims 2, which is characterized in that loss function is as follows:Give an angle set θ1…θN With one group of image I1…IN, the profile of S expression calibration, Sx,y∈ 0,1, wherein 0 represents object, 1 represents non-object, learns a letter Number gθ′, generate a S in angle, θ ' place, the binary cross entropy loss function L of a pixel will demarcate profile and predict profile it Between difference minimize, by being given with minor function:
6. encoding (two) based on the 3D shape described in claims 1, which is characterized in that given one or more image is used In the new view for generating three dimensional object, it is desirable that network predicts object outline by the new viewpoint angle uniquely provided, in order to hold Row task, network need to encode 3D shape, and due to having concentrated profile, network does not have to the intensity of study prognostic chart picture, therefore learns Habit process is easier, and the characterization of several picture or 3 D stereo pixel is not needed during training, from geometrically, if network It can predict the profile of multiple views, then it must encode the profile of three-dimension object in given image, therefore SilNet pairs 3D shape is encoded, and the coding is then required to create the profile of new object in new viewpoint, or learnt by extraction 3D shape is detected.
7. based on the structure profile network (three) described in claims 1, which is characterized in that in order to generate wheel from multiple views Exterior feature, SilNet use the design of encoder and decoder, it is made of an encoder f, and the number that encoder is replicated is many In amount of views, due to the parameter sharing of all encoders, so memory will not increase, encoder is combined in a pond layer In φ, the feature vector of maximum each module of pondization, for learning the feature vector of combination, it is more that this allows SilNet to handle A image;The important feature of each image in processing pond layer, with a decoder gθ′From feature vector to up-sampling, regarded newly The middle two dimensional images for generating profile of point θ ', the feature vector learnt generate a hiding 3D shape by three-dimensional decoder Characterization, hides shape project to the profile of two dimensional image, each image and θ are in an individual encoder by projection layer by this It is encoded in module, the parameter that the size of image is adjusted to 112 × 112, θ is encoded as (sin θ, cos θ) to indicate angle Distribution, these θ values are transmitted by two full articulamentums, and are connected to corresponding module, in a decoder, in feature vector Sampling, is followed by a sigmoid function carried out with pixel-wise, and addition one is additionally after most latter two up-sampling layer Convolutional layer.
8. based on the three-dimensional decoder described in claims 7, which is characterized in that in order to extract three dimensional object and determine that two dimension is special Whether sign changes the decoder of SilNet, so that SilNet in encoder be kept fixed to the information coding of 3D shape In the case of learn 3D shape hiding characterization, in three-dimensional decoder, using Three dimensional convolution transposed matrix to the spy of combination Sign vector is up-sampled, and generates the vector that a volume is 57 × 57 × 57, volume is indicated with V, is a S-shaped behind Function layer, this volume can be used as the three dimensional representation of object, project to obtain profile using projection layer, as twodimensional decoder, Binary is calculated in projected outline and intersects entropy loss, and result of calculation shows that three dimensional representation does not have direct losses.
9. based on the projection layer described in claims 7, which is characterized in that use Tθ′It indicates projection layer, gives a voxel, Assuming that it indicates 3D shape, using the loss function on profile by pixel projection to two dimensional image, θ ' passes through nearest sampler First time rotate to obtain V, determine whether pixel is filled by the minimum value of all depth values in each location of pixels, it is false Vertical direction is located to θ ' carry out orthographic projections and rotation, projected image pixel pj,kIt is given by:
Rotating frame Vθ′(i, j, k) is given by:
Wherein Vθ′(i, j, k) indicates rotating frame, is a differentiable combination of function, the Classification Loss of Pixel-level can pass through this A layer of backpropagation.
10. based on described in claims 1 network training and test (four), which is characterized in that using detection evaluation function (IoU) the test subregion of data reporting collection, for given prediction profile S and calibration profileIt is defined asIts Middle I is target function, if pixel corresponds to an object, I=1 indicates the average value of all images of average IoU, data Collection is randomized into training group, evaluation group and test group so that during object is gathered at one, this can ensure SilNet to can not The adaptive for seeing object randomly chooses N+1 view of an object, the mask of view when being trained with n-th module By as the profile next to be predicted, remaining after result of calculation, then retrains SilNet and training as input picture Collection, assessment collection and test set;Randomly choose the N+1 views of an object and n-th module, compare disparate modules as a result, Each new module includes an additional unselected view, in more different SilNet, it is ensured that these are unselected The view selected is consistent, and for mottled object data set, SilNet is trained using stochastic gradient descent, momentum setting It is 0.9, weight decays to 0.001, and then batch setting 16 initializes network using mottled object data set, be used for It is finely adjusted on engraved data collection.
CN201810081726.0A 2018-01-29 2018-01-29 A kind of multiple view method for reconstructing based on deep learning profile network Withdrawn CN108305229A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810081726.0A CN108305229A (en) 2018-01-29 2018-01-29 A kind of multiple view method for reconstructing based on deep learning profile network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810081726.0A CN108305229A (en) 2018-01-29 2018-01-29 A kind of multiple view method for reconstructing based on deep learning profile network

Publications (1)

Publication Number Publication Date
CN108305229A true CN108305229A (en) 2018-07-20

Family

ID=62866589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810081726.0A Withdrawn CN108305229A (en) 2018-01-29 2018-01-29 A kind of multiple view method for reconstructing based on deep learning profile network

Country Status (1)

Country Link
CN (1) CN108305229A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255831A (en) * 2018-09-21 2019-01-22 南京大学 The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate
CN109410289A (en) * 2018-11-09 2019-03-01 中国科学院武汉物理与数学研究所 A kind of high lack sampling hyperpolarized gas lung MRI method for reconstructing of deep learning
CN109655011A (en) * 2018-12-13 2019-04-19 北京健康有益科技有限公司 A kind of method and system of Human Modeling dimension measurement
CN109840500A (en) * 2019-01-31 2019-06-04 深圳市商汤科技有限公司 A kind of 3 D human body posture information detection method and device
CN109934107A (en) * 2019-01-31 2019-06-25 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN109977945A (en) * 2019-02-26 2019-07-05 博众精工科技股份有限公司 Localization method and system based on deep learning
CN109993825A (en) * 2019-03-11 2019-07-09 北京工业大学 A kind of three-dimensional rebuilding method based on deep learning
CN110113593A (en) * 2019-06-11 2019-08-09 南开大学 Wide baseline multi-view point video synthetic method based on convolutional neural networks
CN110197156A (en) * 2019-05-30 2019-09-03 清华大学 Manpower movement and the shape similarity metric method and device of single image based on deep learning
CN110234040A (en) * 2019-05-10 2019-09-13 九阳股份有限公司 A kind of the food materials image acquiring method and cooking equipment of cooking equipment
CN110675488A (en) * 2019-09-24 2020-01-10 电子科技大学 Construction method of creative three-dimensional voxel model modeling system based on deep learning
CN111105475A (en) * 2019-12-24 2020-05-05 电子科技大学 Bone three-dimensional reconstruction method based on orthogonal angle X-ray
CN111337898A (en) * 2020-02-19 2020-06-26 北京百度网讯科技有限公司 Laser point cloud processing method, device, equipment and storage medium
CN111340067A (en) * 2020-02-10 2020-06-26 天津大学 Redistribution method for multi-view classification
CN112037200A (en) * 2020-08-31 2020-12-04 上海交通大学 Method for automatically identifying anatomical features and reconstructing model in medical image
CN112733698A (en) * 2021-01-05 2021-04-30 北京大学 Three-dimensional multi-view covariant representation learning method and three-dimensional object identification method
CN113506362A (en) * 2021-06-02 2021-10-15 湖南大学 Method for synthesizing new view of single-view transparent object based on coding and decoding network
CN113808006A (en) * 2021-09-01 2021-12-17 南京信息工程大学 Method and device for reconstructing three-dimensional grid model based on two-dimensional image
CN114119723A (en) * 2020-08-25 2022-03-01 辉达公司 Generating views using one or more neural networks
CN116434009A (en) * 2023-04-19 2023-07-14 应急管理部国家减灾中心(应急管理部卫星减灾应用中心) Construction method and system for deep learning sample set of damaged building
CN117047286A (en) * 2023-10-09 2023-11-14 东莞市富明钮扣有限公司 Method for processing workpiece surface by laser, processing system, processor and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780543A (en) * 2017-01-13 2017-05-31 深圳市唯特视科技有限公司 A kind of double framework estimating depths and movement technique based on convolutional neural networks
CN107341506A (en) * 2017-06-12 2017-11-10 华南理工大学 A kind of Image emotional semantic classification method based on the expression of many-sided deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780543A (en) * 2017-01-13 2017-05-31 深圳市唯特视科技有限公司 A kind of double framework estimating depths and movement technique based on convolutional neural networks
CN107341506A (en) * 2017-06-12 2017-11-10 华南理工大学 A kind of Image emotional semantic classification method based on the expression of many-sided deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
OLIVIA WILES: "SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes", 《ARXIV:1711.07888V1》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255831B (en) * 2018-09-21 2020-06-12 南京大学 Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning
CN109255831A (en) * 2018-09-21 2019-01-22 南京大学 The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate
CN109410289A (en) * 2018-11-09 2019-03-01 中国科学院武汉物理与数学研究所 A kind of high lack sampling hyperpolarized gas lung MRI method for reconstructing of deep learning
CN109410289B (en) * 2018-11-09 2021-11-12 中国科学院精密测量科学与技术创新研究院 Deep learning high undersampling hyperpolarized gas lung MRI reconstruction method
CN109655011A (en) * 2018-12-13 2019-04-19 北京健康有益科技有限公司 A kind of method and system of Human Modeling dimension measurement
CN109934107B (en) * 2019-01-31 2022-03-01 北京市商汤科技开发有限公司 Image processing method and device, electronic device and storage medium
CN109840500A (en) * 2019-01-31 2019-06-04 深圳市商汤科技有限公司 A kind of 3 D human body posture information detection method and device
CN109934107A (en) * 2019-01-31 2019-06-25 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN109977945A (en) * 2019-02-26 2019-07-05 博众精工科技股份有限公司 Localization method and system based on deep learning
CN109993825A (en) * 2019-03-11 2019-07-09 北京工业大学 A kind of three-dimensional rebuilding method based on deep learning
CN109993825B (en) * 2019-03-11 2023-06-20 北京工业大学 Three-dimensional reconstruction method based on deep learning
CN110234040B (en) * 2019-05-10 2022-08-09 九阳股份有限公司 Food material image acquisition method of cooking equipment and cooking equipment
CN110234040A (en) * 2019-05-10 2019-09-13 九阳股份有限公司 A kind of the food materials image acquiring method and cooking equipment of cooking equipment
CN110197156A (en) * 2019-05-30 2019-09-03 清华大学 Manpower movement and the shape similarity metric method and device of single image based on deep learning
CN110197156B (en) * 2019-05-30 2021-08-17 清华大学 Single-image human hand action and shape reconstruction method and device based on deep learning
CN110113593A (en) * 2019-06-11 2019-08-09 南开大学 Wide baseline multi-view point video synthetic method based on convolutional neural networks
CN110675488B (en) * 2019-09-24 2023-02-28 电子科技大学 Method for constructing modeling system of creative three-dimensional voxel model based on deep learning
CN110675488A (en) * 2019-09-24 2020-01-10 电子科技大学 Construction method of creative three-dimensional voxel model modeling system based on deep learning
CN111105475A (en) * 2019-12-24 2020-05-05 电子科技大学 Bone three-dimensional reconstruction method based on orthogonal angle X-ray
CN111105475B (en) * 2019-12-24 2022-11-22 电子科技大学 Bone three-dimensional reconstruction method based on orthogonal angle X-ray
CN111340067A (en) * 2020-02-10 2020-06-26 天津大学 Redistribution method for multi-view classification
CN111340067B (en) * 2020-02-10 2022-07-08 天津大学 Redistribution method for multi-view classification
CN111337898A (en) * 2020-02-19 2020-06-26 北京百度网讯科技有限公司 Laser point cloud processing method, device, equipment and storage medium
CN111337898B (en) * 2020-02-19 2022-10-14 北京百度网讯科技有限公司 Laser point cloud processing method, device, equipment and storage medium
CN114119723A (en) * 2020-08-25 2022-03-01 辉达公司 Generating views using one or more neural networks
CN112037200A (en) * 2020-08-31 2020-12-04 上海交通大学 Method for automatically identifying anatomical features and reconstructing model in medical image
CN112733698A (en) * 2021-01-05 2021-04-30 北京大学 Three-dimensional multi-view covariant representation learning method and three-dimensional object identification method
CN113506362A (en) * 2021-06-02 2021-10-15 湖南大学 Method for synthesizing new view of single-view transparent object based on coding and decoding network
CN113506362B (en) * 2021-06-02 2024-03-19 湖南大学 Method for synthesizing new view of single-view transparent object based on coding and decoding network
CN113808006A (en) * 2021-09-01 2021-12-17 南京信息工程大学 Method and device for reconstructing three-dimensional grid model based on two-dimensional image
CN116434009A (en) * 2023-04-19 2023-07-14 应急管理部国家减灾中心(应急管理部卫星减灾应用中心) Construction method and system for deep learning sample set of damaged building
CN116434009B (en) * 2023-04-19 2023-10-24 应急管理部国家减灾中心(应急管理部卫星减灾应用中心) Construction method and system for deep learning sample set of damaged building
CN117047286A (en) * 2023-10-09 2023-11-14 东莞市富明钮扣有限公司 Method for processing workpiece surface by laser, processing system, processor and storage medium
CN117047286B (en) * 2023-10-09 2024-01-16 东莞市富明钮扣有限公司 Method for processing workpiece surface by laser, processing system, processor and storage medium

Similar Documents

Publication Publication Date Title
CN108305229A (en) A kind of multiple view method for reconstructing based on deep learning profile network
Zhao et al. Generative multiplane images: Making a 2d gan 3d-aware
Mildenhall et al. Nerf: Representing scenes as neural radiance fields for view synthesis
Chibane et al. Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes
CN109255831B (en) Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning
Georgoulis et al. Reflectance and natural illumination from single-material specular objects using deep learning
CN113706714A (en) New visual angle synthesis method based on depth image and nerve radiation field
WO2021164759A1 (en) Three-dimensional facial reconstruction
DE102019008168A1 (en) Dynamic estimation of lighting parameters for positions within reality-enhanced scenes using a neural network
CN111753698B (en) Multi-mode three-dimensional point cloud segmentation system and method
CN114255313B (en) Three-dimensional reconstruction method and device for mirror surface object, computer equipment and storage medium
CN108665414A (en) Natural scene picture generation method
CN114067041A (en) Material generation method and device of three-dimensional model, computer equipment and storage medium
Zhou et al. Photomat: A material generator learned from single flash photos
CN117372644A (en) Three-dimensional content generation method based on period implicit representation
Li et al. SpectralNeRF: Physically Based Spectral Rendering with Neural Radiance Field
Martin et al. Synthesising light field volume visualisations using image warping in real-time
Lucas et al. Discovering 3-D Structure of LEO Obects
Fu et al. Multi-scene representation learning with neural radiance fields
CN117333609B (en) Image rendering method, network training method, device and medium
Cyriac Neural Radiance Fields (NeRF) for 3D Visualization Rendering Based on 2D Images
CN118710792A (en) 3D Gaussian sputtering scene reconstruction method based on view dependency difference decoupling
Papathomas et al. The application of depth separation to the display of large data sets
CN118608667A (en) Complex scene efficient rendering method based on visual perception radiation field
Jiang Constructing and Assessing Surrogates for Volume Visualization Using Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20180720