CN113936117A - High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning - Google Patents
High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning Download PDFInfo
- Publication number
- CN113936117A CN113936117A CN202111524515.8A CN202111524515A CN113936117A CN 113936117 A CN113936117 A CN 113936117A CN 202111524515 A CN202111524515 A CN 202111524515A CN 113936117 A CN113936117 A CN 113936117A
- Authority
- CN
- China
- Prior art keywords
- layer
- attention weight
- surface normal
- network
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000013135 deep learning Methods 0.000 title claims abstract description 27
- 238000005286 illumination Methods 0.000 claims abstract description 27
- 230000006870 function Effects 0.000 claims abstract description 22
- 230000004580 weight loss Effects 0.000 claims abstract description 10
- 230000004913 activation Effects 0.000 claims description 12
- 230000000694 effects Effects 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 6
- 230000017105 transposition Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 239000004430 Mapka Substances 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/30—Polynomial surface description
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Software Systems (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The method comprises the steps of shooting a plurality of images of an object to be reconstructed by using a photometric stereo system, outputting accurate surface normal three-dimensional reconstruction by using a deep learning algorithm, wherein a surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images and illumination; the attention weight generation network generates an attention weight map of an object to be reconstructed from the image; processing the attention weight loss function pixel by pixel; and then using the trained network for surface normal reconstruction of the photometric stereo image. The invention respectively learns the surface normal and high-frequency information through the proposed surface normal generation network and the attention weight generation network, and trains by using the proposed attention weight loss, thereby improving the reconstruction precision of the surface of a high-frequency region such as a fold edge. Compared with the traditional photometric stereo method, the three-dimensional reconstruction precision is improved, and particularly the details of the surface of an object to be reconstructed are improved.
Description
Technical Field
The invention relates to a high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning, and belongs to the field of multi-degree three-dimensional reconstruction.
Background
The three-dimensional reconstruction algorithm is a very important and basic problem in computer vision, and the photometric stereo algorithm is a high-precision pixel-by-pixel three-dimensional reconstruction method which recovers the normal direction of the surface of an object by utilizing gray scale change clues provided by images in different illumination directions. Photometric stereo has irreplaceable positions in many high-precision three-dimensional reconstruction tasks, and has important application values in the aspects of archaeological exploration, pipeline detection, seabed fine mapping and the like.
However, the existing depth learning-based photometric stereo method has large errors in high-frequency regions of the object surface, such as wrinkles and edges, and the existing method generates blurred three-dimensional reconstruction results in these regions, which are the places where the emphasis is placed and accurate reconstruction is required.
Disclosure of Invention
In view of the above problems, the present invention provides a method for three-dimensional reconstruction of enhanced luminosity in high frequency region based on deep learning, so as to overcome the disadvantages of the prior art.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps of:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j ,With the corresponding light source position notedl 1 , l 2 , ...,l j ,jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training; wherein:
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed;
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedP ;
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is;p*qAs resolution of the image m,p、q≥2n,n≥4;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k = P k L gradient +λ(1 – P k ) L normal ;
Wherein,,is normal to the true surface of the object to be reconstructednIn positionkThe gradient of (a) of (b) is,ζis the neighborhood pixel range used in computing the gradient,ζthe setting ranges are 1, 2, 3, 4 and 5,is the predicted surface normalIn positionkA gradient of (a);representing the surface normal of the network prediction,representing the true surface normal;
gradient loss can sharpen high frequency representations of the surface normal in the network;P k for the pixel position on the attention weight mapkA value of (d) above;
secondly, the first step is to carry out the first,● represents the point multiplication operation, λ is a hyper parameter, and the range is set to {7,8,9,10} for the purpose of gradient loss and normal loss;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training when the set cycle number is reached so as to achieve the optimal effect; orL normal Less than 0.03, the training is consideredThe most effective fruit has been reached, the training is stopped;
3) the trained network is used for surface normal reconstruction of photometric stereo images:
firstly, shooting more than s images in different illumination directions, wherein s is more than or equal to 10, and then, shooting the images in different illumination directionsm 1 , m 2 , ..., m s Andl 1 , l 2 , ..., l s inputting the trained network to obtain the predicted surface normal 。
The (1) surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructedThe method comprises the following specific steps:
resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; the surface normal generation network is firstly as followsmResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to enable the modulus to be 1, and obtaining the surface normal direction of the object to be reconstructed。
The (2) attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedPThe method comprises the following specific steps:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And the gradient of the image is connected and fused with the image in a third group of dimensions to form a new tensor, wherein the new tensor belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
then passing through the coilCalculating the lamination layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3, and all the convolution kernels adopt 'relu' activation functions, wherein the 6 th layer is a transposed convolution, the 5 th layer and the 7 th layer are convolutions with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus the attention weight graph of the object to be reconstructed is obtainedP 。
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that the resolution ratio of the image mp*qIn (1),pthe values 16, 32, 48, 64,qvalues 16, 32, 48, 64.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in thatζIs set to 1.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that lambda is set to be 8.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that the cycle number is set to be 30 epochs.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in thatpThe value of the number 32 is taken as the value,qtaking the value of 32.
According to the high-frequency region enhanced photometric stereo three-dimensional reconstruction method based on deep learning, provided by the invention, the network is generated through the surface normal, the network is generated through the attention weight, the surface normal and the high-frequency information are respectively learned, and the training is carried out by utilizing the provided attention weight loss, so that the reconstruction precision of the high-frequency region surface such as the fold edge can be improved. Compared with the traditional photometric stereo method, the three-dimensional reconstruction precision is improved, and particularly the details of the surface of an object to be reconstructed are improved.
The attention weight loss provided by the invention can be applied to various bottom layer vision tasks, the task precision is improved, and the details of the image, such as depth estimation, image deblurring and image defogging, are enriched.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of the surface normal generation network in step 2).
Fig. 3 is a schematic diagram of the attention weight generation network in step 2).
Fig. 4 is a schematic diagram of the application effect of the present invention, in which a first action is an input image, a second action generates a weighted image, and a third action generates a surface normal.
Detailed Description
As shown in fig. 1, the method for three-dimensional reconstruction of a high-frequency region enhanced luminosity based on deep learning is characterized by comprising the following steps:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j ,With the corresponding light source position notedl 1 , l 2 , ...,l j ,jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training;
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed;
Resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; as shown in FIG. 2, the surface normal generation network is first generated according tomResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to enable the modulus to be 1, and obtaining the predicted surface normal direction;
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructed:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And its gradient is fused with the image in a third set of dimensions, fig. 3, becoming a new tensor, which belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP ;
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient Second partIs divided into normal losses with coefficient termsL normal I.e. byL k = P k L gradient +λ(1 – P k ) L normal ;
is normal to the true surface of the object to be reconstructednIn positionkThe gradient of (a) of (b) is,ζis the neighborhood pixel range used in computing the gradient,ζthe setting ranges are 1, 2, 3, 4, 5, the default setting in the invention is 1,is the predicted surface normalIn positionkA gradient of (a);representing the surface normal of the network prediction,representing the true surface normal;
gradient loss can sharpen high frequency representations of the surface normal in the network;P k for the pixel position on the attention weight mapkThe value of (A) is a loss of attention weight on a pixel-by-pixel basisL k Providing a first gradient loss componentL gradient Where the attention weight value is large, the weight of the gradient loss is large;
secondly, the first step is to carry out the first,and ● represents a dot product operation,λ is a hyper-parameter, intended for gradient and normal losses, which is set here to 8; generally, the setting can be {7,8,9,10}, and when 8 is taken, a better effect can be obtained;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training at the moment of reaching 30 epoch (cycles) to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
in the invention, the training of the network is finished after 30 epochs, and the training is considered to have achieved the optimal effect at the moment;
(5) the trained network is used for surface normal reconstruction of photometric stereo images:
first shootingsThe images with different illumination directions are displayed,snot less than 10, mixing 1 , m 2 , ..., m s And l 1 , l 2 , ..., l s Inputting the trained network to obtain the predicted surface normal。
Whereinp,qE {16, 32, 48, 64}, λ e {7,8,910}, ζ can be 1, 2, 3, 4, 5.
Claims (8)
1. The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps of:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j ,With the corresponding light source position notedl 1 , l 2 , ..., l j ,jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training; wherein:
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed;
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedP ;
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is; p*qAs resolution of the image m,p、q≥2n,n≥4;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k = P k L gradient +λ(1 – P k ) L normal ;
Wherein,;is normal to the true surface of the object to be reconstructednIn positionkA gradient of (a);
ζis the neighborhood pixel range used in computing the gradient,ζsetting ranges of 1, 2, 3, 4 and 5;is the predicted surface normalIn positionkA gradient of (a);
P k for the pixel position on the attention weight mapkA value of (d) above;
secondly, the first step is to carry out the first,● represents the point multiplication operation, λ is a hyper parameter, and the range is set to {7,8,9,10} for the purpose of gradient loss and normal loss;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training when the set cycle number is reached so as to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
3) the trained network is used for surface normal reconstruction of photometric stereo images:
2. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein (1) the surface normal generation network is designed to generate from the imagem 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructedThe method comprises the following specific steps:
resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; the surface normal generation network is firstly as followsmResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
3. The deep learning-based high-frequency region-enhanced photometric stereo three-dimensional reconstruction method according to claim 1, wherein (2) the attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedPThe method comprises the following specific steps:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And the gradient of the image is connected and fused with the image in a third group of dimensions to form a new tensor, wherein the new tensor belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP 。
4. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein the resolution of the image m isp*qIn (1),pthe values 16, 32, 48, 64,qvalues 16, 32, 48, 64.
5. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1, wherein the method is characterized in thatζIs set to 1.
6. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein λ is set to 8.
7. The deep learning-based high-frequency region enhanced photometric stereo three-dimensional reconstruction method according to claim 1, wherein the number of cycles is set to 30 epochs.
8. The deep learning-based high-frequency region enhanced photometric stereo three-dimensional reconstruction method according to claim 4, wherein the method is characterized in thatpThe value of the number 32 is taken as the value,qtaking the value of 32.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111524515.8A CN113936117B (en) | 2021-12-14 | 2021-12-14 | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111524515.8A CN113936117B (en) | 2021-12-14 | 2021-12-14 | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113936117A true CN113936117A (en) | 2022-01-14 |
CN113936117B CN113936117B (en) | 2022-03-08 |
Family
ID=79288969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111524515.8A Active CN113936117B (en) | 2021-12-14 | 2021-12-14 | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113936117B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114998507A (en) * | 2022-06-07 | 2022-09-02 | 天津大学 | Luminosity three-dimensional reconstruction method based on self-supervision learning |
CN115098563A (en) * | 2022-07-14 | 2022-09-23 | 中国海洋大学 | Time sequence abnormity detection method and system based on GCN and attention VAE |
CN118628371A (en) * | 2024-08-12 | 2024-09-10 | 南开大学 | Surface normal restoration method and device based on photometric stereo and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862741A (en) * | 2017-12-10 | 2018-03-30 | 中国海洋大学 | A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning |
CN108510573A (en) * | 2018-04-03 | 2018-09-07 | 南京大学 | A method of the multiple views human face three-dimensional model based on deep learning is rebuild |
CN109146934A (en) * | 2018-06-04 | 2019-01-04 | 成都通甲优博科技有限责任公司 | A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo |
CN110060212A (en) * | 2019-03-19 | 2019-07-26 | 中国海洋大学 | A kind of multispectral photometric stereo surface normal restoration methods based on deep learning |
US20210241478A1 (en) * | 2020-02-03 | 2021-08-05 | Nanotronics Imaging, Inc. | Deep Photometric Learning (DPL) Systems, Apparatus and Methods |
CN113538675A (en) * | 2021-06-30 | 2021-10-22 | 同济人工智能研究院(苏州)有限公司 | Neural network for calculating attention weight for laser point cloud and training method |
CN113762358A (en) * | 2021-08-18 | 2021-12-07 | 江苏大学 | Semi-supervised learning three-dimensional reconstruction method based on relative deep training |
-
2021
- 2021-12-14 CN CN202111524515.8A patent/CN113936117B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862741A (en) * | 2017-12-10 | 2018-03-30 | 中国海洋大学 | A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning |
CN108510573A (en) * | 2018-04-03 | 2018-09-07 | 南京大学 | A method of the multiple views human face three-dimensional model based on deep learning is rebuild |
CN109146934A (en) * | 2018-06-04 | 2019-01-04 | 成都通甲优博科技有限责任公司 | A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo |
CN110060212A (en) * | 2019-03-19 | 2019-07-26 | 中国海洋大学 | A kind of multispectral photometric stereo surface normal restoration methods based on deep learning |
US20210241478A1 (en) * | 2020-02-03 | 2021-08-05 | Nanotronics Imaging, Inc. | Deep Photometric Learning (DPL) Systems, Apparatus and Methods |
CN113538675A (en) * | 2021-06-30 | 2021-10-22 | 同济人工智能研究院(苏州)有限公司 | Neural network for calculating attention weight for laser point cloud and training method |
CN113762358A (en) * | 2021-08-18 | 2021-12-07 | 江苏大学 | Semi-supervised learning three-dimensional reconstruction method based on relative deep training |
Non-Patent Citations (2)
Title |
---|
CHENG-JIAN LIN等: "A Constrained Independent Component Analysis Based Photometric Stereo for 3D Human Face Reconstruction", 《2012 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL》 * |
陈加等: "深度学习在基于单幅图像的物体三维重建中的应用", 《自动化学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114998507A (en) * | 2022-06-07 | 2022-09-02 | 天津大学 | Luminosity three-dimensional reconstruction method based on self-supervision learning |
CN115098563A (en) * | 2022-07-14 | 2022-09-23 | 中国海洋大学 | Time sequence abnormity detection method and system based on GCN and attention VAE |
CN118628371A (en) * | 2024-08-12 | 2024-09-10 | 南开大学 | Surface normal restoration method and device based on photometric stereo and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113936117B (en) | 2022-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113936117B (en) | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning | |
Liu et al. | Meshdiffusion: Score-based generative 3d mesh modeling | |
Chen et al. | Point-based multi-view stereo network | |
Kwon et al. | Data-driven depth map refinement via multi-scale sparse representation | |
CN112215755B (en) | Image super-resolution reconstruction method based on back projection attention network | |
CN112634149B (en) | Point cloud denoising method based on graph convolution network | |
CN112348959A (en) | Adaptive disturbance point cloud up-sampling method based on deep learning | |
CN113962858A (en) | Multi-view depth acquisition method | |
Pottmann et al. | The isophotic metric and its application to feature sensitive morphology on surfaces | |
CN108171249B (en) | RGBD data-based local descriptor learning method | |
CN109598732A (en) | A kind of medical image cutting method based on three-dimensional space weighting | |
CN117575915B (en) | Image super-resolution reconstruction method, terminal equipment and storage medium | |
CN103679680A (en) | Stereo matching method and system | |
Rashid et al. | Single MR image super-resolution using generative adversarial network | |
CN115841422A (en) | Image splicing method based on pyramid structure super-resolution network | |
CN115631223A (en) | Multi-view stereo reconstruction method based on self-adaptive learning and aggregation | |
CN113361378B (en) | Human body posture estimation method using adaptive data enhancement | |
CN112991504B (en) | Improved hole filling method based on TOF camera three-dimensional reconstruction | |
Wang et al. | Mvdd: Multi-view depth diffusion models | |
CN116883467A (en) | Non-rigid registration method for medical image | |
CN116091762A (en) | Three-dimensional target detection method based on RGBD data and view cone | |
Amirkolaee et al. | Monocular depth estimation with geometrical guidance using a multi-level convolutional neural network | |
EP4191526A1 (en) | Apparatus and method with object posture estimating | |
CN113454678A (en) | Three-dimensional facial scan enhancement | |
CN114119916A (en) | Multi-view stereoscopic vision reconstruction method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |