CN113936117A - High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning - Google Patents

High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning Download PDF

Info

Publication number
CN113936117A
CN113936117A CN202111524515.8A CN202111524515A CN113936117A CN 113936117 A CN113936117 A CN 113936117A CN 202111524515 A CN202111524515 A CN 202111524515A CN 113936117 A CN113936117 A CN 113936117A
Authority
CN
China
Prior art keywords
layer
attention weight
surface normal
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111524515.8A
Other languages
Chinese (zh)
Other versions
CN113936117B (en
Inventor
举雅琨
董军宇
高峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202111524515.8A priority Critical patent/CN113936117B/en
Publication of CN113936117A publication Critical patent/CN113936117A/en
Application granted granted Critical
Publication of CN113936117B publication Critical patent/CN113936117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/30Polynomial surface description
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The method comprises the steps of shooting a plurality of images of an object to be reconstructed by using a photometric stereo system, outputting accurate surface normal three-dimensional reconstruction by using a deep learning algorithm, wherein a surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images and illumination; the attention weight generation network generates an attention weight map of an object to be reconstructed from the image; processing the attention weight loss function pixel by pixel; and then using the trained network for surface normal reconstruction of the photometric stereo image. The invention respectively learns the surface normal and high-frequency information through the proposed surface normal generation network and the attention weight generation network, and trains by using the proposed attention weight loss, thereby improving the reconstruction precision of the surface of a high-frequency region such as a fold edge. Compared with the traditional photometric stereo method, the three-dimensional reconstruction precision is improved, and particularly the details of the surface of an object to be reconstructed are improved.

Description

High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning
Technical Field
The invention relates to a high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning, and belongs to the field of multi-degree three-dimensional reconstruction.
Background
The three-dimensional reconstruction algorithm is a very important and basic problem in computer vision, and the photometric stereo algorithm is a high-precision pixel-by-pixel three-dimensional reconstruction method which recovers the normal direction of the surface of an object by utilizing gray scale change clues provided by images in different illumination directions. Photometric stereo has irreplaceable positions in many high-precision three-dimensional reconstruction tasks, and has important application values in the aspects of archaeological exploration, pipeline detection, seabed fine mapping and the like.
However, the existing depth learning-based photometric stereo method has large errors in high-frequency regions of the object surface, such as wrinkles and edges, and the existing method generates blurred three-dimensional reconstruction results in these regions, which are the places where the emphasis is placed and accurate reconstruction is required.
Disclosure of Invention
In view of the above problems, the present invention provides a method for three-dimensional reconstruction of enhanced luminosity in high frequency region based on deep learning, so as to overcome the disadvantages of the prior art.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps of:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j With the corresponding light source position notedl 1 , l 2 , ...,l j jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training; wherein:
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure 100002_DEST_PATH_IMAGE001
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedP
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is
Figure 100002_DEST_PATH_IMAGE002
p*qAs resolution of the image m,p、q≥2n,n≥4;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k = P k L gradient +λ(1 – P k ) L normal
Wherein,
Figure 100002_DEST_PATH_IMAGE003
Figure 100002_DEST_PATH_IMAGE004
is normal to the true surface of the object to be reconstructednIn positionkThe gradient of (a) of (b) is,ζis the neighborhood pixel range used in computing the gradient,ζthe setting ranges are 1, 2, 3, 4 and 5,
Figure 100002_DEST_PATH_IMAGE005
is the predicted surface normal
Figure 100002_DEST_PATH_IMAGE006
In positionkA gradient of (a);
Figure 100002_DEST_PATH_IMAGE007
representing the surface normal of the network prediction,
Figure 100002_DEST_PATH_IMAGE008
representing the true surface normal;
gradient loss can sharpen high frequency representations of the surface normal in the network;P k for the pixel position on the attention weight mapkA value of (d) above;
secondly, the first step is to carry out the first,
Figure 100002_DEST_PATH_IMAGE009
● represents the point multiplication operation, λ is a hyper parameter, and the range is set to {7,8,9,10} for the purpose of gradient loss and normal loss;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training when the set cycle number is reached so as to achieve the optimal effect; orL normal Less than 0.03, the training is consideredThe most effective fruit has been reached, the training is stopped;
3) the trained network is used for surface normal reconstruction of photometric stereo images:
firstly, shooting more than s images in different illumination directions, wherein s is more than or equal to 10, and then, shooting the images in different illumination directionsm 1 , m 2 , ..., m s Andl 1 , l 2 , ..., l s inputting the trained network to obtain the predicted surface normal
Figure 273068DEST_PATH_IMAGE001
The (1) surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure 799995DEST_PATH_IMAGE001
The method comprises the following specific steps:
resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; the surface normal generation network is firstly as followsmResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to enable the modulus to be 1, and obtaining the surface normal direction of the object to be reconstructed
Figure 487722DEST_PATH_IMAGE001
The (2) attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedPThe method comprises the following specific steps:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And the gradient of the image is connected and fused with the image in a third group of dimensions to form a new tensor, wherein the new tensor belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
then passing through the coilCalculating the lamination layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3, and all the convolution kernels adopt 'relu' activation functions, wherein the 6 th layer is a transposed convolution, the 5 th layer and the 7 th layer are convolutions with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus the attention weight graph of the object to be reconstructed is obtainedP
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that the resolution ratio of the image mp*qIn (1),pthe values 16, 32, 48, 64,qvalues 16, 32, 48, 64.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in thatζIs set to 1.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that lambda is set to be 8.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that the cycle number is set to be 30 epochs.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in thatpThe value of the number 32 is taken as the value,qtaking the value of 32.
According to the high-frequency region enhanced photometric stereo three-dimensional reconstruction method based on deep learning, provided by the invention, the network is generated through the surface normal, the network is generated through the attention weight, the surface normal and the high-frequency information are respectively learned, and the training is carried out by utilizing the provided attention weight loss, so that the reconstruction precision of the high-frequency region surface such as the fold edge can be improved. Compared with the traditional photometric stereo method, the three-dimensional reconstruction precision is improved, and particularly the details of the surface of an object to be reconstructed are improved.
The attention weight loss provided by the invention can be applied to various bottom layer vision tasks, the task precision is improved, and the details of the image, such as depth estimation, image deblurring and image defogging, are enriched.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of the surface normal generation network in step 2).
Fig. 3 is a schematic diagram of the attention weight generation network in step 2).
Fig. 4 is a schematic diagram of the application effect of the present invention, in which a first action is an input image, a second action generates a weighted image, and a third action generates a surface normal.
Detailed Description
As shown in fig. 1, the method for three-dimensional reconstruction of a high-frequency region enhanced luminosity based on deep learning is characterized by comprising the following steps:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j With the corresponding light source position notedl 1 , l 2 , ...,l j jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training;
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure 164822DEST_PATH_IMAGE001
Resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; as shown in FIG. 2, the surface normal generation network is first generated according tomResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to enable the modulus to be 1, and obtaining the predicted surface normal direction
Figure 884254DEST_PATH_IMAGE001
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructed:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And its gradient is fused with the image in a third set of dimensions, fig. 3, becoming a new tensor, which belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is
Figure 280950DEST_PATH_IMAGE002
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient Second partIs divided into normal losses with coefficient termsL normal I.e. byL k = P k L gradient +λ(1 – P k ) L normal
Wherein,
Figure 564295DEST_PATH_IMAGE003
Figure 668255DEST_PATH_IMAGE004
is normal to the true surface of the object to be reconstructednIn positionkThe gradient of (a) of (b) is,ζis the neighborhood pixel range used in computing the gradient,ζthe setting ranges are 1, 2, 3, 4, 5, the default setting in the invention is 1,
Figure 386069DEST_PATH_IMAGE005
is the predicted surface normal
Figure 989220DEST_PATH_IMAGE006
In positionkA gradient of (a);
Figure 648609DEST_PATH_IMAGE007
representing the surface normal of the network prediction,
Figure 667512DEST_PATH_IMAGE008
representing the true surface normal;
gradient loss can sharpen high frequency representations of the surface normal in the network;P k for the pixel position on the attention weight mapkThe value of (A) is a loss of attention weight on a pixel-by-pixel basisL k Providing a first gradient loss componentL gradient Where the attention weight value is large, the weight of the gradient loss is large;
secondly, the first step is to carry out the first,
Figure 358606DEST_PATH_IMAGE009
and ● represents a dot product operation,λ is a hyper-parameter, intended for gradient and normal losses, which is set here to 8; generally, the setting can be {7,8,9,10}, and when 8 is taken, a better effect can be obtained;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training at the moment of reaching 30 epoch (cycles) to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
in the invention, the training of the network is finished after 30 epochs, and the training is considered to have achieved the optimal effect at the moment;
(5) the trained network is used for surface normal reconstruction of photometric stereo images:
first shootingsThe images with different illumination directions are displayed,snot less than 10, mixing 1 , m 2 , ..., m s And l 1 , l 2 , ..., l s Inputting the trained network to obtain the predicted surface normal
Figure 568877DEST_PATH_IMAGE001
WhereinpqE {16, 32, 48, 64}, λ e {7,8,910}, ζ can be 1, 2, 3, 4, 5.
The reconstruction effect is shown in fig. 4. The first row represents the image taken of the object to be reconstructed, the second row represents the generated attention weight map P, and the third row represents the generated surface normal
Figure 217027DEST_PATH_IMAGE001

Claims (8)

1. The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps of:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j With the corresponding light source position notedl 1 , l 2 , ..., l j jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training; wherein:
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure DEST_PATH_IMAGE001
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedP
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is
Figure DEST_PATH_IMAGE002
p*qAs resolution of the image m,p、q≥2n,n≥4;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k = P k L gradient +λ(1 – P k ) L normal
Wherein,
Figure DEST_PATH_IMAGE003
Figure DEST_PATH_IMAGE004
is normal to the true surface of the object to be reconstructednIn positionkA gradient of (a);
ζis the neighborhood pixel range used in computing the gradient,ζsetting ranges of 1, 2, 3, 4 and 5;
Figure DEST_PATH_IMAGE005
is the predicted surface normal
Figure DEST_PATH_IMAGE006
In positionkA gradient of (a);
Figure DEST_PATH_IMAGE007
representing the surface normal of the network prediction,
Figure DEST_PATH_IMAGE008
representing the true surface normal;
P k for the pixel position on the attention weight mapkA value of (d) above;
secondly, the first step is to carry out the first,
Figure DEST_PATH_IMAGE009
● represents the point multiplication operation, λ is a hyper parameter, and the range is set to {7,8,9,10} for the purpose of gradient loss and normal loss;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training when the set cycle number is reached so as to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
3) the trained network is used for surface normal reconstruction of photometric stereo images:
firstly, shooting more than s images in different illumination directions, wherein s is more than or equal to 10, and then, shooting the images in different illumination directionsm 1 , m 2 , ..., m s Andl 1 , l 2 , ..., l s inputting the trained network to obtain the predicted surface normal
Figure 953835DEST_PATH_IMAGE006
2. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein (1) the surface normal generation network is designed to generate from the imagem 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure 787187DEST_PATH_IMAGE001
The method comprises the following specific steps:
resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; the surface normal generation network is firstly as followsmResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to make the modulus of the tensor be 1 to obtain the tensor needing to reconstruct the objectNormal to the surface
Figure 246987DEST_PATH_IMAGE001
3. The deep learning-based high-frequency region-enhanced photometric stereo three-dimensional reconstruction method according to claim 1, wherein (2) the attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedPThe method comprises the following specific steps:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And the gradient of the image is connected and fused with the image in a third group of dimensions to form a new tensor, wherein the new tensor belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP
4. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein the resolution of the image m isp*qIn (1),pthe values 16, 32, 48, 64,qvalues 16, 32, 48, 64.
5. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1, wherein the method is characterized in thatζIs set to 1.
6. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein λ is set to 8.
7. The deep learning-based high-frequency region enhanced photometric stereo three-dimensional reconstruction method according to claim 1, wherein the number of cycles is set to 30 epochs.
8. The deep learning-based high-frequency region enhanced photometric stereo three-dimensional reconstruction method according to claim 4, wherein the method is characterized in thatpThe value of the number 32 is taken as the value,qtaking the value of 32.
CN202111524515.8A 2021-12-14 2021-12-14 High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning Active CN113936117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111524515.8A CN113936117B (en) 2021-12-14 2021-12-14 High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111524515.8A CN113936117B (en) 2021-12-14 2021-12-14 High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning

Publications (2)

Publication Number Publication Date
CN113936117A true CN113936117A (en) 2022-01-14
CN113936117B CN113936117B (en) 2022-03-08

Family

ID=79288969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111524515.8A Active CN113936117B (en) 2021-12-14 2021-12-14 High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning

Country Status (1)

Country Link
CN (1) CN113936117B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998507A (en) * 2022-06-07 2022-09-02 天津大学 Luminosity three-dimensional reconstruction method based on self-supervision learning
CN115098563A (en) * 2022-07-14 2022-09-23 中国海洋大学 Time sequence abnormity detection method and system based on GCN and attention VAE
CN118628371A (en) * 2024-08-12 2024-09-10 南开大学 Surface normal restoration method and device based on photometric stereo and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862741A (en) * 2017-12-10 2018-03-30 中国海洋大学 A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning
CN108510573A (en) * 2018-04-03 2018-09-07 南京大学 A method of the multiple views human face three-dimensional model based on deep learning is rebuild
CN109146934A (en) * 2018-06-04 2019-01-04 成都通甲优博科技有限责任公司 A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo
CN110060212A (en) * 2019-03-19 2019-07-26 中国海洋大学 A kind of multispectral photometric stereo surface normal restoration methods based on deep learning
US20210241478A1 (en) * 2020-02-03 2021-08-05 Nanotronics Imaging, Inc. Deep Photometric Learning (DPL) Systems, Apparatus and Methods
CN113538675A (en) * 2021-06-30 2021-10-22 同济人工智能研究院(苏州)有限公司 Neural network for calculating attention weight for laser point cloud and training method
CN113762358A (en) * 2021-08-18 2021-12-07 江苏大学 Semi-supervised learning three-dimensional reconstruction method based on relative deep training

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862741A (en) * 2017-12-10 2018-03-30 中国海洋大学 A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning
CN108510573A (en) * 2018-04-03 2018-09-07 南京大学 A method of the multiple views human face three-dimensional model based on deep learning is rebuild
CN109146934A (en) * 2018-06-04 2019-01-04 成都通甲优博科技有限责任公司 A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo
CN110060212A (en) * 2019-03-19 2019-07-26 中国海洋大学 A kind of multispectral photometric stereo surface normal restoration methods based on deep learning
US20210241478A1 (en) * 2020-02-03 2021-08-05 Nanotronics Imaging, Inc. Deep Photometric Learning (DPL) Systems, Apparatus and Methods
CN113538675A (en) * 2021-06-30 2021-10-22 同济人工智能研究院(苏州)有限公司 Neural network for calculating attention weight for laser point cloud and training method
CN113762358A (en) * 2021-08-18 2021-12-07 江苏大学 Semi-supervised learning three-dimensional reconstruction method based on relative deep training

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHENG-JIAN LIN等: "A Constrained Independent Component Analysis Based Photometric Stereo for 3D Human Face Reconstruction", 《2012 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL》 *
陈加等: "深度学习在基于单幅图像的物体三维重建中的应用", 《自动化学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998507A (en) * 2022-06-07 2022-09-02 天津大学 Luminosity three-dimensional reconstruction method based on self-supervision learning
CN115098563A (en) * 2022-07-14 2022-09-23 中国海洋大学 Time sequence abnormity detection method and system based on GCN and attention VAE
CN118628371A (en) * 2024-08-12 2024-09-10 南开大学 Surface normal restoration method and device based on photometric stereo and storage medium

Also Published As

Publication number Publication date
CN113936117B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN113936117B (en) High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning
Liu et al. Meshdiffusion: Score-based generative 3d mesh modeling
Chen et al. Point-based multi-view stereo network
Kwon et al. Data-driven depth map refinement via multi-scale sparse representation
CN112215755B (en) Image super-resolution reconstruction method based on back projection attention network
CN112634149B (en) Point cloud denoising method based on graph convolution network
CN112348959A (en) Adaptive disturbance point cloud up-sampling method based on deep learning
CN113962858A (en) Multi-view depth acquisition method
Pottmann et al. The isophotic metric and its application to feature sensitive morphology on surfaces
CN108171249B (en) RGBD data-based local descriptor learning method
CN109598732A (en) A kind of medical image cutting method based on three-dimensional space weighting
CN117575915B (en) Image super-resolution reconstruction method, terminal equipment and storage medium
CN103679680A (en) Stereo matching method and system
Rashid et al. Single MR image super-resolution using generative adversarial network
CN115841422A (en) Image splicing method based on pyramid structure super-resolution network
CN115631223A (en) Multi-view stereo reconstruction method based on self-adaptive learning and aggregation
CN113361378B (en) Human body posture estimation method using adaptive data enhancement
CN112991504B (en) Improved hole filling method based on TOF camera three-dimensional reconstruction
Wang et al. Mvdd: Multi-view depth diffusion models
CN116883467A (en) Non-rigid registration method for medical image
CN116091762A (en) Three-dimensional target detection method based on RGBD data and view cone
Amirkolaee et al. Monocular depth estimation with geometrical guidance using a multi-level convolutional neural network
EP4191526A1 (en) Apparatus and method with object posture estimating
CN113454678A (en) Three-dimensional facial scan enhancement
CN114119916A (en) Multi-view stereoscopic vision reconstruction method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant