CN113936117A - High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning - Google Patents
High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning Download PDFInfo
- Publication number
- CN113936117A CN113936117A CN202111524515.8A CN202111524515A CN113936117A CN 113936117 A CN113936117 A CN 113936117A CN 202111524515 A CN202111524515 A CN 202111524515A CN 113936117 A CN113936117 A CN 113936117A
- Authority
- CN
- China
- Prior art keywords
- surface normal
- attention weight
- reconstructed
- images
- layers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 27
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000006870 function Effects 0.000 claims abstract description 22
- 230000004580 weight loss Effects 0.000 claims abstract description 13
- 238000005286 illumination Methods 0.000 claims abstract description 12
- 230000004913 activation Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000006399 behavior Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000007812 deficiency Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/30—Polynomial surface description
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Software Systems (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种基于深度学习的高频区域增强光度立体三维重建方法,属于多度立体三维重建领域。The invention relates to a high-frequency region-enhanced photometric three-dimensional reconstruction method based on deep learning, and belongs to the field of multi-degree three-dimensional reconstruction.
背景技术Background technique
三维重建算法是计算机视觉中非常重要且基础的问题,光度立体算法是一种高精度的逐像素三维重建方法,其利用不同光照方向下的图像提供的灰度变化线索恢复物体表面法向。光度立体在许多高精度三维重建任务求中有着不可替代的位置,例如其在考古勘探,管道检测,海床精细测绘等方面有着重要的应用价值。The 3D reconstruction algorithm is a very important and fundamental problem in computer vision. The photometric stereo algorithm is a high-precision pixel-by-pixel 3D reconstruction method, which uses the grayscale change clues provided by images under different illumination directions to restore the surface normal of the object. Photometric stereo has an irreplaceable position in many high-precision 3D reconstruction tasks. For example, it has important application value in archaeological exploration, pipeline inspection, and fine seabed mapping.
但现有的基于深度学习的光度立体方法在物体表面高频区域的误差很大,例如褶皱、边缘,现有方法在这些区域会生成模糊的三维重建结果,然而,这些区域正是重点关注并需要精确重建的地方。However, the existing deep learning-based photometric stereo methods have large errors in the high-frequency areas of the object surface, such as wrinkles and edges. The existing methods will generate blurred 3D reconstruction results in these areas. However, these areas are the focus of attention and Where precise reconstruction is required.
发明内容SUMMARY OF THE INVENTION
针对以上问题,本发明目的是提出了一种基于深度学习的高频区域增强光度立体三维重建方法,以克服现有技术的不足。In view of the above problems, the purpose of the present invention is to propose a high-frequency region-enhanced photometric three-dimensional reconstruction method based on deep learning, so as to overcome the deficiencies of the prior art.
基于深度学习的高频区域增强光度立体三维重建方法,其特征是包括以下步骤:A high-frequency region-enhanced photometric stereo 3D reconstruction method based on deep learning is characterized by including the following steps:
1)利用光度立体系统,拍摄若干张待重建物体的图像:1) Using the photometric stereo system, take several images of the object to be reconstructed:
待重建物体在单个平行白色光源的照射下拍摄图像,以待重建物体中心为坐标轴原点,建立笛卡尔坐标系,则白光光源的位置由该笛卡尔坐标系中向量 l = [x, y, z]表示;The image of the object to be reconstructed is taken under the illumination of a single parallel white light source, and the center of the object to be reconstructed is the origin of the coordinate axis to establish a Cartesian coordinate system, then the position of the white light source is determined by the vector l = [ x, y, z ] means;
改变该光源位置,在另一光照方向下获得拍摄图像;通常需至少拍摄10张以上的在不同光照方向照射下的图像,记作m 1 , m 2 , ..., m j ,同时相对应的光源位置记作l 1 , l 2 , ...,l j ,j为大于等于10的自然数;Change the position of the light source and obtain a photographed image under another illumination direction; usually, at least 10 images under different illumination directions need to be photographed, denoted as m 1 , m 2 , ..., m j , while corresponding The position of the light source is denoted as l 1 , l 2 , ..., l j , where j is a natural number greater than or equal to 10;
2)利用深度学习算法输入m 1 ,m 2 , ..., m j 和l 1 ,l 2 , ..., l j ,输出准确的表面法向三维重建:2) Use the deep learning algorithm to input m 1 , m 2 , ..., m j and l 1 , l 2 , ..., l j , and output the accurate three-dimensional reconstruction of the surface normal:
所利用的深度学习算法分为以下四个部分:(1)表面法向生成网络,(2)注意力权重生成网络,(3)注意力权重损失函数联合训练,(4)网络训练;其中:The utilized deep learning algorithm is divided into the following four parts: (1) surface normal generation network, (2) attention weight generation network, (3) joint training of attention weight loss function, (4) network training; among which:
(1)表面法向生成网络被设计成从图像m 1 ,m 2 , ..., m j 和光照l 1 ,l 2 , ..., l j 中生成需要重建物体的表面法向;(1) The surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images m 1 , m 2 , ..., m j and the light l 1 , l 2 , ..., l j ;
(2)注意力权重生成网络被设计成从图像m 1 , m 2 , ..., m j 中生成需要重建物体的注意力权重图P ;(2) The attention weight generation network is designed to generate the attention weight map P of the object to be reconstructed from the images m 1 , m 2 , ..., m j ;
(3)注意力权重损失L是一个逐像素处理的损失函数,它由每一个像素点的损失L k 平均计算得到,公式为;p*q为图像m的分辨率,p、q≥2n,n≥4;(3) The attention weight loss L is a pixel -by-pixel processing loss function, which is calculated by the average loss Lk of each pixel, and the formula is ; p*q is the resolution of image m , p, q ≥ 2 n , n ≥ 4;
每一个像素位置的损失L k 包括两部分,第一部分是带有系数项的梯度损失L gradient ,第二部分是带有系数项的法向损失L normal ,即L k = P k L gradient +λ(1 – P k ) L normal ;The loss L k of each pixel position includes two parts, the first part is the gradient loss L gradient with coefficient terms, and the second part is the normal loss L normal with coefficient terms, that is, L k = P k L gradient + λ (1 – P k ) L normal ;
其中,,是待重建物体的真实表面法向n在位置k的梯度,ζ是计算梯度时使用的邻域像素范围,ζ设置范围为1、2、3、4、5,是预测的表面法向在位置k的梯度;表示网络预测的表面法向,表示真实表面法向;in, , is the gradient of the real surface normal n of the object to be reconstructed at position k , ζ is the neighborhood pixel range used when calculating the gradient, ζ is set in the range of 1, 2, 3, 4, 5, is the predicted surface normal gradient at position k ; represents the surface normal predicted by the network, represents the true surface normal;
梯度损失在网络中可以锐化表面法向的高频表达;P k 为注意力权重图上像素位置k上的值;The gradient loss can sharpen the high-frequency representation of the surface normal in the network; P k is the value at the pixel position k on the attention weight map;
其次,,●代表点乘操作,λ是一个超参,目的是为了梯度损失和法向损失,设置范围为{7、8、9、10};Second, , ● represents the point multiplication operation, λ is a hyperparameter, the purpose is for gradient loss and normal loss, and the setting range is {7, 8, 9, 10};
通过上述(3)注意力权重损失可将(1)表面法向生成网络和(2)注意力权重生成网络建立起联系;Through the above (3) attention weight loss, (1) the surface normal generation network and (2) the attention weight generation network can be connected;
(4)网络训练(4) Network training
训练网络时,利用反向传播算法不断调整优化,最小化上述损失函数,使其在达到设定的循环次数时停止训练,以达到最优效果;或者L normal 小于0.03时,即认为训练已经达到最有效果,停止训练;When training the network, the back-propagation algorithm is used to continuously adjust and optimize to minimize the above loss function, so that it stops training when the set number of cycles is reached to achieve the optimal effect; or when L normal is less than 0.03, it is considered that the training has reached The most effective, stop training;
3)将上述训练好的网络用于光度立体图像的表面法向重建:3) Use the above trained network for surface normal reconstruction of photometric stereo images:
先拍摄s张以上不同光照方向的图像, s≥10,将m 1 , m 2 , ..., m s 和l 1 , l 2 , ..., l s 输入训练好的网络,得到预测的表面法向 。 First shoot more than s images with different lighting directions, s≥10, input m 1 , m 2 , ..., m s and l 1 , l 2 , ..., l s into the trained network to get the predicted surface normal .
所述(1)表面法向生成网络被设计成从图像m 1 ,m 2 , ..., m j 和光照l 1 ,l 2 , ..., l j 中生成需要重建物体的表面法向 具体如下:The (1) surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images m 1 , m 2 , ..., m j and the illumination l 1 , l 2 , ..., l j details as follows:
图像m的分辨率记为p*q,p、q≥2n,n≥4,则m∈ℝp*q*3,其中3表示RGB;表面法向生成网络首先按照m的分辨率p*q将光照l = [x, y, z] ∈ℝ3重复填充至ℝp*q*3的空间中,将填充后的光照记为h,则h∈ℝp*q*3,此时h与m有相同的空间大小,将h与m在第三组维度上连接融合,成为新的张量,新的张量属于ℝp*q*6,在输入j张图像和光照的情况下,得到了j个融合后的张量;The resolution of image m is denoted as p*q, p, q ≥ 2 n , n ≥ 4, then m ∈ℝ p*q*3 , where 3 represents RGB; the surface normal generation network first follows the resolution p* of m q Fill the light l = [ x, y, z ] ∈ℝ 3 into the space of ℝ p*q* 3 repeatedly, and denote the filled light as h , then h ∈ℝ p*q*3 , at this time h It has the same space size as m , and connects h and m in the third set of dimensions to become a new tensor. The new tensor belongs to ℝ p*q*6 . In the case of inputting j images and lighting, Get j fused tensors;
将这些张量分别进行4层卷积层操作,卷积层1、2、3、4的卷积核大小均为3*3,均采用“relu”激活函数,其中第2层和第4层是步长“stride”为2的卷积,第1层和第3层是步长“stride”为1的卷积,卷积层1、2、3、4的特征通道数分别为64,128,128,256;These tensors are respectively subjected to 4-layer convolution layer operations. The convolution kernel sizes of convolution layers 1, 2, 3, and 4 are all 3*3, and the "relu" activation function is used. The second layer and the fourth layer It is a convolution with a stride of 2, the first and third layers are convolutions with a stride of 1, and the feature channels of convolutional layers 1, 2, 3, and 4 are 64 and 128, respectively. , 128, 256;
之后,利用最大池化层从j个经过4层卷积的张量∈ℝp/4*q/4*256池化为一个ℝp/4*q/4*256中的张量;Then, use the max pooling layer to pool from j tensors ∈ ℝ p/4*q/4* 256 through 4 layers of convolution into a tensor in ℝ p/4*q/4*256 ;
再经过卷积层5、6、7、8计算,卷积层5、6、7、8的卷积核大小均为3*3,均采用“relu”激活函数,其中第5层和第7层为转置卷积,第6层和第8层是步长“stride”为1的卷积,卷积层5、6、7、8的特征通道数为128、128、64、3;After the calculation of convolutional layers 5, 6, 7, and 8, the convolution kernel size of convolutional layers 5, 6, 7, and 8 are all 3*3, and the "relu" activation function is used. The layers are transposed convolutions, the sixth and eighth layers are convolutions with a stride of 1, and the number of feature channels for convolutional layers 5, 6, 7, and 8 are 128, 128, 64, and 3;
最后,对第8层卷积得到的张量进行规范化处理,使其模为1,得到需要重建物体的的表面法向。Finally, normalize the tensor obtained by the convolution of the 8th layer, so that its modulus is 1, and obtain the surface normal of the object to be reconstructed. .
所述(2)注意力权重生成网络被设计成从图像m 1 , m 2 , ..., m j 中生成需要重建物体的注意力权重图P具体如下:The (2) attention weight generation network is designed to generate the attention weight map P of the object to be reconstructed from the images m 1 , m 2 , ..., m j as follows:
注意力权重生成网络对图像m∈ℝp*q*3计算其梯度值,该梯度值也属于空间ℝp*q*3,并将其梯度与图像在第三组维度上连接融合,成为新的张量,新的张量属于ℝp*q*6,在输入j张图像和光照的情况下,得到了j个融合后的张量;The attention weight generation network calculates its gradient value for the image m ∈ ℝ p*q* 3 , which also belongs to the space ℝ p*q*3 , and fuses its gradient with the image in the third set of dimensions to become a new , the new tensor belongs to ℝ p*q*6 , in the case of inputting j images and lighting, j fused tensors are obtained;
首先将这些融合后的张量分别进行3层的卷积层操作,这3层的卷积核大小均为3*3,均采用“relu”激活函数,其中第2层的步长“stride”为2,第1层和第3层的步长“stride”为1,四个卷积层的特征通道数分别为64、128、128;First, these fused tensors are respectively subjected to 3-layer convolution layer operations. The size of the convolution kernel of these 3 layers is 3*3, and the "relu" activation function is used, and the step size of the second layer is "stride". is 2, the stride "stride" of the first and third layers is 1, and the number of feature channels of the four convolutional layers is 64, 128, and 128, respectively;
之后,利用最大池化层从j个经过3层卷积的张量∈ℝp/2*q/2*128池化为一个ℝp/2*q/2*128中的张量;Afterwards, use the max pooling layer to pool from j tensors ∈ ℝ p/2*q/2*128 with 3 layers of convolution into a tensor in ℝ p/2*q/2*128 ;
再经过卷积层5、6、7计算,卷积层5、6、7的卷积核大小均为3*3,均采用“relu”激活函数,其中第6层为转置卷积,第5层和第7层是步长“stride”为1的卷积,卷积层5、6、7的特征通道数为128、64、1,从而得到需要重建物体的注意力权重图P 。After the calculation of convolutional layers 5, 6, and 7, the size of the convolution kernels of convolutional layers 5, 6, and 7 are all 3*3, and the "relu" activation function is used. The sixth layer is transposed convolution, and the first Layers 5 and 7 are convolutions with a stride of 1, and the number of feature channels in convolutional layers 5, 6, and 7 is 128, 64, and 1, so as to obtain the attention weight map P of the object to be reconstructed.
所述的基于深度学习的高频区域增强光度立体三维重建方法,其特征是所述图像m的分辨率p*q中,p取值16,32,48,64,q取值16,32,48,64。The described deep learning-based high-frequency region-enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that in the resolution p*q of the image m, p is 16, 32, 48, 64, and q is 16, 32, 48, 64.
所述的基于深度学习的高频区域增强光度立体三维重建方法,其特征是所述ζ设置为1。The deep learning-based high-frequency region enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that the ζ is set to 1.
所述的基于深度学习的高频区域增强光度立体三维重建方法,其特征是所述λ设置为8。The said deep learning-based high-frequency region-enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that the λ is set to 8.
所述的基于深度学习的高频区域增强光度立体三维重建方法,其特征是所述的循环次数设定为30次epoch。The said deep learning-based high-frequency region-enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that the number of cycles is set to 30 epochs.
所述的基于深度学习的高频区域增强光度立体三维重建方法,其特征是所述p取值32,q取值32。The deep learning-based high-frequency region enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that the p value is 32, and the q value is 32.
本发明提出的基于深度学习的高频区域增强的光度立体三维重建方法,通过提出的表面法向生成网络,注意力权重生成网络,分别学习表面法向和高频信息,并利用提出的注意力权重损失进行训练,可以改善褶皱边缘等高频区域表面的重建精度。相比先前传统光度立体方法,提高了三维重建精度,尤其是待重建物体表面的细节。The photometric stereo 3D reconstruction method based on deep learning based on high-frequency region enhancement proposed by the present invention learns the surface normal and high-frequency information respectively through the proposed surface normal generation network and attention weight generation network, and utilizes the proposed attention Weight loss for training can improve the reconstruction accuracy of surfaces in high frequency regions such as wrinkled edges. Compared with the previous traditional photometric stereo method, the accuracy of 3D reconstruction is improved, especially the details of the surface of the object to be reconstructed.
本发明提出的注意力权重损失,可以应用于多种底层视觉任务,提高任务精度,丰富图像的细节,例如深度估计,图像去模糊和图像去雾。The attention weight loss proposed by the present invention can be applied to a variety of underlying visual tasks to improve task accuracy and enrich image details, such as depth estimation, image deblurring, and image dehazing.
附图说明Description of drawings
图1是本发明的流程图。Figure 1 is a flow chart of the present invention.
图2是步骤2)中表面法向生成网络示意图。Figure 2 is a schematic diagram of the surface normal generation network in step 2).
图3是步骤2)中注意力权重生成网络示意图。Figure 3 is a schematic diagram of the attention weight generation network in step 2).
图4是本发明的应用效果示意图,其中第一行为输入图像,第二行为生成的权重图像,第三行为生成的表面法向。4 is a schematic diagram of the application effect of the present invention, wherein the first behavior is an input image, the second behavior is a weighted image generated, and the third behavior is a surface normal generated.
具体实施方式Detailed ways
如图1,基于深度学习的高频区域增强光度立体三维重建方法,其特征是包括以下步骤:As shown in Figure 1, a high-frequency region-enhanced photometric stereo 3D reconstruction method based on deep learning is characterized by including the following steps:
1)利用光度立体系统,拍摄若干张待重建物体的图像:1) Using the photometric stereo system, take several images of the object to be reconstructed:
待重建物体在单个平行白色光源的照射下拍摄图像,以待重建物体中心为坐标轴原点,建立笛卡尔坐标系,则白光光源的位置由该笛卡尔坐标系中向量 l = [x, y, z]表示;The image of the object to be reconstructed is taken under the illumination of a single parallel white light source, and the center of the object to be reconstructed is the origin of the coordinate axis to establish a Cartesian coordinate system, then the position of the white light source is determined by the vector l = [ x, y, z ] means;
改变该光源位置,在另一光照方向下获得拍摄图像;通常需至少拍摄10张以上的在不同光照方向照射下的图像,记作m 1 , m 2 , ..., m j ,同时相对应的光源位置记作l 1 , l 2 , ...,l j ,j为大于等于10的自然数;Change the position of the light source and obtain a photographed image under another illumination direction; usually at least 10 images under different illumination directions need to be photographed, denoted as m 1 , m 2 , ..., m j , while corresponding The position of the light source is denoted as l 1 , l 2 , ..., l j , where j is a natural number greater than or equal to 10;
2)利用深度学习算法输入m 1 ,m 2 , ..., m j 和l 1 ,l 2 , ..., l j ,输出准确的表面法向三维重建:2) Use the deep learning algorithm to input m 1 , m 2 , ..., m j and l 1 , l 2 , ..., l j , and output the accurate three-dimensional reconstruction of the surface normal:
所利用的深度学习算法分为以下四个部分:(1)表面法向生成网络,(2)注意力权重生成网络,(3)注意力权重损失函数联合训练,(4)网络训练;The utilized deep learning algorithm is divided into the following four parts: (1) surface normal generation network, (2) attention weight generation network, (3) joint training of attention weight loss function, (4) network training;
(1)表面法向生成网络被设计成从图像m 1 ,m 2 , ..., m j 和光照l 1 ,l 2 , ..., l j 中生成需要重建物体的表面法向;(1) The surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images m 1 , m 2 , ..., m j and the light l 1 , l 2 , ..., l j ;
图像m的分辨率记为p*q,p、q≥2n,n≥4,则m∈ℝp*q*3,其中3表示RGB;如图2,表面法向生成网络首先按照m的分辨率p*q将光照l = [x, y, z] ∈ℝ3重复填充至ℝp*q*3的空间中,将填充后的光照记为h,则h∈ℝp*q*3,此时h与m有相同的空间大小,将h与m在第三组维度上连接融合,成为新的张量,新的张量属于ℝp*q*6,在输入j张图像和光照的情况下,得到了j个融合后的张量;The resolution of image m is denoted as p*q, p, q ≥ 2 n , n ≥ 4, then m ∈ℝ p*q*3 , where 3 represents RGB; as shown in Figure 2, the surface normal generation network first follows m The resolution p*q fills the light l = [ x, y, z ] ∈ℝ 3 repeatedly into the space of ℝ p*q*3 , and denote the filled light as h , then h ∈ℝ p*q*3 , at this time h and m have the same space size, connect h and m in the third set of dimensions to become a new tensor, the new tensor belongs to ℝ p*q*6 , after inputting j images and lighting In the case of , j fused tensors are obtained;
将这些张量分别进行4层卷积层操作,卷积层1、2、3、4的卷积核大小均为3*3,均采用“relu”激活函数,其中第2层和第4层是步长“stride”为2的卷积,第1层和第3层是步长“stride”为1的卷积,卷积层1、2、3、4的特征通道数分别为64,128,128,256;These tensors are respectively subjected to 4-layer convolution layer operations. The convolution kernel sizes of convolution layers 1, 2, 3, and 4 are all 3*3, and the "relu" activation function is used. The second layer and the fourth layer It is a convolution with a stride of 2, the first and third layers are convolutions with a stride of 1, and the feature channels of convolutional layers 1, 2, 3, and 4 are 64 and 128, respectively. , 128, 256;
之后,利用最大池化层从j个经过4层卷积的张量∈ℝp/4*q/4*256池化为一个ℝp/4*q/4*256中的张量;Then, use the max pooling layer to pool from j tensors ∈ ℝ p/4*q/4* 256 through 4 layers of convolution into a tensor in ℝ p/4*q/4*256 ;
再经过卷积层5、6、7、8计算,卷积层5、6、7、8的卷积核大小均为3*3,均采用“relu”激活函数,其中第5层和第7层为转置卷积,第6层和第8层是步长“stride”为1的卷积,卷积层5、6、7、8的特征通道数为128、128、64、3;After the calculation of convolutional layers 5, 6, 7, and 8, the convolution kernel size of convolutional layers 5, 6, 7, and 8 are all 3*3, and the "relu" activation function is used. The layers are transposed convolutions, the sixth and eighth layers are convolutions with a stride of 1, and the number of feature channels for convolutional layers 5, 6, 7, and 8 are 128, 128, 64, and 3;
最后,对第8层卷积得到的张量进行规范化处理,使其模为1,得到预测的表面法向;Finally, normalize the tensor obtained by the 8th layer convolution to make it modulo 1 to obtain the predicted surface normal ;
(2)注意力权重生成网络被设计成从图像m 1 , m 2 , ..., m j 中生成需要重建物体的注意力权重图:(2) The attention weight generation network is designed to generate the attention weight map of the object to be reconstructed from the images m 1 , m 2 , ..., m j :
注意力权重生成网络对图像m∈ℝp*q*3计算其梯度值,该梯度值也属于空间ℝp*q*3,并将其梯度与图像在第三组维度上连接融合,如图3,成为新的张量,新的张量属于ℝp*q*6,在输入j张图像和光照的情况下,得到了j个融合后的张量;The attention weight generation network calculates its gradient value for the image m ∈ ℝ p*q* 3 , which also belongs to the space ℝ p*q*3 , and fuses its gradient with the image in the third set of dimensions, as shown in the figure 3. Become a new tensor. The new tensor belongs to ℝ p*q*6 . In the case of inputting j images and lighting, j fused tensors are obtained;
首先将这些融合后的张量分别进行3层的卷积层操作,这3层的卷积核大小均为3*3,均采用“relu”激活函数,其中第2层的步长“stride”为2,第1层和第3层的步长“stride”为1,四个卷积层的特征通道数分别为64、128、128;First, these fused tensors are respectively subjected to 3-layer convolution layer operations. The size of the convolution kernel of these 3 layers is 3*3, and the "relu" activation function is used, and the step size of the second layer is "stride". is 2, the stride "stride" of the first and third layers is 1, and the number of feature channels of the four convolutional layers is 64, 128, and 128, respectively;
之后,利用最大池化层从j个经过3层卷积的张量∈ℝp/2*q/2*128池化为一个ℝp/2*q/2*128中的张量;Afterwards, use the max pooling layer to pool from j tensors ∈ ℝ p/2*q/2*128 with 3 layers of convolution into a tensor in ℝ p/2*q/2*128 ;
再经过卷积层5、6、7计算,卷积层5、6、7的卷积核大小均为3*3,均采用“relu”激活函数,其中第6层为转置卷积,第5层和第7层是步长“stride”为1的卷积,卷积层5、6、7的特征通道数为128、64、1,从而得到待重建物体的注意力权重图P ;After the calculation of convolutional layers 5, 6, and 7, the size of the convolution kernels of convolutional layers 5, 6, and 7 are all 3*3, and the "relu" activation function is used. The sixth layer is transposed convolution, and the first The 5th and 7th layers are convolutions with a stride of 1, and the feature channels of convolutional layers 5, 6, and 7 are 128, 64, and 1, so as to obtain the attention weight map P of the object to be reconstructed;
(3)注意力权重损失L是一个逐像素处理的损失函数,它由每一个像素点的损失L k 平均计算得到,公式为;(3) The attention weight loss L is a pixel -by-pixel processing loss function, which is calculated by the average loss Lk of each pixel, and the formula is ;
每一个像素位置的损失L k 包括两部分,第一部分是带有系数项的梯度损失L gradient ,第二部分是带有系数项的法向损失L normal ,即L k = P k L gradient +λ(1 – P k ) L normal ;The loss L k of each pixel position includes two parts, the first part is the gradient loss L gradient with coefficient terms, and the second part is the normal loss L normal with coefficient terms, that is, L k = P k L gradient + λ (1 – P k ) L normal ;
其中,,in, ,
是待重建物体的真实表面法向n在位置k的梯度,ζ是计算梯度时使用的邻域像素范围,ζ设置范围为1、2、3、4、5,本发明中默认设置为1,是预测的表面法向在位置k的梯度;表示网络预测的表面法向,表示真实表面法向; is the gradient of the real surface normal n of the object to be reconstructed at position k , ζ is the neighborhood pixel range used when calculating the gradient, ζ is set in the range of 1, 2, 3, 4, 5, and is set to 1 by default in the present invention, is the predicted surface normal gradient at position k ; represents the surface normal predicted by the network, represents the true surface normal;
梯度损失在网络中可以锐化表面法向的高频表达;P k 为注意力权重图上像素位置k上的值,也即注意力权重为逐像素的损失L k 提供第一个梯度损失分项L gradient 的权重,注意力权重值大的地方,梯度损失的权重就大;The gradient loss can sharpen the high-frequency representation of the surface normal in the network; P k is the value at the pixel position k on the attention weight map, that is, the attention weight provides the first gradient loss score for the pixel-wise loss L k . The weight of the item L gradient , where the attention weight value is large, the weight of the gradient loss is large;
其次,,●代表点乘操作,λ是一个超参,目的是为了梯度损失和法向损失,本文将其设置为8;一般可以设置为{7,8,9,10},取8时可以获得较好的效果;Second, , ● represents the point multiplication operation, λ is a hyperparameter, the purpose is for gradient loss and normal loss, this paper sets it to 8; generally it can be set to {7, 8, 9, 10}, when 8 is taken, it can be compared Good results;
通过上述(3)注意力权重损失可将(1)表面法向生成网络和(2)注意力权重生成网络建立起联系;Through the above (3) attention weight loss, (1) the surface normal generation network and (2) the attention weight generation network can be connected;
(4)网络训练(4) Network training
训练网络时,利用反向传播算法不断调整优化,最小化上述损失函数,使其在达到30次epoch(循环)的时刻停止训练,以达到最优效果;或者L normal 小于0.03时,即认为训练已经达到最有效果,停止训练;When training the network, use the back-propagation algorithm to continuously adjust and optimize to minimize the above loss function, so that it stops training when it reaches 30 epochs (cycles) to achieve the optimal effect; or when L normal is less than 0.03, it is considered that training Has reached the most effective, stop training;
在本发明中,在30个epoch后结束结束网络的训练,此时即认为训练已经达到最优效果;In the present invention, the training of the network is terminated after 30 epochs, at this time, it is considered that the training has reached the optimal effect;
(5)将上述训练好的网络用于光度立体图像的表面法向重建:(5) Use the above trained network for surface normal reconstruction of photometric stereo images:
先拍摄s张以上不同光照方向的图像, s≥10,将m 1 , m 2 , ..., m s 和l 1 , l 2 , ...,l s 输入训练好的网络,得到预测的表面法向。First shoot more than s images with different lighting directions, s ≥ 10, input m 1 , m 2 , ..., m s and l 1 , l 2 , ..., l s into the trained network to get the predicted surface normal .
其中p,q∈{16、32、48、64},λ∈{7,8,910},ζ可以取1,2,3,4,5。where p , q ∈ {16, 32, 48, 64}, λ ∈ {7, 8, 910}, ζ can take 1, 2, 3, 4, 5.
重建效果如图4所示。第一行表示待重建物体拍摄的图像,第二行代表生成的注意力权重图P,第三行代表生成的表面法向。The reconstruction effect is shown in Figure 4. The first row represents the image captured by the object to be reconstructed, the second row represents the generated attention weight map P, and the third row represents the generated surface normal .
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111524515.8A CN113936117B (en) | 2021-12-14 | 2021-12-14 | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111524515.8A CN113936117B (en) | 2021-12-14 | 2021-12-14 | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113936117A true CN113936117A (en) | 2022-01-14 |
CN113936117B CN113936117B (en) | 2022-03-08 |
Family
ID=79288969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111524515.8A Active CN113936117B (en) | 2021-12-14 | 2021-12-14 | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113936117B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114998507A (en) * | 2022-06-07 | 2022-09-02 | 天津大学 | Luminosity three-dimensional reconstruction method based on self-supervision learning |
CN115098563A (en) * | 2022-07-14 | 2022-09-23 | 中国海洋大学 | Time sequence abnormity detection method and system based on GCN and attention VAE |
CN118628371A (en) * | 2024-08-12 | 2024-09-10 | 南开大学 | Surface normal recovery method, device and storage medium based on photometric stereo |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862741A (en) * | 2017-12-10 | 2018-03-30 | 中国海洋大学 | A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning |
CN108510573A (en) * | 2018-04-03 | 2018-09-07 | 南京大学 | A method of the multiple views human face three-dimensional model based on deep learning is rebuild |
CN109146934A (en) * | 2018-06-04 | 2019-01-04 | 成都通甲优博科技有限责任公司 | A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo |
CN110060212A (en) * | 2019-03-19 | 2019-07-26 | 中国海洋大学 | A kind of multispectral photometric stereo surface normal restoration methods based on deep learning |
US20210241478A1 (en) * | 2020-02-03 | 2021-08-05 | Nanotronics Imaging, Inc. | Deep Photometric Learning (DPL) Systems, Apparatus and Methods |
CN113538675A (en) * | 2021-06-30 | 2021-10-22 | 同济人工智能研究院(苏州)有限公司 | Neural network for calculating attention weight for laser point cloud and training method |
CN113762358A (en) * | 2021-08-18 | 2021-12-07 | 江苏大学 | Semi-supervised learning three-dimensional reconstruction method based on relative deep training |
-
2021
- 2021-12-14 CN CN202111524515.8A patent/CN113936117B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862741A (en) * | 2017-12-10 | 2018-03-30 | 中国海洋大学 | A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning |
CN108510573A (en) * | 2018-04-03 | 2018-09-07 | 南京大学 | A method of the multiple views human face three-dimensional model based on deep learning is rebuild |
CN109146934A (en) * | 2018-06-04 | 2019-01-04 | 成都通甲优博科技有限责任公司 | A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo |
CN110060212A (en) * | 2019-03-19 | 2019-07-26 | 中国海洋大学 | A kind of multispectral photometric stereo surface normal restoration methods based on deep learning |
US20210241478A1 (en) * | 2020-02-03 | 2021-08-05 | Nanotronics Imaging, Inc. | Deep Photometric Learning (DPL) Systems, Apparatus and Methods |
CN113538675A (en) * | 2021-06-30 | 2021-10-22 | 同济人工智能研究院(苏州)有限公司 | Neural network for calculating attention weight for laser point cloud and training method |
CN113762358A (en) * | 2021-08-18 | 2021-12-07 | 江苏大学 | Semi-supervised learning three-dimensional reconstruction method based on relative deep training |
Non-Patent Citations (2)
Title |
---|
CHENG-JIAN LIN等: "A Constrained Independent Component Analysis Based Photometric Stereo for 3D Human Face Reconstruction", 《2012 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL》 * |
陈加等: "深度学习在基于单幅图像的物体三维重建中的应用", 《自动化学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114998507A (en) * | 2022-06-07 | 2022-09-02 | 天津大学 | Luminosity three-dimensional reconstruction method based on self-supervision learning |
CN115098563A (en) * | 2022-07-14 | 2022-09-23 | 中国海洋大学 | Time sequence abnormity detection method and system based on GCN and attention VAE |
CN118628371A (en) * | 2024-08-12 | 2024-09-10 | 南开大学 | Surface normal recovery method, device and storage medium based on photometric stereo |
Also Published As
Publication number | Publication date |
---|---|
CN113936117B (en) | 2022-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113936117B (en) | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning | |
CN114549731B (en) | Method and device for generating visual angle image, electronic equipment and storage medium | |
Chen et al. | Point-based multi-view stereo network | |
CN106355570B (en) | A kind of binocular stereo vision matching method of combination depth characteristic | |
CN109377530B (en) | A Binocular Depth Estimation Method Based on Deep Neural Network | |
CN110427968B (en) | A binocular stereo matching method based on detail enhancement | |
CN109598754B (en) | Binocular depth estimation method based on depth convolution network | |
CN111915660B (en) | Binocular disparity matching method and system based on shared features and attention up-sampling | |
CN111833393A (en) | A binocular stereo matching method based on edge information | |
CN113313732A (en) | Forward-looking scene depth estimation method based on self-supervision learning | |
CN112802078A (en) | Depth map generation method and device | |
CN112288788B (en) | Monocular image depth estimation method | |
CN107133914A (en) | For generating the device of three-dimensional color image and method for generating three-dimensional color image | |
CN111553296B (en) | A Binary Neural Network Stereo Vision Matching Method Based on FPGA | |
CN112509021A (en) | Parallax optimization method based on attention mechanism | |
CN102903111B (en) | Large area based on Iamge Segmentation low texture area Stereo Matching Algorithm | |
CN115631223A (en) | Multi-view stereo reconstruction method based on self-adaptive learning and aggregation | |
CN115511708A (en) | Depth map super-resolution method and system based on uncertainty-aware feature transmission | |
CN112991504B (en) | An Improved Hole Filling Method Based on TOF Camera 3D Reconstruction | |
JP7398938B2 (en) | Information processing device and its learning method | |
CN117152330B (en) | Point cloud 3D model mapping method and device based on deep learning | |
Chen et al. | Pisr: Polarimetric neural implicit surface reconstruction for textureless and specular objects | |
CN109934863B (en) | A light field depth information estimation method based on densely connected convolutional neural network | |
Yang et al. | A new RBF reflection model for shape from shading | |
JP7508673B2 (en) | Computer vision method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |