CN113936117A

CN113936117A - High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning

Info

Publication number: CN113936117A
Application number: CN202111524515.8A
Authority: CN
Inventors: 举雅琨; 董军宇; 高峰
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-01-14
Anticipated expiration: 2041-12-14
Also published as: CN113936117B

Abstract

The method comprises the steps of shooting a plurality of images of an object to be reconstructed by using a photometric stereo system, outputting accurate surface normal three-dimensional reconstruction by using a deep learning algorithm, wherein a surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images and illumination; the attention weight generation network generates an attention weight map of an object to be reconstructed from the image; processing the attention weight loss function pixel by pixel; and then using the trained network for surface normal reconstruction of the photometric stereo image. The invention respectively learns the surface normal and high-frequency information through the proposed surface normal generation network and the attention weight generation network, and trains by using the proposed attention weight loss, thereby improving the reconstruction precision of the surface of a high-frequency region such as a fold edge. Compared with the traditional photometric stereo method, the three-dimensional reconstruction precision is improved, and particularly the details of the surface of an object to be reconstructed are improved.

Description

A Photometric Stereo 3D Reconstruction Method Based on Deep Learning for High-frequency Region Enhancement

技术领域technical field

本发明涉及一种基于深度学习的高频区域增强光度立体三维重建方法，属于多度立体三维重建领域。The invention relates to a high-frequency region-enhanced photometric three-dimensional reconstruction method based on deep learning, and belongs to the field of multi-degree three-dimensional reconstruction.

背景技术Background technique

三维重建算法是计算机视觉中非常重要且基础的问题，光度立体算法是一种高精度的逐像素三维重建方法，其利用不同光照方向下的图像提供的灰度变化线索恢复物体表面法向。光度立体在许多高精度三维重建任务求中有着不可替代的位置，例如其在考古勘探，管道检测，海床精细测绘等方面有着重要的应用价值。The 3D reconstruction algorithm is a very important and fundamental problem in computer vision. The photometric stereo algorithm is a high-precision pixel-by-pixel 3D reconstruction method, which uses the grayscale change clues provided by images under different illumination directions to restore the surface normal of the object. Photometric stereo has an irreplaceable position in many high-precision 3D reconstruction tasks. For example, it has important application value in archaeological exploration, pipeline inspection, and fine seabed mapping.

但现有的基于深度学习的光度立体方法在物体表面高频区域的误差很大，例如褶皱、边缘，现有方法在这些区域会生成模糊的三维重建结果，然而，这些区域正是重点关注并需要精确重建的地方。However, the existing deep learning-based photometric stereo methods have large errors in the high-frequency areas of the object surface, such as wrinkles and edges. The existing methods will generate blurred 3D reconstruction results in these areas. However, these areas are the focus of attention and Where precise reconstruction is required.

发明内容SUMMARY OF THE INVENTION

针对以上问题，本发明目的是提出了一种基于深度学习的高频区域增强光度立体三维重建方法，以克服现有技术的不足。In view of the above problems, the purpose of the present invention is to propose a high-frequency region-enhanced photometric three-dimensional reconstruction method based on deep learning, so as to overcome the deficiencies of the prior art.

基于深度学习的高频区域增强光度立体三维重建方法，其特征是包括以下步骤：A high-frequency region-enhanced photometric stereo 3D reconstruction method based on deep learning is characterized by including the following steps:

1）利用光度立体系统，拍摄若干张待重建物体的图像：1) Using the photometric stereo system, take several images of the object to be reconstructed:

待重建物体在单个平行白色光源的照射下拍摄图像，以待重建物体中心为坐标轴原点，建立笛卡尔坐标系，则白光光源的位置由该笛卡尔坐标系中向量 l = [x, y, z]表示；The image of the object to be reconstructed is taken under the illumination of a single parallel white light source, and the center of the object to be reconstructed is the origin of the coordinate axis to establish a Cartesian coordinate system, then the position of the white light source is determined by the vector l = [ x, y, z ] means;

改变该光源位置，在另一光照方向下获得拍摄图像；通常需至少拍摄10张以上的在不同光照方向照射下的图像，记作m ₁ , m ₂ , ..., m _j ，同时相对应的光源位置记作l ₁ , l ₂ , ...,l _j，j为大于等于10的自然数；Change the position of the light source and obtain a photographed image under another illumination direction; usually, at least 10 images under different illumination directions need to be photographed, denoted as m ₁ , m ₂ , ..., m _j , while corresponding The position of the light source is denoted as l ₁ , l ₂ , ..., l _j , where j is a natural number greater than or equal to 10;

2）利用深度学习算法输入m ₁ ,m ₂ , ..., m _j和l ₁ ,l ₂ , ..., l _j，输出准确的表面法向三维重建：2) Use the deep learning algorithm to input m ₁ , m ₂ , ..., m _j and l ₁ , l ₂ , ..., l _j , and output the accurate three-dimensional reconstruction of the surface normal:

所利用的深度学习算法分为以下四个部分：（1）表面法向生成网络，（2）注意力权重生成网络，（3）注意力权重损失函数联合训练,（4）网络训练；其中：The utilized deep learning algorithm is divided into the following four parts: (1) surface normal generation network, (2) attention weight generation network, (3) joint training of attention weight loss function, (4) network training; among which:

（1）表面法向生成网络被设计成从图像m ₁ ,m ₂ , ..., m _j和光照l ₁ ,l ₂ , ..., l _j中生成需要重建物体的表面法向

；(1) The surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images m ₁ , m ₂ , ..., m _j and the light l ₁ , l ₂ , ..., l _j

;

（2）注意力权重生成网络被设计成从图像m ₁ , m ₂ , ..., m _j中生成需要重建物体的注意力权重图P ；(2) The attention weight generation network is designed to generate the attention weight map P of the object to be reconstructed from the images m ₁ , m ₂ , ..., m _j ;

（3）注意力权重损失L是一个逐像素处理的损失函数，它由每一个像素点的损失L _k平均计算得到，公式为

；p*q为图像m的分辨率，p、q≥2ⁿ，n≥4；(3) The attention weight loss L is a _pixel -by-pixel processing loss function, which is calculated by the average loss Lk of each pixel, and the formula is

; p*q is the resolution of image m , p, q ≥ 2 ⁿ , n ≥ 4;

每一个像素位置的损失L _k包括两部分，第一部分是带有系数项的梯度损失L _gradient，第二部分是带有系数项的法向损失L _normal，即L _{k =} P _k L _gradient+λ(1 – P _k) L _normal；The loss L _k of each pixel position includes two parts, the first part is the gradient loss L _gradient with coefficient terms, and the second part is the normal loss L _normal with coefficient terms, that is, L _{k =} P _k L _gradient + λ (1 – P _k ) L _normal ;

其中，

，

是待重建物体的真实表面法向n在位置k的梯度，ζ是计算梯度时使用的邻域像素范围，ζ设置范围为1、2、3、4、5，

是预测的表面法向

在位置k的梯度；

表示网络预测的表面法向，

表示真实表面法向；in,

,

is the gradient of the real surface normal n of the object to be reconstructed at position k , ζ is the neighborhood pixel range used when calculating the gradient, ζ is set in the range of 1, 2, 3, 4, 5,

is the predicted surface normal

gradient at position k ;

represents the surface normal predicted by the network,

represents the true surface normal;

梯度损失在网络中可以锐化表面法向的高频表达；P _k为注意力权重图上像素位置k上的值；The gradient loss can sharpen the high-frequency representation of the surface normal in the network; P _k is the value at the pixel position k on the attention weight map;

其次，

，●代表点乘操作，λ是一个超参，目的是为了梯度损失和法向损失，设置范围为{7、8、9、10}；Second,

, ● represents the point multiplication operation, λ is a hyperparameter, the purpose is for gradient loss and normal loss, and the setting range is {7, 8, 9, 10};

通过上述（3）注意力权重损失可将（1）表面法向生成网络和（2）注意力权重生成网络建立起联系；Through the above (3) attention weight loss, (1) the surface normal generation network and (2) the attention weight generation network can be connected;

（4）网络训练(4) Network training

训练网络时，利用反向传播算法不断调整优化，最小化上述损失函数，使其在达到设定的循环次数时停止训练，以达到最优效果；或者L _normal小于0.03时，即认为训练已经达到最有效果，停止训练；When training the network, the back-propagation algorithm is used to continuously adjust and optimize to minimize the above loss function, so that it stops training when the set number of cycles is reached to achieve the optimal effect; or when L _normal is less than 0.03, it is considered that the training has reached The most effective, stop training;

3）将上述训练好的网络用于光度立体图像的表面法向重建：3) Use the above trained network for surface normal reconstruction of photometric stereo images:

先拍摄s张以上不同光照方向的图像， s≥10，将m ₁ , m ₂ , ..., m _s和l ₁ , l ₂ , ..., l _s输入训练好的网络，得到预测的表面法向

。 First shoot more than s images with different lighting directions, s≥10, input m ₁ , m ₂ , ..., m _s and l ₁ , l ₂ , ..., l _s into the trained network to get the predicted surface normal

.

所述（1）表面法向生成网络被设计成从图像m ₁ ,m ₂ , ..., m _j和光照l ₁ ,l ₂ , ..., l _j中生成需要重建物体的表面法向

具体如下：The (1) surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images m ₁ , m ₂ , ..., m _j and the illumination l ₁ , l ₂ , ..., l _j

details as follows:

图像m的分辨率记为p*q，p、q≥2ⁿ，n≥4，则m∈ℝ^p*q*3，其中3表示RGB；表面法向生成网络首先按照m的分辨率p*q将光照l = [x, y, z] ∈ℝ³重复填充至ℝ^p*q*3的空间中，将填充后的光照记为h，则h∈ℝ^p*q*3，此时h与m有相同的空间大小，将h与m在第三组维度上连接融合，成为新的张量，新的张量属于ℝ^p*q*6，在输入j张图像和光照的情况下，得到了j个融合后的张量；The resolution of image m is denoted as p*q, p, q ≥ 2 ⁿ , n ≥ 4, then m ∈ℝ ^p*q*3 , where 3 represents RGB; the surface normal generation network first follows the resolution p* of m q Fill the light l = [ x, y, z ] ∈ℝ ³ into the space of ℝ ^p*q* 3 repeatedly, and denote the filled light as h , then h ∈ℝ ^p*q*3 , at this time h It has the same space size as m , and connects h and m in the third set of dimensions to become a new tensor. The new tensor belongs to ℝ ^p*q*6 . In the case of inputting j images and lighting, Get j fused tensors;

将这些张量分别进行4层卷积层操作，卷积层1、2、3、4的卷积核大小均为3*3，均采用“relu”激活函数，其中第2层和第4层是步长“stride”为2的卷积，第1层和第3层是步长“stride”为1的卷积，卷积层1、2、3、4的特征通道数分别为64，128，128，256；These tensors are respectively subjected to 4-layer convolution layer operations. The convolution kernel sizes of convolution layers 1, 2, 3, and 4 are all 3*3, and the "relu" activation function is used. The second layer and the fourth layer It is a convolution with a stride of 2, the first and third layers are convolutions with a stride of 1, and the feature channels of convolutional layers 1, 2, 3, and 4 are 64 and 128, respectively. , 128, 256;

之后，利用最大池化层从j个经过4层卷积的张量∈ℝ^p/4*q/4*256池化为一个ℝ^p/4*q/4*256中的张量；Then, use the max pooling layer to pool from j tensors ∈ ℝ p/4*q/4* ²⁵⁶ through 4 layers of convolution into a tensor in ℝ ^p/4*q/4*256 ;

再经过卷积层5、6、7、8计算，卷积层5、6、7、8的卷积核大小均为3*3，均采用“relu”激活函数，其中第5层和第7层为转置卷积，第6层和第8层是步长“stride”为1的卷积，卷积层5、6、7、8的特征通道数为128、128、64、3；After the calculation of convolutional layers 5, 6, 7, and 8, the convolution kernel size of convolutional layers 5, 6, 7, and 8 are all 3*3, and the "relu" activation function is used. The layers are transposed convolutions, the sixth and eighth layers are convolutions with a stride of 1, and the number of feature channels for convolutional layers 5, 6, 7, and 8 are 128, 128, 64, and 3;

最后，对第8层卷积得到的张量进行规范化处理，使其模为1，得到需要重建物体的的表面法向

。Finally, normalize the tensor obtained by the convolution of the 8th layer, so that its modulus is 1, and obtain the surface normal of the object to be reconstructed.

.

所述（2）注意力权重生成网络被设计成从图像m ₁ , m ₂ , ..., m _j中生成需要重建物体的注意力权重图P具体如下：The (2) attention weight generation network is designed to generate the attention weight map P of the object to be reconstructed from the images m ₁ , m ₂ , ..., m _j as follows:

注意力权重生成网络对图像m∈ℝ^p*q*3计算其梯度值，该梯度值也属于空间ℝ^p*q*3，并将其梯度与图像在第三组维度上连接融合，成为新的张量，新的张量属于ℝ^p*q*6，在输入j张图像和光照的情况下，得到了j个融合后的张量；The attention weight generation network calculates its gradient value for the image m ∈ ℝ p*q* ³ , which also belongs to the space ℝ ^p*q*3 , and fuses its gradient with the image in the third set of dimensions to become a new , the new tensor belongs to ℝ ^p*q*6 , in the case of inputting j images and lighting, j fused tensors are obtained;

首先将这些融合后的张量分别进行3层的卷积层操作，这3层的卷积核大小均为3*3，均采用“relu”激活函数，其中第2层的步长“stride”为2，第1层和第3层的步长“stride”为1，四个卷积层的特征通道数分别为64、128、128；First, these fused tensors are respectively subjected to 3-layer convolution layer operations. The size of the convolution kernel of these 3 layers is 3*3, and the "relu" activation function is used, and the step size of the second layer is "stride". is 2, the stride "stride" of the first and third layers is 1, and the number of feature channels of the four convolutional layers is 64, 128, and 128, respectively;

之后，利用最大池化层从j个经过3层卷积的张量∈ℝ^p/2*q/2*128池化为一个ℝ^p/2*q/2*128中的张量；Afterwards, use the max pooling layer to pool from j tensors ^∈ ℝ p/2*q/2*128 with 3 layers of convolution into a tensor in ℝ ^p/2*q/2*128 ;

再经过卷积层5、6、7计算，卷积层5、6、7的卷积核大小均为3*3，均采用“relu”激活函数，其中第6层为转置卷积，第5层和第7层是步长“stride”为1的卷积，卷积层5、6、7的特征通道数为128、64、1，从而得到需要重建物体的注意力权重图P 。After the calculation of convolutional layers 5, 6, and 7, the size of the convolution kernels of convolutional layers 5, 6, and 7 are all 3*3, and the "relu" activation function is used. The sixth layer is transposed convolution, and the first Layers 5 and 7 are convolutions with a stride of 1, and the number of feature channels in convolutional layers 5, 6, and 7 is 128, 64, and 1, so as to obtain the attention weight map P of the object to be reconstructed.

所述的基于深度学习的高频区域增强光度立体三维重建方法，其特征是所述图像m的分辨率p*q中，p取值16，32，48，64，q取值16，32，48，64。The described deep learning-based high-frequency region-enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that in the resolution p*q of the image m, p is 16, 32, 48, 64, and q is 16, 32, 48, 64.

所述的基于深度学习的高频区域增强光度立体三维重建方法，其特征是所述ζ设置为1。The deep learning-based high-frequency region enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that the ζ is set to 1.

所述的基于深度学习的高频区域增强光度立体三维重建方法，其特征是所述λ设置为8。The said deep learning-based high-frequency region-enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that the λ is set to 8.

所述的基于深度学习的高频区域增强光度立体三维重建方法，其特征是所述的循环次数设定为30次epoch。The said deep learning-based high-frequency region-enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that the number of cycles is set to 30 epochs.

所述的基于深度学习的高频区域增强光度立体三维重建方法，其特征是所述p取值32，q取值32。The deep learning-based high-frequency region enhanced photometric stereoscopic three-dimensional reconstruction method is characterized in that the p value is 32, and the q value is 32.

本发明提出的基于深度学习的高频区域增强的光度立体三维重建方法，通过提出的表面法向生成网络，注意力权重生成网络，分别学习表面法向和高频信息，并利用提出的注意力权重损失进行训练，可以改善褶皱边缘等高频区域表面的重建精度。相比先前传统光度立体方法，提高了三维重建精度，尤其是待重建物体表面的细节。The photometric stereo 3D reconstruction method based on deep learning based on high-frequency region enhancement proposed by the present invention learns the surface normal and high-frequency information respectively through the proposed surface normal generation network and attention weight generation network, and utilizes the proposed attention Weight loss for training can improve the reconstruction accuracy of surfaces in high frequency regions such as wrinkled edges. Compared with the previous traditional photometric stereo method, the accuracy of 3D reconstruction is improved, especially the details of the surface of the object to be reconstructed.

本发明提出的注意力权重损失，可以应用于多种底层视觉任务，提高任务精度，丰富图像的细节，例如深度估计，图像去模糊和图像去雾。The attention weight loss proposed by the present invention can be applied to a variety of underlying visual tasks to improve task accuracy and enrich image details, such as depth estimation, image deblurring, and image dehazing.

附图说明Description of drawings

图1是本发明的流程图。Figure 1 is a flow chart of the present invention.

图2是步骤2）中表面法向生成网络示意图。Figure 2 is a schematic diagram of the surface normal generation network in step 2).

图3是步骤2）中注意力权重生成网络示意图。Figure 3 is a schematic diagram of the attention weight generation network in step 2).

图4是本发明的应用效果示意图，其中第一行为输入图像，第二行为生成的权重图像，第三行为生成的表面法向。4 is a schematic diagram of the application effect of the present invention, wherein the first behavior is an input image, the second behavior is a weighted image generated, and the third behavior is a surface normal generated.

具体实施方式Detailed ways

如图1，基于深度学习的高频区域增强光度立体三维重建方法，其特征是包括以下步骤：As shown in Figure 1, a high-frequency region-enhanced photometric stereo 3D reconstruction method based on deep learning is characterized by including the following steps:

改变该光源位置，在另一光照方向下获得拍摄图像；通常需至少拍摄10张以上的在不同光照方向照射下的图像，记作m ₁ , m ₂ , ..., m _j ，同时相对应的光源位置记作l ₁ , l ₂ , ...,l _j，j为大于等于10的自然数；Change the position of the light source and obtain a photographed image under another illumination direction; usually at least 10 images under different illumination directions need to be photographed, denoted as m ₁ , m ₂ , ..., m _j , while corresponding The position of the light source is denoted as l ₁ , l ₂ , ..., l _j , where j is a natural number greater than or equal to 10;

所利用的深度学习算法分为以下四个部分：（1）表面法向生成网络，（2）注意力权重生成网络，（3）注意力权重损失函数联合训练，（4）网络训练；The utilized deep learning algorithm is divided into the following four parts: (1) surface normal generation network, (2) attention weight generation network, (3) joint training of attention weight loss function, (4) network training;

;

图像m的分辨率记为p*q，p、q≥2ⁿ，n≥4，则m∈ℝ^p*q*3，其中3表示RGB；如图2，表面法向生成网络首先按照m的分辨率p*q将光照l = [x, y, z] ∈ℝ³重复填充至ℝ^p*q*3的空间中，将填充后的光照记为h，则h∈ℝ^p*q*3，此时h与m有相同的空间大小，将h与m在第三组维度上连接融合，成为新的张量，新的张量属于ℝ^p*q*6，在输入j张图像和光照的情况下，得到了j个融合后的张量；The resolution of image m is denoted as p*q, p, q ≥ 2 ⁿ , n ≥ 4, then m ∈ℝ ^p*q*3 , where 3 represents RGB; as shown in Figure 2, the surface normal generation network first follows m The resolution p*q fills the light l = [ x, y, z ] ∈ℝ ³ repeatedly into the space of ℝ ^p*q*3 , and denote the filled light as h , then h ∈ℝ ^p*q*3 , at this time h and m have the same space size, connect h and m in the third set of dimensions to become a new tensor, the new tensor belongs to ℝ ^p*q*6 , after inputting j images and lighting In the case of , j fused tensors are obtained;

最后，对第8层卷积得到的张量进行规范化处理，使其模为1，得到预测的表面法向

；Finally, normalize the tensor obtained by the 8th layer convolution to make it modulo 1 to obtain the predicted surface normal

;

（2）注意力权重生成网络被设计成从图像m ₁ , m ₂ , ..., m _j中生成需要重建物体的注意力权重图：(2) The attention weight generation network is designed to generate the attention weight map of the object to be reconstructed from the images m ₁ , m ₂ , ..., m _j :

注意力权重生成网络对图像m∈ℝ^p*q*3计算其梯度值，该梯度值也属于空间ℝ^p*q*3，并将其梯度与图像在第三组维度上连接融合，如图3，成为新的张量，新的张量属于ℝ^p*q*6，在输入j张图像和光照的情况下，得到了j个融合后的张量；The attention weight generation network calculates its gradient value for the image m ∈ ℝ p*q* ³ , which also belongs to the space ℝ ^p*q*3 , and fuses its gradient with the image in the third set of dimensions, as shown in the figure 3. Become a new tensor. The new tensor belongs to ℝ ^p*q*6 . In the case of inputting j images and lighting, j fused tensors are obtained;

再经过卷积层5、6、7计算，卷积层5、6、7的卷积核大小均为3*3，均采用“relu”激活函数，其中第6层为转置卷积，第5层和第7层是步长“stride”为1的卷积，卷积层5、6、7的特征通道数为128、64、1，从而得到待重建物体的注意力权重图P ；After the calculation of convolutional layers 5, 6, and 7, the size of the convolution kernels of convolutional layers 5, 6, and 7 are all 3*3, and the "relu" activation function is used. The sixth layer is transposed convolution, and the first The 5th and 7th layers are convolutions with a stride of 1, and the feature channels of convolutional layers 5, 6, and 7 are 128, 64, and 1, so as to obtain the attention weight map P of the object to be reconstructed;

；(3) The attention weight loss L is a _pixel -by-pixel processing loss function, which is calculated by the average loss Lk of each pixel, and the formula is

;

其中，

，in,

,

是待重建物体的真实表面法向n在位置k的梯度，ζ是计算梯度时使用的邻域像素范围，ζ设置范围为1、2、3、4、5，本发明中默认设置为1，

是预测的表面法向

在位置k的梯度；

表示网络预测的表面法向，

表示真实表面法向；

is the gradient of the real surface normal n of the object to be reconstructed at position k , ζ is the neighborhood pixel range used when calculating the gradient, ζ is set in the range of 1, 2, 3, 4, 5, and is set to 1 by default in the present invention,

is the predicted surface normal

gradient at position k ;

represents the surface normal predicted by the network,

represents the true surface normal;

梯度损失在网络中可以锐化表面法向的高频表达；P _k为注意力权重图上像素位置k上的值，也即注意力权重为逐像素的损失L _k提供第一个梯度损失分项L _gradient的权重，注意力权重值大的地方，梯度损失的权重就大；The gradient loss can sharpen the high-frequency representation of the surface normal in the network; P _k is the value at the pixel position k on the attention weight map, that is, the attention weight provides the first gradient loss score for the pixel-wise loss L _k . The weight of the item L _gradient , where the attention weight value is large, the weight of the gradient loss is large;

其次，

，●代表点乘操作，λ是一个超参，目的是为了梯度损失和法向损失，本文将其设置为8；一般可以设置为{7,8,9,10}，取8时可以获得较好的效果；Second,

, ● represents the point multiplication operation, λ is a hyperparameter, the purpose is for gradient loss and normal loss, this paper sets it to 8; generally it can be set to {7, 8, 9, 10}, when 8 is taken, it can be compared Good results;

（4）网络训练(4) Network training

训练网络时，利用反向传播算法不断调整优化，最小化上述损失函数，使其在达到30次epoch（循环）的时刻停止训练，以达到最优效果；或者L _normal小于0.03时，即认为训练已经达到最有效果，停止训练；When training the network, use the back-propagation algorithm to continuously adjust and optimize to minimize the above loss function, so that it stops training when it reaches 30 epochs (cycles) to achieve the optimal effect; or when L _normal is less than 0.03, it is considered that training Has reached the most effective, stop training;

在本发明中，在30个epoch后结束结束网络的训练，此时即认为训练已经达到最优效果；In the present invention, the training of the network is terminated after 30 epochs, at this time, it is considered that the training has reached the optimal effect;

（5）将上述训练好的网络用于光度立体图像的表面法向重建：(5) Use the above trained network for surface normal reconstruction of photometric stereo images:

先拍摄s张以上不同光照方向的图像， s≥10，将m₁, m₂, ..., m_s和l₁, l₂, ...,l_s输入训练好的网络，得到预测的表面法向

。First shoot more than s images with different lighting directions, s ≥ 10, input m ₁ , m ₂ , ..., m _s and l ₁ , l ₂ , ..., l _s into the trained network to get the predicted surface normal

.

其中p，q∈{16、32、48、64}，λ∈{7,8,910}，ζ可以取1，2，3，4，5。where p , q ∈ {16, 32, 48, 64}, λ ∈ {7, 8, 910}, ζ can take 1, 2, 3, 4, 5.

重建效果如图4所示。第一行表示待重建物体拍摄的图像，第二行代表生成的注意力权重图P，第三行代表生成的表面法向

。The reconstruction effect is shown in Figure 4. The first row represents the image captured by the object to be reconstructed, the second row represents the generated attention weight map P, and the third row represents the generated surface normal

.

Claims

1. A photometric stereoscopic three-dimensional reconstruction method based on high-frequency region enhancement based on deep learning, which is characterized by comprising the following steps:

1) Using the photometric stereo system, take several images of the object to be reconstructed:

The image of the object to be reconstructed is taken under the illumination of a single parallel white light source, and the center of the object to be reconstructed is the origin of the coordinate axis to establish a Cartesian coordinate system, then the position of the white light source is determined by the vector l = [ x, y, z ] means;

Change the position of the light source and obtain a photographed image under another illumination direction; usually, at least 10 images under different illumination directions need to be photographed, denoted as m ₁ , m ₂ , ..., m _j , while corresponding The position of the light source is denoted as l ₁ , l ₂ , ..., l _j , where j is a natural number greater than or equal to 10;

2) Use the deep learning algorithm to input m ₁ , m ₂ , ..., m _j and l ₁ , l ₂ , ..., l _j , and output the accurate three-dimensional reconstruction of the surface normal:

The utilized deep learning algorithm is divided into the following four parts: (1) surface normal generation network, (2) attention weight generation network, (3) joint training of attention weight loss function, (4) network training; among which:

(1) The surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images m ₁ , m ₂ , ..., m _j and the light l ₁ , l ₂ , ..., l _j

;

(2) The attention weight generation network is designed to generate the attention weight map P of the object to be reconstructed from the images m ₁ , m ₂ , ..., m _j ;

(3) The attention weight loss L is a _pixel -by-pixel processing loss function, which is calculated by the average loss Lk of each pixel, and the formula is

; p*q is the resolution of image m , p, q ≥ 2 ⁿ , n ≥ 4;

The loss L _k of each pixel position includes two parts, the first part is the gradient loss L _gradient with coefficient terms, and the second part is the normal loss L _normal with coefficient terms, that is, L _{k =} P _k L _gradient + λ (1 – P _k ) L _normal ;

in,

;

is the gradient of the true surface normal n of the object to be reconstructed at position k ;

ζ is the neighborhood pixel range used when calculating the gradient, and the setting range of ζ is 1, 2, 3, 4, and 5;

is the predicted surface normal

gradient at position k ;

represents the surface normal predicted by the network,

represents the true surface normal;

P _k is the value at pixel position k on the attention weight map;

Second,

Through the above (3) attention weight loss, (1) the surface normal generation network and (2) the attention weight generation network can be connected;

(4) Network training

When training the network, the back-propagation algorithm is used to continuously adjust and optimize to minimize the above loss function, so that it stops training when the set number of cycles is reached to achieve the optimal effect; or when L _normal is less than 0.03, it is considered that the training has reached The most effective, stop training;

3) Use the above trained network for surface normal reconstruction of photometric stereo images:

First shoot more than s images with different lighting directions, s≥10, input m ₁ , m ₂ , ..., m _s and l ₁ , l ₂ , ..., l _s into the trained network to get the predicted surface normal

.

2. The photometric stereo 3D reconstruction method based on deep learning for high frequency region enhancement according to claim 1, wherein (1) the surface normal generation network is designed to generate images from images m ₁ , m ₂ , ..., m _j and lighting l ₁ , l ₂ , ..., l _j generate the surface normal of the object that needs to be reconstructed

details as follows:

The resolution of image m is denoted as p*q, p, q ≥ 2 ⁿ , n ≥ 4, then m ∈ℝ ^p*q*3 , where 3 represents RGB; the surface normal generation network first follows the resolution p* of m q Fill the light l = [ x, y, z ] ∈ℝ ³ into the space of ℝ ^p*q* 3 repeatedly, and denote the filled light as h , then h ∈ℝ ^p*q*3 , at this time h It has the same space size as m , and connects h and m in the third set of dimensions to become a new tensor. The new tensor belongs to ℝ ^p*q*6 . In the case of inputting j images and lighting, Get j fused tensors;

These tensors are respectively subjected to 4-layer convolution layer operations. The convolution kernel sizes of convolution layers 1, 2, 3, and 4 are all 3*3, and the "relu" activation function is used. The second layer and the fourth layer It is a convolution with a stride of 2, the first and third layers are convolutions with a stride of 1, and the feature channels of convolutional layers 1, 2, 3, and 4 are 64 and 128, respectively. , 128, 256;

Then, use the max pooling layer to pool from j tensors ∈ ℝ p/4*q/4* ²⁵⁶ through 4 layers of convolution into a tensor in ℝ ^p/4*q/4*256 ;

After the calculation of convolutional layers 5, 6, 7, and 8, the convolution kernel size of convolutional layers 5, 6, 7, and 8 are all 3*3, and the "relu" activation function is used. The layers are transposed convolutions, the sixth and eighth layers are convolutions with a stride of 1, and the number of feature channels for convolutional layers 5, 6, 7, and 8 are 128, 128, 64, and 3;

Finally, normalize the tensor obtained by the convolution of the 8th layer, so that its modulus is 1, and obtain the surface normal of the object to be reconstructed.

.

3. The photometric stereo 3D reconstruction method based on deep learning for high frequency region enhancement according to claim 1, wherein (2) the attention weight generation network is designed to generate images from images m ₁ , m ₂ , ..., The attention weight map P generated in m _j for the object to be reconstructed is as follows:

The attention weight generation network calculates its gradient value for the image m ∈ ℝ p*q* ³ , which also belongs to the space ℝ ^p*q*3 , and fuses its gradient with the image in the third set of dimensions to become a new , the new tensor belongs to ℝ ^p*q*6 , in the case of inputting j images and lighting, j fused tensors are obtained;

First, these fused tensors are respectively subjected to 3-layer convolution layer operations. The size of the convolution kernel of these 3 layers is 3*3, and the "relu" activation function is used, and the step size of the second layer is "stride". is 2, the stride "stride" of the first and third layers is 1, and the number of feature channels of the four convolutional layers is 64, 128, and 128, respectively;

Afterwards, use the max pooling layer to pool from j tensors ^∈ ℝ p/2*q/2*128 with 3 layers of convolution into a tensor in ℝ ^p/2*q/2*128 ;

After the calculation of convolutional layers 5, 6, and 7, the size of the convolution kernels of convolutional layers 5, 6, and 7 are all 3*3, and the "relu" activation function is used. The sixth layer is transposed convolution, and the first Layers 5 and 7 are convolutions with a stride of 1, and the number of feature channels in convolutional layers 5, 6, and 7 is 128, 64, and 1, so as to obtain the attention weight map P of the object to be reconstructed.

4. The photometric stereoscopic three-dimensional reconstruction method based on deep learning of high-frequency region enhancement as claimed in claim 1, characterized in that in the resolution p*q of the image m, p takes a value of 16, 32, 48, 64, q takes values 16, 32, 48, 64.

5 . The photometric stereoscopic three-dimensional reconstruction method for high-frequency region enhancement based on deep learning according to claim 1 , wherein the ζ is set to 1. 6 .

6 . The photometric stereoscopic three-dimensional reconstruction method for high-frequency region enhancement based on deep learning according to claim 1 , wherein the λ is set to 8. 7 .

7 . The photometric stereoscopic three-dimensional reconstruction method for high-frequency region enhancement based on deep learning according to claim 1 , wherein the number of cycles is set to 30 epochs. 8 .

8 . The photometric stereoscopic three-dimensional reconstruction method for high-frequency region enhancement based on deep learning according to claim 4 , wherein the p value is 32, and the q value is 32. 9 .