CN116823914A - Unsupervised focal stack depth estimation method based on all-focusing image synthesis - Google Patents

Unsupervised focal stack depth estimation method based on all-focusing image synthesis Download PDF

Info

Publication number
CN116823914A
CN116823914A CN202311101094.7A CN202311101094A CN116823914A CN 116823914 A CN116823914 A CN 116823914A CN 202311101094 A CN202311101094 A CN 202311101094A CN 116823914 A CN116823914 A CN 116823914A
Authority
CN
China
Prior art keywords
image
focus
representing
focal stack
pyramid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311101094.7A
Other languages
Chinese (zh)
Other versions
CN116823914B (en
Inventor
黄章进
周萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202311101094.7A priority Critical patent/CN116823914B/en
Publication of CN116823914A publication Critical patent/CN116823914A/en
Application granted granted Critical
Publication of CN116823914B publication Critical patent/CN116823914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an unsupervised focal stack depth estimation method based on full-focusing image synthesis, which comprises the following steps: s1, performing all-focus image calculation by using an image pyramid-based and focus measurement operator to obtain corresponding all-focus images, and fusing the obtained all-focus images to serve as supervision information; s2, carrying out high-frequency noise filtration and preliminary feature extraction on the focal stack through a three-dimensional perception module; s3, introducing a three-dimensional polarization self-attention mechanism into a focal stack, and dividing an input feature map into a channel polarization feature map and a space polarization feature map; and S4, positioning the layer where the maximum definition of the focal stack is located by adopting a layered depth probability prediction module, outputting a corresponding probability value, determining the layer where the optimal definition is located, and obtaining the full-focusing image. The method has relatively high accuracy and good generalization performance in the aspect of depth prediction, is suitable for depth estimation tasks under different scenes, and has high practicability.

Description

基于全对焦图像合成的无监督焦点堆栈深度估计方法Unsupervised focus stack depth estimation method based on fully focused image synthesis

技术领域Technical field

本发明涉及单目深度估计技术领域,尤其涉及一种基于全对焦图像合成的无监督焦点堆栈深度估计方法。The invention relates to the technical field of monocular depth estimation, and in particular to an unsupervised focus stack depth estimation method based on full-focus image synthesis.

背景技术Background technique

有监督方法在深度估计任务上表现出较高准确性,但局限在于需要深度真值,这在实际应用场景中可能难以获得。近年来,随着深度学习技术的不断发展和计算机视觉领域的不断探索,无监督单目深度估计领域取得了长足的进展。无监督单目深度估计是指在没有深度标签的情况下,通过计算机视觉算法推测场景的深度信息。无监督焦点堆栈深度估计可分为两类,即重建监督和辅助监督。Supervised methods show high accuracy in depth estimation tasks, but are limited in that they require true depth values, which may be difficult to obtain in practical application scenarios. In recent years, with the continuous development of deep learning technology and continuous exploration in the field of computer vision, the field of unsupervised monocular depth estimation has made great progress. Unsupervised monocular depth estimation refers to inferring the depth information of the scene through computer vision algorithms without depth labels. Unsupervised focus stack depth estimation can be divided into two categories, namely reconstruction supervision and auxiliary supervision.

重建监督通过网络的重建损失对网络进行监督学习,从而学习到深度信息,将无监督焦点堆栈深度估计视为多视角单目深度估计的一种特殊情况,通过利用对焦序列的模糊差异来估计场景深度,然后,利用对焦图和估计的中间深度重新对焦,输出焦点堆栈,并利用重建损失进行监督学习。然而,由于深度估计任务的不适定性,重建模型容易导致多个深度解相互竞争,难以确定最优解,因此网络结构非常不稳定,同时,中间表示易被解释为焦点堆栈的信息压缩编码,导致模型难以收敛,因此通常需要引入额外的损失来对中间表示进行约束。Reconstruction supervision performs supervised learning on the network through the reconstruction loss of the network to learn depth information. Unsupervised focus stack depth estimation is regarded as a special case of multi-view monocular depth estimation, and the scene is estimated by exploiting the blur difference of the focus sequence. Depth is then refocused using the focus map and estimated intermediate depth, a focus stack is output, and a reconstruction loss is used for supervised learning. However, due to the ill-posedness of the depth estimation task, reconstructing the model easily leads to multiple depth solutions competing with each other, making it difficult to determine the optimal solution. Therefore, the network structure is very unstable. At the same time, the intermediate representation is easily interpreted as the information compression encoding of the focus stack, resulting in Models have difficulty converging, so additional losses often need to be introduced to constrain the intermediate representation.

辅助监督则是在无监督情况下,通过一些辅助信息来指导网络的学习过程,采用全对焦图像作为辅助的监督信息,该方法首先将焦点堆栈输入编解码器结构中,输出各对焦距离下的深度分布概率,并将其分别与焦点堆栈和对焦距离相结合,输出全对焦图像的同时也能得到相对粗糙的深度图。然而,该模型存在一定的局限性,如参数量较大,并且需要数据集本身提供全对焦图像作为监督信息,所以应用限制较大。因此,如何提供一种基于全对焦图像合成的无监督焦点堆栈深度估计方法是本领域技术人员亟须解决的问题。Auxiliary supervision uses some auxiliary information to guide the learning process of the network under unsupervised conditions, using full-focus images as auxiliary supervision information. This method first inputs the focus stack into the codec structure and outputs the image at each focus distance. Depth distribution probability, and combining it with the focus stack and focus distance respectively, can output a fully focused image while also obtaining a relatively rough depth map. However, this model has certain limitations, such as a large number of parameters and the need for the data set itself to provide full-focus images as supervision information, so its application is severely limited. Therefore, how to provide an unsupervised focus stack depth estimation method based on full-focus image synthesis is an urgent problem that those skilled in the art need to solve.

发明内容Contents of the invention

本发明的一个目的在于提出一种基于全对焦图像合成的无监督焦点堆栈深度估计方法,本发明在深度预测方面表现出相对高的准确性和良好的泛化性能,适用于不同场景下的深度估计任务,具有很高的实用性。One purpose of the present invention is to propose an unsupervised focus stack depth estimation method based on full-focus image synthesis. The present invention shows relatively high accuracy and good generalization performance in depth prediction, and is suitable for depth in different scenarios. Estimation tasks, with high practicality.

根据本发明实施例的一种基于全对焦图像合成的无监督焦点堆栈深度估计方法,包括:An unsupervised focus stack depth estimation method based on full-focus image synthesis according to an embodiment of the present invention includes:

S1、利用基于图像金字塔的全对焦图像合成方法和基于焦点测量算子的全对焦图像合成方法进行全对焦图像计算,得到对应的全对焦图像,将得到的全对焦图像进行融合并作为监督信息;S1. Use the full-focus image synthesis method based on the image pyramid and the full-focus image synthesis method based on the focus measurement operator to calculate the full-focus image, obtain the corresponding full-focus image, and fuse the obtained full-focus images as supervision information;

S2、通过三维感知模块对焦点堆栈进行高频噪声过滤和初步特征提取得到初提取特征,同时焦点堆栈经过差分值计算模块得到编码了模糊歧义性的特征,将初提取特征和模糊歧义性特征进行级联,即得到焦点体;S2. Use the three-dimensional perception module to perform high-frequency noise filtering and preliminary feature extraction on the focus stack to obtain the initial extraction features. At the same time, the focus stack obtains features encoding fuzzy ambiguity through the difference value calculation module. The initial extraction features and fuzzy ambiguity features are processed. Cascade, that is, the focus body is obtained;

S3、将三维极化自注意力机制引入焦点堆栈中,将输入特征焦点体分为通道极化特征图和空间极化特征图;S3. Introduce the three-dimensional polarization self-attention mechanism into the focus stack, and divide the input feature focus volume into a channel polarization feature map and a spatial polarization feature map;

S4、上述的通道极化特征图和空间极化特征图经过深度概率预测模块定位焦点堆栈最大清晰度所在的层次,并输出对应的概率值,确定最佳清晰度所在的层次,获得全对焦图像。S4. The above-mentioned channel polarization feature map and spatial polarization feature map use the depth probability prediction module to locate the level where the maximum sharpness of the focus stack is located, and output the corresponding probability value to determine the level where the best clarity is located, and obtain a fully focused image. .

可选的,所述图像金字塔具体包括:Optionally, the image pyramid specifically includes:

高斯金字塔下采样,以原图像表示高斯金字塔的最底层,其分辨率为/>,通过定义第i层的高斯金字塔:Gaussian pyramid downsampling to the original image Represents the bottom layer of the Gaussian pyramid, and its resolution is/> , by defining the i-th layer of Gaussian pyramid:

; ;

其中,其中,表示卷积操作,/>表示大小为/>的卷积核,/>表示去除输入图像的偶数行和偶数列的下采样过程;Among them, among them, Represents the convolution operation,/> Indicates that the size is/> The convolution kernel,/> Represents the downsampling process of removing even rows and even columns of the input image;

下采样将输入图像的分辨率降低为四分之一,通过不断迭代上述步骤,得到整个高斯金字塔;Downsampling will be the resolution of the input image Reduce it to a quarter, and obtain the entire Gaussian pyramid by continuously iterating the above steps;

高斯金字塔上采样,将原图像在每个方向上扩大为原来的两倍,新增的行和列以0填充,使用与先前相同的卷积核乘以四与放大后的图像进行卷积,得到重建后的图像;Gaussian pyramid upsampling, the original image Expand to twice the original size in each direction, fill the new rows and columns with 0, use the same convolution kernel multiplied by four as before to convolve with the enlarged image to obtain the reconstructed image;

重建后的图像内引入拉普拉斯金字塔,设表示拉普拉斯金字塔的第/>层:The Laplacian pyramid is introduced into the reconstructed image, assuming Represents the third of Laplace's Pyramid/> layer:

; ;

其中,表示上采样过程,即将图像在每个方向上扩大为原来的两倍,新增的行和列以0填充;in, Represents the upsampling process, that is, the image is expanded to twice the original size in each direction, and the new rows and columns are filled with 0;

原图像被分解为高斯金字塔和拉普拉斯金字塔,对于焦点堆栈中的每一张图像,执行相同的分解操作,得到一组图像金字塔。original image is decomposed into a Gaussian pyramid and a Laplacian pyramid, and for each image in the focus stack, the same decomposition operation is performed to obtain a set of image pyramids.

可选的,所述图像金字塔的的融合过程具体包括:Optionally, the image pyramid fusion process specifically includes:

给定焦点堆栈序列:Given a focus stack sequence:

;

其中,表示像素点的空间坐标,/>表示对焦序列的数量,每一张图片都和特定的对焦距离相对应;in, Represents the spatial coordinates of the pixel point,/> Indicates the number of focus sequences, each picture corresponds to a specific focus distance;

对焦点堆栈进行图像金字塔分解,得到高斯金字塔/>和拉普拉斯金字塔/>,其中,/>代表金字塔的层数;focus stack Perform image pyramid decomposition to obtain Gaussian pyramid/> and Laplace's Pyramid/> , where,/> Represents the number of layers of the pyramid;

对拉普拉斯金字塔的每一个位置/>进行焦点测量,获取最大清晰度对应的索引图/>,/> 由索引图和拉普拉斯金字塔生成:Pyramid of Laplace every position/> Perform focus measurement to obtain the index image corresponding to the maximum clarity/> ,/> Generated from index graph and Laplacian pyramid:

利用对全对焦拉普拉斯金字塔/>自上而下地进行上采样,得到焦点堆栈对应的全对焦图像。use Full focus on Laplacian Pyramid/> Upsampling is performed from top to bottom to obtain a fully focused image corresponding to the focus stack.

可选的,所述基于图像金字塔的全对焦图像合成方法具体包括对输入的焦点堆栈进行图像金字塔分解,得到高斯金字塔/>和拉普拉斯金字塔/>,对拉普拉斯金字塔/>进行区域信息熵计算,得到每一层的焦点测量清晰度度量值,提取清晰度度量值最大的一层作为对应层的全对焦图像,重建得到最终的全对焦图像。Optionally, the all-focus image synthesis method based on image pyramid specifically includes the input focus stack Perform image pyramid decomposition to obtain Gaussian pyramid/> and Laplace's Pyramid/> , to Laplace's Pyramid/> The regional information entropy is calculated to obtain the focus measurement sharpness measurement value of each layer. The layer with the largest sharpness measurement value is extracted as the full-focus image of the corresponding layer, and the final full-focus image is reconstructed.

可选的,所述基于焦点测量算子的全对焦图像合成方法包括将小区域邻域融合算子应用到各个对焦序列上得到各个焦点图像的焦点测量清晰度度量值,进行索引最大化确定最佳清晰度对应的索引,根据索引提取焦点堆栈中像素值作为全对焦图像。Optionally, the full-focus image synthesis method based on focus measurement operators includes applying small-area neighborhood fusion operators to each focus sequence. The focus measurement sharpness metric value of each focus image is obtained, the index is maximized to determine the index corresponding to the best sharpness, and the pixel value in the focus stack is extracted according to the index as a fully focused image.

可选的,所述基于焦点测量算子的全对焦图像合成方法具体包括:Optionally, the full-focus image synthesis method based on the focus measurement operator specifically includes:

通过向量运算将向量值图像转换为标量值图像获得综合特征:Comprehensive features are obtained by converting vector-valued images into scalar-valued images through vector operations:

表示向量值像素,/>表示标量值像素,选取向量值图像中的小块尺寸/>,使/>为中心向量值像素,/>为窗口/>内的向量值像素;set up Represents vector-valued pixels, /> Represents a scalar-valued pixel, selecting a patch size in a vector-valued image/> , make/> is the center vector value pixel,/> for window/> vector value pixels within;

其中,向量值像素对应的标量值像素/>通过缩放窗口内差分向量长度得到;where, the vector value pixel Corresponding scalar value pixel/> Obtained by scaling the length of the difference vector within the window;

计算窗口内其他向量/>与中心向量/>之差得到差分向量/>calculation window Other vectors within/> with center vector/> The difference is the difference vector/> :

; ;

; ;

; ;

其中,表示结果向量的点积形成的标量值,/>表示一个局部的自适应缩放因子;in, Represents a scalar value formed by the dot product of the result vectors, /> Represents a local adaptive scaling factor;

; ;

其中,计算差分向量之间的点积,用来衡量特征间的相似性,提供差分向量/>和中心向量/>之间的叉积长度;in, Calculate the dot product between difference vectors to measure the similarity between features, Provides differential vectors/> and center vector/> The cross product length between;

将得到的标量值图像应用于索引最大化操作,以评估图像的清晰度,根据最佳清晰度所在的索引从输入的焦点堆栈中提取相应位置的像素值,得到相应的全对焦图像。The resulting scalar value image is applied to the index maximization operation to evaluate the sharpness of the image, and the pixel value at the corresponding position is extracted from the input focus stack according to the index where the best sharpness is located, and the corresponding fully focused image is obtained.

可选的,所述三维感知模块通过一个四层的网络结构完成焦点堆栈的高频噪声过滤和初步特征提取,所述三维感知模块包括多个具有不同的卷积核大小和步长的并行卷积层,用于捕捉不同尺度上的模糊特征;Optionally, the three-dimensional perception module completes the high-frequency noise filtering and preliminary feature extraction of the focus stack through a four-layer network structure. The three-dimensional perception module includes multiple parallel convolutions with different convolution kernel sizes and step sizes. Accumulated layers are used to capture fuzzy features at different scales;

所述S2具体包括:The S2 specifically includes:

S21、使用一个3D卷积网络对焦点堆栈进行过滤,提取模糊特征;S21. Use a 3D convolution network to filter the focus stack and extract blur features;

S22、在网络结构中引入一个差分值计算模块,将模糊特征输入差分值计算模块中,差分值计算模块计算RGB三通道的差分值:S22. Introduce a differential value calculation module into the network structure, input the fuzzy features into the differential value calculation module, and the differential value calculation module calculates the differential values of the three RGB channels:

; ;

其中,表示融合后的RGB通道差分,/>代表输入特征的不同颜色维度;in, Represents the RGB channel difference after fusion,/> Represents different color dimensions of input features;

S23、经过一个下采样层得到RGB差分特征,RGB差分特征与模糊特征进行融合,构建出融合了模糊歧义性的焦点体。S23. Obtain the RGB differential features through a down-sampling layer, and fuse the RGB differential features with the fuzzy features to construct a focus volume that combines fuzzy ambiguity.

可选的,所述通道极化特征图通过对输入的特征图x进行极化变换得到:Optionally, the channel polarization feature map is obtained by performing polarization transformation on the input feature map x:

极化变换将输入的特征图x转化为两组基向量和/>Polarization transformation converts the input feature map x into two sets of basis vectors and/> ;

其中,和/>对应通道层面的查询和键;in, and/> Corresponds to channel-level queries and keys;

计算和/>的相似度得分/>calculate and/> similarity score/> :

; ;

其中,表示激活函数,/>表示归一化指数函数,/>、/>和/>分别表示1×1的三维卷积层,/>和/>表示两个张量重塑操作符,×表示元素级别的乘法操作,/>和/>与/>之间的通道数为/>in, Represents the activation function,/> Represents a normalized exponential function,/> ,/> and/> Represents a 1×1 three-dimensional convolution layer respectively, /> and/> Represents two tensor reshaping operators, × represents element-level multiplication operations, /> and/> with/> The number of channels between is/> ;

用得分作为权重,对输入向量进行加权求和,得到获得了通道关联的通道极化特征图/>:Use score As a weight, the input vectors are weighted and summed to obtain the channel polarization feature map that obtains the channel correlation/> :

; ;

其中,表示通道级乘法运算符。in, Represents a channel-level multiplication operator.

可选的,所述空间极化特征图方法包括:Optionally, the spatial polarization feature map method includes:

将输入的通道极化特征图进行极化变化,得到两组极化向向量和/>Convert the input channel polarization feature map to Perform polarization changes to obtain two sets of polarization vectors and/> ;

其中,通过对三通道进行全局池化以获取全局空间特征,/>通过三维卷积将输入特征图中的像素进行重新排列增强空间不同方向上的特征;in, Global pooling is performed on three channels to obtain global spatial features,/> Rearrange the pixels in the input feature map through three-dimensional convolution to enhance features in different directions in space;

通过两组极化向量计算相似度矩阵Calculate the similarity matrix from two sets of polarization vectors :

; ;

其中,分别表示标准的1×1三维卷积层,表示通道卷积的中间参数,,×表示矩阵点乘操作,表示全局池化; in, and Represents the standard 1×1 three-dimensional convolution layer respectively, Represents the intermediate parameters of channel convolution, , and , × represents the matrix dot multiplication operation, Represents global pooling;

通过相似度矩阵来获取对应的权重,将权重与输入的通道极化特征进行加权求和,得到关联了通道和空间特征的综合自注意力特征表示The corresponding weights are obtained through the similarity matrix, and the weights are weighted and summed with the input channel polarization features to obtain a comprehensive self-attention feature representation that associates channel and spatial features. ;

; ;

其中 表示空间乘法运算符。in Represents the spatial multiplication operator.

可选的,所述S4具体包括:Optionally, the S4 specifically includes:

S41、经过一个去掉池化层的编解码器网络后,将焦点堆栈深度估计网络的输出分为多个层次,每个层次对应一个特定的对焦距离;S41. After passing through a codec network with the pooling layer removed, the output of the focus stack depth estimation network is divided into multiple levels, each level corresponding to a specific focus distance;

S42、在层次间应用操作确定最佳清晰度所在的层次,得到最佳对焦位置,获得全对焦图像;S42. Apply between levels The operation determines the level where the best clarity is, obtains the best focus position, and obtains a fully focused image;

S43、使用多层概率值加权求和的方式得到最终的深度估计结果。S43. Use the weighted summation of multi-layer probability values to obtain the final depth estimation result.

本发明的有益效果是:The beneficial effects of the present invention are:

本发明首先合成全对焦图像并将其用作监督信息,然后通过特征粗提取模块、极化自注意力模块和分层深度估计模块进行深度估计。使用焦点堆栈合成全对焦图像用作监督信息并利用自注意力机制的关联能力来获取场景深度,使得本发明在深度预测方面表现出相对高的准确性和良好的泛化性能,适用于不同场景下的深度估计任务,具有很高的实用性。The present invention first synthesizes a full-focus image and uses it as supervision information, and then performs depth estimation through a feature rough extraction module, a polarization self-attention module and a hierarchical depth estimation module. Using focus stacks to synthesize fully focused images is used as supervision information and utilizes the correlation ability of the self-attention mechanism to obtain scene depth, so that the present invention shows relatively high accuracy and good generalization performance in depth prediction, and is suitable for different scenes It is highly practical for depth estimation tasks.

附图说明Description of the drawings

附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。在附图中:The drawings are used to provide a further understanding of the present invention and constitute a part of the specification. They are used to explain the present invention together with the embodiments of the present invention and do not constitute a limitation of the present invention. In the attached picture:

图1为本发明提出的一种基于全对焦图像合成的无监督焦点堆栈深度估计方法中无监督焦点堆栈深度估计模型;Figure 1 is an unsupervised focus stack depth estimation model in an unsupervised focus stack depth estimation method based on full-focus image synthesis proposed by the present invention;

图2为本发明提出的一种基于全对焦图像合成的无监督焦点堆栈深度估计方法中焦点测量清晰度度量值的结构框图;Figure 2 is a structural block diagram of the focus measurement sharpness measurement value in an unsupervised focus stack depth estimation method based on full-focus image synthesis proposed by the present invention;

图3为本发明提出的一种基于全对焦图像合成的无监督焦点堆栈深度估计方法中全对焦图像合成定性对比图;Figure 3 is a qualitative comparison diagram of full-focus image synthesis in an unsupervised focus stack depth estimation method based on full-focus image synthesis proposed by the present invention;

图4为本发明提出的一种基于全对焦图像合成的无监督焦点堆栈深度估计方法中三维感知模块的结构框图;Figure 4 is a structural block diagram of a three-dimensional perception module in an unsupervised focus stack depth estimation method based on full-focus image synthesis proposed by the present invention;

图5为本发明提出的一种基于全对焦图像合成的无监督焦点堆栈深度估计方法中通道差分模块的结构框图;Figure 5 is a structural block diagram of a channel difference module in an unsupervised focus stack depth estimation method based on full-focus image synthesis proposed by the present invention;

图6为本发明提出的一种基于全对焦图像合成的无监督焦点堆栈深度估计方法中DefocusNet上泛化性能可视化对比图;Figure 6 is a visual comparison chart of the generalization performance on DefocusNet in an unsupervised focus stack depth estimation method based on full-focus image synthesis proposed by the present invention;

图7为本发明提出的一种基于全对焦图像合成的无监督焦点堆栈深度估计方法中MobileDepth上泛化性能可视化对比图。Figure 7 is a visual comparison chart of generalization performance on MobileDepth in an unsupervised focus stack depth estimation method based on full-focus image synthesis proposed by the present invention.

具体实施方式Detailed ways

现在结合附图对本发明做进一步详细的说明。这些附图均为简化的示意图,仅以示意方式说明本发明的基本结构,因此其仅显示与本发明有关的构成。The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams that only illustrate the basic structure of the present invention in a schematic manner, and therefore only show the structures related to the present invention.

参考图1,一种基于全对焦图像合成的无监督焦点堆栈深度估计方法,包括:Referring to Figure 1, an unsupervised focus stack depth estimation method based on full-focus image synthesis includes:

S1、利用基于图像金字塔的全对焦图像合成方法和基于焦点测量算子的全对焦图像合成方法进行全对焦图像计算,得到对应的全对焦图像,将得到的全对焦图像进行融合并作为监督信息;S1. Use the full-focus image synthesis method based on the image pyramid and the full-focus image synthesis method based on the focus measurement operator to calculate the full-focus image, obtain the corresponding full-focus image, and fuse the obtained full-focus images as supervision information;

参考图2,本实施方式中展示了两种方法的合成全对焦图像的过程。Referring to Figure 2, this embodiment shows the process of synthesizing a full-focus image using two methods.

图中的表示对焦序列,高斯金字塔下采样,以原图像/>表示高斯金字塔的最底层,其分辨率为/>,通过定义第i层的高斯金字塔:in the picture Represents the focus sequence, Gaussian pyramid downsampling, using the original image/> Represents the bottom layer of the Gaussian pyramid, and its resolution is/> , by defining the i-th layer of Gaussian pyramid:

; ;

其中,表示卷积操作,/>表示大小为/>的卷积核,/>表示去除输入图像的偶数行和偶数列的下采样过程;in, Represents the convolution operation,/> Indicates that the size is/> The convolution kernel,/> Represents the downsampling process of removing even rows and even columns of the input image;

下采样将输入图像的分辨率降低为四分之一,通过不断迭代上述步骤,得到整个高斯金字塔;Downsampling will be the resolution of the input image Reduce it to a quarter, and obtain the entire Gaussian pyramid by continuously iterating the above steps;

; ;

其中,表示上采样过程,即将图像在每个方向上扩大为原来的两倍,新增的行和列以0填充;in, Represents the upsampling process, that is, the image is expanded to twice the original size in each direction, and the new rows and columns are filled with 0;

原图像被分解为高斯金字塔和拉普拉斯金字塔,对于焦点堆栈中的每一张图像,执行相同的分解操作,得到一组图像金字塔。original image is decomposed into a Gaussian pyramid and a Laplacian pyramid, and for each image in the focus stack, the same decomposition operation is performed to obtain a set of image pyramids.

本实施方式中,图像金字塔的的融合过程具体包括:In this implementation, the image pyramid fusion process specifically includes:

给定焦点堆栈序列:Given a focus stack sequence:

;

其中,表示像素点的空间坐标,/>表示对焦序列的数量,每一张图片都和特定的对焦距离相对应;in, Represents the spatial coordinates of the pixel point,/> Indicates the number of focus sequences, each picture corresponds to a specific focus distance;

对焦点堆栈进行图像金字塔分解,得到高斯金字塔/>和拉普拉斯金字塔/>,其中,/>代表金字塔的层数;focus stack Perform image pyramid decomposition to obtain Gaussian pyramid/> and Laplace's Pyramid/> , where,/> Represents the number of layers of the pyramid;

对拉普拉斯金字塔的每一个位置/>进行焦点测量,获取最大清晰度对应的索引图/>,全对焦拉普拉斯金字塔/>由索引图和拉普拉斯金字塔生成:/>;Pyramid of Laplace every position/> Perform focus measurement to obtain the index image corresponding to the maximum clarity/> , full focus Laplacian Pyramid/> Generated from index graph and Laplacian pyramid: /> ;

利用对全对焦拉普拉斯金字塔/>自上而下地进行上采样,得到焦点堆栈对应的全对焦图像。use Full focus on Laplacian Pyramid/> Upsampling is performed from top to bottom to obtain a fully focused image corresponding to the focus stack.

本实施方式中,基于图像金字塔的全对焦图像合成方法具体包括对输入的焦点堆栈进行分解,得到高斯金字塔/>和拉普拉斯金字塔/>,由于整个分解过程完全可逆,所以此图像变换方法没有信息损失,对拉普拉斯金字塔/>进行区域信息熵计算,得到每一层的焦点测量清晰度度量值,提取清晰度度量值最大的一层作为对应层的全对焦图像,重建得到最终的全对焦图像。In this embodiment, the full-focus image synthesis method based on image pyramid specifically includes the input focus stack Decompose and obtain Gaussian pyramid/> and Laplace's Pyramid/> , since the entire decomposition process is completely reversible, there is no information loss in this image transformation method, for the Laplacian pyramid/> The regional information entropy is calculated to obtain the focus measurement sharpness measurement value of each layer. The layer with the largest sharpness measurement value is extracted as the full-focus image of the corresponding layer, and the final full-focus image is reconstructed.

本实施方式中,基于焦点测量算子的全对焦图像合成方法包括将小区域邻域融合算子应用到各个对焦序列上得到各个焦点图像的焦点测量清晰度度量值,进行索引最大化确定最佳清晰度对应的索引,根据索引提取焦点堆栈中像素值作为全对焦图像。In this embodiment, the full-focus image synthesis method based on the focus measurement operator includes applying the small-area neighborhood fusion operator to each focus sequence. The focus measurement sharpness metric value of each focus image is obtained, the index is maximized to determine the index corresponding to the best sharpness, and the pixel value in the focus stack is extracted according to the index as a fully focused image.

本发明基于图像金字塔和小窗口融合算子的全对焦图像融合算法能够合成高质量的全对焦图像。提出的模型利用全局关联结构有效地提升了深度预测的精度,同时轻量化的设计使模型具备实时推理能力。The full-focus image fusion algorithm of the present invention based on image pyramid and small window fusion operator can synthesize high-quality full-focus images. The proposed model uses the global correlation structure to effectively improve the accuracy of depth prediction, and its lightweight design enables the model to have real-time reasoning capabilities.

参考图3,本实施方式中,基于焦点测量算子的全对焦图像合成方法具体包括:Referring to Figure 3, in this embodiment, the full-focus image synthesis method based on the focus measurement operator specifically includes:

通过向量运算将向量值图像转换为标量值图像获得综合特征:Comprehensive features are obtained by converting vector-valued images into scalar-valued images through vector operations:

表示向量值像素,/>表示标量值像素,选取向量值图像中的小块尺寸/>,使/>为中心向量值像素,/>为窗口/>内的向量值像素;set up Represents vector-valued pixels, /> Represents a scalar-valued pixel, selecting a patch size in a vector-valued image/> , make/> is the center vector value pixel,/> for window/> vector value pixels within;

其中,向量值像素对应的标量值像素/>通过缩放窗口内差分向量长度得到;where, the vector value pixel Corresponding scalar value pixel/> Obtained by scaling the length of the difference vector within the window;

计算窗口内其他向量/>与中心向量/>之差得到差分向量/>calculation window Other vectors within/> with center vector/> The difference is the difference vector/> :

; ;

; ;

; ;

其中,表示结果向量的点积形成的标量值,/>表示一个局部的自适应缩放因子,/>在计算标量特征图像上扮演着重要角色;in, Represents a scalar value formed by the dot product of the result vectors, /> Represents a local adaptive scaling factor, /> Plays an important role in calculating scalar feature images;

; ;

其中,计算差分向量之间的点积,用来衡量特征间的相似性,提供差分向量/>和中心向量/>之间的叉积长度;in, Calculate the dot product between difference vectors to measure the similarity between features, Provides differential vectors/> and center vector/> The cross product length between;

将得到的标量值图像应用于索引最大化操作,以评估图像的清晰度,根据最佳清晰度所在的索引从输入的焦点堆栈中提取相应位置的像素值,得到相应的全对焦图像,依据此方法,可以从对焦序列合成高质量的全对焦图像。The resulting scalar value image is applied to the index maximization operation to evaluate the sharpness of the image, and the pixel value at the corresponding position is extracted from the input focus stack according to the index where the best sharpness is located, and the corresponding fully focused image is obtained according to With this method, high-quality all-focus images can be synthesized from focus sequences.

S2、通过三维感知模块对焦点堆栈进行高频噪声过滤和初步特征提取得到初提取特征,同时焦点堆栈经过差分值计算模块得到编码了模糊歧义性的特征,将初提取特征和模糊歧义性特征进行级联,即得到焦点体;S2. Use the three-dimensional perception module to perform high-frequency noise filtering and preliminary feature extraction on the focus stack to obtain the initial extraction features. At the same time, the focus stack obtains features encoding fuzzy ambiguity through the difference value calculation module. The initial extraction features and fuzzy ambiguity features are processed. Cascade, that is, the focus body is obtained;

本实施方式中,三维感知模块通过一个四层的网络结构完成焦点堆栈的高频噪声过滤和初步特征提取,三维感知模块包括多个具有不同的卷积核大小和步长的并行卷积层,用于捕捉不同尺度上的模糊特征;In this implementation, the three-dimensional perception module completes high-frequency noise filtering and preliminary feature extraction of the focus stack through a four-layer network structure. The three-dimensional perception module includes multiple parallel convolution layers with different convolution kernel sizes and step sizes. Used to capture fuzzy features at different scales;

参考图4,S2具体包括:Referring to Figure 4, S2 specifically includes:

S21、使用一个3D卷积网络对焦点堆栈进行过滤,提取模糊特征;S21. Use a 3D convolution network to filter the focus stack and extract blur features;

S22、在网络结构中引入一个差分值计算模块,将模糊特征输入差分值计算模块中,差分值计算模块计算RGB三通道的差分值:S22. Introduce a differential value calculation module into the network structure, input the fuzzy features into the differential value calculation module, and the differential value calculation module calculates the differential values of the three RGB channels:

; ;

其中,表示融合后的 RGB通道差分,代表输入特征的不同颜色维度;in, Represents the fused RGB channel difference, representing the different color dimensions of the input features;

S23、经过一个下采样层得到RGB差分特征,RGB差分特征与模糊特征进行融合,构建出融合了模糊歧义性的焦点体。S23. Obtain the RGB differential features through a down-sampling layer, and fuse the RGB differential features with the fuzzy features to construct a focus volume that combines fuzzy ambiguity.

S3、将三维极化自注意力机制引入焦点堆栈中,将输入特征焦点体分为通道极化特征图和空间极化特征图;S3. Introduce the three-dimensional polarization self-attention mechanism into the focus stack, and divide the input feature focus volume into a channel polarization feature map and a spatial polarization feature map;

本实施方式中,通道极化特征图通过对输入的特征图x进行极化变换得到:In this implementation, the channel polarization feature map is obtained by performing polarization transformation on the input feature map x:

极化变换将输入的特征图x转化为两组基向量和/>Polarization transformation converts the input feature map x into two sets of basis vectors and/> ;

其中,和/>对应通道层面的查询和键;in, and/> Corresponds to channel-level queries and keys;

计算和/>的相似度得分/>calculate and/> similarity score/> :

; ;

其中,表示激活函数,/>表示归一化指数函数,/>、/>和/>分别表示1×1的三维卷积层,/>和/>表示两个张量重塑操作符,×表示元素级别的乘法操作,/>和/>与/>之间的通道数为/>in, Represents the activation function,/> Represents a normalized exponential function,/> ,/> and/> Represents a 1×1 three-dimensional convolution layer respectively, /> and/> Represents two tensor reshaping operators, × represents element-level multiplication operations, /> and/> with/> The number of channels between is/> ;

用得分作为权重,对输入向量进行加权求和,得到获得了通道关联的通道极化特征图/>Use score As a weight, the input vectors are weighted and summed to obtain the channel polarization feature map that obtains the channel correlation/> :

; ;

其中,表示通道级乘法运算符。in, Represents a channel-level multiplication operator.

本实施方式中,空间极化特征图方法包括:In this implementation, the spatial polarization feature map method includes:

将输入的通道极化特征图进行极化变化,得到两组极化向量和/>Convert the input channel polarization feature map to Perform polarization changes to obtain two sets of polarization vectors and/> ;

其中,通过对三通道进行全局池化以获取全局空间特征,/>通过三维卷积将输入特征图中的像素进行重新排列增强空间不同方向上的特征;in, Global pooling is performed on three channels to obtain global spatial features,/> Rearrange the pixels in the input feature map through three-dimensional convolution to enhance features in different directions in space;

通过两组极化向量计算相似度矩阵Calculate the similarity matrix from two sets of polarization vectors :

; ;

其中,和/>分别表示标准的1×1三维卷积层,/>表示通道卷积的中间参数,、/>和/>表示三个张量重塑操作,×表示矩阵点乘操作,/>表示全局池化;in, and/> Represents the standard 1×1 three-dimensional convolution layer respectively,/> Represents the intermediate parameters of channel convolution, ,/> and/> represents three tensor reshaping operations, × represents the matrix dot multiplication operation, /> Represents global pooling;

通过相似度矩阵来获取对应的权重,将权重与输入的通道极化特征进行加权求和,得到关联了通道和空间特征的综合自注意力特征表示;The corresponding weights are obtained through the similarity matrix, and the weights are weighted and summed with the input channel polarization features to obtain a comprehensive self-attention feature representation that associates channel and spatial features. ; ;

其中 表示空间乘法运算符。in Represents the spatial multiplication operator.

需要注意的是,上述所有的卷积操作和张量重塑操作都是在三个通道维度上进行的,因此,三维极化自注意力机制可以同时考虑通道关联性和空间模糊关联性。It should be noted that all the above convolution operations and tensor reshaping operations are performed in three channel dimensions. Therefore, the three-dimensional polarization self-attention mechanism can consider both channel correlation and spatial fuzzy correlation.

本发明提出的模型在较小的焦点堆栈上表现出良好的性能,同时具有优秀的泛化能力。The model proposed in this invention shows good performance on smaller focus stacks while having excellent generalization capabilities.

S4、上述的通道极化特征图和空间极化特征图经过深度概率预测模块定位焦点堆栈最大清晰度所在的层次,并输出对应的概率值,确定最佳清晰度所在的层次,获得全对焦图像。S4. The above-mentioned channel polarization feature map and spatial polarization feature map use the depth probability prediction module to locate the level where the maximum sharpness of the focus stack is located, and output the corresponding probability value to determine the level where the best clarity is located, and obtain a fully focused image. .

本实施方式中,S4具体包括:In this implementation, S4 specifically includes:

S41、经过一个去掉池化层的编解码器网络后,将焦点堆栈深度估计网络的输出分为多个层次,每个层次对应一个特定的对焦距离;S41. After passing through a codec network with the pooling layer removed, the output of the focus stack depth estimation network is divided into multiple levels, each level corresponding to a specific focus distance;

S42、在层次间应用Softmax操作确定最佳清晰度所在的层次,得到最佳对焦位置,获得全对焦图像;S42. Apply Softmax operation between layers to determine the layer with the best clarity, obtain the best focus position, and obtain a fully focused image;

在测试时,利用输入对焦序列中的模糊信息,确定目标深度所在的层次,并利用对应层次的概率密度函数计算深度概率值。During testing, the blur information in the input focus sequence is used to determine the level at which the target depth is located, and the probability density function of the corresponding level is used to calculate the depth probability value.

S43、使用多层概率值加权求和的方式得到最终的深度估计结果。S43. Use the weighted summation of multi-layer probability values to obtain the final depth estimation result.

在实施例1:In Example 1:

本发明在4D Light Field、DefocusNet和FlyingThings3D数据集上进行了量化:This invention is quantified on 4D Light Field, DefocusNet and FlyingThings3D datasets:

由上表1可以看出,提出的全对焦图像合成方法,可以从较小的焦点堆栈中合成比较精确的全对焦图像。As can be seen from Table 1 above, the proposed full-focus image synthesis method can synthesize a more accurate full-focus image from a smaller focus stack.

上表2-表4是本发明在 4D Light Field、DefocusNet和FlyingThings3D数据集上与最新的方法进行了量化对比结果。The above Tables 2 to 4 are the quantitative comparison results between the present invention and the latest methods on the 4D Light Field, DefocusNet and FlyingThings3D data sets.

由上表1-表4可以看出,在4D Light Field数据集上的结果表明,本发明在无监督深度估计中比 AiFDepthNet 方法MSE和RMSE指标上分别提升了42.5%和26.3%。与有监督方法的对比中,本方法超越了包括VDFF、PSPNet、DDFF在内的大部分有监督方法,即使与DefocusNet方法相比,在MSE和RMSE上的性能仅相差15.0%和4.6%。在DefocusNet数据集和FlyingThings3D数据集上的结果显示,相对于AiFDepthNet方法,本方法在 MAE、MSE、RMSE指标上均取得更高的精度。与AiFDepthNet方法16M参数量相比,本方法的参数量也更小,为3.3M,具有更高的计算效率。As can be seen from Tables 1 to 4 above, the results on the 4D Light Field data set show that the present invention improves the MSE and RMSE indicators by 42.5% and 26.3% respectively in unsupervised depth estimation compared to the AiFDepthNet method. In comparison with supervised methods, this method surpasses most supervised methods including VDFF, PSPNet, and DDFF. Even compared with the DefocusNet method, the performance difference in MSE and RMSE is only 15.0% and 4.6%. The results on the DefocusNet data set and FlyingThings3D data set show that compared with the AiFDepthNet method, this method achieves higher accuracy in MAE, MSE, and RMSE indicators. Compared with the 16M parameter size of the AiFDepthNet method, the parameter size of this method is also smaller, 3.3M, and has higher computational efficiency.

本发明首先合成全对焦图像并将其用作监督信息,然后通过特征粗提取模块、极化自注意力模块和分层深度估计模块进行深度估计。使用焦点堆栈合成全对焦图像用作监督信息并利用自注意力机制的关联能力来获取场景深度,使得本发明在深度预测方面表现出相对高的准确性和良好的泛化性能,适用于不同场景下的深度估计任务,具有很高的实用性。The present invention first synthesizes a full-focus image and uses it as supervision information, and then performs depth estimation through a feature rough extraction module, a polarization self-attention module and a hierarchical depth estimation module. Using focus stacks to synthesize fully focused images is used as supervision information and utilizes the correlation ability of the self-attention mechanism to obtain scene depth, so that the present invention shows relatively high accuracy and good generalization performance in depth prediction, and is suitable for different scenes It is highly practical for depth estimation tasks.

以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,根据本发明的技术方案及其发明构思加以等同替换或改变,都应涵盖在本发明的保护范围之内。The above are only preferred specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can, within the technical scope disclosed in the present invention, implement the technical solutions of the present invention. Equivalent substitutions or changes of the inventive concept thereof shall be included in the protection scope of the present invention.

Claims (10)

1. An unsupervised focal stack depth estimation method based on all-focus image synthesis, comprising:
s1, performing all-focus image calculation by using an all-focus image synthesis method based on an image pyramid and an all-focus image synthesis method based on a focus measurement operator to obtain corresponding all-focus images, and fusing the obtained all-focus images to serve as supervision information;
s2, performing high-frequency noise filtration and preliminary feature extraction on the focal stack through a three-dimensional perception module to obtain primary extraction features, simultaneously obtaining coded ambiguity features through a differential value calculation module, and cascading the primary extraction features and the ambiguity features to obtain a focal body;
s3, introducing a three-dimensional polarization self-attention mechanism into a focal stack, and dividing an input characteristic focal body into a channel polarization characteristic diagram and a space polarization characteristic diagram;
s4, positioning the layer where the maximum definition of the focal stack is located through the depth probability prediction module by the channel polarization feature map and the space polarization feature map, outputting a corresponding probability value, determining the layer where the optimal definition is located, and obtaining the full-focusing image.
2. The method for estimating depth of an unsupervised focal stack based on full-focus image synthesis according to claim 1, wherein the image pyramid specifically comprises:
downsampling with Gaussian pyramid to original imageRepresents the bottom layer of the Gaussian pyramid with resolution +.>By defining the gaussian pyramid of the i-th layer:
;
wherein ,representing convolution operations +.>Representing a size of +.>Is a convolution kernel of->A downsampling process that removes even rows and even columns of the input image;
downsampling the resolution of an input imageReducing the height to one fourth, and obtaining the whole Gaussian pyramid by continuously iterating the steps;
gaussian pyramid upsampling to image the originalExpanding twice in each direction, filling the newly added rows and columns with 0, and convolving the newly added rows and columns with the amplified image by multiplying the newly added rows and columns by four by the convolution kernel which is the same as the previous row to obtain a reconstructed image;
introducing Laplacian pyramid into the reconstructed image, and settingRepresents the +.o of the Laplacian pyramid>Layer (c):
;
wherein ,representing the upsampling process, i.e., expanding the image twice as much as it was in each direction, with the newly added rows and columns filled with 0's;
original imageIs decomposed into a gaussian pyramid and a laplacian pyramid, and the same decomposition operation is performed for each image in the focal stack, resulting in a set of image pyramids.
3. The method for estimating the depth of an unsupervised focal stack based on the synthesis of an all-focused image according to claim 2, wherein the process of merging the image pyramids specifically comprises:
given a focal stack sequence:
wherein ,representing the spatial coordinates of the pixel points, +.>Representing the number of focusing sequences, wherein each picture corresponds to a specific focusing distance;
focal stackDecomposing the image pyramid to obtain Gaussian pyramid +.>Laplacian pyramid, wherein ,/>Representing the number of layers of the pyramid;
laplacian pyramidIs +.>Performing focus measurement to obtain index map corresponding to maximum definition +.>,/>The all-focus Laplacian pyramid is generated by an index map and the Laplacian pyramid:;
by means ofFor all-focus Laplacian pyramid->And up-sampling from top to bottom to obtain an all-focusing image corresponding to the focal stack.
4. A non-focusing image synthesis based on claim 3The method for estimating the depth of the supervised focal stack is characterized by specifically comprising the steps of focusing the input focal stackDecomposing to obtain Gaussian pyramid->And Laplacian pyramid->Laplacian pyramidAnd carrying out regional information entropy calculation to obtain a focus measurement definition measurement value of each layer, extracting a layer with the maximum definition measurement value as an all-focusing image of the corresponding layer, and reconstructing to obtain a final all-focusing image.
5. An unsupervised focal stack depth estimation method based on full focus image synthesis according to claim 3, wherein the full focus image synthesis method based on focus measurement operator comprises applying a small region neighborhood fusion operator to each focus sequenceAnd obtaining focus measurement definition values of all focus images, carrying out index maximization to determine an index corresponding to the optimal definition, and extracting pixel values in a focus stack according to the index to serve as an all-focus image.
6. The method for estimating depth of an unsupervised focal stack based on the composition of an all-focused image according to claim 5, wherein the method for composing an all-focused image based on a focus measurement operator specifically comprises:
the vector value image is converted into a scalar value image through vector operation to obtain comprehensive characteristics:
is provided withRepresenting vector value pixels, ">Representing scalar value pixels, selecting a tile size +.>Make->For the center vector value pixel, ">For window->Vector value pixels within;
wherein the vector value pixelsCorresponding scalar value pixels +.>Obtaining by scaling the differential vector length in the window;
computing windowInner other vector->And center vector->The difference results in a difference vector->
;
;
;
wherein ,scalar value representing dot product formation of result vector, < ->Representing a local adaptive scaling factor;
wherein ,the dot product between the differential vectors is calculated to measure the similarity between the features,providing a differential vector->And center vector->Cross-product length between;
and applying the obtained scalar value image to index maximization operation to evaluate the definition of the image, and extracting the pixel value of the corresponding position from the input focal stack according to the index of the optimal definition to obtain the corresponding all-focusing image.
7. The method for estimating depth of an unsupervised focal stack based on full-focus image synthesis according to claim 1, wherein the three-dimensional perception module is configured to complete high-frequency noise filtering and preliminary feature extraction of the focal stack through a four-layer network structure, and the three-dimensional perception module comprises a plurality of parallel convolution layers with different convolution kernel sizes and step sizes for capturing fuzzy features on different scales;
the step S2 specifically comprises the following steps:
s21, filtering the focal stack by using a 3D convolution network to extract fuzzy features;
s22, introducing a differential value calculation module into the network structure, inputting the fuzzy characteristic into the differential value calculation module, and calculating differential values of RGB three channels by the differential value calculation module:
wherein ,representing the fused RGB channel difference, +.>Different color dimensions representing input features;
s23, obtaining RGB differential features through a downsampling layer, fusing the RGB differential features and the fuzzy features, and constructing a focus body fused with fuzzy ambiguity.
8. The method for estimating depth of an unsupervised focal stack based on full-focus image synthesis according to claim 1, wherein the channel polarization feature map is obtained by performing polarization transformation on an input feature map x:
the polarization transformation converts the input feature map x into two sets of basis vectors and />
wherein , and />Query and key corresponding to channel level;
calculation of and />Similarity score +.>
;
wherein ,representing an activation function->Representing normalized exponential function, ++>、/> and />Respectively representing 1 x 1 three-dimensional convolution layers, < >> and />Representing two tensor remodelling operators, x represents multiplication at element level, +.> and />And->The number of channels between is->
Score for useAs weights, the input vectors are weighted and summed to obtain a channel polarization characteristic map of the channel correlation>
;
wherein ,representing a channel-level multiply operator.
9. The method for estimating depth of an unsupervised focal stack based on full focus image synthesis according to claim 8, wherein the method for spatially polarizing the feature map comprises:
to input the channel polarization characteristic diagramPerforming polarization change to obtain two sets of polarization vectors and /> wherein ,/>Global spatial feature acquisition by global pooling of three channels +.>Rearranging pixels in an input feature map through three-dimensional convolution to enhance features in different directions in space;
calculating a similarity matrix from two sets of polarization vectors
;
wherein , and />Respectively representing standard 1 x 1 three-dimensional convolution layers,/->Intermediate parameters representing the convolution of the channel,/-> and />Representing three tensor remodelling operations, x represents matrix dot product operation, +.>Representing global pooling;
corresponding weights are obtained through the similarity matrix, weighted summation is carried out on the weights and the input channel polarization characteristics, and the comprehensive self-attention characteristic representation associated with the channel and the spatial characteristics is obtained
;
wherein Representing the spatial multiplication operator.
10. The method for estimating the depth of an unsupervised focal stack based on the synthesis of an all-in-focus image according to claim 1, wherein S4 specifically comprises:
s41, after passing through a codec network with a pooling layer removed, dividing the output of a focal stack depth estimation network into a plurality of layers, wherein each layer corresponds to a specific focusing distance;
s42, application between layersThe hierarchy where the best definition is located is determined through operation, the best focusing position is obtained, and an all-focusing image is obtained;
s43, obtaining a final depth estimation result by using a multi-layer probability value weighted summation mode.
CN202311101094.7A 2023-08-30 2023-08-30 Unsupervised focal stack depth estimation method based on all-focusing image synthesis Active CN116823914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311101094.7A CN116823914B (en) 2023-08-30 2023-08-30 Unsupervised focal stack depth estimation method based on all-focusing image synthesis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311101094.7A CN116823914B (en) 2023-08-30 2023-08-30 Unsupervised focal stack depth estimation method based on all-focusing image synthesis

Publications (2)

Publication Number Publication Date
CN116823914A true CN116823914A (en) 2023-09-29
CN116823914B CN116823914B (en) 2024-01-09

Family

ID=88141360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311101094.7A Active CN116823914B (en) 2023-08-30 2023-08-30 Unsupervised focal stack depth estimation method based on all-focusing image synthesis

Country Status (1)

Country Link
CN (1) CN116823914B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118570070A (en) * 2024-06-05 2024-08-30 深圳市斯贝达电子有限公司 A super-resolution image enhancement method based on focal stacking

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120218386A1 (en) * 2011-02-28 2012-08-30 Duke University Systems and Methods for Comprehensive Focal Tomography
CN110246172A (en) * 2019-06-18 2019-09-17 首都师范大学 A kind of the light field total focus image extraction method and system of the fusion of two kinds of Depth cues
CN110751160A (en) * 2019-10-30 2020-02-04 华中科技大学 Method, device and system for detecting object in image
CN112465796A (en) * 2020-12-07 2021-03-09 清华大学深圳国际研究生院 Light field feature extraction method fusing focus stack and full-focus image
CN114792430A (en) * 2022-04-24 2022-07-26 深圳市安软慧视科技有限公司 Pedestrian re-identification method, system and related equipment based on polarized self-attention
US20220309696A1 (en) * 2021-03-23 2022-09-29 Mediatek Inc. Methods and Apparatuses of Depth Estimation from Focus Information
CN115830240A (en) * 2022-12-14 2023-03-21 山西大学 An Unsupervised Deep Learning 3D Reconstruction Method Based on Image Fusion Perspective

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120218386A1 (en) * 2011-02-28 2012-08-30 Duke University Systems and Methods for Comprehensive Focal Tomography
CN110246172A (en) * 2019-06-18 2019-09-17 首都师范大学 A kind of the light field total focus image extraction method and system of the fusion of two kinds of Depth cues
CN110751160A (en) * 2019-10-30 2020-02-04 华中科技大学 Method, device and system for detecting object in image
CN112465796A (en) * 2020-12-07 2021-03-09 清华大学深圳国际研究生院 Light field feature extraction method fusing focus stack and full-focus image
US20220309696A1 (en) * 2021-03-23 2022-09-29 Mediatek Inc. Methods and Apparatuses of Depth Estimation from Focus Information
CN114792430A (en) * 2022-04-24 2022-07-26 深圳市安软慧视科技有限公司 Pedestrian re-identification method, system and related equipment based on polarized self-attention
CN115830240A (en) * 2022-12-14 2023-03-21 山西大学 An Unsupervised Deep Learning 3D Reconstruction Method Based on Image Fusion Perspective

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TIAN, B, ET.AL: "Fine-grained multi-focus image fusion based on edge features", 《SCIENTIFIC REPORTS 》, vol. 13, no. 1 *
周萌等: "基于失焦模糊特性的焦点堆栈深度估计方法", 《计算机应用》, pages 2 *
张雪霏: "面向单目深度估计的无监督深度学习模型研究", 《中国优秀硕士论文电子期刊网》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118570070A (en) * 2024-06-05 2024-08-30 深圳市斯贝达电子有限公司 A super-resolution image enhancement method based on focal stacking

Also Published As

Publication number Publication date
CN116823914B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
Lim et al. DSLR: Deep stacked Laplacian restorer for low-light image enhancement
CN111047548B (en) Attitude transformation data processing method and device, computer equipment and storage medium
Li et al. Model-informed multistage unsupervised network for hyperspectral image super-resolution
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN113673590B (en) Rain removal method, system and medium based on multi-scale hourglass densely connected network
CN114187331A (en) Unsupervised optical flow estimation method based on Transformer feature pyramid network
Ghorai et al. Multiple pyramids based image inpainting using local patch statistics and steering kernel feature
CN111861880B (en) Image super-fusion method based on regional information enhancement and block self-attention
CN102722863A (en) A Method for Super-Resolution Reconstruction of Depth Maps Using Autoregressive Models
CN113762147B (en) Facial expression migration method and device, electronic equipment and storage medium
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN115147271A (en) Multi-view information attention interaction network for light field super-resolution
CN113284051A (en) Face super-resolution method based on frequency decomposition multi-attention machine system
CN114638836A (en) An urban streetscape segmentation method based on highly effective driving and multi-level feature fusion
CN112686830A (en) Super-resolution method of single depth map based on image decomposition
CN116823914B (en) Unsupervised focal stack depth estimation method based on all-focusing image synthesis
Talreja et al. XTNSR: Xception-based transformer network for single image super resolution
CN114943646A (en) Gradient weight loss and attention mechanism super-resolution method based on texture guidance
CN112907641B (en) Multi-view depth estimation method based on detail information retention
Tang et al. MFFAGAN: generative adversarial network with multilevel feature fusion attention mechanism for remote sensing image super-resolution
CN113240584A (en) Multitask gesture picture super-resolution method based on picture edge information
CN114049436B (en) Improved cascade structure multi-view stereoscopic reconstruction method
CN114494007B (en) A text-guided natural image super-resolution reconstruction method
Jin et al. Boosting single image super-resolution learnt from implicit multi-image prior
CN114782256A (en) Image reconstruction method, image reconstruction device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant