CN114359041A - Light field image space super-resolution reconstruction method - Google Patents
Light field image space super-resolution reconstruction method Download PDFInfo
- Publication number
- CN114359041A CN114359041A CN202111405987.1A CN202111405987A CN114359041A CN 114359041 A CN114359041 A CN 114359041A CN 202111405987 A CN202111405987 A CN 202111405987A CN 114359041 A CN114359041 A CN 114359041A
- Authority
- CN
- China
- Prior art keywords
- feature maps
- spatial
- output
- residual block
- resolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 130
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 472
- 238000012549 training Methods 0.000 claims description 80
- 230000009466 transformation Effects 0.000 claims description 73
- 238000005070 sampling Methods 0.000 claims description 53
- 238000010586 diagram Methods 0.000 claims description 49
- 230000008521 reorganization Effects 0.000 claims description 43
- 230000006870 function Effects 0.000 claims description 37
- 230000004913 activation Effects 0.000 claims description 34
- 238000011176 pooling Methods 0.000 claims description 22
- 238000005215 recombination Methods 0.000 claims description 21
- 230000006798 recombination Effects 0.000 claims description 21
- 238000012360 testing method Methods 0.000 claims description 16
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 4
- 230000006872 improvement Effects 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 8
- 230000005284 excitation Effects 0.000 claims 1
- 238000003475 lamination Methods 0.000 claims 1
- 238000004804 winding Methods 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 35
- 238000003384 imaging method Methods 0.000 description 9
- 238000013441 quality evaluation Methods 0.000 description 9
- 238000003491 array Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 241000711981 Sais Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000016776 visual perception Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101001013832 Homo sapiens Mitochondrial peptide methionine sulfoxide reductase Proteins 0.000 description 1
- 102100031767 Mitochondrial peptide methionine sulfoxide reductase Human genes 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013100 final test Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000007794 visualization technique Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
本发明公开了一种光场图像空间超分辨率重建方法,其构建空间超分辨率网络,包括编码器、孔径级特征配准模块、光场特征增强模块和解码器等,利用编码器对上采样后的低空间分辨率光场图像、2D高分辨率图像及其模糊后的图像提取多尺度特征;通过孔径级特征配准模块来学习2D高分辨率特征与低分辨率光场特征之间的对应性,以将2D高分辨率特征配准到每个子孔径图像下并形成配准后的高分辨率光场特征;通过光场特征增强模块以利用配准的高分辨率光场特征来增强提取的浅层光场特征,得到增强后的高分辨率光场特征;利用解码器将增强后的高分辨率光场特征重建为高空间分辨率光场图像;优点是能高质量地重建高空间分辨率光场图像,并恢复纹理和细节信息。
The invention discloses a light field image spatial super-resolution reconstruction method, which constructs a spatial super-resolution network, including an encoder, an aperture-level feature registration module, a light field feature enhancement module, a decoder, and the like. Extract multi-scale features from sampled low spatial resolution light field images, 2D high resolution images and their blurred images; learn the relationship between 2D high resolution features and low resolution light field features through aperture-level feature registration module to register the 2D high-resolution features under each sub-aperture image and form the registered high-resolution light field features; use the registered high-resolution light field features through the light field feature enhancement module to Enhance the extracted shallow light field features to obtain enhanced high-resolution light field features; use the decoder to reconstruct the enhanced high-resolution light field features into high spatial resolution light field images; the advantage is that high-quality reconstruction can be achieved High spatial resolution light field images and recover texture and detail information.
Description
技术领域technical field
本发明涉及一种图像超分辨率重建技术,尤其是涉及一种光场图像空间超分辨率重建方法。The invention relates to an image super-resolution reconstruction technology, in particular to a light-field image space super-resolution reconstruction method.
背景技术Background technique
与常规的数码相机不同,光场相机可采集场景中光线的强度(即空间信息)和方向信息(即角度信息),进而更真实地记录真实世界。与之同时,光场相机采集的4维(4-Dimensional,4D)光场图像所蕴含的丰富信息促进了许多应用,如重聚焦、深度估计、虚拟/增强现实等。目前的商业级光场相机采用微透镜阵列来分离通过场景中同一位置点的不同方向的光线,进而在传感器平面同时记录空间信息和角度信息。但是,由于空间和角度维共享的传感器的分辨率有限,采集到的4D光场图像在提供高角度采样(或称高角度分辨率)的同时,它们的空间分辨率不可避免地被降低,因此,提高4D光场图像的空间分辨率成为了光场研究领域里一个亟待解决的重要问题。Different from conventional digital cameras, light field cameras can capture the intensity (ie, spatial information) and direction information (ie, angle information) of light in the scene, thereby recording the real world more realistically. At the same time, the rich information contained in 4-dimensional (4D) light field images captured by light field cameras facilitates many applications, such as refocusing, depth estimation, virtual/augmented reality, and so on. Current commercial-grade light field cameras use a microlens array to separate light from different directions passing through the same point in the scene, thereby simultaneously recording spatial and angular information at the sensor plane. However, due to the limited resolution of the sensor sharing the spatial and angular dimensions, the captured 4D light field images provide high angular sampling (or high angular resolution) while their spatial resolution is inevitably reduced. , improving the spatial resolution of 4D light field images has become an important problem to be solved urgently in the field of light field research.
一般而言,4D光场图像包含多种可相互转换的可视化方法,如基于2维(2-Dimensional,2D)空间信息显示的子孔径图像(Sub-Aperture Image,SAI)阵列、基于2D角度信息显示的微透镜图像(Micro-Lens Image,MLI)阵列,以及联合1维空间信息与1维角度信息显示的极平面图像(Epipolar Plane Image,EPI)等。直观地,提高4D光场图像的空间分辨率即是提高4D光场图像中每幅2D SAI的分辨率。因此,一种直接的做法就是将现有的2D图像超分辨率重建方法,如Haris等人提出的深度反投影网络,Lai等人提出的深度拉普拉斯金字塔网络等,独立地应用于每幅SAI,但是这种做法忽略了4D光场图像嵌入在角度域的信息,并且很难保证超分辨率结果的角度一致性。因此,通过探索4D光场图像的高维结构特性是设计4D光场图像空间超分辨率重建方法的关键。目前针对4D光场图像的空间超分辨率重建方法可大致分为基于优化和基于学习两类。Generally speaking, 4D light field images include a variety of mutually convertible visualization methods, such as sub-aperture image (SAI) arrays based on 2-dimensional (2-Dimensional, 2D) spatial information display, Displayed Micro-Lens Image (MLI) array, and Epipolar Plane Image (EPI) displayed by combining 1-dimensional spatial information and 1-dimensional angular information. Intuitively, improving the spatial resolution of the 4D light field image is to improve the resolution of each 2D SAI in the 4D light field image. Therefore, a straightforward approach is to apply existing 2D image super-resolution reconstruction methods, such as the deep back-projection network proposed by Haris et al., and the deep Laplacian pyramid network proposed by Lai et al. However, this approach ignores the information embedded in the angular domain of the 4D light field image, and it is difficult to guarantee the angular consistency of the super-resolution results. Therefore, exploring the high-dimensional structural properties of 4D light-field images is the key to designing spatial super-resolution reconstruction methods for 4D light-field images. The current spatial super-resolution reconstruction methods for 4D light field images can be roughly divided into two categories: optimization-based and learning-based.
基于优化的方法通常利用估计的视差或深度信息来建模4D光场图像的各SAI之间的关系,进而将4D光场图像空间超分辨率重建表示为优化问题。但是,从低空间分辨率光场图像中推断出的视差或深度信息并不十分可靠,因而基于优化的方法所展现出的性能相当有限。Optimization-based methods usually utilize the estimated disparity or depth information to model the relationship between various SAIs of 4D light-field images, and then formulate 4D light-field image spatial super-resolution reconstruction as an optimization problem. However, disparity or depth information inferred from low spatial resolution light field images is not very reliable, so optimization-based methods exhibit rather limited performance.
基于学习的方法是通过数据驱动的方式来探索4D光场图像的内在高维结构,并以此学习低空间分辨率光场图像和高空间分辨率光场图像之间的非线性映射。例如,Yeung等人利用空间-角度可分离卷积来迭代地利用4D光场图像的空间和角度信息。Wang等人则开发了一个空间-角度交互网络来融合4D光场图像的空间和角度信息。Jin等人提出了一种新颖的融合机制来利用SAI之间的补偿信息,并通过两阶段网络来恢复4D光场图像的视差细节。尽管上述方法在低重建尺度下(如2×)取得了较好的性能,但在大重建尺度下(如8×),仍无法有效恢复足够的纹理和细节信息。这是因为低分辨率光场图像所包含的空间和角度信息有限,而上述方法又只能通过4D光场图像内部的信息来推断由低分辨率而丢失的细节。Boominathan等人提出了一种使用混合输入的光场图像空间超分辨率重建方法,其通过引入一幅额外的高分辨率2D图像作为补充信息来提高4D光场图像的空间分辨率,但该方法中的平均融合机制容易模糊重建结果,并且独立地对每幅SAI进行处理会破坏重建光场图像的视差结构。Learning-based methods explore the intrinsic high-dimensional structure of 4D light-field images in a data-driven manner, and learn the nonlinear mapping between low-spatial-resolution light-field images and high-spatial-resolution light-field images. For example, Yeung et al. exploit spatial-angle separable convolution to iteratively exploit the spatial and angular information of 4D light field images. Wang et al. developed a spatial-angle interaction network to fuse the spatial and angular information of 4D light field images. Jin et al. proposed a novel fusion mechanism to exploit the compensation information between SAIs and recover the parallax details of 4D light-field images through a two-stage network. Although the above methods achieve good performance at low reconstruction scales (such as 2×), they still cannot effectively recover sufficient texture and detail information at large reconstruction scales (such as 8×). This is because low-resolution light-field images contain limited spatial and angular information, and the above methods can only infer details lost by low-resolution from the information inside 4D light-field images. Boominathan et al. proposed a light-field image spatial super-resolution reconstruction method using mixed inputs, which improved the spatial resolution of 4D light-field images by introducing an additional high-resolution 2D image as supplementary information, but this method The average fusion mechanism in SAI tends to blur the reconstruction results, and processing each SAI independently would destroy the parallax structure of the reconstructed light-field image.
综上,虽然目前的相关研究在低重建尺度下已经取得了不错的光场图像空间超分辨率重建效果,但是在处理大重建尺度(如8×)的问题上仍存在一定的不足,特别地,在恢复重建光场图像的高频纹理信息,并避免视觉伪像,以及保留视差结构方面还有一定的改进空间。To sum up, although the current related research has achieved good spatial super-resolution reconstruction effects of light field images at low reconstruction scales, there are still certain deficiencies in dealing with large reconstruction scales (such as 8×). , there is still room for improvement in recovering high-frequency texture information of reconstructed light field images, avoiding visual artifacts, and preserving parallax structure.
发明内容SUMMARY OF THE INVENTION
本发明所要解决的技术问题是提供一种光场图像空间超分辨率重建方法,其联合光场相机与传统2D相机以构成异构式成像系统,光场相机提供了丰富的角度信息和有限的空间信息,而传统2D相机则仅采集了光线的强度信息以获取足够的空间信息,进而可充分利用两者获取的角度信息和空间信息,以高质量地重建高空间分辨率光场图像,并恢复重建光场图像的纹理和细节信息,同时避免由视差带来的鬼影伪像,以及保留视差结构。The technical problem to be solved by the present invention is to provide a light-field image spatial super-resolution reconstruction method, which combines a light-field camera and a traditional 2D camera to form a heterogeneous imaging system. The light-field camera provides rich angle information and limited However, traditional 2D cameras only collect light intensity information to obtain sufficient spatial information, and then make full use of the angle information and spatial information obtained by the two to reconstruct high spatial resolution light field images with high quality. Recover texture and detail information of reconstructed light field images while avoiding ghost artifacts caused by parallax and preserving parallax structure.
本发明解决上述技术问题所采用的技术方案为:一种光场图像空间超分辨率重建方法,其特征在于包括以下步骤:The technical solution adopted by the present invention to solve the above technical problems is: a light field image space super-resolution reconstruction method, which is characterized by comprising the following steps:
步骤1:选取Num幅空间分辨率为W×H且角度分辨率为V×U的彩色三通道的低空间分辨率光场图像、对应的Num幅分辨率为αW×αH的彩色三通道的2D高分辨率图像,以及对应的Num幅空间分辨率为αW×αH且角度分辨率为V×U的彩色三通道的参考高空间分辨率光场图像;其中,Num>1,α表示空间分辨率提升倍数,α的值大于1;Step 1: Select Num light field images of low spatial resolution with three color channels with spatial resolution of W×H and angular resolution of V×U, and corresponding Num 2D images of three color channels with resolution of αW×αH High-resolution images, and the corresponding Num reference high-spatial-resolution light field images with three color channels with a spatial resolution of αW×αH and an angular resolution of V×U; where Num>1, α represents the spatial resolution Increase the multiplier, the value of α is greater than 1;
步骤2:构建一个卷积神经网络,作为空间超分辨率网络:空间超分辨率网络包括用于提取多尺度特征的编码器、用于配准光场特征和2D高分辨率特征的孔径级特征配准模块、用于从低空间分辨率光场图像中提取浅层特征的浅层特征提取层、用于融合光场特征和2D高分辨率特征的光场特征增强模块、用于缓解粗尺度特征中的配准误差的空间注意力块、用于将潜在特征重建为光场图像的解码器;Step 2: Build a convolutional neural network as a spatial super-resolution network: The spatial super-resolution network includes an encoder for extracting multi-scale features, aperture-level features for registering light field features, and 2D high-resolution features Registration module, shallow feature extraction layer for extracting shallow features from low spatial resolution light field images, light field feature enhancement module for fusing light field features and 2D high resolution features, for alleviating coarse scale A spatial attention block for registration errors in features, a decoder for reconstructing latent features into light field images;
对于编码器,其由依次连接的第一卷积层、第二卷积层、第一残差块和第二残差块组成,第一卷积层的输入端并行接收三个输入,分别为一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR经空间分辨率上采样后得到的图像重组的宽度为αsW×V且高度为αsH×U的子孔径图像阵列,将其记为一幅宽度为αsW且高度为αsH的模糊后的2D高分辨率图像的单通道图像,将其记为以及一幅宽度为αsW且高度为αsH的2D高分辨率图像的单通道图像,将其记为IHR,第一卷积层的输出端针对输出64幅宽度为αsW×V且高度为αsH×U的特征图,将针对输出的所有特征图构成的集合记为第一卷积层的输出端针对输出64幅宽度为αsW且高度为αsH的特征图,将针对输出的所有特征图构成的集合记为第一卷积层的输出端针对IHR输出64幅宽度为αsW且高度为αsH的特征图,将针对IHR输出的所有特征图构成的集合记为YHR,0;第二卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,0中的所有特征图,第二卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二卷积层的输出端针对YHR,0输出64幅宽度为且高度为的特征图,将针对YHR,0输出的所有特征图构成的集合记为YHR,1;第一残差块的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,1中的所有特征图,第一残差块的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块的输出端针对YHR,1输出64幅宽度为且高度为的特征图,将针对YHR,1输出的所有特征图构成的集合记为YHR,2;第二残差块的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,2中的所有特征图,第二残差块的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块的输出端针对YHR,2输出64幅宽度为且高度为的特征图,将针对YHR,2输出的所有特征图构成的集合记为YHR,3;其中,为通过对空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR进行双三次插值上采样后得到的图像重组的宽度为αsW×V且高度为αsH×U的子孔径图像阵列,为通过对IHR先进行双三次插值下采样、后进行双三次插值上采样得到,αs表示空间分辨率采样因子,αs 3=α,双三次插值上采样的上采样因子和双三次插值下采样的下采样因子均取值为αs,第一卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64,第二卷积层的卷积核的尺寸为3×3、卷积步长为2、输入通道数为64、输出通道数为64,第一卷积层和第二卷积层采用的激活函数均为“ReLU”;For the encoder, it consists of a first convolutional layer, a second convolutional layer, a first residual block, and a second residual block connected in sequence. The input of the first convolutional layer receives three inputs in parallel, which are A single-channel image LLR of a low spatial resolution light field image with a spatial resolution of W×H and an angular resolution of V×U The width of the image reconstruction obtained by up-sampling the spatial resolution is α s W×V and the sub-aperture image array with height α s H×U, denoted as A single-channel image of a blurred 2D high-resolution image of width αsW and height αsH , denoted as and a single-channel image of a 2D high-resolution image of width α s W and height α s H, denoted as I HR , the output of the first convolutional layer is for Output 64 feature maps with width α s W×V and height α s H×U, which will be used for The set of all output feature maps is denoted as The output of the first convolutional layer is for Output 64 feature maps with width α s W and height α s H, which will be used for The set of all output feature maps is denoted as The output end of the first convolutional layer outputs 64 feature maps with a width of α s W and a height of α s H for I HR , and the set formed by all feature maps output for I HR is denoted as Y HR,0 ; the second The input of the convolutional layer receives three inputs in parallel, which are All feature maps in , All feature maps in and all feature maps in Y HR,0 , the output of the second convolutional layer for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the second convolutional layer is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the second convolutional layer outputs 64 widths for Y HR, 0 and the height is The feature map of , the set formed by all feature maps output for Y HR ,0 is denoted as Y HR,1 ; the input end of the first residual block receives three inputs in parallel, which are respectively All feature maps in , All feature maps in and all feature maps in Y HR,1 , the output of the first residual block for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the first residual block is for Y HR, 1 outputs 64 widths of and the height is The feature map of , the set formed by all feature maps output for Y HR,1 is denoted as Y HR,2 ; the input of the second residual block receives three inputs in parallel, which are respectively All feature maps in , All feature maps in and all feature maps in Y HR,2 , the output of the second residual block for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output end of the second residual block is for Y HR, 2 outputs 64 widths of and the height is The feature map of , the set formed by all feature maps output for Y HR,2 is denoted as Y HR,3 ; wherein, is the width of α s W × V and a subaperture image array of height α s H × U, In order to obtain by first performing bicubic interpolation downsampling on I HR , and then performing bicubic interpolation upsampling, α s represents the spatial resolution sampling factor, α s 3 =α, the upsampling factor of bicubic interpolation upsampling and the bicubic interpolation The downsampling factors of downsampling are all α s , the size of the convolution kernel of the first convolution layer is 3×3, the convolution stride is 1, the number of input channels is 1, and the number of output channels is 64. The size of the convolution kernel of the convolution layer is 3 × 3, the convolution stride is 2, the number of input channels is 64, and the number of output channels is 64. The activation functions used in the first convolution layer and the second convolution layer are both "ReLU";
对于孔径级特征配准模块,其输入端接收三类特征图,第一类是中的所有特征图,第二类是中的所有特征图,第三类包括四个输入,分别为YHR,0中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图、YHR,3中的所有特征图;在孔径级特征配准模块中,首先将中的所有特征图、YHR,0中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图和YHR,3中的所有特征图各自复制V×U倍,使中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图和YHR,3中的所有特征图的宽度变为且高度变为即使得尺寸与中的特征图的尺寸相匹配,并使YHR,0中的所有特征图的宽度变为αsW×V且高度变为αsH×U,即使得尺寸与中的特征图的尺寸相匹配;然后对中的所有特征图和中的所有特征图进行块匹配,块匹配结束后得到一幅宽度为且高度为的坐标索引图,记为PCI;接着根据PCI,将YHR,1中的所有特征图与中的所有特征图进行空间位置配准,得到64幅宽度为且高度为的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,1;同样,根据PCI,将YHR,2中的所有特征图与中的所有特征图进行空间位置配准,得到64幅宽度为且高度为的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,2;根据PCI,将YHR,3中的所有特征图与中的所有特征图进行空间位置配准,得到64幅宽度为且高度为的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,3;再对PCI进行双三次插值上采样,得到一幅宽度为αsW×V且高度为αsH×U的坐标索引图,记为最后根据将YHR,0中的所有特征图与中的所有特征图进行空间位置配准,得到64幅宽度为αsW×V且高度为αsH×U的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,0;孔径级特征配准模块的输出端输出FAlign,0中的所有特征图、FAlign,1中的所有特征图、FAlign,2中的所有特征图和FAlign,3中的所有特征图;其中,用于块匹配的精度衡量指标为纹理和结构相似度指数,用于块匹配的块的尺寸为3×3,双三次插值上采样的上采样因子为αs;For the aperture-level feature registration module, its input receives three types of feature maps. The first type is All feature maps in , the second class is All feature maps in Y HR, the third category consists of four inputs, namely all feature maps in Y HR,0 , all feature maps in Y HR,1 , all feature maps in Y HR,2 , Y HR,3 All feature maps in ; in the aperture-level feature registration module, first All feature maps in Y HR, all feature maps in 0 , Y HR, all feature maps in 1 , Y HR, all feature maps in 2 , and all feature maps in Y HR, 3 each replicate V × U times, make The widths of all feature maps in Y HR, 1 , Y HR, 2 , and Y HR, 3 become and the height becomes even if the size is the same as match the dimensions of the feature maps in Y HR,0 and make the width of all feature maps in Y HR,0 become α s W×V and the height become α s H×U, i.e. make the size equal to to match the dimensions of the feature maps in ; then All feature maps in and All feature maps in the and the height is The coordinate index map of , denoted as PCI ; then according to PCI , all feature maps in All feature maps in and the height is The registration feature map of the All feature maps in and the height is The registration feature map of the All feature maps in and the height is The registration feature map of , and the set of all the obtained registration feature maps is denoted as F Align,3 ; then perform bicubic interpolation and upsampling on PCI to obtain a width of α s W × V and height of α s The coordinate index map of H×U, denoted as Finally according to Combine all feature maps in Y HR,0 with All the feature maps in the spatial position are registered, and 64 registered feature maps with a width of α s W×V and a height of α s H×U are obtained, and the set of all the obtained registration feature maps is denoted as F Align ,0 ; the output of the aperture-level feature registration module outputs all feature maps in F Align,0 , all feature maps in F Align,1 , all feature maps in F Align,2 , and all feature maps in F Align,3 feature map; wherein, the accuracy measure used for block matching is texture and structure similarity index, the size of the block used for block matching is 3×3, and the upsampling factor for bicubic interpolation upsampling is α s ;
对于浅层特征提取层,其由1个第五卷积层组成,第五卷积层的输入端接收一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR重组的宽度为W×V且高度为H×U的子孔径图像阵列,第五卷积层的输出端输出64幅宽度为W×V且高度为H×U的特征图,将输出的所有特征图构成的集合记为FLR;其中,第五卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64,第五卷积层采用的激活函数为“ReLU”;For the shallow feature extraction layer, it consists of a fifth convolutional layer. The input of the fifth convolutional layer receives a low spatial resolution light with a spatial resolution of W×H and an angular resolution of V×U. The single-channel image LLR recombination of the field image is a sub-aperture image array with a width of W×V and a height of H×U, and the output of the fifth convolutional layer outputs 64 images with a width of W×V and a height of H×U. Feature map, the set of all output feature maps is denoted as F LR ; among them, the size of the convolution kernel of the fifth convolution layer is 3×3, the convolution step size is 1, the number of input channels is 1, and the output channel is 1. The number is 64, and the activation function used by the fifth convolutional layer is "ReLU";
对于光场特征增强模块,其由依次连接的第一增强残差块、第二增强残差块和第三增强残差块组成,第一增强残差块的输入端接收FAlign,1中的所有特征图和FLR中的所有特征图,第一增强残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FEn,1;第二增强残差块的输入端接收FAlign,2中的所有特征图和FEn,1中的所有特征图,第二增强残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FEn,2;第三增强残差块的输入端接收FAlign,3中的所有特征图和FEn,2中的所有特征图,第三增强残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FEn,3;For the light field feature enhancement module, it consists of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block connected in sequence, and the input of the first enhanced residual block receives F Align,1 in For all feature maps and all feature maps in FLR , the output of the first enhanced residual block outputs 64 images with a width of and the height is The feature map of the output, the set formed by all the output feature maps is denoted as F En,1 ; the input end of the second enhanced residual block receives all the feature maps in F Align,2 and all the feature maps in F En,1 , The output end of the second enhanced residual block outputs 64 widths of and the height is The feature map of the output is denoted as F En,2 ; the input end of the third enhanced residual block receives all the feature maps in F Align,3 and all the feature maps in F En,2 , The output end of the third enhanced residual block outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as F En,3 ;
对于空间注意力块,其由依次连接的第六卷积层和第七卷积层组成,第六卷积层的输入端接收FAlign,0中的所有特征图,第六卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的空间注意力特征图,将输出的所有空间注意力特征图构成的集合记为FSA1;第七卷积层的输入端接收FSA1中的所有空间注意力特征图,第七卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的空间注意力特征图,将输出的所有空间注意力特征图构成的集合记为FSA2;将FAlign,0中的所有特征图与FSA2中的所有空间注意力特征图进行逐元素相乘,将得到的所有特征图构成的集合记为FWA,0;将FWA,0中的所有特征图作为空间注意力块的输出端输出的所有特征图;其中,第六卷积层和第七卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64,第六卷积层采用的激活函数为“ReLU”,第七卷积层采用的激活函数为“Sigmoid”;For the spatial attention block, it consists of the sixth convolutional layer and the seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives all the feature maps in F Align,0 , and the output of the sixth convolutional layer The terminal outputs 64 spatial attention feature maps with a width of α s W×V and a height of α s H×U, and the set of all output spatial attention feature maps is denoted as F SA1 ; the input of the seventh convolutional layer The terminal receives all the spatial attention feature maps in F SA1 , and the output terminal of the seventh convolutional layer outputs 64 spatial attention feature maps with a width of α s W×V and a height of α s H×U. The set composed of spatial attention feature maps is denoted as F SA2 ; all feature maps in F Align,0 are multiplied element by element with all spatial attention feature maps in F SA2 , and the set composed of all the obtained feature maps is denoted. is F WA,0 ; all feature maps in F WA,0 are used as all feature maps output by the output of the spatial attention block; wherein, the size of the convolution kernels of the sixth convolutional layer and the seventh convolutional layer are all It is 3 × 3, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 64. The activation function used in the sixth convolution layer is "ReLU", and the activation function used in the seventh convolution layer. is "Sigmoid";
对于解码器,其由依次连接的第三残差块、第四残差块、子像素卷积层、第八卷积层和第九卷积层组成,第三残差块的输入端接收FEn,3中的所有特征图,第三残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDec,1;第四残差块的输入端接收FDec,1中的所有特征图,第四残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDec,2;子像素卷积层的输入端接收FDec,2中的所有特征图,子像素卷积层的输出端输出256幅宽度为且高度为的特征图,并将256幅宽度为且高度为的特征图进一步转换为64幅宽度为αsW×V且高度为αsH×U的特征图,将转换后的所有特征图构成的集合记为FDec,3;第八卷积层的输入端接收FDec,3中的所有特征图与FWA,0中的所有特征图进行逐元素相加后的结果,第八卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的特征图,将输出的所有特征图构成的集合记为FDec,4;第九卷积层的输入端接收FDec,4中的所有特征图,第九卷积层的输出端输出一幅宽度为αsW×V且高度为αsH×U的重建单通道光场图像,并将该幅宽度为αsW×V且高度为αsH×U的重建单通道光场图像重组为空间分辨率为αsW×αsH且角度分辨率为V×U的高空间分辨率单通道光场图像,记为LSR;其中,子像素卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为256,第八卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为64,第九卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为1,子像素卷积层和第八卷积层采用的激活函数均为“ReLU”,第九卷积层不采用激活函数;For the decoder, it consists of the third residual block, the fourth residual block, the sub-pixel convolutional layer, the eighth convolutional layer and the ninth convolutional layer connected in sequence, and the input of the third residual block receives F For all feature maps in En,3 , the output of the third residual block outputs 64 widths of and the height is The feature map of the output, the set formed by all the output feature maps is denoted as F Dec,1 ; the input end of the fourth residual block receives all the feature maps in F Dec,1 , and the output end of the fourth residual block outputs 64 width is and the height is The feature map of the output is marked as F Dec,2 ; the input end of the subpixel convolution layer receives all the feature maps in F Dec,2 , and the output end of the subpixel convolution layer outputs 256 width is and the height is feature map of , and set the width of 256 as and the height is The feature maps are further converted into 64 feature maps with a width of α s W×V and a height of α s H×U, and the set of all the converted feature maps is denoted as F Dec,3 ; The input terminal receives the result of element-wise addition of all feature maps in F Dec,3 and all feature maps in F WA,0 , and the output terminal of the eighth convolutional layer outputs 64 images with a width of α s W×V and A feature map with a height of α s H×U, denote the set of all output feature maps as F Dec,4 ; the input of the ninth convolutional layer receives all the feature maps in F Dec,4 , and the ninth convolution The output of the layer outputs a reconstructed single-channel light field image with width α s W×V and height α s H×U, and converts the image with width α s W×V and height α s H×U. The reconstructed single-channel light field image is reorganized into a high spatial resolution single-channel light field image with a spatial resolution of α s W×α s H and an angular resolution of V×U, denoted as L SR ; among them, the sub-pixel convolution layer The size of the convolution kernel is 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 256, and the size of the convolution kernel of the eighth convolution layer is 3×3, the convolution stride is 1 is 1, the number of input channels is 64, the number of output channels is 64, the size of the convolution kernel of the ninth convolutional layer is 1×1, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 1, The activation functions used in the sub-pixel convolutional layer and the eighth convolutional layer are all "ReLU", and the ninth convolutional layer does not use an activation function;
步骤3:将训练集中的每幅低空间分辨率光场图像、对应的2D高分辨率图像、对应的参考高空间分辨率光场图像进行颜色空间转换,即从RGB颜色空间转换到YCbCr颜色空间,并提取出Y通道图像;然后将每幅低空间分辨率光场图像的Y通道图像重组为宽度为W×V且高度为H×U的子孔径图像阵列来表示;接着将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、对应的2D高分辨率图像的Y通道图像、对应的参考高空间分辨率光场图像的Y通道图像构成训练集;再构建金字塔网络,并利用训练集进行训练,具体过程为:Step 3: Convert each low spatial resolution light field image, corresponding 2D high resolution image, and corresponding reference high spatial resolution light field image in the training set to color space, that is, convert from RGB color space to YCbCr color space , and extract the Y-channel image; then recombine the Y-channel image of each low spatial resolution light field image into a sub-aperture image array with a width of W×V and a height of H×U to represent; The sub-aperture image array of the Y-channel image reconstruction of the low-spatial-resolution light field image, the Y-channel image of the corresponding 2D high-resolution image, and the Y-channel image of the corresponding reference high-spatial-resolution light field image constitute the training set; Pyramid network, and use the training set for training, the specific process is:
步骤3_1:将构建好的空间超分辨率网络复制三次,并进行级联,每个空间超分辨率网络的权重共享,即参数全都一样,将三个空间超分辨率网络构成的整体网络定义为金字塔网络;在每个金字塔水平,空间超分辨率网络的重建尺度设置为与αs取值相同;Step 3_1: Copy the constructed spatial super-resolution network three times and cascade it. The weights of each spatial super-resolution network are shared, that is, the parameters are all the same. The overall network composed of the three spatial super-resolution networks is defined as Pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to the same value as α s ;
步骤3_2:对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行两次空间分辨率下采样,将下采样后得到的图像作为标签图像;对训练集中的每幅2D高分辨率图像的Y通道图像也进行两次同样的空间分辨率下采样,将下采样后得到的图像作为针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第一个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同;Step 3_2: Perform two spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the rate image is also down-sampled twice with the same spatial resolution, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the first spatial super-resolution network in the pyramid network; The sub-aperture image array of the Y-channel image reconstruction of all low-spatial-resolution light-field images in the training set, and the image reconstruction of the Y-channel images of all low-spatial-resolution light-field images in the training set after a spatial resolution upsampling. Subaperture image array, all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after one spatial resolution downsampling and one spatial resolution upsampling is input into the first spatial super-resolution network in the constructed pyramid network for training, and the training is obtained. The high spatial resolution Y channel light field image is reconstructed by α s times corresponding to the Y channel image of each low spatial resolution light field image in the collection; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic The scales of interpolation, spatial resolution upsampling and spatial resolution downsampling are the same as α s ;
步骤3_3:对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行单次空间分辨率下采样,将下采样后得到的图像作为标签图像;对训练集中的每幅2D高分辨率图像的Y通道图像也进行单次同样的空间分辨率下采样,将下采样后得到的图像作为针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第二个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同;Step 3_3: Perform a single spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the high-speed image is also down-sampled at the same spatial resolution once, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the second spatial super-resolution network in the pyramid network; α s times corresponding to Y-channel images of all low-spatial-resolution light-field images in the training set Reconstructed sub-aperture image arrays of high-spatial-resolution Y-channel light-field images, Y of all low-spatial-resolution light-field images in the training set The α s -fold reconstruction of the high spatial resolution Y channel light field image corresponding to the channel image is the sub-aperture image array of the image reconstruction obtained after one spatial resolution upsampling, and all the images for the second spatial super-resolution network in the pyramid network. The 2D high-resolution Y-channel image and all 2D high-resolution Y-channel images for the second spatial super-resolution network in the pyramid network are obtained after one spatial resolution downsampling and one spatial resolution upsampling. The 2D high-resolution Y-channel image is input into the second spatial super-resolution network in the constructed pyramid network for training, and the α s 2 corresponding to the Y-channel image of each low-spatial-resolution light field image in the training set is obtained. The Y-channel light field image with high spatial resolution is reconstructed by 2 times; among them, the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, and the scales of spatial resolution upsampling and spatial resolution downsampling are the same as α s take the same value;
步骤3_4:将训练集中的每幅参考高空间分辨率光场图像的Y通道图像作为标签图像;将训练集中的每幅2D高分辨率图像的Y通道图像作为针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第三个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs 3倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同;Step 3_4: Use the Y-channel image of each reference high-spatial-resolution light field image in the training set as the label image; use the Y-channel image of each 2D high-resolution image in the training set as the third spatial image in the pyramid network. 2D high-resolution Y-channel images of the super-resolution network; then αs 2 times the corresponding Y-channel images of all low-spatial-resolution light-field images in the training set reconstructed sub-apertures of the high-spatial-resolution Y-channel light-field images Image array, sub-aperture image of image reconstruction obtained by upsampling of high spatial resolution Y channel light field images corresponding to α s 2 times corresponding to Y channel images of all low spatial resolution light field images in the training set array, all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after resolution downsampling and one spatial resolution upsampling is input into the third spatial super-resolution network in the constructed pyramid network for training, and each image in the training set is obtained. The high spatial resolution Y channel light field image is reconstructed by 3 times the α s corresponding to the Y channel image of the low spatial resolution light field image; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, The scales of spatial resolution upsampling and spatial resolution downsampling are the same as α s ;
在训练结束后得到金字塔网络中的各空间超分辨率网络中的所有卷积核的最佳权重参数,即得到训练有素的空间超分辨率网络模型;After the training, the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network are obtained, that is, the well-trained spatial super-resolution network model is obtained;
步骤4:任意选取一幅彩色三通道的低空间分辨率光场图像和对应的一幅彩色三通道的2D高分辨率图像作为测试图像;然后将彩色三通道的低空间分辨率光场图像和对应的彩色三通道的2D高分辨率图像从RGB颜色空间转换到YCbCr颜色空间,并提取出Y通道图像;接着将低空间分辨率光场图像的Y通道图像重组为子孔径图像阵列来表示;再将低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、2D高分辨率图像的Y通道图像和2D高分辨率图像的Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到空间超分辨率网络模型中,测试得到低空间分辨率光场图像的Y通道图像对应的重建高空间分辨率Y通道光场图像;之后对低空间分辨率光场图像的Cb通道图像和Cr通道图像分别进行双三次插值上采样,得到低空间分辨率光场图像的Cb通道图像对应的重建高空间分辨率Cb通道光场图像和低空间分辨率光场图像的Cr通道图像对应的重建高空间分辨率Cr通道光场图像;最后将得到的重建高空间分辨率Y通道光场图像、重建高空间分辨率Cb通道光场图像和重建高空间分辨率Cr通道光场图像在颜色通道维度上进行级联,并将级联结果重新转换到RGB颜色空间,得到低空间分辨率光场图像对应的彩色三通道的重建高空间分辨率光场图像。Step 4: Arbitrarily select a low spatial resolution light field image of three color channels and a corresponding 2D high resolution image of three color channels as test images; then use the low spatial resolution light field image of three color channels and The 2D high-resolution image of the corresponding color three-channel is converted from the RGB color space to the YCbCr color space, and the Y-channel image is extracted; then the Y-channel image of the low-spatial-resolution light field image is recombined into a sub-aperture image array to represent; The sub-aperture image array obtained by recombining the Y-channel image of the low-spatial-resolution light field image, the sub-aperture image array obtained by upsampling the Y-channel image of the low-spatial-resolution light field image for one time, and the 2D The Y-channel image of the high-resolution image and the Y-channel image of the 2D high-resolution image are subjected to one spatial resolution down-sampling and one spatial resolution up-sampling. The blurred 2D high-resolution Y-channel image is input to the spatial super-resolution In the rate network model, the reconstructed high-spatial-resolution Y-channel light-field image corresponding to the Y-channel image of the low-spatial-resolution light-field image was obtained by testing; Bicubic interpolation and upsampling to obtain the reconstructed high spatial resolution Cb channel light field image corresponding to the Cb channel image of the low spatial resolution light field image and the reconstructed high spatial resolution Cr corresponding to the Cr channel image of the low spatial resolution light field image channel light field image; finally, the obtained reconstructed high spatial resolution Y channel light field image, reconstructed high spatial resolution Cb channel light field image and reconstructed high spatial resolution Cr channel light field image are cascaded in the color channel dimension, The concatenated result is re-converted to the RGB color space, and the reconstructed high-spatial-resolution light-field image with three color channels corresponding to the low-spatial-resolution light-field image is obtained.
所述的步骤2中,第一残差块、第二残差块、第三残差块和第四残差块的结构相同,其均由依次连接的第三卷积层和第四卷积层组成,第一残差块中的第三卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,1中的所有特征图,第一残差块中的第三卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块中的第三卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块中的第三卷积层的输出端针对YHR,1输出64幅宽度为且高度为的特征图,将针对YHR,1输出的所有特征图构成的集合记为第一残差块中的第四卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和中的所有特征图,第一残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为将中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对输出的所有特征图,这些特征图构成的集合即为将中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对输出的所有特征图,这些特征图构成的集合即为将中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对YHR,1输出的所有特征图,这些特征图构成的集合即为YHR,2;In the step 2, the structure of the first residual block, the second residual block, the third residual block and the fourth residual block is the same, and they are all composed of the third convolution layer and the fourth convolution layer connected in sequence. The input of the third convolutional layer in the first residual block receives three inputs in parallel, which are All feature maps in , All feature maps in and all feature maps in Y HR,1 , the output of the third convolutional layer in the first residual block for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the third convolutional layer in the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the third convolutional layer in the first residual block is for Y HR, and 1 outputs 64 widths of and the height is The feature map of , the set of all feature maps output for Y HR,1 is recorded as The input of the fourth convolutional layer in the first residual block receives three inputs in parallel, which are All feature maps in , All feature maps in and All feature maps in , the output of the fourth convolutional layer in the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the fourth convolutional layer in the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the fourth convolutional layer in the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as Will All feature maps in All feature maps in the All output feature maps, the set of these feature maps is Will All feature maps in All feature maps in the All output feature maps, the set of these feature maps is Will All feature maps in All feature maps in are added element by element, and all feature maps obtained are used as the output of the first residual block for all feature maps output by Y HR,1 , and the set formed by these feature maps is Y HR,2 ;
第二残差块中的第三卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,2中的所有特征图,第二残差块中的第三卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块中的第三卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块中的第三卷积层的输出端针对YHR,2输出64幅宽度为且高度为的特征图,将针对YHR,2输出的所有特征图构成的集合记为第二残差块中的第四卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和中的所有特征图,第二残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为将中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对输出的所有特征图,这些特征图构成的集合即为将中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对输出的所有特征图,这些特征图构成的集合即为将YHR,2中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对YHR,2输出的所有特征图,这些特征图构成的集合即为YHR,3;The input of the third convolutional layer in the second residual block receives three inputs in parallel, which are All feature maps in , All feature maps in and all feature maps in Y HR,2 , the output of the third convolutional layer in the second residual block for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the third convolutional layer in the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the third convolutional layer in the second residual block is for Y HR, 2 outputs 64 widths of and the height is The feature map of , the set of all feature maps output for Y HR,2 is recorded as The input of the fourth convolutional layer in the second residual block receives three inputs in parallel, which are All feature maps in , All feature maps in and All feature maps in , the output of the fourth convolutional layer in the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the fourth convolutional layer in the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the fourth convolutional layer in the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as Will All feature maps in All feature maps in the All output feature maps, the set of these feature maps is Will All feature maps in All feature maps in the All output feature maps, the set of these feature maps is Combine all feature maps in Y HR,2 with All feature maps in are added element by element, and all feature maps obtained are used as the output of the second residual block for all feature maps output by Y HR,2 , and the set formed by these feature maps is Y HR,3 ;
第三残差块中的第三卷积层的输入端接收FEn,3中的所有特征图,第三残差块中的第三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三残差块中的第四卷积层的输入端接收中的所有特征图,第三残差块中的第四卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为将FEn,3中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第三残差块的输出端输出的所有特征图,这些特征图构成的集合即为FDec,1;The input terminal of the third convolutional layer in the third residual block receives all the feature maps in F En,3 , and the output terminal of the third convolutional layer in the third residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as The input of the fourth convolutional layer in the third residual block receives All feature maps in and the height is The feature map of , denote the set of all output feature maps as Combine all feature maps in F En,3 with All feature maps in are added element by element, and all the feature maps obtained are used as all feature maps output by the output of the third residual block, and the set formed by these feature maps is F Dec,1 ;
第四残差块中的第三卷积层的输入端接收FDec,1中的所有特征图,第四残差块中的第三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第四残差块中的第四卷积层的输入端接收中的所有特征图,第四残差块中的第四卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为将FDec,1中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第四残差块的输出端输出的所有特征图,这些特征图构成的集合即为FDec,2;The input of the third convolutional layer in the fourth residual block receives all the feature maps in F Dec,1 , and the output of the third convolutional layer in the fourth residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as The input of the fourth convolutional layer in the fourth residual block receives All feature maps in , the output of the fourth convolutional layer in the fourth residual block outputs 64 width and the height is The feature map of , denote the set of all output feature maps as Combine all feature maps in F Dec,1 with All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the fourth residual block, and the set formed by these feature maps is F Dec,2 ;
上述,第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层和第四卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64,第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层采用的激活函数为“ReLU”、第四卷积层不采用激活函数。As mentioned above, the size of the convolution kernels of the third convolution layer and the fourth convolution layer in the first residual block, the second residual block, the third residual block and the fourth residual block are all 3×3 , the convolution step size is 1, the number of input channels is 64, the number of output channels is 64, the first residual block, the second residual block, the third residual block and the fourth residual block The activation function used in the third convolutional layer is "ReLU", and the fourth convolutional layer does not use an activation function.
所述的步骤2中,第一增强残差块、第二增强残差块和第三增强残差块的结构相同,其均由依次连接的第一空间特征变换层、第一空间角度卷积层、第二空间特征变换层、第二空间角度卷积层和通道注意力层组成,第一空间特征变换层和第二空间特征变换层的结构相同,其均由并行的第十卷积层和第十一卷积层组成,第一空间角度卷积层和第二空间角度卷积层的结构相同,其均由依次连接的第十二卷积层和第十三卷积层组成,通道注意力层由依次连接的全局均值池化层、第十四卷积层和第十五卷积层组成;In the step 2, the structure of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block is the same, and they are all connected by the first spatial feature transformation layer and the first spatial angle convolution layer. layer, the second spatial feature transformation layer, the second spatial angle convolution layer and the channel attention layer. It is composed of the eleventh convolutional layer, the first spatial angle convolutional layer and the second spatial angle convolutional layer have the same structure, and they are both composed of the twelfth convolutional layer and the thirteenth convolutional layer connected in sequence. The attention layer consists of a global mean pooling layer, a fourteenth convolutional layer, and a fifteenth convolutional layer connected in sequence;
第一增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第一增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第一增强残差块中的第一空间特征变换层的输入端接收FLR中的所有特征图,将FLR中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolutional layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the first spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The input end of the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in the FLR , and compares all the feature maps in the FLR with All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the first enhanced residual block, and the set formed by these feature maps is denoted as
第一增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第一增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第一增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第一增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第一增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receives All feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the first enhanced residual block receives The result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolutional layer in the first enhanced residual block, Denote the set of these feature maps as
第一增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第一增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第一增强残差块中的第二空间特征变换层的输入端接收中的所有特征图,将中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolutional layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the second spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The input of the second spatial feature transform layer in the first enhanced residual block receives All feature maps in , will All feature maps in All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in the
第一增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第一增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第一增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第一增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第一增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the first enhanced residual block receives For all feature maps in the first enhanced residual block, the output of the twelfth convolutional layer of the second spatial angle convolutional layer in the first enhanced residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the first enhanced residual block receives As a result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the first enhanced residual block, Denote the set of these feature maps as
第一增强残差块中的通道注意力层中的全局均值池化层的输入端接收中的所有特征图,第一增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FGAP,1,FGAP,1中的每幅特征图中的所有特征值相同;第一增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,1中的所有特征图,第一增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDS,1;第一增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,1中的所有特征图,第一增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FUS,1;将FUS,1中的所有特征图与中的所有特征图进行逐元素相乘,将得到的所有特征图作为第一增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,1;The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives All feature maps in , the output of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as F GAP,1 , and all feature values in each feature map in F GAP,1 are the same; in the channel attention layer in the first enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,1 , and the output of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 4 widths of and the height is The feature map of the output, the set of all output feature maps is denoted as F DS,1 ; the input end of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives F DS,1 in For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 images with a width of and the height is The feature map of the Perform element-wise multiplication of all feature maps in , and take all the feature maps obtained as all feature maps output by the output of the channel attention layer in the first enhanced residual block, and denote the set of these feature maps as F CA ,1 ;
将FCA,1中的所有特征图与FLR中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,1;Add all feature maps in F CA,1 and all feature maps in F LR element by element, and use all the feature maps obtained as all feature maps output by the output of the first enhanced residual block. These feature maps constitute The set of is F En,1 ;
第二增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第二增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第二增强残差块中的第一空间特征变换层的接收端接收FEn,1中的所有特征图,将FEn,1中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input terminal of the tenth convolutional layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the first spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The receiving end of the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F En, 1 , and compares all the feature maps in F En,1 with All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in the
第二增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第二增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第二增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第二增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第二增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the second enhanced residual block receives For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the second enhanced residual block receives The result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in the reorganization operation from the angle dimension to the space dimension are reorganized, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as
第二增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第二增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第二增强残差块中的第二空间特征变换层的接收端接收中的所有特征图,将中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolutional layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the second spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The receiving end of the second spatial feature transform layer in the second enhanced residual block receives All feature maps in , will All feature maps in All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in the
第二增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第二增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第二增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第二增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第二增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the second enhanced residual block receives For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the second enhanced residual block receives The result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as
第二增强残差块中的通道注意力层中的全局均值池化层的输入端接收中的所有特征图,第二增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FGAP,2,FGAP,2中的每幅特征图中的所有特征值相同;第二增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,2中的所有特征图,第二增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDS,2;第二增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,2中的所有特征图,第二增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FUS,2;将FUS,2中的所有特征图与中的所有特征图进行逐元素相乘,将得到的所有特征图作为第二增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,2;The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives All feature maps in , the output of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as F GAP,2 , all feature values in each feature map in F GAP,2 are the same; in the channel attention layer in the second enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,2 , and the output of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 4 widths of and the height is The feature map of the output, the set of all output feature maps is denoted as F DS,2 ; the input end of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives F DS,2 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as F US,2 ; all feature maps in F US,2 are combined with Perform element-by-element multiplication of all feature maps in , and use all the feature maps obtained as all feature maps output by the output of the channel attention layer in the second enhanced residual block, and denote the set formed by these feature maps as F CA ,2 ;
将FCA,2中的所有特征图与FEn,1中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,2;Add all feature maps in F CA,2 and all feature maps in F En,1 element-wise, and use all the resulting feature maps as all feature maps output by the output of the second enhanced residual block. These features The set of graphs is F En,2 ;
第三增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三增强残差块中的第一空间特征变换层的接收端接收FEn,2中的所有特征图,将FEn,2中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolutional layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The receiver of the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F En, 2 , and compares all the feature maps in F En,2 with All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as
第三增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第三增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第三增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第三增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第三增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the third enhanced residual block receives All feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the third enhanced residual block receives As a result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in the recombination operation from the angle dimension to the space dimension are performed, and all the feature maps obtained after the recombination operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as
第三增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三增强残差块中的第二空间特征变换层的接收端接收中的所有特征图,将中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The receiving end of the second spatial feature transformation layer in the third enhanced residual block receives All feature maps in , will All feature maps in All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the second spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as
第三增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第三增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第三增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第三增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第三增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the third enhanced residual block receives For all feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in the recombination operation are converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the third enhanced residual block receives The result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in the reorganization operation from the angle dimension to the space dimension are reorganized, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as
第三增强残差块中的通道注意力层中的全局均值池化层的输入端接收中的所有特征图,第三增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FGAP,3,FGAP,3中的每幅特征图中的所有特征值相同;第三增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,3中的所有特征图,第三增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDS,3;第三增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,3中的所有特征图,第三增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FUS,3;将FUS,3中的所有特征图与中的所有特征图进行逐元素相乘,将得到的所有特征图作为第三增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,3;The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives All feature maps in , the output of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as F GAP,3 , all feature values in each feature map in F GAP,3 are the same; in the channel attention layer in the third enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,3 , and the output of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 4 widths of and the height is The feature map of the output, the set of all output feature maps is denoted as F DS,3 ; the input end of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives F DS,3 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 images with a width of and the height is The feature map of the All feature maps in are multiplied element by element, and all the feature maps obtained are used as all feature maps output by the output of the channel attention layer in the third enhanced residual block, and the set formed by these feature maps is denoted as F CA ,3 ;
将FCA,3中的所有特征图与FEn,2中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,3;Add all feature maps in F CA,3 and all feature maps in F En,2 element-wise, and use all the resulting feature maps as all feature maps output by the output of the third enhanced residual block. These features The set of graphs is F En,3 ;
上述,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十卷积层和第十一卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、均不采用激活函数,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十二卷积层和第十三卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、采用的激活函数均为“ReLU”,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十四卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为4、采用的激活函数为“ReLU”,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十五卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为4、输出通道数为64、采用的激活函数为“Sigmoid”。As mentioned above, the size of the convolution kernels of the tenth convolution layer and the eleventh convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block are all 3×3, volume The product step size is 1, the number of input channels is 64, the number of output channels is 64, and no activation function is used. The first enhanced residual block, the second enhanced residual block and the third enhanced residual block are each The size of the convolution kernel of the twelfth convolutional layer and the thirteenth convolutional layer is both 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 64, and the activation The functions are all "ReLU", the size of the convolution kernel of the fourteenth convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block is 1 × 1, the convolution step The length is 1, the number of input channels is 64, the number of output channels is 4, the activation function used is "ReLU", the tenth of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block. The size of the convolution kernel of the five convolutional layers is 1 × 1, the convolution stride is 1, the number of input channels is 4, the number of output channels is 64, and the activation function used is "Sigmoid".
与现有技术相比,本发明的优点在于:Compared with the prior art, the advantages of the present invention are:
1)本发明方法考虑到传统2D相机可采集丰富的空间信息,其可作为光场图像空间分辨率重建的补偿信息,因此同时使用光场图像和2D高分辨率图像,并在此基础上,构造了一个端到端的卷积神经网络以充分利用两者信息以重建高空间分辨率光场图像,并恢复细致的纹理信息以及保留重建结果的视差结构。1) The method of the present invention considers that the traditional 2D camera can collect rich spatial information, which can be used as compensation information for the spatial resolution reconstruction of the light field image, so the light field image and the 2D high-resolution image are used at the same time, and on this basis, An end-to-end convolutional neural network is constructed to fully utilize both information to reconstruct high spatial resolution light field images, and to recover detailed texture information as well as preserve the disparity structure of the reconstruction results.
2)为建立光场图像与2D高分辨率图像之间的联系,本发明方法构建了孔径级特征配准模块以在高维特征空间探索光场图像与2D高分辨率图像之间的相关性,进而准确地将2D高分辨率图像的特征信息配准到光场图像下;此外,本发明方法利用构建的光场特征增强模块以将配准得到的高分辨率特征与从低空间分辨率光场图像中提取的浅层光场特征进行多层次融合,以有效生成高空间分辨率光场特征,进而可将其重建为高空间分辨率光场图像。2) In order to establish the connection between the light field image and the 2D high-resolution image, the method of the present invention constructs an aperture-level feature registration module to explore the correlation between the light field image and the 2D high-resolution image in the high-dimensional feature space , and then accurately register the feature information of the 2D high-resolution image to the light-field image; in addition, the method of the present invention utilizes the constructed light-field feature enhancement module to match the registered high-resolution feature with the low spatial resolution feature. The shallow light field features extracted from the light field image are fused at multiple levels to effectively generate high spatial resolution light field features, which can then be reconstructed into high spatial resolution light field images.
3)为提高灵活性和实用性,本发明方法采用了一种金字塔网络重建方式,其通过在不同金字塔水平重建特定尺度的超分辨率结果,以渐进式地提高光场图像的空间分辨率并恢复纹理和细节,因而可在一次前向推断中重建多尺度结果(如包含2×,4×和8×);此外,本发明方法在不同金字塔水平下采用权重共享策略以有效降低所构建的金字塔网络的参数量并减轻训练负担。3) In order to improve flexibility and practicability, the method of the present invention adopts a pyramid network reconstruction method, which progressively improves the spatial resolution of light field images by reconstructing super-resolution results of specific scales at different pyramid levels. The texture and details are recovered, so that multi-scale results (such as including 2×, 4× and 8×) can be reconstructed in one forward inference; in addition, the method of the present invention adopts a weight sharing strategy at different pyramid levels to effectively reduce the constructed The amount of parameters of the pyramid network and reduce the training burden.
附图说明Description of drawings
图1为本发明方法的总体实现流程框图;Fig. 1 is the overall realization flow chart of the method of the present invention;
图2为本发明方法构建的卷积神经网络即空间超分辨率网络的组成结构示意图;2 is a schematic diagram of the composition structure of a convolutional neural network constructed by the method of the present invention, that is, a spatial super-resolution network;
图3a为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块的组成结构示意图;3a is a schematic diagram of the composition structure of a light field feature enhancement module in a convolutional neural network constructed by the method of the present invention, that is, a spatial super-resolution network;
图3b为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块中的第一空间特征变换层和第二空间特征变换层的组成结构示意图;3b is a schematic diagram of the composition structure of the first spatial feature transformation layer and the second spatial feature transformation layer in the light field feature enhancement module in the convolutional neural network constructed by the method of the present invention, that is, the spatial super-resolution network;
图3c为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块中的第一空间角度卷积层和第二空间角度卷积层的组成结构示意图;3c is a schematic diagram of the composition structure of the first spatial angle convolution layer and the second spatial angle convolution layer in the light field feature enhancement module in the light field feature enhancement module in the spatial super-resolution network constructed by the method of the present invention;
图3d为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块中的通道注意力层的组成结构示意图;3d is a schematic diagram of the composition structure of a channel attention layer in a light field feature enhancement module in a convolutional neural network constructed by the method of the present invention, that is, a spatial super-resolution network;
图4为本发明方法建立的金字塔网络重建方式的说明示意图;FIG. 4 is a schematic diagram illustrating the reconstruction method of the pyramid network established by the method of the present invention;
图5a为采用双三次插值方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5a is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the tested EPFL light-field image database using the bicubic interpolation method, and the sub-aperture image under the center coordinates is taken here for display;
图5b为采用Haris等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5b is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Haris et al. ;
图5c为采用Lai等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Lai et al. ;
图5d为采用Yeung等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5d is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Yeung et al. ;
图5e为采用Wang等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5e is a reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Wang et al. ;
图5f为采用Jin等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5f shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Jin et al. ;
图5g为采用Boominathan等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5g is a reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Boominathan et al. ;
图5h为采用本发明方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;5h is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the tested EPFL light-field image database by the method of the present invention, and the sub-aperture image under the center coordinates is taken here to show;
图5i为测试的EPFL光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5i is the label high spatial resolution light field image corresponding to the low spatial resolution light field image in the tested EPFL light field image database, and the sub-aperture image under the center coordinates is taken here to show;
图6a为采用双三次插值方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6a is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the tested STFLytro light-field image database using the bicubic interpolation method, and the sub-aperture image under the center coordinates is taken here for display;
图6b为采用Haris等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6b is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Haris et al. ;
图6c为采用Lai等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Lai et al. ;
图6d为采用Yeung等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6d is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Yeung et al. ;
图6e为采用Wang等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6e shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Wang et al. ;
图6f为采用Jin等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6f shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Jin et al. ;
图6g为采用Boominathan等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6g is a reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Boominathan et al. ;
图6h为采用本发明方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;6h is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the STFLytro light-field image database tested by the method of the present invention, and the sub-aperture image under the center coordinates is taken here to show;
图6i为测试的STFLytro光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像。Figure 6i shows the labeled high spatial resolution light field image corresponding to the low spatial resolution light field image in the tested STFLytro light field image database.
具体实施方式Detailed ways
以下结合附图实施例对本发明作进一步详细描述。The present invention will be further described in detail below with reference to the embodiments of the accompanying drawings.
随着沉浸式媒体和技术的发展,用户越来越倾向于观看具有交互式和沉浸感的图像/视频等视觉内容。但是,传统2D成像方法仅能采集场景中光线的强度信息,无法提供场景的深度信息。相比之下,3D成像技术可获取更多的场景信息,然而其包含的深度信息有限,一般用于立体显示。光场成像作为一种新兴的成像技术,可在单次拍摄中同时采集场景中光线的强度和方向信息,进而更有效地记录真实世界,正受到广泛关注。同时,一些基于光场成像的光学仪器和设备已被开发以促进光场技术的应用与发展。受限于成像传感器的尺寸,利用光场相机获取的4D光场图像存在空间和角度分辨率相互制衡的问题。简单来说,4D光场图像在提供高角度分辨率的同时,不可避免地会遭受低空间分辨率,这严重影响了4D光场图像的实际应用,如重聚焦、深度估计等,针对此,本发明提出了一种光场图像空间超分辨率重建方法,With the development of immersive media and technology, users are more and more inclined to watch visual content such as images/videos with an interactive and immersive sense. However, the traditional 2D imaging method can only collect the intensity information of the light in the scene, and cannot provide the depth information of the scene. In contrast, 3D imaging technology can obtain more scene information, but it contains limited depth information and is generally used for stereoscopic display. As an emerging imaging technology, light field imaging can simultaneously capture the intensity and direction information of light in a scene in a single shot, thereby more effectively recording the real world, and is receiving widespread attention. Meanwhile, some optical instruments and devices based on light field imaging have been developed to promote the application and development of light field technology. Limited by the size of the imaging sensor, the 4D light field image obtained by the light field camera has the problem of balance between the spatial and angular resolution. In simple terms, 4D light field images inevitably suffer from low spatial resolution while providing high angular resolution, which seriously affects the practical applications of 4D light field images, such as refocusing, depth estimation, etc. The present invention proposes a light field image space super-resolution reconstruction method,
其通过异构式成像以在捕获光场图像的同时获取一幅2D高分辨率图像,进而将捕获的2D高分辨率图像作为补充信息来帮助增强光场图像的空间分辨率,具体是构建了一个空间超分辨率网络,其主要包括编码器、孔径级特征配准模块、光场特征增强模块和解码器等部分;首先利用编码器来分别对上采样后的低空间分辨率光场图像、模糊后的2D高分辨率图像和2D高分辨率图像本身提取多尺度特征;之后通过孔径级特征配准模块来学习2D高分辨率特征与低分辨率光场特征之间的对应性,以将2D高分辨率特征配准到光场图像的每个子孔径图像下并形成配准后的高分辨率光场特征;然后通过光场特征增强模块以利用配准得到的高分辨率光场特征来增强从输入光场图像中提取的浅层光场特征,得到增强后的高分辨率光场特征;最后,利用解码器将增强后的高分辨率光场特征重建为高质量的高空间分辨率光场图像;此外,采用金字塔网络重建架构,以在每个金字塔水平重建特定上采样尺度的高空间分辨率光场图像,进而可同时生成多尺度重建结果。It uses heterogeneous imaging to acquire a 2D high-resolution image while capturing the light-field image, and then uses the captured 2D high-resolution image as supplementary information to help enhance the spatial resolution of the light-field image. A spatial super-resolution network mainly includes an encoder, an aperture-level feature registration module, a light field feature enhancement module, and a decoder. The blurred 2D high-resolution image and the 2D high-resolution image itself extract multi-scale features; then the aperture-level feature registration module is used to learn the correspondence between 2D high-resolution features and low-resolution light field features to The 2D high-resolution features are registered under each sub-aperture image of the light field image to form the registered high-resolution light field features; Enhance the shallow light field features extracted from the input light field image to obtain the enhanced high-resolution light field features; finally, use the decoder to reconstruct the enhanced high-resolution light field features into high-quality high spatial resolution light field images; in addition, a pyramid network reconstruction architecture is employed to reconstruct high spatial resolution light field images at specific upsampling scales at each pyramid level, which in turn can generate multi-scale reconstruction results simultaneously.
本发明提出的一种光场图像空间超分辨率重建方法,其总体实现流程框图如图1所示,其包括以下步骤:A light field image spatial super-resolution reconstruction method proposed by the present invention, the overall implementation flowchart is shown in Figure 1, which includes the following steps:
步骤1:选取Num幅空间分辨率为W×H且角度分辨率为V×U的彩色三通道的低空间分辨率光场图像、对应的Num幅分辨率为αW×αH的彩色三通道的2D高分辨率图像,以及对应的Num幅空间分辨率为αW×αH且角度分辨率为V×U的彩色三通道的参考高空间分辨率光场图像;其中,Num>1,在本实施例中取Num=200,在本实施例中W×H为75×50、V×U为5×5,α表示空间分辨率提升倍数,α的值大于1,在本实施例中取α的值为8。Step 1: Select Num light field images of low spatial resolution with three color channels with spatial resolution of W×H and angular resolution of V×U, and corresponding Num 2D images of three color channels with resolution of αW×αH A high-resolution image, and a corresponding Num reference high-spatial-resolution light field image with three color channels with a spatial resolution of αW×αH and an angular resolution of V×U; where Num>1, in this embodiment Take Num=200, in this embodiment, W×H is 75×50, V×U is 5×5, α represents the spatial resolution improvement multiple, and the value of α is greater than 1, in this embodiment, the value of α is taken as 8.
步骤2:构建一个卷积神经网络,作为空间超分辨率网络:如图2所示,空间超分辨率网络包括用于提取多尺度特征的编码器、用于配准光场特征和2D高分辨率特征的孔径级特征配准模块、用于从低空间分辨率光场图像中提取浅层特征的浅层特征提取层、用于融合光场特征和2D高分辨率特征的光场特征增强模块、用于缓解粗尺度特征中的配准误差的空间注意力块、用于将潜在特征重建为光场图像的解码器。Step 2: Build a convolutional neural network as a spatial super-resolution network: As shown in Figure 2, the spatial super-resolution network includes an encoder for extracting multi-scale features, a light field feature for registration, and a 2D high-resolution network. Aperture-level feature registration module for rate features, shallow feature extraction layer for extracting shallow features from low spatial resolution light field images, light field feature enhancement module for fusing light field features and 2D high resolution features , Spatial attention blocks for alleviating registration errors in coarse-scale features, Decoders for reconstructing latent features into light-field images.
对于编码器,其由依次连接的第一卷积层、第二卷积层、第一残差块和第二残差块组成,第一卷积层的输入端并行接收三个输入,分别为一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR经空间分辨率上采样后得到的图像重组的宽度为αsW×V且高度为αsH×U的子孔径图像阵列,将其记为一幅宽度为αsW且高度为αsH的模糊后的2D高分辨率图像的单通道图像,将其记为以及一幅宽度为αsW且高度为αsH的2D高分辨率图像的单通道图像,将其记为IHR,第一卷积层的输出端针对输出64幅宽度为αsW×V且高度为αsH×U的特征图,将针对输出的所有特征图构成的集合记为第一卷积层的输出端针对输出64幅宽度为αsW且高度为αsH的特征图,将针对输出的所有特征图构成的集合记为第一卷积层的输出端针对IHR输出64幅宽度为αsW且高度为αsH的特征图,将针对IHR输出的所有特征图构成的集合记为YHR,0;第二卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,0中的所有特征图,第二卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二卷积层的输出端针对YHR,0输出64幅宽度为且高度为的特征图,将针对YHR,0输出的所有特征图构成的集合记为YHR,1;第一残差块的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,1中的所有特征图,第一残差块的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块的输出端针对YHR,1输出64幅宽度为且高度为的特征图,将针对YHR,1输出的所有特征图构成的集合记为YHR,2;第二残差块的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,2中的所有特征图,第二残差块的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块的输出端针对YHR,2输出64幅宽度为且高度为的特征图,将针对YHR,2输出的所有特征图构成的集合记为YHR,3;其中,为通过对空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR进行现有的双三次插值上采样后得到的图像重组的宽度为αsW×V且高度为αsH×U的子孔径图像阵列,为通过对IHR先进行双三次插值下采样、后进行双三次插值上采样得到,αs表示空间分辨率采样因子,在本实施例中αs取值为2,αs 3=α,双三次插值上采样的上采样因子和双三次插值下采样的下采样因子均取值为αs,第一卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64,第二卷积层的卷积核的尺寸为3×3、卷积步长为2、输入通道数为64、输出通道数为64,第一卷积层和第二卷积层采用的激活函数均为“ReLU”。For the encoder, it consists of a first convolutional layer, a second convolutional layer, a first residual block, and a second residual block connected in sequence. The input of the first convolutional layer receives three inputs in parallel, which are A single-channel image LLR of a low spatial resolution light field image with a spatial resolution of W×H and an angular resolution of V×U The width of the image reconstruction obtained by up-sampling the spatial resolution is α s W×V and the sub-aperture image array with height α s H×U, denoted as A single-channel image of a blurred 2D high-resolution image of width αsW and height αsH , denoted as and a single-channel image of a 2D high-resolution image of width α s W and height α s H, denoted as I HR , the output of the first convolutional layer is for Output 64 feature maps with width α s W×V and height α s H×U, which will be used for The set of all output feature maps is denoted as The output of the first convolutional layer is for Output 64 feature maps with width α s W and height α s H, which will be used for The set of all output feature maps is denoted as The output end of the first convolutional layer outputs 64 feature maps with a width of α s W and a height of α s H for I HR , and the set formed by all feature maps output for I HR is denoted as Y HR,0 ; the second The input of the convolutional layer receives three inputs in parallel, which are All feature maps in , All feature maps in and all feature maps in Y HR,0 , the output of the second convolutional layer for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the second convolutional layer is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the second convolutional layer outputs 64 widths for Y HR, 0 and the height is The feature map of Y HR,0 will be marked as Y HR,1 for the set formed by all the feature maps output; the input end of the first residual block receives three inputs in parallel, which are respectively All feature maps in , All feature maps in and all feature maps in Y HR,1 , the output of the first residual block for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the first residual block is for Y HR, 1 outputs 64 widths of and the height is The feature map of , the set formed by all feature maps output for Y HR,1 is denoted as Y HR,2 ; the input of the second residual block receives three inputs in parallel, which are respectively All feature maps in , All feature maps in and all feature maps in Y HR,2 , the output of the second residual block for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output end of the second residual block is for Y HR, 2 outputs 64 widths of and the height is The feature map of , the set formed by all feature maps output for Y HR,2 is denoted as Y HR,3 ; wherein, is the width of the image reconstruction obtained by upsampling the existing bicubic interpolation on the single-channel image LLR of the low spatial resolution light field image with spatial resolution W×H and angular resolution V×U. a subaperture image array of s W × V and height α s H × U, In order to obtain by first performing bicubic interpolation downsampling on I HR , and then performing bicubic interpolation upsampling, α s represents the spatial resolution sampling factor, and in this embodiment, α s is 2, α s 3 =α, and α s 3 =α. The upsampling factor of the cubic interpolation upsampling and the downsampling factor of the bicubic interpolation downsampling are both α s , the size of the convolution kernel of the first convolution layer is 3×3, the convolution stride is 1, and the input channel The number is 1, the number of output channels is 64, the size of the convolution kernel of the second convolution layer is 3×3, the convolution stride is 2, the number of input channels is 64, the number of output channels is 64, and the first convolution layer is And the activation function used by the second convolutional layer is "ReLU".
对于孔径级特征配准模块,其输入端接收三类特征图,第一类是中的所有特征图,第二类是中的所有特征图,第三类包括四个输入,分别为YHR,0中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图、YHR,3中的所有特征图;在孔径级特征配准模块中,首先将中的所有特征图、YHR,0中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图和YHR,3中的所有特征图各自复制V×U倍,使中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图和YHR,3中的所有特征图的宽度变为且高度变为即使得尺寸与中的特征图的尺寸相匹配,并使YHR,0中的所有特征图的宽度变为αsW×V且高度变为αsH×U,即使得尺寸与中的特征图的尺寸相匹配;然后对中的所有特征图和中的所有特征图进行现有的块匹配,块匹配结束后得到一幅宽度为且高度为的坐标索引图,记为PCI;接着根据PCI,将YHR,1中的所有特征图与中的所有特征图进行空间位置配准,得到64幅宽度为且高度为的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,1;同样,根据PCI,将YHR,2中的所有特征图与中的所有特征图进行空间位置配准,得到64幅宽度为且高度为的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,2;根据PCI,将YHR,3中的所有特征图与中的所有特征图进行空间位置配准,得到64幅宽度为且高度为的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,3;再对PCI进行双三次插值上采样,得到一幅宽度为αsW×V且高度为αsH×U的坐标索引图,记为最后根据将YHR,0中的所有特征图与中的所有特征图进行空间位置配准,得到64幅宽度为αsW×V且高度为αsH×U的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,0;孔径级特征配准模块的输出端输出FAlign,0中的所有特征图、FAlign,1中的所有特征图、FAlign,2中的所有特征图和FAlign,3中的所有特征图;其中,用于块匹配的精度衡量指标为纹理和结构相似度指数,用于块匹配的块的尺寸为3×3,双三次插值上采样的上采样因子为αs;由于高层特征更紧凑地描述了图像在语义层面的相似性,同时抑制了不相关的纹理,因此这里是对中的所有特征图和中的所有特征图进行块匹配,得到的坐标索引图PCI反映了中的特征图与中的特征图之间的空间位置配准关系,另外,卷积操作不会改变特征图的空间位置信息,PCI也反映了中的特征图与中的特征图之间的空间位置配准关系,以及中的特征图与中的特征图之间的空间位置配准关系,经双三次插值上采样后得到的反映了中的特征图与中的特征图之间的空间位置配准关系。For the aperture-level feature registration module, its input receives three types of feature maps. The first type is All feature maps in , the second class is All feature maps in Y HR, the third category consists of four inputs, namely all feature maps in Y HR,0 , all feature maps in Y HR,1 , all feature maps in Y HR,2 , Y HR,3 All feature maps in ; in the aperture-level feature registration module, first All feature maps in Y HR, all feature maps in 0 , Y HR, all feature maps in 1 , Y HR, all feature maps in 2 , and all feature maps in Y HR, 3 each replicate V × U times, make The widths of all feature maps in Y HR, 1 , Y HR, 2 , and Y HR, 3 become and the height becomes even if the size is the same as match the dimensions of the feature maps in Y HR,0 and make the width of all feature maps in Y HR,0 become α s W×V and the height become α s H×U, i.e. make the size equal to to match the dimensions of the feature maps in ; then All feature maps in and All feature maps in and the height is The coordinate index map of , denoted as PCI ; then according to PCI , all feature maps in All feature maps in and the height is The registration feature map of the All feature maps in and the height is The registration feature map of the All feature maps in and the height is The registration feature map of , and the set of all the obtained registration feature maps is denoted as F Align,3 ; then perform bicubic interpolation and upsampling on PCI to obtain a width of α s W × V and height of α s The coordinate index map of H×U, denoted as Finally according to Combine all feature maps in Y HR,0 with All the feature maps in the spatial position are registered, and 64 registered feature maps with a width of α s W×V and a height of α s H×U are obtained, and the set of all the obtained registration feature maps is denoted as F Align ,0 ; the output of the aperture-level feature registration module outputs all feature maps in F Align,0 , all feature maps in F Align,1 , all feature maps in F Align,2 , and all feature maps in F Align,3 Feature map; among them, the accuracy measurement index used for block matching is texture and structure similarity index, the size of the block used for block matching is 3×3, and the upsampling factor of bicubic interpolation is α s ; due to high-level features More compactly describes the similarity of images at the semantic level while suppressing irrelevant textures, so here is the All feature maps in and All feature maps in The feature maps in and In addition, the convolution operation does not change the spatial position information of the feature maps, and the PCI also reflects The feature maps in and The spatial location registration relationship between the feature maps in , and The feature maps in and The spatial position registration relationship between the feature maps in , obtained after upsampling by bicubic interpolation Reflects The feature maps in and The spatial location registration relationship between the feature maps in .
对于浅层特征提取层,其由1个第五卷积层组成,第五卷积层的输入端接收一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR重组的宽度为W×V且高度为H×U的子孔径图像阵列,第五卷积层的输出端输出64幅宽度为W×V且高度为H×U的特征图,将输出的所有特征图构成的集合记为FLR;其中,第五卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64,第五卷积层采用的激活函数为“ReLU”。For the shallow feature extraction layer, it consists of a fifth convolutional layer. The input of the fifth convolutional layer receives a low spatial resolution light with a spatial resolution of W×H and an angular resolution of V×U. The single-channel image LLR recombination of the field image is a sub-aperture image array with a width of W×V and a height of H×U, and the output of the fifth convolutional layer outputs 64 images with a width of W×V and a height of H×U. Feature map, the set of all output feature maps is denoted as F LR ; among them, the size of the convolution kernel of the fifth convolution layer is 3×3, the convolution step size is 1, the number of input channels is 1, and the output channel is 1. The number is 64, and the activation function used by the fifth convolutional layer is "ReLU".
对于光场特征增强模块,如图3a所示,其由依次连接的第一增强残差块、第二增强残差块和第三增强残差块组成,第一增强残差块的输入端接收FAlign,1中的所有特征图和FLR中的所有特征图,在αs取值为2时W×V等价于H×U等价于即FLR中的特征图的尺寸与FAlign,1中的特征图的尺寸相同,第一增强残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FEn,1;第二增强残差块的输入端接收FAlign,2中的所有特征图和FEn,1中的所有特征图,第二增强残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FEn,2;第三增强残差块的输入端接收FAlign,3中的所有特征图和FEn,2中的所有特征图,第三增强残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FEn,3。For the light field feature enhancement module, as shown in Figure 3a, it consists of a first enhanced residual block, a second enhanced residual block and a third enhanced residual block connected in sequence. The input of the first enhanced residual block receives All feature maps in F Align,1 and all feature maps in F LR , when α s is 2, W×V is equivalent to H×U is equivalent to That is, the size of the feature map in F LR is the same as the size of the feature map in F Align,1 , and the output of the first enhanced residual block outputs 64 widths of and the height is The feature map of the output, the set composed of all the output feature maps is denoted as F En,1 ; the input end of the second enhanced residual block receives all the feature maps in F Align,2 and all the feature maps in F En,1 , The output end of the second enhanced residual block outputs 64 widths of and the height is The feature map of the output, the set of all the output feature maps is denoted as F En,2 ; the input end of the third enhanced residual block receives all the feature maps in F Align,3 and all the feature maps in F En,2 , The output end of the third enhanced residual block outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as F En,3 .
对于空间注意力块,其由依次连接的第六卷积层和第七卷积层组成,第六卷积层的输入端接收FAlign,0中的所有特征图,第六卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的空间注意力特征图,将输出的所有空间注意力特征图构成的集合记为FSA1;第七卷积层的输入端接收FSA1中的所有空间注意力特征图,第七卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的空间注意力特征图,将输出的所有空间注意力特征图构成的集合记为FSA2;将FAlign,0中的所有特征图与FSA2中的所有空间注意力特征图进行逐元素相乘,将得到的所有特征图构成的集合记为FWA,0;将FWA,0中的所有特征图作为空间注意力块的输出端输出的所有特征图;其中,第六卷积层和第七卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64,第六卷积层采用的激活函数为“ReLU”,第七卷积层采用的激活函数为“Sigmoid”。For the spatial attention block, it consists of the sixth convolutional layer and the seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives all the feature maps in F Align,0 , and the output of the sixth convolutional layer The terminal outputs 64 spatial attention feature maps with a width of α s W×V and a height of α s H×U, and the set of all output spatial attention feature maps is denoted as F SA1 ; the input of the seventh convolutional layer The terminal receives all the spatial attention feature maps in F SA1 , and the output terminal of the seventh convolutional layer outputs 64 spatial attention feature maps with a width of α s W×V and a height of α s H×U. The set composed of spatial attention feature maps is denoted as F SA2 ; all feature maps in F Align,0 are multiplied element by element with all spatial attention feature maps in F SA2 , and the set composed of all the obtained feature maps is denoted. is F WA,0 ; all feature maps in F WA,0 are used as all feature maps output by the output of the spatial attention block; wherein, the size of the convolution kernels of the sixth convolutional layer and the seventh convolutional layer are all It is 3 × 3, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 64. The activation function used in the sixth convolution layer is "ReLU", and the activation function used in the seventh convolution layer. for "Sigmoid".
对于解码器,其由依次连接的第三残差块、第四残差块、子像素卷积层、第八卷积层和第九卷积层组成,第三残差块的输入端接收FEn,3中的所有特征图,第三残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDec,1;第四残差块的输入端接收FDec,1中的所有特征图,第四残差块的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDec,2;子像素卷积层的输入端接收FDec,2中的所有特征图,子像素卷积层的输出端输出256幅宽度为且高度为的特征图,并将256幅宽度为且高度为的特征图进一步转换为64幅宽度为αsW×V且高度为αsH×U的特征图,将转换后的所有特征图构成的集合记为FDec,3;第八卷积层的输入端接收FDec,3中的所有特征图与FWA,0中的所有特征图进行逐元素相加后的结果,第八卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的特征图,将输出的所有特征图构成的集合记为FDec,4;第九卷积层的输入端接收FDec,4中的所有特征图,第九卷积层的输出端输出一幅宽度为αsW×V且高度为αsH×U的重建单通道光场图像,并将该幅宽度为αsW×V且高度为αsH×U的重建单通道光场图像重组为空间分辨率为αsW×αsH且角度分辨率为V×U的高空间分辨率单通道光场图像,记为LSR;其中,子像素卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为256,第八卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为64,第九卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为1,子像素卷积层和第八卷积层采用的激活函数均为“ReLU”,第九卷积层不采用激活函数。For the decoder, it consists of the third residual block, the fourth residual block, the sub-pixel convolutional layer, the eighth convolutional layer and the ninth convolutional layer connected in sequence, and the input of the third residual block receives F For all feature maps in En,3 , the output of the third residual block outputs 64 widths of and the height is The feature map of the output, the set formed by all the output feature maps is denoted as F Dec,1 ; the input end of the fourth residual block receives all the feature maps in F Dec,1 , and the output end of the fourth residual block outputs 64 width is and the height is The feature map of the output is marked as F Dec,2 ; the input end of the subpixel convolution layer receives all the feature maps in F Dec,2 , and the output end of the subpixel convolution layer outputs 256 width is and the height is feature map of , and set the width of 256 as and the height is The feature maps are further converted into 64 feature maps with a width of α s W×V and a height of α s H×U, and the set of all the converted feature maps is denoted as F Dec,3 ; The input terminal receives the result of element-wise addition of all feature maps in F Dec,3 and all feature maps in F WA,0 , and the output terminal of the eighth convolutional layer outputs 64 images with a width of α s W×V and A feature map with a height of α s H×U, denote the set of all output feature maps as F Dec,4 ; the input of the ninth convolutional layer receives all the feature maps in F Dec,4 , and the ninth convolution The output of the layer outputs a reconstructed single-channel light field image with width α s W×V and height α s H×U, and converts the image with width α s W×V and height α s H×U. The reconstructed single-channel light field image is reorganized into a high spatial resolution single-channel light field image with a spatial resolution of α s W×α s H and an angular resolution of V×U, denoted as L SR ; among them, the sub-pixel convolution layer The size of the convolution kernel is 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 256, and the size of the convolution kernel of the eighth convolution layer is 3×3, the convolution stride is 1 is 1, the number of input channels is 64, the number of output channels is 64, the size of the convolution kernel of the ninth convolutional layer is 1×1, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 1, The activation functions used in the sub-pixel convolutional layer and the eighth convolutional layer are all "ReLU", and the ninth convolutional layer does not use an activation function.
步骤3:将训练集中的每幅低空间分辨率光场图像、对应的2D高分辨率图像、对应的参考高空间分辨率光场图像进行颜色空间转换,即从RGB颜色空间转换到YCbCr颜色空间,并提取出Y通道图像;然后将每幅低空间分辨率光场图像的Y通道图像重组为宽度为W×V且高度为H×U的子孔径图像阵列来表示;接着将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、对应的2D高分辨率图像的Y通道图像、对应的参考高空间分辨率光场图像的Y通道图像构成训练集;再构建金字塔网络,并利用训练集进行训练,具体过程为:Step 3: Convert each low spatial resolution light field image, corresponding 2D high resolution image, and corresponding reference high spatial resolution light field image in the training set to color space, that is, convert from RGB color space to YCbCr color space , and extract the Y-channel image; then recombine the Y-channel image of each low spatial resolution light field image into a sub-aperture image array with a width of W×V and a height of H×U to represent; The sub-aperture image array of the Y-channel image reconstruction of the low-spatial-resolution light field image, the Y-channel image of the corresponding 2D high-resolution image, and the Y-channel image of the corresponding reference high-spatial-resolution light field image constitute the training set; Pyramid network, and use the training set for training, the specific process is:
步骤3_1:如图4所示,将构建好的空间超分辨率网络复制三次,并进行级联,每个空间超分辨率网络的权重共享,即参数全都一样,将三个空间超分辨率网络构成的整体网络定义为金字塔网络;在每个金字塔水平,空间超分辨率网络的重建尺度设置为与αs取值相同,αs取值为2时即将光场图像的空间分辨率提高2倍,因此最终的重建尺度可达到8,即α=αs 3=8。Step 3_1: As shown in Figure 4, the constructed spatial super-resolution network is copied three times and cascaded. The weights of each spatial super-resolution network are shared, that is, the parameters are all the same. The overall network formed is defined as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be the same as the value of α s . When the value of α s is 2, the spatial resolution of the light field image is increased by 2 times. , so the final reconstruction scale can reach 8, that is, α=α s 3 =8.
步骤3_2:对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行两次空间分辨率下采样,将下采样后得到的图像作为标签图像;对训练集中的每幅2D高分辨率图像的Y通道图像也进行两次同样的空间分辨率下采样,将下采样后得到的图像作为针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第一个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同。Step 3_2: Perform two spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the rate image is also down-sampled twice with the same spatial resolution, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the first spatial super-resolution network in the pyramid network; The sub-aperture image array of the Y-channel image reconstruction of all low-spatial-resolution light-field images in the training set, and the image reconstruction of the Y-channel images of all low-spatial-resolution light-field images in the training set after a spatial resolution upsampling. Subaperture image array, all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after one spatial resolution downsampling and one spatial resolution upsampling is input into the first spatial super-resolution network in the constructed pyramid network for training, and the training is obtained. The high spatial resolution Y channel light field image is reconstructed by α s times corresponding to the Y channel image of each low spatial resolution light field image in the collection; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic The scales of interpolation, spatial resolution upsampling and spatial resolution downsampling are all the same as α s .
步骤3_3:对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行单次空间分辨率下采样,将下采样后得到的图像作为标签图像;对训练集中的每幅2D高分辨率图像的Y通道图像也进行单次同样的空间分辨率下采样,将下采样后得到的图像作为针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第二个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同。Step 3_3: Perform a single spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the high-speed image is also down-sampled at the same spatial resolution once, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the second spatial super-resolution network in the pyramid network; α s times corresponding to Y-channel images of all low-spatial-resolution light-field images in the training set Reconstructed sub-aperture image arrays of high-spatial-resolution Y-channel light-field images, Y of all low-spatial-resolution light-field images in the training set The α s -fold reconstruction of the high spatial resolution Y channel light field image corresponding to the channel image is the sub-aperture image array of the image reconstruction obtained after one spatial resolution upsampling, and all the images for the second spatial super-resolution network in the pyramid network. The 2D high-resolution Y-channel image and all 2D high-resolution Y-channel images for the second spatial super-resolution network in the pyramid network are obtained after one spatial resolution downsampling and one spatial resolution upsampling. The 2D high-resolution Y-channel image is input into the second spatial super-resolution network in the constructed pyramid network for training, and the α s 2 corresponding to the Y-channel image of each low-spatial-resolution light field image in the training set is obtained. The Y-channel light field image with high spatial resolution is reconstructed by 2 times; among them, the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, and the scales of spatial resolution upsampling and spatial resolution downsampling are the same as α s The value is the same.
步骤3_4:将训练集中的每幅参考高空间分辨率光场图像的Y通道图像作为标签图像;将训练集中的每幅2D高分辨率图像的Y通道图像作为针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第三个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs 3倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同。Step 3_4: Use the Y-channel image of each reference high-spatial-resolution light field image in the training set as the label image; use the Y-channel image of each 2D high-resolution image in the training set as the third spatial image in the pyramid network. 2D high-resolution Y-channel images of the super-resolution network; then αs 2 times the corresponding Y-channel images of all low-spatial-resolution light-field images in the training set reconstructed sub-apertures of the high-spatial-resolution Y-channel light-field images Image array, sub-aperture image of image reconstruction obtained by upsampling of high spatial resolution Y channel light field images corresponding to α s 2 times corresponding to Y channel images of all low spatial resolution light field images in the training set array, all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after resolution downsampling and one spatial resolution upsampling is input into the third spatial super-resolution network in the constructed pyramid network for training, and each image in the training set is obtained. The high spatial resolution Y channel light field image is reconstructed by 3 times the α s corresponding to the Y channel image of the low spatial resolution light field image; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, The scale of spatial resolution upsampling and spatial resolution downsampling is the same as α s .
在训练结束后得到金字塔网络中的各空间超分辨率网络中的所有卷积核的最佳权重参数,即得到训练有素的空间超分辨率网络模型;该网络模型在每个金字塔水平实现特定的超分辨率重建尺度,因而可在一次前向推断中输出多尺度超分辨率结果(即在αs取值为2时尺度为2×、4×和8×);此外,通过对各金字塔水平下的空间超分辨率网络进行权重共享,可有效减少网络参数量并降低训练负担。After the training, the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network are obtained, that is, a well-trained spatial super-resolution network model is obtained; the network model achieves specific The super-resolution reconstruction scale of the The horizontal spatial super-resolution network performs weight sharing, which can effectively reduce the amount of network parameters and reduce the training burden.
步骤4:任意选取一幅彩色三通道的低空间分辨率光场图像和对应的一幅彩色三通道的2D高分辨率图像作为测试图像;然后将彩色三通道的低空间分辨率光场图像和对应的彩色三通道的2D高分辨率图像从RGB颜色空间转换到YCbCr颜色空间,并提取出Y通道图像;接着将低空间分辨率光场图像的Y通道图像重组为子孔径图像阵列来表示;再将低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、2D高分辨率图像的Y通道图像和2D高分辨率图像的Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到空间超分辨率网络模型中,测试得到低空间分辨率光场图像的Y通道图像对应的重建高空间分辨率Y通道光场图像;之后对低空间分辨率光场图像的Cb通道图像和Cr通道图像分别进行双三次插值上采样,得到低空间分辨率光场图像的Cb通道图像对应的重建高空间分辨率Cb通道光场图像和低空间分辨率光场图像的Cr通道图像对应的重建高空间分辨率Cr通道光场图像;最后将得到的重建高空间分辨率Y通道光场图像、重建高空间分辨率Cb通道光场图像和重建高空间分辨率Cr通道光场图像在颜色通道维度上进行级联,并将级联结果重新转换到RGB颜色空间,得到低空间分辨率光场图像对应的彩色三通道的重建高空间分辨率光场图像。Step 4: Arbitrarily select a low spatial resolution light field image of three color channels and a corresponding 2D high resolution image of three color channels as test images; then use the low spatial resolution light field image of three color channels and The 2D high-resolution image of the corresponding color three-channel is converted from the RGB color space to the YCbCr color space, and the Y-channel image is extracted; then the Y-channel image of the low-spatial-resolution light field image is recombined into a sub-aperture image array to represent; The sub-aperture image array obtained by recombining the Y-channel image of the low-spatial-resolution light field image, the sub-aperture image array obtained by upsampling the Y-channel image of the low-spatial-resolution light field image for one time, and the 2D The Y-channel image of the high-resolution image and the Y-channel image of the 2D high-resolution image are subjected to one spatial resolution down-sampling and one spatial resolution up-sampling. The blurred 2D high-resolution Y-channel image is input to the spatial super-resolution In the rate network model, the reconstructed high-spatial-resolution Y-channel light-field image corresponding to the Y-channel image of the low-spatial-resolution light-field image was obtained by testing; Bicubic interpolation and upsampling to obtain the reconstructed high spatial resolution Cb channel light field image corresponding to the Cb channel image of the low spatial resolution light field image and the reconstructed high spatial resolution Cr corresponding to the Cr channel image of the low spatial resolution light field image channel light field image; finally, the obtained reconstructed high spatial resolution Y channel light field image, reconstructed high spatial resolution Cb channel light field image and reconstructed high spatial resolution Cr channel light field image are cascaded in the color channel dimension, The concatenated result is re-converted to the RGB color space, and the reconstructed high-spatial-resolution light-field image with three color channels corresponding to the low-spatial-resolution light-field image is obtained.
在本实施例中,步骤2中,第一残差块、第二残差块、第三残差块和第四残差块的结构相同,其均由依次连接的第三卷积层和第四卷积层组成,第一残差块中的第三卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,1中的所有特征图,第一残差块中的第三卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块中的第三卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块中的第三卷积层的输出端针对YHR,1输出64幅宽度为且高度为的特征图,将针对YHR,1输出的所有特征图构成的集合记为第一残差块中的第四卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和中的所有特征图,第一残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第一残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为将中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对输出的所有特征图,这些特征图构成的集合即为将中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对输出的所有特征图,这些特征图构成的集合即为将YHR,1中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对YHR,1输出的所有特征图,这些特征图构成的集合即为YHR,2。In this embodiment, in step 2, the structures of the first residual block, the second residual block, the third residual block and the fourth residual block are the same, and they are all composed of the third convolutional layer and the fourth residual block connected in sequence. It consists of four convolutional layers, and the input of the third convolutional layer in the first residual block receives three inputs in parallel, which are All feature maps in , All feature maps in and all feature maps in Y HR,1 , the output of the third convolutional layer in the first residual block for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the third convolutional layer in the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the third convolutional layer in the first residual block is for Y HR, 1 outputs 64 widths of and the height is The feature map of , the set of all feature maps output for Y HR,1 is recorded as The input of the fourth convolutional layer in the first residual block receives three inputs in parallel, which are All feature maps in , All feature maps in and All feature maps in , the output of the fourth convolutional layer in the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the fourth convolutional layer in the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the fourth convolutional layer in the first residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as Will All feature maps in All feature maps in the All output feature maps, the set of these feature maps is Will All feature maps in All feature maps in the All output feature maps, the set of these feature maps is Combine all feature maps in Y HR,1 with All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first residual block for Y HR,1 , and the set formed by these feature maps is Y HR,2 .
第二残差块中的第三卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和YHR,2中的所有特征图,第二残差块中的第三卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块中的第三卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块中的第三卷积层的输出端针对YHR,2输出64幅宽度为且高度为的特征图,将针对YHR,2输出的所有特征图构成的集合记为第二残差块中的第四卷积层的输入端并行接收三个输入,分别为中的所有特征图、中的所有特征图和中的所有特征图,第二残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为第二残差块中的第四卷积层的输出端针对输出64幅宽度为且高度为的特征图,将针对输出的所有特征图构成的集合记为将中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对输出的所有特征图,这些特征图构成的集合即为将中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对输出的所有特征图,这些特征图构成的集合即为将YHR,2中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对YHR,2输出的所有特征图,这些特征图构成的集合即为YHR,3。The input of the third convolutional layer in the second residual block receives three inputs in parallel, which are All feature maps in , All feature maps in and all feature maps in Y HR,2 , the output of the third convolutional layer in the second residual block for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the third convolutional layer in the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the third convolutional layer in the second residual block is for Y HR, 2 outputs 64 widths of and the height is The feature map of , the set of all feature maps output for Y HR,2 is recorded as The input of the fourth convolutional layer in the second residual block receives three inputs in parallel, which are All feature maps in , All feature maps in and All feature maps in , the output of the fourth convolutional layer in the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the fourth convolutional layer in the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as The output of the fourth convolutional layer in the second residual block is for The output 64 width is and the height is The feature map of , will be for The set of all output feature maps is denoted as Will All feature maps in All feature maps in the All output feature maps, the set of these feature maps is Will All feature maps in All feature maps in the All output feature maps, the set of these feature maps is Combine all feature maps in Y HR,2 with All feature maps in are added element by element, and all the obtained feature maps are used as the output of the second residual block for all feature maps output by Y HR,2 , and the set formed by these feature maps is Y HR,3 .
第三残差块中的第三卷积层的输入端接收FEn,3中的所有特征图,第三残差块中的第三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三残差块中的第四卷积层的输入端接收中的所有特征图,第三残差块中的第四卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为将FEn,3中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第三残差块的输出端输出的所有特征图,这些特征图构成的集合即为FDec,1。The input terminal of the third convolutional layer in the third residual block receives all the feature maps in F En,3 , and the output terminal of the third convolutional layer in the third residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as The input of the fourth convolutional layer in the third residual block receives All feature maps in and the height is The feature map of , denote the set of all output feature maps as Combine all feature maps in F En,3 with All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the third residual block, and the set formed by these feature maps is F Dec,1 .
第四残差块中的第三卷积层的输入端接收FDec,1中的所有特征图,第四残差块中的第三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第四残差块中的第四卷积层的输入端接收中的所有特征图,第四残差块中的第四卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为将FDec,1中的所有特征图与中的所有特征图进行逐元素相加,将得到的所有特征图作为第四残差块的输出端输出的所有特征图,这些特征图构成的集合即为FDec,2。The input terminal of the third convolutional layer in the fourth residual block receives all the feature maps in F Dec,1 , and the output terminal of the third convolutional layer in the fourth residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as The input of the fourth convolutional layer in the fourth residual block receives All feature maps in , the output of the fourth convolutional layer in the fourth residual block outputs 64 width and the height is The feature map of , denote the set of all output feature maps as Combine all feature maps in F Dec,1 with All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the fourth residual block, and the set formed by these feature maps is F Dec,2 .
上述,第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层和第四卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64,第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层采用的激活函数为“ReLU”、第四卷积层不采用激活函数。As mentioned above, the size of the convolution kernels of the third convolution layer and the fourth convolution layer in the first residual block, the second residual block, the third residual block and the fourth residual block are all 3×3 , the convolution step size is 1, the number of input channels is 64, the number of output channels is 64, the first residual block, the second residual block, the third residual block and the fourth residual block The activation function used in the third convolutional layer is "ReLU", and the fourth convolutional layer does not use an activation function.
在本实施例中,步骤2中,如图3a、图3b、图3c和图3d所示,第一增强残差块、第二增强残差块和第三增强残差块的结构相同,其均由依次连接的第一空间特征变换层、第一空间角度卷积层、第二空间特征变换层、第二空间角度卷积层和通道注意力层组成,第一空间特征变换层和第二空间特征变换层的结构相同,其均由并行的第十卷积层和第十一卷积层组成,第一空间角度卷积层和第二空间角度卷积层的结构相同,其均由依次连接的第十二卷积层和第十三卷积层组成,通道注意力层由依次连接的全局均值池化层、第十四卷积层和第十五卷积层组成。In this embodiment, in step 2, as shown in Fig. 3a, Fig. 3b, Fig. 3c, and Fig. 3d, the first enhanced residual block, the second enhanced residual block and the third enhanced residual block have the same structure. They are all composed of the first spatial feature transformation layer, the first spatial angle convolution layer, the second spatial feature transformation layer, the second spatial angle convolution layer and the channel attention layer, which are connected in sequence. The structure of the spatial feature transformation layer is the same, which consists of the tenth convolution layer and the eleventh convolution layer in parallel. The twelfth convolutional layer and the thirteenth convolutional layer are connected, and the channel attention layer is composed of the global mean pooling layer, the fourteenth convolutional layer and the fifteenth convolutional layer connected in sequence.
第一增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第一增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第一增强残差块中的第一空间特征变换层的输入端接收FLR中的所有特征图,将FLR中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolutional layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the first spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The input end of the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in the FLR , and compares all the feature maps in the FLR with All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the first enhanced residual block, and the set formed by these feature maps is denoted as
第一增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第一增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作(重组操作是光场图像的常规处理手段,重组操作仅改变特征图中每个特征值的排列次序,不改变特征值的大小),第一增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第一增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第一增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receives All feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in the image are reorganized from the spatial dimension to the angle dimension (the reorganization operation is a conventional processing method for light field images. The reorganization operation only changes the arrangement order of each eigenvalue in the feature map, and does not change the size of the eigenvalues) , the input of the thirteenth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receives The result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolutional layer in the first enhanced residual block, Denote the set of these feature maps as
第一增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第一增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第一增强残差块中的第二空间特征变换层的输入端接收中的所有特征图,将中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolutional layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the second spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The input of the second spatial feature transform layer in the first enhanced residual block receives All feature maps in , will All feature maps in All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in the
第一增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第一增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第一增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第一增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第一增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the first enhanced residual block receives For all feature maps in the first enhanced residual block, the output of the twelfth convolutional layer of the second spatial angle convolutional layer in the first enhanced residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the first enhanced residual block receives As a result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the first enhanced residual block, Denote the set of these feature maps as
第一增强残差块中的通道注意力层中的全局均值池化层的输入端接收中的所有特征图,第一增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FGAP,1,FGAP,1中的每幅特征图中的所有特征值相同(全局均值池化层是独立对输入端接收的每幅特征图计算全局均值,进而可将一幅特征图转换为单个特征值,然后将得到的特征值进行复制以恢复空间尺寸,即将单个特征值复制倍,得到宽度为且高度为的特征图);第一增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,1中的所有特征图,第一增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDS,1;第一增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,1中的所有特征图,第一增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FUS,1;将FUS,1中的所有特征图与中的所有特征图进行逐元素相乘,将得到的所有特征图作为第一增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,1。The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives All feature maps in , the output of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as F GAP,1 , and all feature values in each feature map in F GAP,1 are the same (the global mean pooling layer is an independent process for each input received by the input. The global mean is calculated from the feature maps, and then a feature map can be converted into a single feature value, and then the obtained feature value is copied to restore the spatial size, that is, the single feature value is copied times, resulting in a width of and the height is The input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives all the feature maps in F GAP,1 , the channel attention in the first enhanced residual block The output of the fourteenth convolutional layer in the layer outputs 4 widths of and the height is The feature map of the output, the set of all output feature maps is denoted as F DS,1 ; the input end of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives F DS,1 in For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 images with a width of and the height is The feature map of the Perform element-wise multiplication of all feature maps in , and take all the feature maps obtained as all feature maps output by the output of the channel attention layer in the first enhanced residual block, and denote the set of these feature maps as F CA ,1 .
将FCA,1中的所有特征图与FLR中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,1。Add all feature maps in F CA,1 and all feature maps in F LR element by element, and use all the feature maps obtained as all feature maps output by the output of the first enhanced residual block. These feature maps constitute The set of is F En,1 .
第二增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第二增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第二增强残差块中的第一空间特征变换层的接收端接收FEn,1中的所有特征图,将FEn,1中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input terminal of the tenth convolutional layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the first spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The receiving end of the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F En, 1 , and compares all the feature maps in F En,1 with All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in the
第二增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第二增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第二增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第二增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第二增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the second enhanced residual block receives For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the second enhanced residual block receives The result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in the reorganization operation from the angle dimension to the space dimension are reorganized, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as
第二增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第二增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第二增强残差块中的第二空间特征变换层的接收端接收中的所有特征图,将中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolutional layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the second spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The receiving end of the second spatial feature transform layer in the second enhanced residual block receives All feature maps in , will All feature maps in All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in the
第二增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第二增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第二增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第二增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第二增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the second enhanced residual block receives For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the second enhanced residual block receives The result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as
第二增强残差块中的通道注意力层中的全局均值池化层的输入端接收中的所有特征图,第二增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FGAP,2,FGAP,2中的每幅特征图中的所有特征值相同;第二增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,2中的所有特征图,第二增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDS,2;第二增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,2中的所有特征图,第二增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FUS,2;将FUS,2中的所有特征图与中的所有特征图进行逐元素相乘,将得到的所有特征图作为第二增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,2。The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives All feature maps in , the output of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as F GAP,2 , all feature values in each feature map in F GAP,2 are the same; in the channel attention layer in the second enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,2 , and the output of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 4 widths of and the height is The feature map of the output, the set of all output feature maps is denoted as F DS,2 ; the input end of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives F DS,2 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as F US,2 ; all feature maps in F US,2 are combined with Perform element-by-element multiplication of all feature maps in , and use all the feature maps obtained as all feature maps output by the output of the channel attention layer in the second enhanced residual block, and denote the set formed by these feature maps as F CA , 2 .
将FCA,2中的所有特征图与FEn,1中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,2。Add all feature maps in F CA,2 and all feature maps in F En,1 element-wise, and use all the resulting feature maps as all feature maps output by the output of the second enhanced residual block. These features The set of graphs is F En,2 .
第三增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三增强残差块中的第一空间特征变换层的接收端接收FEn,2中的所有特征图,将FEn,2中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolutional layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The receiver of the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F En, 2 , and compares all the feature maps in F En,2 with All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as
第三增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第三增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第三增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第三增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第三增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the third enhanced residual block receives All feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the third enhanced residual block receives As a result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in the recombination operation from the angle dimension to the space dimension are performed, and all the feature maps obtained after the recombination operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as
第三增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为第三增强残差块中的第二空间特征变换层的接收端接收中的所有特征图,将中的所有特征图与中的所有特征图进行逐元素相乘,再将相乘结果与中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input end of the tenth convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as The input of the eleventh convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is and the height is The feature map of , denote the set of all output feature maps as The receiving end of the second spatial feature transformation layer in the third enhanced residual block receives All feature maps in , will All feature maps in All feature maps in are multiplied element-wise, and the multiplication result is All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the second spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as
第三增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收中的所有特征图,第三增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从空间维转换到角度维的重组操作,第三增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收中的所有特征图的重组操作结果,第三增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为对中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第三增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为 The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the third enhanced residual block receives For all feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 images with a width of and the height is The feature map of , denote the set of all output feature maps as right All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the third enhanced residual block receives As a result of the reorganization operation of all feature maps in and the height is The feature map of , denote the set of all output feature maps as right All feature maps in the recombination operation from the angle dimension to the space dimension are performed, and all the feature maps obtained after the recombination operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as
第三增强残差块中的通道注意力层中的全局均值池化层的输入端接收中的所有特征图,第三增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FGAP,3,FGAP,3中的每幅特征图中的所有特征值相同;第三增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,3中的所有特征图,第三增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FDS,3;第三增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,3中的所有特征图,第三增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FUS,3;将FUS,3中的所有特征图与中的所有特征图进行逐元素相乘,将得到的所有特征图作为第三增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,3。The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives All feature maps in , the output of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as F GAP,3 , all feature values in each feature map in F GAP,3 are the same; in the channel attention layer in the third enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,3 , and the output of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 4 widths of and the height is The feature map of the output, the set of all output feature maps is denoted as F DS,3 ; the input end of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives F DS,3 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 images with a width of and the height is The feature map of the All feature maps in are multiplied element by element, and all the feature maps obtained are used as all feature maps output by the output of the channel attention layer in the third enhanced residual block, and the set formed by these feature maps is denoted as F CA ,3 .
将FCA,3中的所有特征图与FEn,2中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,3。Add all feature maps in F CA,3 and all feature maps in F En,2 element-wise, and use all the resulting feature maps as all feature maps output by the output of the third enhanced residual block. These features The set of graphs is F En,3 .
上述,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十卷积层和第十一卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、均不采用激活函数,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十二卷积层和第十三卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、采用的激活函数均为“ReLU”,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十四卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为4、采用的激活函数为“ReLU”,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十五卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为4、输出通道数为64、采用的激活函数为“Sigmoid”。As mentioned above, the size of the convolution kernels of the tenth convolution layer and the eleventh convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block are all 3×3, volume The product step size is 1, the number of input channels is 64, the number of output channels is 64, and no activation function is used. The first enhanced residual block, the second enhanced residual block and the third enhanced residual block are each The size of the convolution kernel of the twelfth convolutional layer and the thirteenth convolutional layer is both 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 64, and the activation The functions are all "ReLU", the size of the convolution kernel of the fourteenth convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block is 1 × 1, the convolution step The length is 1, the number of input channels is 64, the number of output channels is 4, the activation function used is "ReLU", the tenth of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block. The size of the convolution kernel of the five convolutional layers is 1 × 1, the convolution stride is 1, the number of input channels is 4, the number of output channels is 64, and the activation function used is "Sigmoid".
为进一步说明本发明方法的可行性和有效性,对本发明方法进行实验。In order to further illustrate the feasibility and effectiveness of the method of the present invention, experiments were carried out on the method of the present invention.
本发明方法采用PyTorch深度学习框架进行实现。训练和测试所采用的光场图像均来自现有的光场图像数据库,其包括真实世界场景和合成场景,这些光场图像数据库可在网上自由下载。为保证测试的可靠性与鲁棒性,随机选择200幅光场图像构成训练图像集合,另外选择70幅光场图像构成测试图像集合,其中,训练图像集合中的光场图像和测试图像集合中的光场图像互不交叉。训练图像集合和测试图像集合所用到的光场图像数据库的基本信息如表1所示,其中EPFL[1]、INRIA[2]、STFLytro[6]和Kalantari et al.[7]这4个光场图像数据库是采用Lytro光场相机拍摄得到的,因此所获取的光场图像属于窄基线光场数据;STFGantry[5]光场图像数据库是采用固定在龙门架上的传统相机进行移动拍摄得到,因此所获取的光场图像具有更大的基线范围,属于宽基线光场数据;HCI new[3]和HCIold[4]光场图像数据库中的光场图像属于人工合成的光场图像,也属于宽基线光场数据。The method of the present invention is implemented by using the PyTorch deep learning framework. The light field images used for training and testing come from existing light field image databases, including real-world scenes and synthetic scenes, which can be freely downloaded online. In order to ensure the reliability and robustness of the test, 200 light field images were randomly selected to form the training image set, and 70 light field images were selected to form the test image set. Among them, the light field images in the training image set and the test image set were The light field images do not cross each other. The basic information of the light field image database used in the training image set and the test image set is shown in Table 1, among which EPFL[1], INRIA[2], STFLytro[6] and Kalantari et al.[7] The field image database is obtained by using the Lytro light field camera, so the acquired light field image belongs to the narrow baseline light field data; the STFGantry[5] light field image database is obtained by moving the traditional camera fixed on the gantry, Therefore, the acquired light field image has a larger baseline range and belongs to the wide baseline light field data; the light field images in the HCI new[3] and HCIold[4] light field image databases belong to the artificially synthesized light field images and also belong to Wide baseline light field data.
表1训练图像集合和测试图像集合所用到的光场图像数据库的基本信息Table 1 Basic information of the light field image database used in the training image set and the test image set
训练图像集合和测试图像集合所用到的光场图像数据库对应的参考文献信息(或下载网址)如下:The reference information (or download URL) corresponding to the light field image database used in the training image set and the test image set is as follows:
[1]Rerabek M,Ebrahimi T.New light Field Image Dataset[C]//2016 EighthInternational Conference on Quality of Multimedia Experience(QoMEX).2016.(新的光场图像数据集[C]//第八届多媒体体验质量国际会议,2016.)[1]Rerabek M, Ebrahimi T.New light Field Image Dataset[C]//2016 EighthInternational Conference on Quality of Multimedia Experience(QoMEX).2016.(New Light Field Image Dataset[C]//The Eighth Multimedia Experience International Conference on Quality of Experience, 2016.)
[2]Pendu M L,Jiang X,Guillemot C.Light Field Inpainting Propagationvia Low Rank Matrix Completion[J].IEEE Transactions on Image Processing,2018,27(4):1981-1993.(通过低秩矩阵完成进行光场修复传播,IEEE图像处理汇刊,2018,27(4):1981-1993.)[2] Pendu M L, Jiang X, Guillemot C. Light Field Inpainting Propagationvia Low Rank Matrix Completion [J]. IEEE Transactions on Image Processing, 2018, 27(4): 1981-1993. (Light Field Inpainting Propagation via Low Rank Matrix Completion [J]. Repair Propagation, IEEE Transactions on Image Processing, 2018, 27(4):1981-1993.)
[3]Honauer K,Johannsen O,Kondermann D,et al.A Dataset and EvaluationMethodology for Depth Estimation on 4D Light Fields[C]//Asian Conference onComputer Vision,2016.(用于4D光场深度估计的一个数据集和评估方法[C]//亚洲计算机视觉会议,2016.)[3]Honauer K,Johannsen O,Kondermann D,et al.A Dataset and EvaluationMethodology for Depth Estimation on 4D Light Fields[C]//Asian Conference onComputer Vision,2016.(A dataset for 4D light field depth estimation and Evaluation Methods [C] // Asian Conference on Computer Vision, 2016.)
[4]Wanner S,Meister S,B Goldluecke.Datasets and Benchmarks forDensely Sampled 4D Light Fields[C]//International Symposium on VisionModeling and Visualization,2013.(密集采样4D光场的数据集和基准[C]//视觉建模和可视化国际研讨会,2013.)[4]Wanner S, Meister S, B Goldluecke.Datasets and Benchmarks for Densely Sampled 4D Light Fields[C]//International Symposium on VisionModeling and Visualization,2013.(Datasets and Benchmarks for Densely Sampled 4D Light Fields[C]// International Symposium on Visual Modeling and Visualization, 2013.)
[5]Vaish V,Adams A.The(New)Stanford Light Field Archive,ComputerGraphics Laboratory,Stanford University,2008.((新)斯坦福光场档案,计算机图形实验室,斯坦福大学,2008.)[5] Vaish V, Adams A. The (New) Stanford Light Field Archive, ComputerGraphics Laboratory, Stanford University, 2008. ((New) Stanford Light Field Archive, Computer Graphics Laboratory, Stanford University, 2008.)
[6]Raj A S,Lowney M,Shah R,Wetzstein G.Stanford Lytro Light FieldArchive,Available:http://lightfields.stanford.edu/index.html.(斯坦福Lytro光场档案,可得网址:http://lightfields.stanford.edu/index.html.)[6] Raj A S, Lowney M, Shah R, Wetzstein G. Stanford Lytro Light FieldArchive, Available: http://lightfields.stanford.edu/index.html. (Stanford Lytro Light Field Archive, available at http:/ /lightfields.stanford.edu/index.html.)
[7]Kalantari N K,Wang T C,Ramamoorthi R.Learning-Based View SynthesisFor Light Field Cameras[J].ACM Transactions on Graphics,2016,35(6):1-10.(用于光场相机的基于学习的视图合成[J].ACM图形汇刊,2016,35(6):1-10.)[7]Kalantari N K, Wang T C, Ramamoorthi R.Learning-Based View SynthesisFor Light Field Cameras[J].ACM Transactions on Graphics,2016,35(6):1-10.(Learning-Based View SynthesisFor Light Field Cameras View Synthesis [J]. ACM Graphic Transactions, 2016, 35(6): 1-10.)
将训练图像集合和测试图像集合中的光场图像分别重组为子孔径图像阵列;考虑到光场相机存在渐晕效应(表现为边界子孔径图像的视觉质量低),因此将用于训练和测试的光场图像的角度分辨率剪裁为9×9,即只取中心高质量的9×9视图;再从得到的角度分辨率为9×9的光场图像中取出中心的5×5视图以构成角度分辨率为5×5的光场图像,并利用双三次插值方法来对其进行空间分辨率下采样,下采样尺度为8,即将光场图像的空间分辨率降为原始光场图像的1/8,进而得到低空间分辨率光场图像;将原始的角度分辨率为5×5的光场图像作为参考高空间分辨率光场图像(即标签图像);之后从最初的9×9视图中(不包括中心的5×5视图)选取一幅子孔径图像,并保持其分辨率不变,作为2D高分辨率图像。因此,最终的训练集包括200幅角度分辨率为5×5的低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列,对应的200幅2D高分辨率图像的Y通道图像,以及对应的200幅参考高空间分辨率光场图像的Y通道图像;最终的测试集包括70幅角度分辨率为5×5的低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列,对应的70幅2D高分辨率图像的Y通道图像,以及对应的70幅参考高空间分辨率光场图像,其中,70幅参考高空间分辨率光场图像不涉及网络的推断或测试,只用于后续的主观视觉比较和客观质量评价。Reorganize the light field images in the training image set and the test image set into sub-aperture image arrays respectively; considering the vignetting effect of the light field camera (which manifests as the low visual quality of the boundary sub-aperture images), it will be used for training and testing. The angular resolution of the light field image is cropped to 9 × 9, that is, only the high-quality 9 × 9 view in the center is taken; A light field image with an angular resolution of 5 × 5 is formed, and the bicubic interpolation method is used to downsample its spatial resolution, and the downsampling scale is 8, that is, the spatial resolution of the light field image is reduced to that of the original light field image. 1/8, and then obtain a low spatial resolution light field image; use the original light field image with an angular resolution of 5 × 5 as the reference high spatial resolution light field image (ie, the label image); then from the initial 9 × 9 A sub-aperture image is selected from the view (excluding the central 5 × 5 view), and its resolution is maintained as a 2D high-resolution image. Thus, the final training set consists of 200 Y-channel images of low spatial resolution light-field images with an angular resolution of 5 × 5, a reconstituted subaperture image array, the corresponding Y-channel images of 200 2D high-resolution images, and Corresponding 200 Y-channel images of reference high-spatial-resolution light field images; the final test set includes 70 sub-aperture image arrays recombined from Y-channel images of low-spatial-resolution light field images with an angular resolution of 5 × 5, The Y channel images of the corresponding 70 2D high-resolution images, and the corresponding 70 reference high-spatial-resolution light field images, of which the 70 reference high-spatial-resolution light field images do not involve network inference or testing, only use For subsequent subjective visual comparison and objective quality evaluation.
在训练所构建的空间超分辨率网络时,所有卷积核的参数采用MSRA初始化器进行初始化;损失函数选用像素域L1范数损失和梯度损失的组合;利用ADAM优化器训练网络;首先以10-4为学习率来训练空间超分辨率网络中的编码器和解码器两部分到一定程度收敛,之后再将学习率设置为10-4,来训练整个空间超分辨率网络,在训练完25个epochs后学习率以比例因子0.5进行衰减。When training the constructed spatial super-resolution network, the parameters of all convolution kernels are initialized with MSRA initializer; the loss function is a combination of pixel domain L1 norm loss and gradient loss; ADAM optimizer is used to train the network; -4 is the learning rate to train the encoder and decoder in the spatial super-resolution network to a certain degree of convergence, and then set the learning rate to 10 -4 to train the entire spatial super-resolution network. After training 25 The learning rate is decayed by a scaling factor of 0.5 after epochs.
为了说明本发明方法的性能,将本发明方法与现有的双三次插值方法、现有的六种图像超分辨率重建方法进行对比,分别为Haris等人提出的基于深度反投影网络的方法、Lai等人提出的基于深度拉普拉斯金字塔网络的方法、Yeung等人提出的基于空间-角度可分离卷积的方法、Wang等人提出的基于空间-角度交互网络的方法、Jin等人提出的基于两阶段网络的方法以及Boominathan等人提出的基于混合输入的方法,其中,Haris等人的方法和Lai等人的方法属于2D图像超分辨率重建方法(其独立地应用于光场图像的每幅子孔径图像),Yeung等人的方法、Wang等人的方法和Jin等人的方法属于普通的光场图像空间超分辨率重建方法,Boominathan等人的方法属于使用混合输入的光场图像空间超分辨率重建方法。In order to illustrate the performance of the method of the present invention, the method of the present invention is compared with the existing bicubic interpolation method and the existing six kinds of image super-resolution reconstruction methods, which are the method based on deep back projection network proposed by Haris et al. The method based on deep Laplacian pyramid network proposed by Lai et al., the method based on space-angle separable convolution proposed by Yeung et al., the method based on space-angle interaction network proposed by Wang et al., and the method proposed by Jin et al. The two-stage network-based method and the mixed-input-based method proposed by Boominathan et al., where Haris et al. and Lai et al. belong to 2D image super-resolution reconstruction methods (which are independently applied to the Each sub-aperture image), Yeung et al., Wang et al., and Jin et al. belong to general light-field image spatial super-resolution reconstruction methods, and Boominathan et al. Spatial super-resolution reconstruction methods.
在此,使用的客观质量评价指标包括PSNR(Peak Signal-to-Noise Ratio,峰值信噪比)、SSIM(Structural Similarity Index,结构相似性指数),以及一种先进的光场图像客观质量评价指标(参见Min X,Zhou J,Zhai G,et al.A Metric for Light FieldReconstruction,Compression,and Display Quality Evaluation[J].IEEETransactions on Image Processing,2020,29:3790-3804.(一个用于光场重建、压缩和显示质量评估的度量,IEEE图像处理汇刊,2020,29:3790-3804.)),其中,PSNR是从像素重建误差角度来评价超分辨率重建图像的客观质量,其值越高表示图像质量越好;SSIM是从视觉感知的角度来评价超分辨率重建图像的客观质量,其值在0~1之间,值越高表示图像质量越好;光场图像客观质量评价指标则是通过联合度量光场图像的空间质量(纹理和细节)和角度质量(视差结构)来有效评价超分辨率重建图像的客观质量,其值越高表示图像质量越好。Here, the objective quality evaluation indexes used include PSNR (Peak Signal-to-Noise Ratio, peak signal-to-noise ratio), SSIM (Structural Similarity Index, structural similarity index), and an advanced light field image objective quality evaluation index (See Min X, Zhou J, Zhai G, et al. A Metric for Light Field Reconstruction, Compression, and Display Quality Evaluation [J]. IEEE Transactions on Image Processing, 2020, 29:3790-3804. (A for Light Field Reconstruction , Metrics for Compression and Display Quality Evaluation, IEEE Transactions on Image Processing, 2020, 29: 3790-3804.)), where PSNR is the objective quality of super-resolution reconstructed images from the perspective of pixel reconstruction error, and the higher the value is Indicates that the image quality is better; SSIM evaluates the objective quality of the super-resolution reconstructed image from the perspective of visual perception, and its value is between 0 and 1. The higher the value, the better the image quality; the objective quality evaluation index of the light field image is It is to effectively evaluate the objective quality of super-resolution reconstructed images by jointly measuring the spatial quality (texture and details) and angular quality (parallax structure) of light field images. The higher the value, the better the image quality.
表2给出了采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在PSNR(dB)指标上的对比,表3给出了采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在SSIM指标上的对比,表4给出了采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在光场图像客观质量评价指标上的对比。从表2、表3和表4所列出的客观数据可以看出,相比于现有的光场图像空间超分辨率重建方法(包括2D图像超分辨率重建方法),本发明方法在所使用的三个客观质量评价指标上均取得了更高的质量分数,且明显高于所有对比方法,这表明本发明方法可以有效重建光场图像的纹理和细节信息,同时恢复较好的视差结构;特别地,对于具有不同基线范围和场景内容的光场图像数据库而言,本发明方法均取得了最好的超分辨率重建效果,这表明本发明方法可以很好地处理窄基线和宽基线光场数据,并且对场景内容具有很好的鲁棒性。Table 2 shows the comparison of the PSNR (dB) index between the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method. The comparison of the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method on the SSIM index, Table 4 provides the method of the present invention and the existing bicubic interpolation method and the existing light field. Comparison of image space super-resolution reconstruction methods on objective quality evaluation indicators of light field images. It can be seen from the objective data listed in Table 2, Table 3 and Table 4 that, compared with the existing light field image spatial super-resolution reconstruction methods (including 2D image super-resolution reconstruction methods), the method of the present invention has The three objective quality evaluation indicators used have achieved higher quality scores, which are significantly higher than all comparison methods, which indicates that the method of the present invention can effectively reconstruct the texture and detail information of the light field image, and at the same time restore a better parallax structure. ; In particular, for light field image databases with different baseline ranges and scene contents, the method of the present invention has achieved the best super-resolution reconstruction effect, which shows that the method of the present invention can handle narrow baselines and wide baselines well light field data and is robust to scene content.
表2采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在PSNR(dB)指标上的对比Table 2. Comparison of PSNR (dB) index between the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method
表3采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在SSIM指标上的对比Table 3. Comparison of the SSIM index between the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method
表4采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在光场图像客观质量评价指标上的对比Table 4 Comparison of the objective quality evaluation index of light field images using the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method
图5a给出了采用双三次插值方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5b给出了采用Haris等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5c给出了采用Lai等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5d给出了采用Yeung等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5e给出了采用Wang等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5f给出了采用Jin等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5g给出了采用Boominathan等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5h给出了采用本发明方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5i给出了测试的EPFL光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示。Figure 5a shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the bicubic interpolation method. Show; Figure 5b shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Haris et al. Aperture image to display; Figure 5c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Lai et al. The center coordinates are taken here. Figure 5d shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Yeung et al., here The sub-aperture image at the center coordinate is taken to display; Figure 5e shows the reconstructed high spatial resolution light field obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Wang et al. The image, here is the sub-aperture image under the center coordinates to display; Figure 5f shows the reconstructed high spatial resolution obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Jin et al. The sub-aperture image at the center coordinate is taken here for display; Figure 5g shows the reconstruction obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Boominathan et al. The high spatial resolution light field image is shown here by taking the sub-aperture image under the center coordinates; Figure 5h shows the low spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of the present invention. The high spatial resolution light field image is reconstructed, and the sub-aperture image under the center coordinate is taken here to display; Figure 5i shows the label high spatial resolution light corresponding to the low spatial resolution light field image in the tested EPFL light field image database. Field image, here the sub-aperture image under the center coordinates is displayed.
图6a给出了采用双三次插值方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6b给出了采用Haris等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6c给出了采用Lai等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6d给出了采用Yeung等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6e给出了采用Wang等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6f给出了采用Jin等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6g给出了采用Boominathan等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6h给出了采用本发明方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6i给出了测试的STFLytro光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示。Figure 6a shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the bicubic interpolation method. Show; Figure 6b shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Haris et al. Aperture image to display; Figure 6c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Lai et al. Figure 6d shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Yeung et al., here The sub-aperture image in the center coordinate is taken to display; Figure 6e shows the reconstructed high spatial resolution light field obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Wang et al. The image, here is the sub-aperture image under the center coordinates to display; Figure 6f shows the reconstructed high spatial resolution obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Jin et al. The low-spatial-resolution light-field image in the tested STFLytro light-field image database is obtained by processing the low-spatial-resolution light-field image in the STFLytro light-field image database using the method of Boominathan et al. The high spatial resolution light field image is shown here by taking the sub-aperture image under the center coordinates; Fig. 6h shows the image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of the present invention. The high spatial resolution light field image is reconstructed, and the sub-aperture image at the center coordinate is taken here to display; Figure 6i shows the label high spatial resolution light corresponding to the low spatial resolution light field image in the tested STFLytro light field image database. Field image, here the sub-aperture image under the center coordinates is displayed.
分别将图5a至图5h与图5i进行对比,以及将图6a至图6h与图6i进行对比,可以清楚地看到,利用现有的光场图像空间超分辨率重建方法,包括2D图像超分辨率重建方法,所重建的高空间分辨率光场图像无法恢复图像的纹理和细节信息,如图5a至图5f中的左下方矩形框放大区域,以及图6a至图6f中的右下方矩形框放大区域所示;使用混合输入的光场图像空间超分辨率重建方法取得了相对更好地结果,但仍然包含一些模糊伪像,如图5g中的左下方矩形框放大区域和与图6g中的右下方矩形框放大区域所示;相比之下,本发明方法所重建的高空间分辨率光场图像具有清晰的纹理和丰富的细节,且在主观视觉感知上接近标签高空间分辨率光场图像(即图5i和图6i),这表明本发明方法可有效恢复光场图像的纹理信息。此外,通过高质量地重建每幅子孔径图像,本发明方法可很好地保证最终重建的高空间分辨率光场图像的视差结构。Comparing Fig. 5a to Fig. 5h with Fig. 5i, and Fig. 6a to Fig. 6h and Fig. 6i, respectively, it can be clearly seen that using the existing light-field image spatial super-resolution reconstruction methods, including 2D image super-resolution reconstruction methods. Resolution reconstruction method, the reconstructed high spatial resolution light field image cannot restore the texture and detail information of the image, such as the enlarged area of the lower left rectangular box in Figure 5a to Figure 5f, and the lower right rectangle in Figure 6a to Figure 6f. The enlarged area of the box is shown; the spatial super-resolution reconstruction method of the light field image using the mixed input achieved relatively better results, but still contained some blurring artifacts, as shown in the lower left rectangular box enlargement area in Fig. 5g and the same as Fig. 6g. In contrast, the high spatial resolution light field image reconstructed by the method of the present invention has clear texture and rich details, and is close to the label high spatial resolution in subjective visual perception Light field images (ie, Fig. 5i and Fig. 6i ), which show that the method of the present invention can effectively restore the texture information of light field images. In addition, by reconstructing each sub-aperture image with high quality, the method of the present invention can well ensure the parallax structure of the final reconstructed high spatial resolution light field image.
本发明方法的创新性主要体现如下:一是通过异构式成像以在捕获高维光场数据的同时获取丰富的2D空间信息,即同时捕获一幅光场图像和一幅2D高分辨率图像,进而通过利用2D高分辨率图像的信息来有效提高光场图像的空间分辨率,并恢复相应的纹理和细节;二是为建立并探索光场图像与2D高分辨率图像之间的联系,本发明方法分别构造了孔径级特征配准模块和光场特征增强模块,前者可将2D高分辨率信息与4D光场图像信息进行准确配准,后者则在此基础上利用配准得到的高分辨率特征信息来一致性地增强光场特征中的视觉信息,以得到增强后的高分辨率光场特征;三是采用灵活的金字塔重建方式,即以从粗到细的重建策略来渐进式地提高光场图像的空间分辨率并恢复准确的视差结构,进而可在一次前向推断中重建多尺度超分辨率结果。此外,为降低金字塔网络的参数量和训练负担,在每个金字塔水平进行权重共享。The innovation of the method of the present invention is mainly reflected as follows: First, through heterogeneous imaging, rich 2D spatial information is acquired while capturing high-dimensional light field data, that is, a light field image and a 2D high-resolution image are simultaneously captured, and then By using the information of the 2D high-resolution image to effectively improve the spatial resolution of the light field image, and restore the corresponding texture and details; the second is to establish and explore the connection between the light field image and the 2D high-resolution image, the present invention The method constructs an aperture-level feature registration module and a light field feature enhancement module respectively. The former can accurately register 2D high-resolution information with 4D light field image information, and the latter uses the high-resolution obtained by registration on this basis. feature information to uniformly enhance the visual information in the light field features to obtain enhanced high-resolution light field features; the third is to use a flexible pyramid reconstruction method, that is, to gradually improve the reconstruction strategy from coarse to fine The spatial resolution of light field images and recover accurate disparity structure, which in turn can reconstruct multi-scale super-resolution results in a single forward inference. In addition, to reduce the amount of parameters and training burden of the pyramid network, weights are shared at each pyramid level.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111405987.1A CN114359041B (en) | 2021-11-24 | 2021-11-24 | A method for spatial super-resolution reconstruction of light field images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111405987.1A CN114359041B (en) | 2021-11-24 | 2021-11-24 | A method for spatial super-resolution reconstruction of light field images |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114359041A true CN114359041A (en) | 2022-04-15 |
CN114359041B CN114359041B (en) | 2024-11-26 |
Family
ID=81096214
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111405987.1A Active CN114359041B (en) | 2021-11-24 | 2021-11-24 | A method for spatial super-resolution reconstruction of light field images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114359041B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116309067A (en) * | 2023-03-21 | 2023-06-23 | 安徽易刚信息技术有限公司 | Light field image space super-resolution method |
CN117114987A (en) * | 2023-07-17 | 2023-11-24 | 重庆理工大学 | Light field image super-resolution reconstruction method based on sub-pixel and gradient guidance |
CN117475088A (en) * | 2023-12-25 | 2024-01-30 | 浙江优众新材料科技有限公司 | Light field reconstruction model training method based on polar plane attention and related equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109741260A (en) * | 2018-12-29 | 2019-05-10 | 天津大学 | An efficient super-resolution method based on deep backprojection network |
US20200402205A1 (en) * | 2019-06-18 | 2020-12-24 | Huawei Technologies Co., Ltd. | Real-time video ultra resolution |
CN112381711A (en) * | 2020-10-27 | 2021-02-19 | 深圳大学 | Light field image reconstruction model training and rapid super-resolution reconstruction method |
CN112669214A (en) * | 2021-01-04 | 2021-04-16 | 东北大学 | Fuzzy image super-resolution reconstruction method based on alternative direction multiplier algorithm |
CN112950475A (en) * | 2021-03-05 | 2021-06-11 | 北京工业大学 | Light field super-resolution reconstruction method based on residual learning and spatial transformation network |
CN113139898A (en) * | 2021-03-24 | 2021-07-20 | 宁波大学 | Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning |
-
2021
- 2021-11-24 CN CN202111405987.1A patent/CN114359041B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109741260A (en) * | 2018-12-29 | 2019-05-10 | 天津大学 | An efficient super-resolution method based on deep backprojection network |
US20200402205A1 (en) * | 2019-06-18 | 2020-12-24 | Huawei Technologies Co., Ltd. | Real-time video ultra resolution |
CN112381711A (en) * | 2020-10-27 | 2021-02-19 | 深圳大学 | Light field image reconstruction model training and rapid super-resolution reconstruction method |
CN112669214A (en) * | 2021-01-04 | 2021-04-16 | 东北大学 | Fuzzy image super-resolution reconstruction method based on alternative direction multiplier algorithm |
CN112950475A (en) * | 2021-03-05 | 2021-06-11 | 北京工业大学 | Light field super-resolution reconstruction method based on residual learning and spatial transformation network |
CN113139898A (en) * | 2021-03-24 | 2021-07-20 | 宁波大学 | Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning |
Non-Patent Citations (1)
Title |
---|
邓武 等: "《融合全局与局部视角的光场超分辨率重建》", 《计算机应用研究》, vol. 36, no. 5, 31 May 2019 (2019-05-31), pages 1549 - 1559 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116309067A (en) * | 2023-03-21 | 2023-06-23 | 安徽易刚信息技术有限公司 | Light field image space super-resolution method |
CN116309067B (en) * | 2023-03-21 | 2023-09-29 | 安徽易刚信息技术有限公司 | Light field image space super-resolution method |
CN117114987A (en) * | 2023-07-17 | 2023-11-24 | 重庆理工大学 | Light field image super-resolution reconstruction method based on sub-pixel and gradient guidance |
CN117475088A (en) * | 2023-12-25 | 2024-01-30 | 浙江优众新材料科技有限公司 | Light field reconstruction model training method based on polar plane attention and related equipment |
CN117475088B (en) * | 2023-12-25 | 2024-03-19 | 浙江优众新材料科技有限公司 | Light field reconstruction model training method based on polar plane attention and related equipment |
Also Published As
Publication number | Publication date |
---|---|
CN114359041B (en) | 2024-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113139898B (en) | Super-resolution reconstruction method of light field image based on frequency domain analysis and deep learning | |
Cai et al. | Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction | |
Meng et al. | High-dimensional dense residual convolutional neural network for light field reconstruction | |
CN110570353B (en) | Densely connected generative adversarial network single image super-resolution reconstruction method | |
CN112819737B (en) | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution | |
An et al. | TR-MISR: Multiimage super-resolution based on feature fusion with transformers | |
Farrugia et al. | Light field super-resolution using a low-rank prior and deep convolutional neural networks | |
Zhao et al. | Pyramid global context network for image dehazing | |
CN114359041B (en) | A method for spatial super-resolution reconstruction of light field images | |
Liu et al. | Multi-angular epipolar geometry based light field angular reconstruction network | |
CN112767253B (en) | Multi-scale feature fusion binocular image super-resolution reconstruction method | |
CN109886871A (en) | Image super-resolution method based on channel attention mechanism and multi-layer feature fusion | |
Shi et al. | Exploiting multi-scale parallel self-attention and local variation via dual-branch transformer-CNN structure for face super-resolution | |
CN109146787B (en) | Real-time reconstruction method of dual-camera spectral imaging system based on interpolation | |
CN106920214A (en) | Spatial target images super resolution ratio reconstruction method | |
CN113744134B (en) | Hyperspectral image super-resolution method based on spectral unmixing convolutional neural network | |
CN112308085B (en) | A Convolutional Neural Network Based Light Field Image Denoising Method | |
CN114841856A (en) | Image super-pixel reconstruction method of dense connection network based on depth residual channel space attention | |
Jin et al. | Light field super-resolution via attention-guided fusion of hybrid lenses | |
Guan et al. | Srdgan: learning the noise prior for super resolution with dual generative adversarial networks | |
CN110880162A (en) | Snapshot spectrum depth combined imaging method and system based on deep learning | |
CN117474764B (en) | A high-resolution reconstruction method for remote sensing images under complex degradation models | |
CN112785502A (en) | Light field image super-resolution method of hybrid camera based on texture migration | |
CN117114987B (en) | Light field image super-resolution reconstruction method based on sub-pixel and gradient guidance | |
CN111754561A (en) | Method and system for depth recovery of light field images based on self-supervised deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |