CN114359041A

CN114359041A - Light field image space super-resolution reconstruction method

Info

Publication number: CN114359041A
Application number: CN202111405987.1A
Authority: CN
Inventors: 陈晔曜; 郁梅; 蒋刚毅
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-04-15
Anticipated expiration: 2041-11-24
Also published as: CN114359041B

Abstract

The invention discloses a light field image spatial super-resolution reconstruction method, which constructs a spatial super-resolution network, including an encoder, an aperture-level feature registration module, a light field feature enhancement module, a decoder, and the like. Extract multi-scale features from sampled low spatial resolution light field images, 2D high resolution images and their blurred images; learn the relationship between 2D high resolution features and low resolution light field features through aperture-level feature registration module to register the 2D high-resolution features under each sub-aperture image and form the registered high-resolution light field features; use the registered high-resolution light field features through the light field feature enhancement module to Enhance the extracted shallow light field features to obtain enhanced high-resolution light field features; use the decoder to reconstruct the enhanced high-resolution light field features into high spatial resolution light field images; the advantage is that high-quality reconstruction can be achieved High spatial resolution light field images and recover texture and detail information.

Description

A method for spatial super-resolution reconstruction of light field images

技术领域technical field

本发明涉及一种图像超分辨率重建技术，尤其是涉及一种光场图像空间超分辨率重建方法。The invention relates to an image super-resolution reconstruction technology, in particular to a light-field image space super-resolution reconstruction method.

背景技术Background technique

与常规的数码相机不同，光场相机可采集场景中光线的强度(即空间信息)和方向信息(即角度信息)，进而更真实地记录真实世界。与之同时，光场相机采集的4维(4-Dimensional，4D)光场图像所蕴含的丰富信息促进了许多应用，如重聚焦、深度估计、虚拟/增强现实等。目前的商业级光场相机采用微透镜阵列来分离通过场景中同一位置点的不同方向的光线，进而在传感器平面同时记录空间信息和角度信息。但是，由于空间和角度维共享的传感器的分辨率有限，采集到的4D光场图像在提供高角度采样(或称高角度分辨率)的同时，它们的空间分辨率不可避免地被降低，因此，提高4D光场图像的空间分辨率成为了光场研究领域里一个亟待解决的重要问题。Different from conventional digital cameras, light field cameras can capture the intensity (ie, spatial information) and direction information (ie, angle information) of light in the scene, thereby recording the real world more realistically. At the same time, the rich information contained in 4-dimensional (4D) light field images captured by light field cameras facilitates many applications, such as refocusing, depth estimation, virtual/augmented reality, and so on. Current commercial-grade light field cameras use a microlens array to separate light from different directions passing through the same point in the scene, thereby simultaneously recording spatial and angular information at the sensor plane. However, due to the limited resolution of the sensor sharing the spatial and angular dimensions, the captured 4D light field images provide high angular sampling (or high angular resolution) while their spatial resolution is inevitably reduced. , improving the spatial resolution of 4D light field images has become an important problem to be solved urgently in the field of light field research.

一般而言，4D光场图像包含多种可相互转换的可视化方法，如基于2维(2-Dimensional，2D)空间信息显示的子孔径图像(Sub-Aperture Image，SAI)阵列、基于2D角度信息显示的微透镜图像(Micro-Lens Image，MLI)阵列，以及联合1维空间信息与1维角度信息显示的极平面图像(Epipolar Plane Image，EPI)等。直观地，提高4D光场图像的空间分辨率即是提高4D光场图像中每幅2D SAI的分辨率。因此，一种直接的做法就是将现有的2D图像超分辨率重建方法，如Haris等人提出的深度反投影网络，Lai等人提出的深度拉普拉斯金字塔网络等，独立地应用于每幅SAI，但是这种做法忽略了4D光场图像嵌入在角度域的信息，并且很难保证超分辨率结果的角度一致性。因此，通过探索4D光场图像的高维结构特性是设计4D光场图像空间超分辨率重建方法的关键。目前针对4D光场图像的空间超分辨率重建方法可大致分为基于优化和基于学习两类。Generally speaking, 4D light field images include a variety of mutually convertible visualization methods, such as sub-aperture image (SAI) arrays based on 2-dimensional (2-Dimensional, 2D) spatial information display, Displayed Micro-Lens Image (MLI) array, and Epipolar Plane Image (EPI) displayed by combining 1-dimensional spatial information and 1-dimensional angular information. Intuitively, improving the spatial resolution of the 4D light field image is to improve the resolution of each 2D SAI in the 4D light field image. Therefore, a straightforward approach is to apply existing 2D image super-resolution reconstruction methods, such as the deep back-projection network proposed by Haris et al., and the deep Laplacian pyramid network proposed by Lai et al. However, this approach ignores the information embedded in the angular domain of the 4D light field image, and it is difficult to guarantee the angular consistency of the super-resolution results. Therefore, exploring the high-dimensional structural properties of 4D light-field images is the key to designing spatial super-resolution reconstruction methods for 4D light-field images. The current spatial super-resolution reconstruction methods for 4D light field images can be roughly divided into two categories: optimization-based and learning-based.

基于优化的方法通常利用估计的视差或深度信息来建模4D光场图像的各SAI之间的关系，进而将4D光场图像空间超分辨率重建表示为优化问题。但是，从低空间分辨率光场图像中推断出的视差或深度信息并不十分可靠，因而基于优化的方法所展现出的性能相当有限。Optimization-based methods usually utilize the estimated disparity or depth information to model the relationship between various SAIs of 4D light-field images, and then formulate 4D light-field image spatial super-resolution reconstruction as an optimization problem. However, disparity or depth information inferred from low spatial resolution light field images is not very reliable, so optimization-based methods exhibit rather limited performance.

基于学习的方法是通过数据驱动的方式来探索4D光场图像的内在高维结构，并以此学习低空间分辨率光场图像和高空间分辨率光场图像之间的非线性映射。例如，Yeung等人利用空间-角度可分离卷积来迭代地利用4D光场图像的空间和角度信息。Wang等人则开发了一个空间-角度交互网络来融合4D光场图像的空间和角度信息。Jin等人提出了一种新颖的融合机制来利用SAI之间的补偿信息，并通过两阶段网络来恢复4D光场图像的视差细节。尽管上述方法在低重建尺度下(如2×)取得了较好的性能，但在大重建尺度下(如8×)，仍无法有效恢复足够的纹理和细节信息。这是因为低分辨率光场图像所包含的空间和角度信息有限，而上述方法又只能通过4D光场图像内部的信息来推断由低分辨率而丢失的细节。Boominathan等人提出了一种使用混合输入的光场图像空间超分辨率重建方法，其通过引入一幅额外的高分辨率2D图像作为补充信息来提高4D光场图像的空间分辨率，但该方法中的平均融合机制容易模糊重建结果，并且独立地对每幅SAI进行处理会破坏重建光场图像的视差结构。Learning-based methods explore the intrinsic high-dimensional structure of 4D light-field images in a data-driven manner, and learn the nonlinear mapping between low-spatial-resolution light-field images and high-spatial-resolution light-field images. For example, Yeung et al. exploit spatial-angle separable convolution to iteratively exploit the spatial and angular information of 4D light field images. Wang et al. developed a spatial-angle interaction network to fuse the spatial and angular information of 4D light field images. Jin et al. proposed a novel fusion mechanism to exploit the compensation information between SAIs and recover the parallax details of 4D light-field images through a two-stage network. Although the above methods achieve good performance at low reconstruction scales (such as 2×), they still cannot effectively recover sufficient texture and detail information at large reconstruction scales (such as 8×). This is because low-resolution light-field images contain limited spatial and angular information, and the above methods can only infer details lost by low-resolution from the information inside 4D light-field images. Boominathan et al. proposed a light-field image spatial super-resolution reconstruction method using mixed inputs, which improved the spatial resolution of 4D light-field images by introducing an additional high-resolution 2D image as supplementary information, but this method The average fusion mechanism in SAI tends to blur the reconstruction results, and processing each SAI independently would destroy the parallax structure of the reconstructed light-field image.

综上，虽然目前的相关研究在低重建尺度下已经取得了不错的光场图像空间超分辨率重建效果，但是在处理大重建尺度(如8×)的问题上仍存在一定的不足，特别地，在恢复重建光场图像的高频纹理信息，并避免视觉伪像，以及保留视差结构方面还有一定的改进空间。To sum up, although the current related research has achieved good spatial super-resolution reconstruction effects of light field images at low reconstruction scales, there are still certain deficiencies in dealing with large reconstruction scales (such as 8×). , there is still room for improvement in recovering high-frequency texture information of reconstructed light field images, avoiding visual artifacts, and preserving parallax structure.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是提供一种光场图像空间超分辨率重建方法，其联合光场相机与传统2D相机以构成异构式成像系统，光场相机提供了丰富的角度信息和有限的空间信息，而传统2D相机则仅采集了光线的强度信息以获取足够的空间信息，进而可充分利用两者获取的角度信息和空间信息，以高质量地重建高空间分辨率光场图像，并恢复重建光场图像的纹理和细节信息，同时避免由视差带来的鬼影伪像，以及保留视差结构。The technical problem to be solved by the present invention is to provide a light-field image spatial super-resolution reconstruction method, which combines a light-field camera and a traditional 2D camera to form a heterogeneous imaging system. The light-field camera provides rich angle information and limited However, traditional 2D cameras only collect light intensity information to obtain sufficient spatial information, and then make full use of the angle information and spatial information obtained by the two to reconstruct high spatial resolution light field images with high quality. Recover texture and detail information of reconstructed light field images while avoiding ghost artifacts caused by parallax and preserving parallax structure.

本发明解决上述技术问题所采用的技术方案为：一种光场图像空间超分辨率重建方法，其特征在于包括以下步骤：The technical solution adopted by the present invention to solve the above technical problems is: a light field image space super-resolution reconstruction method, which is characterized by comprising the following steps:

步骤1：选取Num幅空间分辨率为W×H且角度分辨率为V×U的彩色三通道的低空间分辨率光场图像、对应的Num幅分辨率为αW×αH的彩色三通道的2D高分辨率图像，以及对应的Num幅空间分辨率为αW×αH且角度分辨率为V×U的彩色三通道的参考高空间分辨率光场图像；其中，Num＞1，α表示空间分辨率提升倍数，α的值大于1；Step 1: Select Num light field images of low spatial resolution with three color channels with spatial resolution of W×H and angular resolution of V×U, and corresponding Num 2D images of three color channels with resolution of αW×αH High-resolution images, and the corresponding Num reference high-spatial-resolution light field images with three color channels with a spatial resolution of αW×αH and an angular resolution of V×U; where Num>1, α represents the spatial resolution Increase the multiplier, the value of α is greater than 1;

步骤2：构建一个卷积神经网络，作为空间超分辨率网络：空间超分辨率网络包括用于提取多尺度特征的编码器、用于配准光场特征和2D高分辨率特征的孔径级特征配准模块、用于从低空间分辨率光场图像中提取浅层特征的浅层特征提取层、用于融合光场特征和2D高分辨率特征的光场特征增强模块、用于缓解粗尺度特征中的配准误差的空间注意力块、用于将潜在特征重建为光场图像的解码器；Step 2: Build a convolutional neural network as a spatial super-resolution network: The spatial super-resolution network includes an encoder for extracting multi-scale features, aperture-level features for registering light field features, and 2D high-resolution features Registration module, shallow feature extraction layer for extracting shallow features from low spatial resolution light field images, light field feature enhancement module for fusing light field features and 2D high resolution features, for alleviating coarse scale A spatial attention block for registration errors in features, a decoder for reconstructing latent features into light field images;

对于编码器，其由依次连接的第一卷积层、第二卷积层、第一残差块和第二残差块组成，第一卷积层的输入端并行接收三个输入，分别为一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像L_LR经空间分辨率上采样后得到的图像重组的宽度为α_sW×V且高度为α_sH×U的子孔径图像阵列，将其记为

一幅宽度为α_sW且高度为α_sH的模糊后的2D高分辨率图像的单通道图像，将其记为

以及一幅宽度为α_sW且高度为α_sH的2D高分辨率图像的单通道图像，将其记为I_HR，第一卷积层的输出端针对

输出64幅宽度为α_sW×V且高度为α_sH×U的特征图，将针对

输出的所有特征图构成的集合记为

第一卷积层的输出端针对

输出64幅宽度为α_sW且高度为α_sH的特征图，将针对

输出的所有特征图构成的集合记为

第一卷积层的输出端针对I_HR输出64幅宽度为α_sW且高度为α_sH的特征图，将针对I_HR输出的所有特征图构成的集合记为Y_HR,0；第二卷积层的输入端并行接收三个输入，分别为

中的所有特征图、

中的所有特征图和Y_HR,0中的所有特征图，第二卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二卷积层的输出端针对Y_HR,0输出64幅宽度为

且高度为

的特征图，将针对Y_HR,0输出的所有特征图构成的集合记为Y_HR,1；第一残差块的输入端并行接收三个输入，分别为

中的所有特征图、

中的所有特征图和Y_HR,1中的所有特征图，第一残差块的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块的输出端针对Y_HR,1输出64幅宽度为

且高度为

的特征图，将针对Y_HR,1输出的所有特征图构成的集合记为Y_HR,2；第二残差块的输入端并行接收三个输入，分别为

中的所有特征图、

中的所有特征图和Y_HR,2中的所有特征图，第二残差块的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块的输出端针对Y_HR,2输出64幅宽度为

且高度为

的特征图，将针对Y_HR,2输出的所有特征图构成的集合记为Y_HR,3；其中，

为通过对空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像L_LR进行双三次插值上采样后得到的图像重组的宽度为α_sW×V且高度为α_sH×U的子孔径图像阵列，

为通过对I_HR先进行双三次插值下采样、后进行双三次插值上采样得到，α_s表示空间分辨率采样因子，α_s ³＝α，双三次插值上采样的上采样因子和双三次插值下采样的下采样因子均取值为α_s，第一卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64，第二卷积层的卷积核的尺寸为3×3、卷积步长为2、输入通道数为64、输出通道数为64，第一卷积层和第二卷积层采用的激活函数均为“ReLU”；For the encoder, it consists of a first convolutional layer, a second convolutional layer, a first residual block, and a second residual block connected in sequence. The input of the first convolutional layer receives three inputs in parallel, which are A single-channel image _LLR of a low spatial resolution light field image with a spatial resolution of W×H and an angular resolution of V×U The width of the image reconstruction obtained by up-sampling the spatial resolution is α _s W×V and the sub-aperture image array with height α _s H×U, denoted as

A single-channel image of a blurred 2D high-resolution image of width _αsW and height _αsH , denoted as

and a single-channel image of a 2D high-resolution image of width α _s W and height α _s H, denoted as I _HR , the output of the first convolutional layer is for

Output 64 feature maps with width α _s W×V and height α _s H×U, which will be used for

The set of all output feature maps is denoted as

The output of the first convolutional layer is for

Output 64 feature maps with width α _s W and height α _s H, which will be used for

The set of all output feature maps is denoted as

The output end of the first convolutional layer outputs 64 feature maps with a width of α _s W and a height of α _s H for I _HR , and the set formed by all feature maps output for I _HR is denoted as Y _HR,0 ; the second The input of the convolutional layer receives three inputs in parallel, which are

All feature maps in ,

All feature maps in and all feature maps in Y _HR,0 , the output of the second convolutional layer for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the second convolutional layer is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the second convolutional layer outputs 64 widths for Y _{HR, 0}

and the height is

The feature map of , the set formed by all feature maps output for Y _HR ,0 is denoted as Y _HR,1 ; the input end of the first residual block receives three inputs in parallel, which are respectively

All feature maps in ,

All feature maps in and all feature maps in Y _HR,1 , the output of the first residual block for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the first residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the first residual block is for Y _{HR, 1} outputs 64 widths of

and the height is

The feature map of , the set formed by all feature maps output for Y _HR,1 is denoted as Y _HR,2 ; the input of the second residual block receives three inputs in parallel, which are respectively

All feature maps in ,

All feature maps in and all feature maps in Y _HR,2 , the output of the second residual block for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the second residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output end of the second residual block is for Y _{HR, 2} outputs 64 widths of

and the height is

The feature map of , the set formed by all feature maps output for Y _HR,2 is denoted as Y _HR,3 ; wherein,

is the width of α _s W _× V and a subaperture image array of height α _s H × U,

In order to obtain by first performing bicubic interpolation downsampling on I _HR , and then performing bicubic interpolation upsampling, α _s represents the spatial resolution sampling factor, α _s ³ =α, the upsampling factor of bicubic interpolation upsampling and the bicubic interpolation The downsampling factors of downsampling are all α _s , the size of the convolution kernel of the first convolution layer is 3×3, the convolution stride is 1, the number of input channels is 1, and the number of output channels is 64. The size of the convolution kernel of the convolution layer is 3 × 3, the convolution stride is 2, the number of input channels is 64, and the number of output channels is 64. The activation functions used in the first convolution layer and the second convolution layer are both "ReLU";

对于孔径级特征配准模块，其输入端接收三类特征图，第一类是

中的所有特征图，第二类是

中的所有特征图，第三类包括四个输入，分别为Y_HR,0中的所有特征图、Y_HR,1中的所有特征图、Y_HR,2中的所有特征图、Y_HR,3中的所有特征图；在孔径级特征配准模块中，首先将

中的所有特征图、Y_HR,0中的所有特征图、Y_HR,1中的所有特征图、Y_HR,2中的所有特征图和Y_HR,3中的所有特征图各自复制V×U倍，使

中的所有特征图、Y_HR,1中的所有特征图、Y_HR,2中的所有特征图和Y_HR,3中的所有特征图的宽度变为

且高度变为

即使得尺寸与

中的特征图的尺寸相匹配，并使Y_HR,0中的所有特征图的宽度变为α_sW×V且高度变为α_sH×U，即使得尺寸与

中的特征图的尺寸相匹配；然后对

中的所有特征图和

中的所有特征图进行块匹配，块匹配结束后得到一幅宽度为

且高度为

的坐标索引图，记为P_CI；接着根据P_CI，将Y_HR,1中的所有特征图与

中的所有特征图进行空间位置配准，得到64幅宽度为

且高度为

的配准特征图，将得到的所有配准特征图构成的集合记为F_Align,1；同样，根据P_CI，将Y_HR,2中的所有特征图与

中的所有特征图进行空间位置配准，得到64幅宽度为

且高度为

的配准特征图，将得到的所有配准特征图构成的集合记为F_Align,2；根据P_CI，将Y_HR,3中的所有特征图与

中的所有特征图进行空间位置配准，得到64幅宽度为

且高度为

的配准特征图，将得到的所有配准特征图构成的集合记为F_Align,3；再对P_CI进行双三次插值上采样，得到一幅宽度为α_sW×V且高度为α_sH×U的坐标索引图，记为

最后根据

将Y_HR,0中的所有特征图与

中的所有特征图进行空间位置配准，得到64幅宽度为α_sW×V且高度为α_sH×U的配准特征图，将得到的所有配准特征图构成的集合记为F_Align,0；孔径级特征配准模块的输出端输出F_Align,0中的所有特征图、F_Align,1中的所有特征图、F_Align,2中的所有特征图和F_Align,3中的所有特征图；其中，用于块匹配的精度衡量指标为纹理和结构相似度指数，用于块匹配的块的尺寸为3×3，双三次插值上采样的上采样因子为α_s；For the aperture-level feature registration module, its input receives three types of feature maps. The first type is

All feature maps in , the second class is

All feature maps in Y HR, the third category consists of four inputs, namely all feature maps in Y _HR,0 , all feature maps in Y _HR,1 , all feature maps in Y _HR,2 , Y _HR,3 All feature maps in ; in the aperture-level feature registration module, first

All feature maps in Y _{HR, all feature maps in 0} , Y HR, all feature maps _{in 1} , Y HR, all feature maps _{in 2} , and all feature maps in Y _{HR, 3} each replicate V × U times, make

The widths of all feature maps in Y _{HR, 1} , Y _{HR, 2} , and Y _{HR, 3} become

and the height becomes

even if the size is the same as

match the dimensions of the feature maps in Y _HR,0 and make the width of all feature maps in Y HR,0 become α _s W×V and the height become α _s H×U, i.e. make the size equal to

to match the dimensions of the feature maps in ; then

All feature maps in and

All feature maps in the

and the height is

The coordinate index map of , denoted as _PCI ; then according to _PCI _, all feature maps in

All feature maps in

and the height is

The registration _feature _map _of the

All feature maps in

and the height is

The registration _feature _map _of the

All feature maps in

and the height is

The registration feature map of , and the set of all the obtained registration feature maps is denoted as F _Align,3 ; then perform _bicubic interpolation and upsampling on PCI to obtain a width of α _s W × V and height of α _s The coordinate index map of H×U, denoted as

Finally according to

Combine all feature maps in Y _HR,0 with

All the feature maps in the spatial position are registered, and 64 registered feature maps with a width of α _s W×V and a height of α _s H×U are obtained, and the set of all the obtained registration feature maps is denoted as F _{Align ,0} ; the output of the aperture-level feature registration module outputs all feature maps in F _Align,0 , all feature maps in F _Align,1 , all feature maps in F _Align,2 , and all feature maps in F _Align,3 feature map; wherein, the accuracy measure used for block matching is texture and structure similarity index, the size of the block used for block matching is 3×3, and the upsampling factor for bicubic interpolation upsampling is α _s ;

对于浅层特征提取层，其由1个第五卷积层组成，第五卷积层的输入端接收一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像L_LR重组的宽度为W×V且高度为H×U的子孔径图像阵列，第五卷积层的输出端输出64幅宽度为W×V且高度为H×U的特征图，将输出的所有特征图构成的集合记为F_LR；其中，第五卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64，第五卷积层采用的激活函数为“ReLU”；For the shallow feature extraction layer, it consists of a fifth convolutional layer. The input of the fifth convolutional layer receives a low spatial resolution light with a spatial resolution of W×H and an angular resolution of V×U. The single-channel image _LLR recombination of the field image is a sub-aperture image array with a width of W×V and a height of H×U, and the output of the fifth convolutional layer outputs 64 images with a width of W×V and a height of H×U. Feature map, the set of all output feature maps is denoted as F _LR ; among them, the size of the convolution kernel of the fifth convolution layer is 3×3, the convolution step size is 1, the number of input channels is 1, and the output channel is 1. The number is 64, and the activation function used by the fifth convolutional layer is "ReLU";

对于光场特征增强模块，其由依次连接的第一增强残差块、第二增强残差块和第三增强残差块组成，第一增强残差块的输入端接收F_Align,1中的所有特征图和F_LR中的所有特征图，第一增强残差块的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_En,1；第二增强残差块的输入端接收F_Align,2中的所有特征图和F_En,1中的所有特征图，第二增强残差块的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_En,2；第三增强残差块的输入端接收F_Align,3中的所有特征图和F_En,2中的所有特征图，第三增强残差块的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_En,3；For the light field feature enhancement module, it consists of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block connected in sequence, and the input of the first enhanced residual block receives F _Align,1 in For all feature maps and all feature maps in _FLR , the output of the first enhanced residual block outputs 64 images with a width of

and the height is

The feature map of the output, the set formed by all the output feature maps is denoted as F _En,1 ; the input end of the second enhanced residual block receives all the feature maps in F _Align,2 and all the feature maps in F _En,1 , The output end of the second enhanced residual block outputs 64 widths of

and the height is

The feature map of the output is denoted as F _En,2 ; the input end of the third enhanced residual block receives all the feature maps in F _Align,3 and all the feature maps in F _En,2 , The output end of the third enhanced residual block outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as F _En,3 ;

对于空间注意力块，其由依次连接的第六卷积层和第七卷积层组成，第六卷积层的输入端接收F_Align,0中的所有特征图，第六卷积层的输出端输出64幅宽度为α_sW×V且高度为α_sH×U的空间注意力特征图，将输出的所有空间注意力特征图构成的集合记为F_SA1；第七卷积层的输入端接收F_SA1中的所有空间注意力特征图，第七卷积层的输出端输出64幅宽度为α_sW×V且高度为α_sH×U的空间注意力特征图，将输出的所有空间注意力特征图构成的集合记为F_SA2；将F_Align,0中的所有特征图与F_SA2中的所有空间注意力特征图进行逐元素相乘，将得到的所有特征图构成的集合记为F_WA,0；将F_WA,0中的所有特征图作为空间注意力块的输出端输出的所有特征图；其中，第六卷积层和第七卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64，第六卷积层采用的激活函数为“ReLU”，第七卷积层采用的激活函数为“Sigmoid”；For the spatial attention block, it consists of the sixth convolutional layer and the seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives all the feature maps in F _Align,0 , and the output of the sixth convolutional layer The terminal outputs 64 spatial attention feature maps with a width of α _s W×V and a height of α _s H×U, and the set of all output spatial attention feature maps is denoted as F _SA1 ; the input of the seventh convolutional layer The terminal receives all the spatial attention feature maps in F _SA1 , and the output terminal of the seventh convolutional layer outputs 64 spatial attention feature maps with a width of α _s W×V and a height of α _s H×U. The set composed of spatial attention feature maps is denoted as F _SA2 ; all feature maps in F _Align,0 are multiplied element by element with all spatial attention feature maps in F _SA2 , and the set composed of all the obtained feature maps is denoted. is F _WA,0 ; all feature maps in F _WA,0 are used as all feature maps output by the output of the spatial attention block; wherein, the size of the convolution kernels of the sixth convolutional layer and the seventh convolutional layer are all It is 3 × 3, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 64. The activation function used in the sixth convolution layer is "ReLU", and the activation function used in the seventh convolution layer. is "Sigmoid";

对于解码器，其由依次连接的第三残差块、第四残差块、子像素卷积层、第八卷积层和第九卷积层组成，第三残差块的输入端接收F_En,3中的所有特征图，第三残差块的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_Dec,1；第四残差块的输入端接收F_Dec,1中的所有特征图，第四残差块的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_Dec,2；子像素卷积层的输入端接收F_Dec,2中的所有特征图，子像素卷积层的输出端输出256幅宽度为

且高度为

的特征图，并将256幅宽度为

且高度为

的特征图进一步转换为64幅宽度为α_sW×V且高度为α_sH×U的特征图，将转换后的所有特征图构成的集合记为F_Dec,3；第八卷积层的输入端接收F_Dec,3中的所有特征图与F_WA,0中的所有特征图进行逐元素相加后的结果，第八卷积层的输出端输出64幅宽度为α_sW×V且高度为α_sH×U的特征图，将输出的所有特征图构成的集合记为F_Dec,4；第九卷积层的输入端接收F_Dec,4中的所有特征图，第九卷积层的输出端输出一幅宽度为α_sW×V且高度为α_sH×U的重建单通道光场图像，并将该幅宽度为α_sW×V且高度为α_sH×U的重建单通道光场图像重组为空间分辨率为α_sW×α_sH且角度分辨率为V×U的高空间分辨率单通道光场图像，记为L_SR；其中，子像素卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为256，第八卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为64，第九卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为1，子像素卷积层和第八卷积层采用的激活函数均为“ReLU”，第九卷积层不采用激活函数；For the decoder, it consists of the third residual block, the fourth residual block, the sub-pixel convolutional layer, the eighth convolutional layer and the ninth convolutional layer connected in sequence, and the input of the third residual block receives F For all feature maps in _En,3 , the output of the third residual block outputs 64 widths of

and the height is

The feature map of the output, the set formed by all the output feature maps is denoted as F _Dec,1 ; the input end of the fourth residual block receives all the feature maps in F _Dec,1 , and the output end of the fourth residual block outputs 64 width is

and the height is

The feature map of the output is marked as F _Dec,2 ; the input end of the subpixel convolution layer receives all the feature maps in F _Dec,2 , and the output end of the subpixel convolution layer outputs 256 width is

and the height is

feature map of , and set the width of 256 as

and the height is

The feature maps are further converted into 64 feature maps with a width of α _s W×V and a height of α _s H×U, and the set of all the converted feature maps is denoted as F _Dec,3 ; The input terminal receives the result of element-wise addition of all feature maps in F _Dec,3 and all feature maps in F _WA,0 , and the output terminal of the eighth convolutional layer outputs 64 images with a width of α _s W×V and A feature map with a height of α _s H×U, denote the set of all output feature maps as F _Dec,4 ; the input of the ninth convolutional layer receives all the feature maps in F _Dec,4 , and the ninth convolution The output of the layer outputs a reconstructed single-channel light field image with width α _s W×V and height α _s H×U, and converts the image with width α _s W×V and height α _s H×U. The reconstructed single-channel light field image is reorganized into a high spatial resolution single-channel light field image with a spatial resolution of α _s W×α _s H and an angular resolution of V×U, denoted as L _SR ; among them, the sub-pixel convolution layer The size of the convolution kernel is 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 256, and the size of the convolution kernel of the eighth convolution layer is 3×3, the convolution stride is 1 is 1, the number of input channels is 64, the number of output channels is 64, the size of the convolution kernel of the ninth convolutional layer is 1×1, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 1, The activation functions used in the sub-pixel convolutional layer and the eighth convolutional layer are all "ReLU", and the ninth convolutional layer does not use an activation function;

步骤3：将训练集中的每幅低空间分辨率光场图像、对应的2D高分辨率图像、对应的参考高空间分辨率光场图像进行颜色空间转换，即从RGB颜色空间转换到YCbCr颜色空间，并提取出Y通道图像；然后将每幅低空间分辨率光场图像的Y通道图像重组为宽度为W×V且高度为H×U的子孔径图像阵列来表示；接着将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、对应的2D高分辨率图像的Y通道图像、对应的参考高空间分辨率光场图像的Y通道图像构成训练集；再构建金字塔网络，并利用训练集进行训练，具体过程为：Step 3: Convert each low spatial resolution light field image, corresponding 2D high resolution image, and corresponding reference high spatial resolution light field image in the training set to color space, that is, convert from RGB color space to YCbCr color space , and extract the Y-channel image; then recombine the Y-channel image of each low spatial resolution light field image into a sub-aperture image array with a width of W×V and a height of H×U to represent; The sub-aperture image array of the Y-channel image reconstruction of the low-spatial-resolution light field image, the Y-channel image of the corresponding 2D high-resolution image, and the Y-channel image of the corresponding reference high-spatial-resolution light field image constitute the training set; Pyramid network, and use the training set for training, the specific process is:

步骤3_1：将构建好的空间超分辨率网络复制三次，并进行级联，每个空间超分辨率网络的权重共享，即参数全都一样，将三个空间超分辨率网络构成的整体网络定义为金字塔网络；在每个金字塔水平，空间超分辨率网络的重建尺度设置为与α_s取值相同；Step 3_1: Copy the constructed spatial super-resolution network three times and cascade it. The weights of each spatial super-resolution network are shared, that is, the parameters are all the same. The overall network composed of the three spatial super-resolution networks is defined as Pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to the same value as α _s ;

步骤3_2：对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行两次空间分辨率下采样，将下采样后得到的图像作为标签图像；对训练集中的每幅2D高分辨率图像的Y通道图像也进行两次同样的空间分辨率下采样，将下采样后得到的图像作为针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像；然后将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第一个空间超分辨率网络中进行训练，得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的α_s倍重建高空间分辨率Y通道光场图像；其中，空间分辨率上采样和空间分辨率下采样的方式均为双三次插值，空间分辨率上采样和空间分辨率下采样的尺度均与α_s取值相同；Step 3_2: Perform two spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the rate image is also down-sampled twice with the same spatial resolution, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the first spatial super-resolution network in the pyramid network; The sub-aperture image array of the Y-channel image reconstruction of all low-spatial-resolution light-field images in the training set, and the image reconstruction of the Y-channel images of all low-spatial-resolution light-field images in the training set after a spatial resolution upsampling. Subaperture image array, all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after one spatial resolution downsampling and one spatial resolution upsampling is input into the first spatial super-resolution network in the constructed pyramid network for training, and the training is obtained. The high spatial resolution Y channel light field image is reconstructed by α _s times corresponding to the Y channel image of each low spatial resolution light field image in the collection; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic The scales of interpolation, spatial resolution upsampling and spatial resolution downsampling are the same as α _s ;

步骤3_3：对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行单次空间分辨率下采样，将下采样后得到的图像作为标签图像；对训练集中的每幅2D高分辨率图像的Y通道图像也进行单次同样的空间分辨率下采样，将下采样后得到的图像作为针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像；然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的α_s倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的α_s倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第二个空间超分辨率网络中进行训练，得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的α_s ²倍重建高空间分辨率Y通道光场图像；其中，空间分辨率上采样和空间分辨率下采样的方式均为双三次插值，空间分辨率上采样和空间分辨率下采样的尺度均与α_s取值相同；Step 3_3: Perform a single spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the high-speed image is also down-sampled at the same spatial resolution once, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the second spatial super-resolution network in the pyramid network; α _s times corresponding to Y-channel images of all low-spatial-resolution light-field images in the training set Reconstructed sub-aperture image arrays of high-spatial-resolution Y-channel light-field images, Y of all low-spatial-resolution light-field images in the training set The α _s -fold reconstruction of the high spatial resolution Y channel light field image corresponding to the channel image is the sub-aperture image array of the image reconstruction obtained after one spatial resolution upsampling, and all the images for the second spatial super-resolution network in the pyramid network. The 2D high-resolution Y-channel image and all 2D high-resolution Y-channel images for the second spatial super-resolution network in the pyramid network are obtained after one spatial resolution downsampling and one spatial resolution upsampling. The 2D high-resolution Y-channel image is input into the second spatial super-resolution network in the constructed pyramid network for training, and the α _s ² corresponding to the Y-channel image of each low-spatial-resolution light field image in the training set is obtained. The Y-channel light field image with high spatial resolution is reconstructed by 2 times; among them, the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, and the scales of spatial resolution upsampling and spatial resolution downsampling are the same as α _s take the same value;

步骤3_4：将训练集中的每幅参考高空间分辨率光场图像的Y通道图像作为标签图像；将训练集中的每幅2D高分辨率图像的Y通道图像作为针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像；然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的α_s ²倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的α_s ²倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第三个空间超分辨率网络中进行训练，得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的α_s ³倍重建高空间分辨率Y通道光场图像；其中，空间分辨率上采样和空间分辨率下采样的方式均为双三次插值，空间分辨率上采样和空间分辨率下采样的尺度均与α_s取值相同；Step 3_4: Use the Y-channel image of each reference high-spatial-resolution light field image in the training set as the label image; use the Y-channel image of each 2D high-resolution image in the training set as the third spatial image in the pyramid network. 2D high-resolution Y-channel images of the super-resolution network; then _αs ² times the corresponding Y-channel images of all low-spatial-resolution light-field images in the training set reconstructed sub-apertures of the high-spatial-resolution Y-channel light-field images Image array, sub-aperture image of image reconstruction obtained by upsampling of high spatial resolution Y channel light field images corresponding to α _s ² times corresponding to Y channel images of all low spatial resolution light field images in the training set array, all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after resolution downsampling and one spatial resolution upsampling is input into the third spatial super-resolution network in the constructed pyramid network for training, and each image in the training set is obtained. The high spatial resolution Y channel light field image is reconstructed by ³ times the α _s corresponding to the Y channel image of the low spatial resolution light field image; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, The scales of spatial resolution upsampling and spatial resolution downsampling are the same as α _s ;

在训练结束后得到金字塔网络中的各空间超分辨率网络中的所有卷积核的最佳权重参数，即得到训练有素的空间超分辨率网络模型；After the training, the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network are obtained, that is, the well-trained spatial super-resolution network model is obtained;

步骤4：任意选取一幅彩色三通道的低空间分辨率光场图像和对应的一幅彩色三通道的2D高分辨率图像作为测试图像；然后将彩色三通道的低空间分辨率光场图像和对应的彩色三通道的2D高分辨率图像从RGB颜色空间转换到YCbCr颜色空间，并提取出Y通道图像；接着将低空间分辨率光场图像的Y通道图像重组为子孔径图像阵列来表示；再将低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、2D高分辨率图像的Y通道图像和2D高分辨率图像的Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到空间超分辨率网络模型中，测试得到低空间分辨率光场图像的Y通道图像对应的重建高空间分辨率Y通道光场图像；之后对低空间分辨率光场图像的Cb通道图像和Cr通道图像分别进行双三次插值上采样，得到低空间分辨率光场图像的Cb通道图像对应的重建高空间分辨率Cb通道光场图像和低空间分辨率光场图像的Cr通道图像对应的重建高空间分辨率Cr通道光场图像；最后将得到的重建高空间分辨率Y通道光场图像、重建高空间分辨率Cb通道光场图像和重建高空间分辨率Cr通道光场图像在颜色通道维度上进行级联，并将级联结果重新转换到RGB颜色空间，得到低空间分辨率光场图像对应的彩色三通道的重建高空间分辨率光场图像。Step 4: Arbitrarily select a low spatial resolution light field image of three color channels and a corresponding 2D high resolution image of three color channels as test images; then use the low spatial resolution light field image of three color channels and The 2D high-resolution image of the corresponding color three-channel is converted from the RGB color space to the YCbCr color space, and the Y-channel image is extracted; then the Y-channel image of the low-spatial-resolution light field image is recombined into a sub-aperture image array to represent; The sub-aperture image array obtained by recombining the Y-channel image of the low-spatial-resolution light field image, the sub-aperture image array obtained by upsampling the Y-channel image of the low-spatial-resolution light field image for one time, and the 2D The Y-channel image of the high-resolution image and the Y-channel image of the 2D high-resolution image are subjected to one spatial resolution down-sampling and one spatial resolution up-sampling. The blurred 2D high-resolution Y-channel image is input to the spatial super-resolution In the rate network model, the reconstructed high-spatial-resolution Y-channel light-field image corresponding to the Y-channel image of the low-spatial-resolution light-field image was obtained by testing; Bicubic interpolation and upsampling to obtain the reconstructed high spatial resolution Cb channel light field image corresponding to the Cb channel image of the low spatial resolution light field image and the reconstructed high spatial resolution Cr corresponding to the Cr channel image of the low spatial resolution light field image channel light field image; finally, the obtained reconstructed high spatial resolution Y channel light field image, reconstructed high spatial resolution Cb channel light field image and reconstructed high spatial resolution Cr channel light field image are cascaded in the color channel dimension, The concatenated result is re-converted to the RGB color space, and the reconstructed high-spatial-resolution light-field image with three color channels corresponding to the low-spatial-resolution light-field image is obtained.

所述的步骤2中，第一残差块、第二残差块、第三残差块和第四残差块的结构相同，其均由依次连接的第三卷积层和第四卷积层组成，第一残差块中的第三卷积层的输入端并行接收三个输入，分别为

中的所有特征图、

中的所有特征图和Y_HR,1中的所有特征图，第一残差块中的第三卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块中的第三卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块中的第三卷积层的输出端针对Y_HR，1输出64幅宽度为

且高度为

的特征图，将针对Y_HR,1输出的所有特征图构成的集合记为

第一残差块中的第四卷积层的输入端并行接收三个输入，分别为

中的所有特征图、

中的所有特征图和

中的所有特征图，第一残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

将

中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第一残差块的输出端针对

输出的所有特征图，这些特征图构成的集合即为

将

中的所有特征图与

输出的所有特征图，这些特征图构成的集合即为

将

中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第一残差块的输出端针对Y_HR,1输出的所有特征图，这些特征图构成的集合即为Y_HR,2；In the step 2, the structure of the first residual block, the second residual block, the third residual block and the fourth residual block is the same, and they are all composed of the third convolution layer and the fourth convolution layer connected in sequence. The input of the third convolutional layer in the first residual block receives three inputs in parallel, which are

All feature maps in ,

All feature maps in and all feature maps in Y _HR,1 , the output of the third convolutional layer in the first residual block for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the third convolutional layer in the first residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the third convolutional layer in the first residual block is for Y _{HR, and 1} outputs 64 widths of

and the height is

The feature map of , the set of all feature maps output for Y _HR,1 is recorded as

The input of the fourth convolutional layer in the first residual block receives three inputs in parallel, which are

All feature maps in ,

All feature maps in and

All feature maps in , the output of the fourth convolutional layer in the first residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the fourth convolutional layer in the first residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the fourth convolutional layer in the first residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

Will

All feature maps in

All feature maps in the

All output feature maps, the set of these feature maps is

Will

All feature maps in

All feature maps in the

All output feature maps, the set of these feature maps is

Will

All feature maps in

All feature maps in are added element by element, and all feature maps obtained are used as the output of the first residual block for all feature maps output by Y _HR,1 , and the set formed by these feature maps is Y _HR,2 ;

第二残差块中的第三卷积层的输入端并行接收三个输入，分别为

中的所有特征图、

中的所有特征图和Y_HR,2中的所有特征图，第二残差块中的第三卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块中的第三卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块中的第三卷积层的输出端针对Y_HR,2输出64幅宽度为

且高度为

的特征图，将针对Y_HR,2输出的所有特征图构成的集合记为

第二残差块中的第四卷积层的输入端并行接收三个输入，分别为

中的所有特征图、

中的所有特征图和

中的所有特征图，第二残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

将

中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第二残差块的输出端针对

输出的所有特征图，这些特征图构成的集合即为

将

中的所有特征图与

输出的所有特征图，这些特征图构成的集合即为

将Y_HR,2中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第二残差块的输出端针对Y_HR,2输出的所有特征图，这些特征图构成的集合即为Y_HR,3；The input of the third convolutional layer in the second residual block receives three inputs in parallel, which are

All feature maps in ,

All feature maps in and all feature maps in Y _HR,2 , the output of the third convolutional layer in the second residual block for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the third convolutional layer in the second residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the third convolutional layer in the second residual block is for Y _{HR, 2} outputs 64 widths of

and the height is

The feature map of , the set of all feature maps output for Y _HR,2 is recorded as

The input of the fourth convolutional layer in the second residual block receives three inputs in parallel, which are

All feature maps in ,

All feature maps in and

All feature maps in , the output of the fourth convolutional layer in the second residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the fourth convolutional layer in the second residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

Will

All feature maps in

All feature maps in the

All output feature maps, the set of these feature maps is

Will

All feature maps in

All feature maps in the

All output feature maps, the set of these feature maps is

Combine all feature maps in Y _HR,2 with

All feature maps in are added element by element, and all feature maps obtained are used as the output of the second residual block for all feature maps output by Y _HR,2 , and the set formed by these feature maps is Y _HR,3 ;

第三残差块中的第三卷积层的输入端接收F_En,3中的所有特征图，第三残差块中的第三卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第三残差块中的第四卷积层的输入端接收

中的所有特征图，第三残差块中的第四卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

将F_En,3中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第三残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_Dec,1；The input terminal of the third convolutional layer in the third residual block receives all the feature maps in F _En,3 , and the output terminal of the third convolutional layer in the third residual block outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as

The input of the fourth convolutional layer in the third residual block receives

All feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

Combine all feature maps in F _En,3 with

All feature maps in are added element by element, and all the feature maps obtained are used as all feature maps output by the output of the third residual block, and the set formed by these feature maps is F _Dec,1 ;

第四残差块中的第三卷积层的输入端接收F_Dec,1中的所有特征图，第四残差块中的第三卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第四残差块中的第四卷积层的输入端接收

中的所有特征图，第四残差块中的第四卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

将F_Dec,1中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第四残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_Dec,2；The input of the third convolutional layer in the fourth residual block receives all the feature maps in F _Dec,1 , and the output of the third convolutional layer in the fourth residual block outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as

The input of the fourth convolutional layer in the fourth residual block receives

All feature maps in , the output of the fourth convolutional layer in the fourth residual block outputs 64 width

and the height is

The feature map of , denote the set of all output feature maps as

Combine all feature maps in F _Dec,1 with

All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the fourth residual block, and the set formed by these feature maps is F _Dec,2 ;

上述，第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层和第四卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64，第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层采用的激活函数为“ReLU”、第四卷积层不采用激活函数。As mentioned above, the size of the convolution kernels of the third convolution layer and the fourth convolution layer in the first residual block, the second residual block, the third residual block and the fourth residual block are all 3×3 , the convolution step size is 1, the number of input channels is 64, the number of output channels is 64, the first residual block, the second residual block, the third residual block and the fourth residual block The activation function used in the third convolutional layer is "ReLU", and the fourth convolutional layer does not use an activation function.

所述的步骤2中，第一增强残差块、第二增强残差块和第三增强残差块的结构相同，其均由依次连接的第一空间特征变换层、第一空间角度卷积层、第二空间特征变换层、第二空间角度卷积层和通道注意力层组成，第一空间特征变换层和第二空间特征变换层的结构相同，其均由并行的第十卷积层和第十一卷积层组成，第一空间角度卷积层和第二空间角度卷积层的结构相同，其均由依次连接的第十二卷积层和第十三卷积层组成，通道注意力层由依次连接的全局均值池化层、第十四卷积层和第十五卷积层组成；In the step 2, the structure of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block is the same, and they are all connected by the first spatial feature transformation layer and the first spatial angle convolution layer. layer, the second spatial feature transformation layer, the second spatial angle convolution layer and the channel attention layer. It is composed of the eleventh convolutional layer, the first spatial angle convolutional layer and the second spatial angle convolutional layer have the same structure, and they are both composed of the twelfth convolutional layer and the thirteenth convolutional layer connected in sequence. The attention layer consists of a global mean pooling layer, a fourteenth convolutional layer, and a fifteenth convolutional layer connected in sequence;

第一增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收F_Align,1中的所有特征图，第一增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第一增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收F_Align,1中的所有特征图，第一增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第一增强残差块中的第一空间特征变换层的输入端接收F_LR中的所有特征图，将F_LR中的所有特征图与

中的所有特征图进行逐元素相乘，再将相乘结果与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第一增强残差块中的第一空间特征变换层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input end of the tenth convolutional layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F _Align,1 , and the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as

The input of the eleventh convolution layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F _Align,1 , and the first spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is

and the height is

The feature map of , denote the set of all output feature maps as

The input end of the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in the _FLR , and compares all the feature maps in the _FLR with

All feature maps in are multiplied element-wise, and the multiplication result is

All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the first enhanced residual block, and the set formed by these feature maps is denoted as

第一增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收

中的所有特征图，第一增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从空间维转换到角度维的重组操作，第一增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收

中的所有特征图的重组操作结果，第一增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从角度维到空间维的重组操作，将重组操作后得到的所有特征图作为第一增强残差块中的第一空间角度卷积层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receives

All feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the first enhanced residual block receives

The result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolutional layer in the first enhanced residual block, Denote the set of these feature maps as

第一增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收F_Align,1中的所有特征图，第一增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第一增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收F_Align,1中的所有特征图，第一增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第一增强残差块中的第二空间特征变换层的输入端接收

中的所有特征图，将

中的所有特征图与

中的所有特征图进行逐元素相乘，再将相乘结果与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第一增强残差块中的第二空间特征变换层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input end of the tenth convolutional layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F _Align,1 , and the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as

The input of the eleventh convolution layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F _Align,1 , and the second spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is

and the height is

The feature map of , denote the set of all output feature maps as

The input of the second spatial feature transform layer in the first enhanced residual block receives

All feature maps in , will

All feature maps in

All feature maps in the

第一增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收

中的所有特征图，第一增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从空间维转换到角度维的重组操作，第一增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收

中的所有特征图的重组操作结果，第一增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从角度维到空间维的重组操作，将重组操作后得到的所有特征图作为第一增强残差块中的第二空间角度卷积层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the first enhanced residual block receives

For all feature maps in the first enhanced residual block, the output of the twelfth convolutional layer of the second spatial angle convolutional layer in the first enhanced residual block outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the first enhanced residual block receives

As a result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the first enhanced residual block, Denote the set of these feature maps as

第一增强残差块中的通道注意力层中的全局均值池化层的输入端接收

中的所有特征图，第一增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_GAP,1，F_GAP,1中的每幅特征图中的所有特征值相同；第一增强残差块中的通道注意力层中的第十四卷积层的输入端接收F_GAP,1中的所有特征图，第一增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_DS,1；第一增强残差块中的通道注意力层中的第十五卷积层的输入端接收F_DS,1中的所有特征图，第一增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_US,1；将F_US,1中的所有特征图与

中的所有特征图进行逐元素相乘，将得到的所有特征图作为第一增强残差块中的通道注意力层的输出端输出的所有特征图，将这些特征图构成的集合记为F_CA,1；The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives

All feature maps in , the output of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as F _GAP,1 , and all feature values in each feature map in F _GAP,1 are the same; in the channel attention layer in the first enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F _GAP,1 , and the output of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 4 widths of

and the height is

The feature map of the output, the set of all output feature maps is denoted as F _DS,1 ; the input end of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives F _DS,1 in For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 images with a width of

and the height is

The feature _map _of the

Perform element-wise multiplication of all feature maps in , and take all the feature maps obtained as all feature maps output by the output of the channel attention layer in the first enhanced residual block, and denote the set of these feature maps as F _{CA ,1} ;

将F_CA,1中的所有特征图与F_LR中的所有特征图进行逐元素相加，将得到的所有特征图作为第一增强残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_En,1；Add all feature maps in F _CA,1 and all feature maps in F _LR element by element, and use all the feature maps obtained as all feature maps output by the output of the first enhanced residual block. These feature maps constitute The set of is F _En,1 ;

第二增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收F_Align,2中的所有特征图，第二增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第二增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收F_Align,2中的所有特征图，第二增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第二增强残差块中的第一空间特征变换层的接收端接收F_En,1中的所有特征图，将F_En,1中的所有特征图与

中的所有特征图进行逐元素相乘，再将相乘结果与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第二增强残差块中的第一空间特征变换层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input terminal of the tenth convolutional layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F _Align,2 , and the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as

The input of the eleventh convolution layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F _Align,2 , and the first spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is

and the height is

The feature map of , denote the set of all output feature maps as

The receiving end of the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F En, ₁ , and compares all the feature maps in F _En,1 with

All feature maps in the

第二增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收

中的所有特征图，第二增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从空间维转换到角度维的重组操作，第二增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收

中的所有特征图的重组操作结果，第二增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从角度维到空间维的重组操作，将重组操作后得到的所有特征图作为第二增强残差块中的第一空间角度卷积层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the second enhanced residual block receives

For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the second enhanced residual block receives

The result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in the reorganization operation from the angle dimension to the space dimension are reorganized, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as

第二增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收F_Align,2中的所有特征图，第二增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第二增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收F_Align,2中的所有特征图，第二增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第二增强残差块中的第二空间特征变换层的接收端接收

中的所有特征图，将

中的所有特征图与

中的所有特征图进行逐元素相乘，再将相乘结果与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第二增强残差块中的第二空间特征变换层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input end of the tenth convolutional layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F _Align,2 , and the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as

The input of the eleventh convolution layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F _Align,2 , and the second spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is

and the height is

The feature map of , denote the set of all output feature maps as

The receiving end of the second spatial feature transform layer in the second enhanced residual block receives

All feature maps in , will

All feature maps in

All feature maps in the

第二增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收

中的所有特征图，第二增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从空间维转换到角度维的重组操作，第二增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收

中的所有特征图的重组操作结果，第二增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从角度维到空间维的重组操作，将重组操作后得到的所有特征图作为第二增强残差块中的第二空间角度卷积层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the second enhanced residual block receives

For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the second enhanced residual block receives

The result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as

第二增强残差块中的通道注意力层中的全局均值池化层的输入端接收

中的所有特征图，第二增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_GAP,2，F_GAP,2中的每幅特征图中的所有特征值相同；第二增强残差块中的通道注意力层中的第十四卷积层的输入端接收F_GAP,2中的所有特征图，第二增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_DS,2；第二增强残差块中的通道注意力层中的第十五卷积层的输入端接收F_DS,2中的所有特征图，第二增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_US,2；将F_US,2中的所有特征图与

中的所有特征图进行逐元素相乘，将得到的所有特征图作为第二增强残差块中的通道注意力层的输出端输出的所有特征图，将这些特征图构成的集合记为F_CA,2；The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives

All feature maps in , the output of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as F _GAP,2 , all feature values in each feature map in F _GAP,2 are the same; in the channel attention layer in the second enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F _GAP,2 , and the output of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 4 widths of

and the height is

The feature map of the output, the set of all output feature maps is denoted as F _DS,2 ; the input end of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives F _DS,2 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as F _US,2 ; all feature maps in F _US,2 are combined with

Perform element-by-element multiplication of all feature maps in , and use all the feature maps obtained as all feature maps output by the output of the channel attention layer in the second enhanced residual block, and denote the set formed by these feature maps as F _{CA ,2} ;

将F_CA,2中的所有特征图与F_En,1中的所有特征图进行逐元素相加，将得到的所有特征图作为第二增强残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_En,2；Add all feature maps in F _CA,2 and all feature maps in F _En,1 element-wise, and use all the resulting feature maps as all feature maps output by the output of the second enhanced residual block. These features The set of graphs is F _En,2 ;

第三增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收F_Align,3中的所有特征图，第三增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第三增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收F_Align,3中的所有特征图，第三增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第三增强残差块中的第一空间特征变换层的接收端接收F_En,2中的所有特征图，将F_En,2中的所有特征图与

中的所有特征图进行逐元素相乘，再将相乘结果与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第三增强残差块中的第一空间特征变换层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input end of the tenth convolutional layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F _Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as

The input of the eleventh convolution layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F _Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is

and the height is

The feature map of , denote the set of all output feature maps as

The receiver of the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F En, ₂ , and compares all the feature maps in F _En,2 with

All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as

第三增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收

中的所有特征图，第三增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从空间维转换到角度维的重组操作，第三增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收

中的所有特征图的重组操作结果，第三增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从角度维到空间维的重组操作，将重组操作后得到的所有特征图作为第三增强残差块中的第一空间角度卷积层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the third enhanced residual block receives

All feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the third enhanced residual block receives

As a result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in the recombination operation from the angle dimension to the space dimension are performed, and all the feature maps obtained after the recombination operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as

第三增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收F_Align,3中的所有特征图，第三增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第三增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收F_Align,3中的所有特征图，第三增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第三增强残差块中的第二空间特征变换层的接收端接收

中的所有特征图，将

中的所有特征图与

中的所有特征图进行逐元素相乘，再将相乘结果与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第三增强残差块中的第二空间特征变换层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input end of the tenth convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F _Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as

The input of the eleventh convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F _Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is

and the height is

The feature map of , denote the set of all output feature maps as

The receiving end of the second spatial feature transformation layer in the third enhanced residual block receives

All feature maps in , will

All feature maps in

All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the second spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as

第三增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收

中的所有特征图，第三增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从空间维转换到角度维的重组操作，第三增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收

中的所有特征图的重组操作结果，第三增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从角度维到空间维的重组操作，将重组操作后得到的所有特征图作为第三增强残差块中的第二空间角度卷积层的输出端输出的所有特征图，将这些特征图构成的集合记为

The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the third enhanced residual block receives

For all feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in the recombination operation are converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the third enhanced residual block receives

The result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in the reorganization operation from the angle dimension to the space dimension are reorganized, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as

第三增强残差块中的通道注意力层中的全局均值池化层的输入端接收

中的所有特征图，第三增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_GAP,3，F_GAP,3中的每幅特征图中的所有特征值相同；第三增强残差块中的通道注意力层中的第十四卷积层的输入端接收F_GAP,3中的所有特征图，第三增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_DS,3；第三增强残差块中的通道注意力层中的第十五卷积层的输入端接收F_DS,3中的所有特征图，第三增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_US,3；将F_US,3中的所有特征图与

中的所有特征图进行逐元素相乘，将得到的所有特征图作为第三增强残差块中的通道注意力层的输出端输出的所有特征图，将这些特征图构成的集合记为F_CA,3；The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives

All feature maps in , the output of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as F _GAP,3 , all feature values in each feature map in F _GAP,3 are the same; in the channel attention layer in the third enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F _GAP,3 , and the output of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 4 widths of

and the height is

The feature map of the output, the set of all output feature maps is denoted as F _DS,3 ; the input end of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives F _DS,3 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 images with a width of

and the height is

The feature _map _of the

All feature maps in are multiplied element by element, and all the feature maps obtained are used as all feature maps output by the output of the channel attention layer in the third enhanced residual block, and the set formed by these feature maps is denoted as F _{CA ,3} ;

将F_CA,3中的所有特征图与F_En,2中的所有特征图进行逐元素相加，将得到的所有特征图作为第三增强残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_En,3；Add all feature maps in F _CA,3 and all feature maps in F _En,2 element-wise, and use all the resulting feature maps as all feature maps output by the output of the third enhanced residual block. These features The set of graphs is F _En,3 ;

上述，第一增强残差块、第二增强残差块和第三增强残差块各自中的第十卷积层和第十一卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、均不采用激活函数，第一增强残差块、第二增强残差块和第三增强残差块各自中的第十二卷积层和第十三卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、采用的激活函数均为“ReLU”，第一增强残差块、第二增强残差块和第三增强残差块各自中的第十四卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为4、采用的激活函数为“ReLU”，第一增强残差块、第二增强残差块和第三增强残差块各自中的第十五卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为4、输出通道数为64、采用的激活函数为“Sigmoid”。As mentioned above, the size of the convolution kernels of the tenth convolution layer and the eleventh convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block are all 3×3, volume The product step size is 1, the number of input channels is 64, the number of output channels is 64, and no activation function is used. The first enhanced residual block, the second enhanced residual block and the third enhanced residual block are each The size of the convolution kernel of the twelfth convolutional layer and the thirteenth convolutional layer is both 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 64, and the activation The functions are all "ReLU", the size of the convolution kernel of the fourteenth convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block is 1 × 1, the convolution step The length is 1, the number of input channels is 64, the number of output channels is 4, the activation function used is "ReLU", the tenth of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block. The size of the convolution kernel of the five convolutional layers is 1 × 1, the convolution stride is 1, the number of input channels is 4, the number of output channels is 64, and the activation function used is "Sigmoid".

与现有技术相比，本发明的优点在于：Compared with the prior art, the advantages of the present invention are:

1)本发明方法考虑到传统2D相机可采集丰富的空间信息，其可作为光场图像空间分辨率重建的补偿信息，因此同时使用光场图像和2D高分辨率图像，并在此基础上，构造了一个端到端的卷积神经网络以充分利用两者信息以重建高空间分辨率光场图像，并恢复细致的纹理信息以及保留重建结果的视差结构。1) The method of the present invention considers that the traditional 2D camera can collect rich spatial information, which can be used as compensation information for the spatial resolution reconstruction of the light field image, so the light field image and the 2D high-resolution image are used at the same time, and on this basis, An end-to-end convolutional neural network is constructed to fully utilize both information to reconstruct high spatial resolution light field images, and to recover detailed texture information as well as preserve the disparity structure of the reconstruction results.

2)为建立光场图像与2D高分辨率图像之间的联系，本发明方法构建了孔径级特征配准模块以在高维特征空间探索光场图像与2D高分辨率图像之间的相关性，进而准确地将2D高分辨率图像的特征信息配准到光场图像下；此外，本发明方法利用构建的光场特征增强模块以将配准得到的高分辨率特征与从低空间分辨率光场图像中提取的浅层光场特征进行多层次融合，以有效生成高空间分辨率光场特征，进而可将其重建为高空间分辨率光场图像。2) In order to establish the connection between the light field image and the 2D high-resolution image, the method of the present invention constructs an aperture-level feature registration module to explore the correlation between the light field image and the 2D high-resolution image in the high-dimensional feature space , and then accurately register the feature information of the 2D high-resolution image to the light-field image; in addition, the method of the present invention utilizes the constructed light-field feature enhancement module to match the registered high-resolution feature with the low spatial resolution feature. The shallow light field features extracted from the light field image are fused at multiple levels to effectively generate high spatial resolution light field features, which can then be reconstructed into high spatial resolution light field images.

3)为提高灵活性和实用性，本发明方法采用了一种金字塔网络重建方式，其通过在不同金字塔水平重建特定尺度的超分辨率结果，以渐进式地提高光场图像的空间分辨率并恢复纹理和细节，因而可在一次前向推断中重建多尺度结果(如包含2×，4×和8×)；此外，本发明方法在不同金字塔水平下采用权重共享策略以有效降低所构建的金字塔网络的参数量并减轻训练负担。3) In order to improve flexibility and practicability, the method of the present invention adopts a pyramid network reconstruction method, which progressively improves the spatial resolution of light field images by reconstructing super-resolution results of specific scales at different pyramid levels. The texture and details are recovered, so that multi-scale results (such as including 2×, 4× and 8×) can be reconstructed in one forward inference; in addition, the method of the present invention adopts a weight sharing strategy at different pyramid levels to effectively reduce the constructed The amount of parameters of the pyramid network and reduce the training burden.

附图说明Description of drawings

图1为本发明方法的总体实现流程框图；Fig. 1 is the overall realization flow chart of the method of the present invention;

图2为本发明方法构建的卷积神经网络即空间超分辨率网络的组成结构示意图；2 is a schematic diagram of the composition structure of a convolutional neural network constructed by the method of the present invention, that is, a spatial super-resolution network;

图3a为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块的组成结构示意图；3a is a schematic diagram of the composition structure of a light field feature enhancement module in a convolutional neural network constructed by the method of the present invention, that is, a spatial super-resolution network;

图3b为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块中的第一空间特征变换层和第二空间特征变换层的组成结构示意图；3b is a schematic diagram of the composition structure of the first spatial feature transformation layer and the second spatial feature transformation layer in the light field feature enhancement module in the convolutional neural network constructed by the method of the present invention, that is, the spatial super-resolution network;

图3c为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块中的第一空间角度卷积层和第二空间角度卷积层的组成结构示意图；3c is a schematic diagram of the composition structure of the first spatial angle convolution layer and the second spatial angle convolution layer in the light field feature enhancement module in the light field feature enhancement module in the spatial super-resolution network constructed by the method of the present invention;

图3d为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块中的通道注意力层的组成结构示意图；3d is a schematic diagram of the composition structure of a channel attention layer in a light field feature enhancement module in a convolutional neural network constructed by the method of the present invention, that is, a spatial super-resolution network;

图4为本发明方法建立的金字塔网络重建方式的说明示意图；FIG. 4 is a schematic diagram illustrating the reconstruction method of the pyramid network established by the method of the present invention;

图5a为采用双三次插值方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 5a is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the tested EPFL light-field image database using the bicubic interpolation method, and the sub-aperture image under the center coordinates is taken here for display;

图5b为采用Haris等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 5b is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Haris et al. ;

图5c为采用Lai等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 5c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Lai et al. ;

图5d为采用Yeung等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 5d is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Yeung et al. ;

图5e为采用Wang等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 5e is a reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Wang et al. ;

图5f为采用Jin等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 5f shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Jin et al. ;

图5g为采用Boominathan等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 5g is a reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Boominathan et al. ;

图5h为采用本发明方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；5h is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the tested EPFL light-field image database by the method of the present invention, and the sub-aperture image under the center coordinates is taken here to show;

图5i为测试的EPFL光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 5i is the label high spatial resolution light field image corresponding to the low spatial resolution light field image in the tested EPFL light field image database, and the sub-aperture image under the center coordinates is taken here to show;

图6a为采用双三次插值方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 6a is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the tested STFLytro light-field image database using the bicubic interpolation method, and the sub-aperture image under the center coordinates is taken here for display;

图6b为采用Haris等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 6b is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Haris et al. ;

图6c为采用Lai等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 6c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Lai et al. ;

图6d为采用Yeung等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 6d is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Yeung et al. ;

图6e为采用Wang等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 6e shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Wang et al. ;

图6f为采用Jin等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 6f shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Jin et al. ;

图6g为采用Boominathan等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；Figure 6g is a reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Boominathan et al. ;

图6h为采用本发明方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；6h is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the STFLytro light-field image database tested by the method of the present invention, and the sub-aperture image under the center coordinates is taken here to show;

图6i为测试的STFLytro光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像。Figure 6i shows the labeled high spatial resolution light field image corresponding to the low spatial resolution light field image in the tested STFLytro light field image database.

具体实施方式Detailed ways

以下结合附图实施例对本发明作进一步详细描述。The present invention will be further described in detail below with reference to the embodiments of the accompanying drawings.

随着沉浸式媒体和技术的发展，用户越来越倾向于观看具有交互式和沉浸感的图像/视频等视觉内容。但是，传统2D成像方法仅能采集场景中光线的强度信息，无法提供场景的深度信息。相比之下，3D成像技术可获取更多的场景信息，然而其包含的深度信息有限，一般用于立体显示。光场成像作为一种新兴的成像技术，可在单次拍摄中同时采集场景中光线的强度和方向信息，进而更有效地记录真实世界，正受到广泛关注。同时，一些基于光场成像的光学仪器和设备已被开发以促进光场技术的应用与发展。受限于成像传感器的尺寸，利用光场相机获取的4D光场图像存在空间和角度分辨率相互制衡的问题。简单来说，4D光场图像在提供高角度分辨率的同时，不可避免地会遭受低空间分辨率，这严重影响了4D光场图像的实际应用，如重聚焦、深度估计等，针对此，本发明提出了一种光场图像空间超分辨率重建方法，With the development of immersive media and technology, users are more and more inclined to watch visual content such as images/videos with an interactive and immersive sense. However, the traditional 2D imaging method can only collect the intensity information of the light in the scene, and cannot provide the depth information of the scene. In contrast, 3D imaging technology can obtain more scene information, but it contains limited depth information and is generally used for stereoscopic display. As an emerging imaging technology, light field imaging can simultaneously capture the intensity and direction information of light in a scene in a single shot, thereby more effectively recording the real world, and is receiving widespread attention. Meanwhile, some optical instruments and devices based on light field imaging have been developed to promote the application and development of light field technology. Limited by the size of the imaging sensor, the 4D light field image obtained by the light field camera has the problem of balance between the spatial and angular resolution. In simple terms, 4D light field images inevitably suffer from low spatial resolution while providing high angular resolution, which seriously affects the practical applications of 4D light field images, such as refocusing, depth estimation, etc. The present invention proposes a light field image space super-resolution reconstruction method,

其通过异构式成像以在捕获光场图像的同时获取一幅2D高分辨率图像，进而将捕获的2D高分辨率图像作为补充信息来帮助增强光场图像的空间分辨率，具体是构建了一个空间超分辨率网络，其主要包括编码器、孔径级特征配准模块、光场特征增强模块和解码器等部分；首先利用编码器来分别对上采样后的低空间分辨率光场图像、模糊后的2D高分辨率图像和2D高分辨率图像本身提取多尺度特征；之后通过孔径级特征配准模块来学习2D高分辨率特征与低分辨率光场特征之间的对应性，以将2D高分辨率特征配准到光场图像的每个子孔径图像下并形成配准后的高分辨率光场特征；然后通过光场特征增强模块以利用配准得到的高分辨率光场特征来增强从输入光场图像中提取的浅层光场特征，得到增强后的高分辨率光场特征；最后，利用解码器将增强后的高分辨率光场特征重建为高质量的高空间分辨率光场图像；此外，采用金字塔网络重建架构，以在每个金字塔水平重建特定上采样尺度的高空间分辨率光场图像，进而可同时生成多尺度重建结果。It uses heterogeneous imaging to acquire a 2D high-resolution image while capturing the light-field image, and then uses the captured 2D high-resolution image as supplementary information to help enhance the spatial resolution of the light-field image. A spatial super-resolution network mainly includes an encoder, an aperture-level feature registration module, a light field feature enhancement module, and a decoder. The blurred 2D high-resolution image and the 2D high-resolution image itself extract multi-scale features; then the aperture-level feature registration module is used to learn the correspondence between 2D high-resolution features and low-resolution light field features to The 2D high-resolution features are registered under each sub-aperture image of the light field image to form the registered high-resolution light field features; Enhance the shallow light field features extracted from the input light field image to obtain the enhanced high-resolution light field features; finally, use the decoder to reconstruct the enhanced high-resolution light field features into high-quality high spatial resolution light field images; in addition, a pyramid network reconstruction architecture is employed to reconstruct high spatial resolution light field images at specific upsampling scales at each pyramid level, which in turn can generate multi-scale reconstruction results simultaneously.

本发明提出的一种光场图像空间超分辨率重建方法，其总体实现流程框图如图1所示，其包括以下步骤：A light field image spatial super-resolution reconstruction method proposed by the present invention, the overall implementation flowchart is shown in Figure 1, which includes the following steps:

步骤1：选取Num幅空间分辨率为W×H且角度分辨率为V×U的彩色三通道的低空间分辨率光场图像、对应的Num幅分辨率为αW×αH的彩色三通道的2D高分辨率图像，以及对应的Num幅空间分辨率为αW×αH且角度分辨率为V×U的彩色三通道的参考高空间分辨率光场图像；其中，Num＞1，在本实施例中取Num＝200，在本实施例中W×H为75×50、V×U为5×5，α表示空间分辨率提升倍数，α的值大于1，在本实施例中取α的值为8。Step 1: Select Num light field images of low spatial resolution with three color channels with spatial resolution of W×H and angular resolution of V×U, and corresponding Num 2D images of three color channels with resolution of αW×αH A high-resolution image, and a corresponding Num reference high-spatial-resolution light field image with three color channels with a spatial resolution of αW×αH and an angular resolution of V×U; where Num>1, in this embodiment Take Num=200, in this embodiment, W×H is 75×50, V×U is 5×5, α represents the spatial resolution improvement multiple, and the value of α is greater than 1, in this embodiment, the value of α is taken as 8.

步骤2：构建一个卷积神经网络，作为空间超分辨率网络：如图2所示，空间超分辨率网络包括用于提取多尺度特征的编码器、用于配准光场特征和2D高分辨率特征的孔径级特征配准模块、用于从低空间分辨率光场图像中提取浅层特征的浅层特征提取层、用于融合光场特征和2D高分辨率特征的光场特征增强模块、用于缓解粗尺度特征中的配准误差的空间注意力块、用于将潜在特征重建为光场图像的解码器。Step 2: Build a convolutional neural network as a spatial super-resolution network: As shown in Figure 2, the spatial super-resolution network includes an encoder for extracting multi-scale features, a light field feature for registration, and a 2D high-resolution network. Aperture-level feature registration module for rate features, shallow feature extraction layer for extracting shallow features from low spatial resolution light field images, light field feature enhancement module for fusing light field features and 2D high resolution features , Spatial attention blocks for alleviating registration errors in coarse-scale features, Decoders for reconstructing latent features into light-field images.

输出64幅宽度为α_sW×V且高度为α_sH×U的特征图，将针对

输出的所有特征图构成的集合记为

第一卷积层的输出端针对

输出64幅宽度为α_sW且高度为α_sH的特征图，将针对

输出的所有特征图构成的集合记为

中的所有特征图、

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二卷积层的输出端针对Y_HR,0输出64幅宽度为

且高度为

中的所有特征图、

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块的输出端针对Y_HR,1输出64幅宽度为

且高度为

中的所有特征图、

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块的输出端针对Y_HR,2输出64幅宽度为

且高度为

为通过对空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像L_LR进行现有的双三次插值上采样后得到的图像重组的宽度为α_sW×V且高度为α_sH×U的子孔径图像阵列，

为通过对I_HR先进行双三次插值下采样、后进行双三次插值上采样得到，α_s表示空间分辨率采样因子，在本实施例中α_s取值为2，α_s ³＝α，双三次插值上采样的上采样因子和双三次插值下采样的下采样因子均取值为α_s，第一卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64，第二卷积层的卷积核的尺寸为3×3、卷积步长为2、输入通道数为64、输出通道数为64，第一卷积层和第二卷积层采用的激活函数均为“ReLU”。For the encoder, it consists of a first convolutional layer, a second convolutional layer, a first residual block, and a second residual block connected in sequence. The input of the first convolutional layer receives three inputs in parallel, which are A single-channel image _LLR of a low spatial resolution light field image with a spatial resolution of W×H and an angular resolution of V×U The width of the image reconstruction obtained by up-sampling the spatial resolution is α _s W×V and the sub-aperture image array with height α _s H×U, denoted as

The set of all output feature maps is denoted as

The output of the first convolutional layer is for

The set of all output feature maps is denoted as

All feature maps in ,

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the second convolutional layer is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the second convolutional layer outputs 64 widths for Y _{HR, 0}

and the height is

The feature map of Y _HR,0 will be marked as Y _HR,1 for the set formed by all the feature maps output; the input end of the first residual block receives three inputs in parallel, which are respectively

All feature maps in ,

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the first residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the first residual block is for Y _{HR, 1} outputs 64 widths of

and the height is

All feature maps in ,

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the second residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

and the height is

is the width of the image reconstruction obtained by upsampling the existing bicubic interpolation on the single-channel image _LLR of the low spatial resolution light field image with spatial resolution W×H and angular resolution V×U. a subaperture image array of _s W × V and height α _s H × U,

In order to obtain by first performing bicubic interpolation downsampling on I _HR , and then performing bicubic interpolation upsampling, α _s represents the spatial resolution sampling factor, and in this embodiment, α _s is 2, α _s ³ =α, and α s 3 =α. The upsampling factor of the cubic interpolation upsampling and the downsampling factor of the bicubic interpolation downsampling are both α _s , the size of the convolution kernel of the first convolution layer is 3×3, the convolution stride is 1, and the input channel The number is 1, the number of output channels is 64, the size of the convolution kernel of the second convolution layer is 3×3, the convolution stride is 2, the number of input channels is 64, the number of output channels is 64, and the first convolution layer is And the activation function used by the second convolutional layer is "ReLU".

中的所有特征图，第二类是

且高度变为

即使得尺寸与

中的特征图的尺寸相匹配；然后对

中的所有特征图和

中的所有特征图进行现有的块匹配，块匹配结束后得到一幅宽度为

且高度为

中的所有特征图进行空间位置配准，得到64幅宽度为

且高度为

中的所有特征图进行空间位置配准，得到64幅宽度为

且高度为

中的所有特征图进行空间位置配准，得到64幅宽度为

且高度为

最后根据

将Y_HR,0中的所有特征图与

中的所有特征图进行空间位置配准，得到64幅宽度为α_sW×V且高度为α_sH×U的配准特征图，将得到的所有配准特征图构成的集合记为F_Align,0；孔径级特征配准模块的输出端输出F_Align,0中的所有特征图、F_Align,1中的所有特征图、F_Align,2中的所有特征图和F_Align,3中的所有特征图；其中，用于块匹配的精度衡量指标为纹理和结构相似度指数，用于块匹配的块的尺寸为3×3，双三次插值上采样的上采样因子为α_s；由于高层特征更紧凑地描述了图像在语义层面的相似性，同时抑制了不相关的纹理，因此这里是对

中的所有特征图和

中的所有特征图进行块匹配，得到的坐标索引图P_CI反映了

中的特征图与

中的特征图之间的空间位置配准关系，另外，卷积操作不会改变特征图的空间位置信息，P_CI也反映了

中的特征图与

中的特征图之间的空间位置配准关系，以及

中的特征图与

中的特征图之间的空间位置配准关系，经双三次插值上采样后得到的

反映了

中的特征图与

中的特征图之间的空间位置配准关系。For the aperture-level feature registration module, its input receives three types of feature maps. The first type is

All feature maps in , the second class is

and the height becomes

even if the size is the same as

to match the dimensions of the feature maps in ; then

All feature maps in and

All feature maps in

and the height is

All feature maps in

and the height is

The registration _feature _map _of the

All feature maps in

and the height is

The registration _feature _map _of the

All feature maps in

and the height is

Finally according to

Combine all feature maps in Y _HR,0 with

All the feature maps in the spatial position are registered, and 64 registered feature maps with a width of α _s W×V and a height of α _s H×U are obtained, and the set of all the obtained registration feature maps is denoted as F _{Align ,0} ; the output of the aperture-level feature registration module outputs all feature maps in F _Align,0 , all feature maps in F _Align,1 , all feature maps in F _Align,2 , and all feature maps in F _Align,3 Feature map; among them, the accuracy measurement index used for block matching is texture and structure similarity index, the size of the block used for block matching is 3×3, and the upsampling factor of bicubic interpolation is α _s ; due to high-level features More compactly describes the similarity of images at the semantic level while suppressing irrelevant textures, so here is the

All feature maps in and

All feature _maps in

The feature maps in and

In addition, the convolution operation does not change the spatial position information of the feature maps, and the _PCI also reflects

The feature maps in and

The spatial location registration relationship between the feature maps in , and

The feature maps in and

The spatial position registration relationship between the feature maps in , obtained after upsampling by bicubic interpolation

Reflects

The feature maps in and

The spatial location registration relationship between the feature maps in .

对于浅层特征提取层，其由1个第五卷积层组成，第五卷积层的输入端接收一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像L_LR重组的宽度为W×V且高度为H×U的子孔径图像阵列，第五卷积层的输出端输出64幅宽度为W×V且高度为H×U的特征图，将输出的所有特征图构成的集合记为F_LR；其中，第五卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64，第五卷积层采用的激活函数为“ReLU”。For the shallow feature extraction layer, it consists of a fifth convolutional layer. The input of the fifth convolutional layer receives a low spatial resolution light with a spatial resolution of W×H and an angular resolution of V×U. The single-channel image _LLR recombination of the field image is a sub-aperture image array with a width of W×V and a height of H×U, and the output of the fifth convolutional layer outputs 64 images with a width of W×V and a height of H×U. Feature map, the set of all output feature maps is denoted as F _LR ; among them, the size of the convolution kernel of the fifth convolution layer is 3×3, the convolution step size is 1, the number of input channels is 1, and the output channel is 1. The number is 64, and the activation function used by the fifth convolutional layer is "ReLU".

对于光场特征增强模块，如图3a所示，其由依次连接的第一增强残差块、第二增强残差块和第三增强残差块组成，第一增强残差块的输入端接收F_Align,1中的所有特征图和F_LR中的所有特征图，在α_s取值为2时W×V等价于

H×U等价于

即F_LR中的特征图的尺寸与F_Align,1中的特征图的尺寸相同，第一增强残差块的输出端输出64幅宽度为

且高度为

且高度为

且高度为

的特征图，将输出的所有特征图构成的集合记为F_En,3。For the light field feature enhancement module, as shown in Figure 3a, it consists of a first enhanced residual block, a second enhanced residual block and a third enhanced residual block connected in sequence. The input of the first enhanced residual block receives All feature maps in F _Align,1 and all feature maps in F _LR , when α _s is 2, W×V is equivalent to

H×U is equivalent to

That is, the size of the feature map in F _LR is the same as the size of the feature map in F _Align,1 , and the output of the first enhanced residual block outputs 64 widths of

and the height is

The feature map of the output, the set composed of all the output feature maps is denoted as F _En,1 ; the input end of the second enhanced residual block receives all the feature maps in F _Align,2 and all the feature maps in F _En,1 , The output end of the second enhanced residual block outputs 64 widths of

and the height is

The feature map of the output, the set of all the output feature maps is denoted as F _En,2 ; the input end of the third enhanced residual block receives all the feature maps in F _Align,3 and all the feature maps in F _En,2 , The output end of the third enhanced residual block outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as F _En,3 .

对于空间注意力块，其由依次连接的第六卷积层和第七卷积层组成，第六卷积层的输入端接收F_Align,0中的所有特征图，第六卷积层的输出端输出64幅宽度为α_sW×V且高度为α_sH×U的空间注意力特征图，将输出的所有空间注意力特征图构成的集合记为F_SA1；第七卷积层的输入端接收F_SA1中的所有空间注意力特征图，第七卷积层的输出端输出64幅宽度为α_sW×V且高度为α_sH×U的空间注意力特征图，将输出的所有空间注意力特征图构成的集合记为F_SA2；将F_Align,0中的所有特征图与F_SA2中的所有空间注意力特征图进行逐元素相乘，将得到的所有特征图构成的集合记为F_WA,0；将F_WA,0中的所有特征图作为空间注意力块的输出端输出的所有特征图；其中，第六卷积层和第七卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64，第六卷积层采用的激活函数为“ReLU”，第七卷积层采用的激活函数为“Sigmoid”。For the spatial attention block, it consists of the sixth convolutional layer and the seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives all the feature maps in F _Align,0 , and the output of the sixth convolutional layer The terminal outputs 64 spatial attention feature maps with a width of α _s W×V and a height of α _s H×U, and the set of all output spatial attention feature maps is denoted as F _SA1 ; the input of the seventh convolutional layer The terminal receives all the spatial attention feature maps in F _SA1 , and the output terminal of the seventh convolutional layer outputs 64 spatial attention feature maps with a width of α _s W×V and a height of α _s H×U. The set composed of spatial attention feature maps is denoted as F _SA2 ; all feature maps in F _Align,0 are multiplied element by element with all spatial attention feature maps in F _SA2 , and the set composed of all the obtained feature maps is denoted. is F _WA,0 ; all feature maps in F _WA,0 are used as all feature maps output by the output of the spatial attention block; wherein, the size of the convolution kernels of the sixth convolutional layer and the seventh convolutional layer are all It is 3 × 3, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 64. The activation function used in the sixth convolution layer is "ReLU", and the activation function used in the seventh convolution layer. for "Sigmoid".

且高度为

且高度为

且高度为

的特征图，并将256幅宽度为

且高度为

的特征图进一步转换为64幅宽度为α_sW×V且高度为α_sH×U的特征图，将转换后的所有特征图构成的集合记为F_Dec,3；第八卷积层的输入端接收F_Dec,3中的所有特征图与F_WA,0中的所有特征图进行逐元素相加后的结果，第八卷积层的输出端输出64幅宽度为α_sW×V且高度为α_sH×U的特征图，将输出的所有特征图构成的集合记为F_Dec,4；第九卷积层的输入端接收F_Dec,4中的所有特征图，第九卷积层的输出端输出一幅宽度为α_sW×V且高度为α_sH×U的重建单通道光场图像，并将该幅宽度为α_sW×V且高度为α_sH×U的重建单通道光场图像重组为空间分辨率为α_sW×α_sH且角度分辨率为V×U的高空间分辨率单通道光场图像，记为L_SR；其中，子像素卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为256，第八卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为64，第九卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为1，子像素卷积层和第八卷积层采用的激活函数均为“ReLU”，第九卷积层不采用激活函数。For the decoder, it consists of the third residual block, the fourth residual block, the sub-pixel convolutional layer, the eighth convolutional layer and the ninth convolutional layer connected in sequence, and the input of the third residual block receives F For all feature maps in _En,3 , the output of the third residual block outputs 64 widths of

and the height is

feature map of , and set the width of 256 as

and the height is

The feature maps are further converted into 64 feature maps with a width of α _s W×V and a height of α _s H×U, and the set of all the converted feature maps is denoted as F _Dec,3 ; The input terminal receives the result of element-wise addition of all feature maps in F _Dec,3 and all feature maps in F _WA,0 , and the output terminal of the eighth convolutional layer outputs 64 images with a width of α _s W×V and A feature map with a height of α _s H×U, denote the set of all output feature maps as F _Dec,4 ; the input of the ninth convolutional layer receives all the feature maps in F _Dec,4 , and the ninth convolution The output of the layer outputs a reconstructed single-channel light field image with width α _s W×V and height α _s H×U, and converts the image with width α _s W×V and height α _s H×U. The reconstructed single-channel light field image is reorganized into a high spatial resolution single-channel light field image with a spatial resolution of α _s W×α _s H and an angular resolution of V×U, denoted as L _SR ; among them, the sub-pixel convolution layer The size of the convolution kernel is 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 256, and the size of the convolution kernel of the eighth convolution layer is 3×3, the convolution stride is 1 is 1, the number of input channels is 64, the number of output channels is 64, the size of the convolution kernel of the ninth convolutional layer is 1×1, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 1, The activation functions used in the sub-pixel convolutional layer and the eighth convolutional layer are all "ReLU", and the ninth convolutional layer does not use an activation function.

步骤3_1：如图4所示，将构建好的空间超分辨率网络复制三次，并进行级联，每个空间超分辨率网络的权重共享，即参数全都一样，将三个空间超分辨率网络构成的整体网络定义为金字塔网络；在每个金字塔水平，空间超分辨率网络的重建尺度设置为与α_s取值相同，α_s取值为2时即将光场图像的空间分辨率提高2倍，因此最终的重建尺度可达到8，即α＝α_s ³＝8。Step 3_1: As shown in Figure 4, the constructed spatial super-resolution network is copied three times and cascaded. The weights of each spatial super-resolution network are shared, that is, the parameters are all the same. The overall network formed is defined as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be the same as the value of α _s . When the value of α _s is 2, the spatial resolution of the light field image is increased by 2 times. , so the final reconstruction scale can reach 8, that is, α=α _s ³ =8.

步骤3_2：对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行两次空间分辨率下采样，将下采样后得到的图像作为标签图像；对训练集中的每幅2D高分辨率图像的Y通道图像也进行两次同样的空间分辨率下采样，将下采样后得到的图像作为针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像；然后将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第一个空间超分辨率网络中进行训练，得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的α_s倍重建高空间分辨率Y通道光场图像；其中，空间分辨率上采样和空间分辨率下采样的方式均为双三次插值，空间分辨率上采样和空间分辨率下采样的尺度均与α_s取值相同。Step 3_2: Perform two spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the rate image is also down-sampled twice with the same spatial resolution, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the first spatial super-resolution network in the pyramid network; The sub-aperture image array of the Y-channel image reconstruction of all low-spatial-resolution light-field images in the training set, and the image reconstruction of the Y-channel images of all low-spatial-resolution light-field images in the training set after a spatial resolution upsampling. Subaperture image array, all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after one spatial resolution downsampling and one spatial resolution upsampling is input into the first spatial super-resolution network in the constructed pyramid network for training, and the training is obtained. The high spatial resolution Y channel light field image is reconstructed by α _s times corresponding to the Y channel image of each low spatial resolution light field image in the collection; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic The scales of interpolation, spatial resolution upsampling and spatial resolution downsampling are all the same as α _s .

步骤3_3：对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行单次空间分辨率下采样，将下采样后得到的图像作为标签图像；对训练集中的每幅2D高分辨率图像的Y通道图像也进行单次同样的空间分辨率下采样，将下采样后得到的图像作为针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像；然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的α_s倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的α_s倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第二个空间超分辨率网络中进行训练，得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的α_s ²倍重建高空间分辨率Y通道光场图像；其中，空间分辨率上采样和空间分辨率下采样的方式均为双三次插值，空间分辨率上采样和空间分辨率下采样的尺度均与α_s取值相同。Step 3_3: Perform a single spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the high-speed image is also down-sampled at the same spatial resolution once, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the second spatial super-resolution network in the pyramid network; α _s times corresponding to Y-channel images of all low-spatial-resolution light-field images in the training set Reconstructed sub-aperture image arrays of high-spatial-resolution Y-channel light-field images, Y of all low-spatial-resolution light-field images in the training set The α _s -fold reconstruction of the high spatial resolution Y channel light field image corresponding to the channel image is the sub-aperture image array of the image reconstruction obtained after one spatial resolution upsampling, and all the images for the second spatial super-resolution network in the pyramid network. The 2D high-resolution Y-channel image and all 2D high-resolution Y-channel images for the second spatial super-resolution network in the pyramid network are obtained after one spatial resolution downsampling and one spatial resolution upsampling. The 2D high-resolution Y-channel image is input into the second spatial super-resolution network in the constructed pyramid network for training, and the α _s ² corresponding to the Y-channel image of each low-spatial-resolution light field image in the training set is obtained. The Y-channel light field image with high spatial resolution is reconstructed by 2 times; among them, the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, and the scales of spatial resolution upsampling and spatial resolution downsampling are the same as α _s The value is the same.

步骤3_4：将训练集中的每幅参考高空间分辨率光场图像的Y通道图像作为标签图像；将训练集中的每幅2D高分辨率图像的Y通道图像作为针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像；然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的α_s ²倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的α_s ²倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第三个空间超分辨率网络中进行训练，得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的α_s ³倍重建高空间分辨率Y通道光场图像；其中，空间分辨率上采样和空间分辨率下采样的方式均为双三次插值，空间分辨率上采样和空间分辨率下采样的尺度均与α_s取值相同。Step 3_4: Use the Y-channel image of each reference high-spatial-resolution light field image in the training set as the label image; use the Y-channel image of each 2D high-resolution image in the training set as the third spatial image in the pyramid network. 2D high-resolution Y-channel images of the super-resolution network; then _αs ² times the corresponding Y-channel images of all low-spatial-resolution light-field images in the training set reconstructed sub-apertures of the high-spatial-resolution Y-channel light-field images Image array, sub-aperture image of image reconstruction obtained by upsampling of high spatial resolution Y channel light field images corresponding to α _s ² times corresponding to Y channel images of all low spatial resolution light field images in the training set array, all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after resolution downsampling and one spatial resolution upsampling is input into the third spatial super-resolution network in the constructed pyramid network for training, and each image in the training set is obtained. The high spatial resolution Y channel light field image is reconstructed by ³ times the α _s corresponding to the Y channel image of the low spatial resolution light field image; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, The scale of spatial resolution upsampling and spatial resolution downsampling is the same as α _s .

在训练结束后得到金字塔网络中的各空间超分辨率网络中的所有卷积核的最佳权重参数，即得到训练有素的空间超分辨率网络模型；该网络模型在每个金字塔水平实现特定的超分辨率重建尺度，因而可在一次前向推断中输出多尺度超分辨率结果(即在α_s取值为2时尺度为2×、4×和8×)；此外，通过对各金字塔水平下的空间超分辨率网络进行权重共享，可有效减少网络参数量并降低训练负担。After the training, the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network are obtained, that is, a well-trained spatial super-resolution network model is obtained; the network model achieves specific The super-resolution _{reconstruction} scale of the The horizontal spatial super-resolution network performs weight sharing, which can effectively reduce the amount of network parameters and reduce the training burden.

在本实施例中，步骤2中，第一残差块、第二残差块、第三残差块和第四残差块的结构相同，其均由依次连接的第三卷积层和第四卷积层组成，第一残差块中的第三卷积层的输入端并行接收三个输入，分别为

中的所有特征图、

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块中的第三卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块中的第三卷积层的输出端针对Y_HR,1输出64幅宽度为

且高度为

的特征图，将针对Y_HR,1输出的所有特征图构成的集合记为

中的所有特征图、

中的所有特征图和

中的所有特征图，第一残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第一残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

将

中的所有特征图与

输出的所有特征图，这些特征图构成的集合即为

将

中的所有特征图与

输出的所有特征图，这些特征图构成的集合即为

将Y_HR,1中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第一残差块的输出端针对Y_HR,1输出的所有特征图，这些特征图构成的集合即为Y_HR,2。In this embodiment, in step 2, the structures of the first residual block, the second residual block, the third residual block and the fourth residual block are the same, and they are all composed of the third convolutional layer and the fourth residual block connected in sequence. It consists of four convolutional layers, and the input of the third convolutional layer in the first residual block receives three inputs in parallel, which are

All feature maps in ,

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the third convolutional layer in the first residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the third convolutional layer in the first residual block is for Y _{HR, 1} outputs 64 widths of

and the height is

All feature maps in ,

All feature maps in and

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the fourth convolutional layer in the first residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the fourth convolutional layer in the first residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

Will

All feature maps in

All feature maps in the

All output feature maps, the set of these feature maps is

Will

All feature maps in

All feature maps in the

All output feature maps, the set of these feature maps is

Combine all feature maps in Y _HR,1 with

All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first residual block for Y _HR,1 , and the set formed by these feature maps is Y _HR,2 .

中的所有特征图、

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块中的第三卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

且高度为

的特征图，将针对Y_HR,2输出的所有特征图构成的集合记为

中的所有特征图、

中的所有特征图和

中的所有特征图，第二残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

第二残差块中的第四卷积层的输出端针对

输出64幅宽度为

且高度为

的特征图，将针对

输出的所有特征图构成的集合记为

将

中的所有特征图与

输出的所有特征图，这些特征图构成的集合即为

将

中的所有特征图与

输出的所有特征图，这些特征图构成的集合即为

将Y_HR,2中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第二残差块的输出端针对Y_HR,2输出的所有特征图，这些特征图构成的集合即为Y_HR,3。The input of the third convolutional layer in the second residual block receives three inputs in parallel, which are

All feature maps in ,

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output of the third convolutional layer in the second residual block is for

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

and the height is

All feature maps in ,

All feature maps in and

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

The output 64 width is

and the height is

The feature map of , will be for

The set of all output feature maps is denoted as

Will

All feature maps in

All feature maps in the

All output feature maps, the set of these feature maps is

Will

All feature maps in

All feature maps in the

All output feature maps, the set of these feature maps is

Combine all feature maps in Y _HR,2 with

All feature maps in are added element by element, and all the obtained feature maps are used as the output of the second residual block for all feature maps output by Y _HR,2 , and the set formed by these feature maps is Y _HR,3 .

且高度为

的特征图，将输出的所有特征图构成的集合记为

第三残差块中的第四卷积层的输入端接收

且高度为

的特征图，将输出的所有特征图构成的集合记为

将F_En,3中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第三残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_Dec,1。The input terminal of the third convolutional layer in the third residual block receives all the feature maps in F _En,3 , and the output terminal of the third convolutional layer in the third residual block outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as

All feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

Combine all feature maps in F _En,3 with

All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the third residual block, and the set formed by these feature maps is F _Dec,1 .

且高度为

的特征图，将输出的所有特征图构成的集合记为

第四残差块中的第四卷积层的输入端接收

且高度为

的特征图，将输出的所有特征图构成的集合记为

将F_Dec,1中的所有特征图与

中的所有特征图进行逐元素相加，将得到的所有特征图作为第四残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_Dec,2。The input terminal of the third convolutional layer in the fourth residual block receives all the feature maps in F _Dec,1 , and the output terminal of the third convolutional layer in the fourth residual block outputs 64 images with a width of

and the height is

The feature map of , denote the set of all output feature maps as

and the height is

The feature map of , denote the set of all output feature maps as

Combine all feature maps in F _Dec,1 with

All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the fourth residual block, and the set formed by these feature maps is F _Dec,2 .

在本实施例中，步骤2中，如图3a、图3b、图3c和图3d所示，第一增强残差块、第二增强残差块和第三增强残差块的结构相同，其均由依次连接的第一空间特征变换层、第一空间角度卷积层、第二空间特征变换层、第二空间角度卷积层和通道注意力层组成，第一空间特征变换层和第二空间特征变换层的结构相同，其均由并行的第十卷积层和第十一卷积层组成，第一空间角度卷积层和第二空间角度卷积层的结构相同，其均由依次连接的第十二卷积层和第十三卷积层组成，通道注意力层由依次连接的全局均值池化层、第十四卷积层和第十五卷积层组成。In this embodiment, in step 2, as shown in Fig. 3a, Fig. 3b, Fig. 3c, and Fig. 3d, the first enhanced residual block, the second enhanced residual block and the third enhanced residual block have the same structure. They are all composed of the first spatial feature transformation layer, the first spatial angle convolution layer, the second spatial feature transformation layer, the second spatial angle convolution layer and the channel attention layer, which are connected in sequence. The structure of the spatial feature transformation layer is the same, which consists of the tenth convolution layer and the eleventh convolution layer in parallel. The twelfth convolutional layer and the thirteenth convolutional layer are connected, and the channel attention layer is composed of the global mean pooling layer, the fourteenth convolutional layer and the fifteenth convolutional layer connected in sequence.

且高度为

的特征图，将输出的所有特征图构成的集合记为

且高度为

的特征图，将输出的所有特征图构成的集合记为

中的所有特征图进行逐元素相乘，再将相乘结果与

and the height is

The feature map of , denote the set of all output feature maps as

and the height is

The feature map of , denote the set of all output feature maps as

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

中的所有特征图进行从空间维转换到角度维的重组操作(重组操作是光场图像的常规处理手段，重组操作仅改变特征图中每个特征值的排列次序，不改变特征值的大小)，第一增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

All feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in the image are reorganized from the spatial dimension to the angle dimension (the reorganization operation is a conventional processing method for light field images. The reorganization operation only changes the arrangement order of each eigenvalue in the feature map, and does not change the size of the eigenvalues) , the input of the thirteenth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receives

The result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

且高度为

的特征图，将输出的所有特征图构成的集合记为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第一增强残差块中的第二空间特征变换层的输入端接收

中的所有特征图，将

中的所有特征图与

中的所有特征图进行逐元素相乘，再将相乘结果与

and the height is

The feature map of , denote the set of all output feature maps as

and the height is

The feature map of , denote the set of all output feature maps as

All feature maps in , will

All feature maps in

All feature maps in the

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

and the height is

The feature map of , denote the set of all output feature maps as

right

As a result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

且高度为

的特征图，将输出的所有特征图构成的集合记为F_GAP,1，F_GAP,1中的每幅特征图中的所有特征值相同(全局均值池化层是独立对输入端接收的每幅特征图计算全局均值，进而可将一幅特征图转换为单个特征值，然后将得到的特征值进行复制以恢复空间尺寸，即将单个特征值复制

倍，得到宽度为

且高度为

的特征图)；第一增强残差块中的通道注意力层中的第十四卷积层的输入端接收F_GAP,1中的所有特征图，第一增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为

且高度为

且高度为

中的所有特征图进行逐元素相乘，将得到的所有特征图作为第一增强残差块中的通道注意力层的输出端输出的所有特征图，将这些特征图构成的集合记为F_CA,1。The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives

and the height is

The feature map of , denote the set of all output feature maps as F _GAP,1 , and all feature values in each feature map in F _GAP,1 are the same (the global mean pooling layer is an independent process for each input received by the input. The global mean is calculated from the feature maps, and then a feature map can be converted into a single feature value, and then the obtained feature value is copied to restore the spatial size, that is, the single feature value is copied

times, resulting in a width of

and the height is

The input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives all the feature maps in F _GAP,1 , the channel attention in the first enhanced residual block The output of the fourteenth convolutional layer in the layer outputs 4 widths of

and the height is

The feature _map _of the

Perform element-wise multiplication of all feature maps in , and take all the feature maps obtained as all feature maps output by the output of the channel attention layer in the first enhanced residual block, and denote the set of these feature maps as F _{CA ,1} .

将F_CA,1中的所有特征图与F_LR中的所有特征图进行逐元素相加，将得到的所有特征图作为第一增强残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_En,1。Add all feature maps in F _CA,1 and all feature maps in F _LR element by element, and use all the feature maps obtained as all feature maps output by the output of the first enhanced residual block. These feature maps constitute The set of is F _En,1 .

且高度为

的特征图，将输出的所有特征图构成的集合记为

且高度为

的特征图，将输出的所有特征图构成的集合记为

中的所有特征图进行逐元素相乘，再将相乘结果与

and the height is

The feature map of , denote the set of all output feature maps as

and the height is

The feature map of , denote the set of all output feature maps as

All feature maps in the

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

and the height is

The feature map of , denote the set of all output feature maps as

right

The result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

且高度为

的特征图，将输出的所有特征图构成的集合记为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第二增强残差块中的第二空间特征变换层的接收端接收

中的所有特征图，将

中的所有特征图与

中的所有特征图进行逐元素相乘，再将相乘结果与

and the height is

The feature map of , denote the set of all output feature maps as

and the height is

The feature map of , denote the set of all output feature maps as

All feature maps in , will

All feature maps in

All feature maps in the

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 widths of

and the height is

The feature map of , denote the set of all output feature maps as

right

The result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

且高度为

且高度为

且高度为

中的所有特征图进行逐元素相乘，将得到的所有特征图作为第二增强残差块中的通道注意力层的输出端输出的所有特征图，将这些特征图构成的集合记为F_CA,2。The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives

and the height is

Perform element-by-element multiplication of all feature maps in , and use all the feature maps obtained as all feature maps output by the output of the channel attention layer in the second enhanced residual block, and denote the set formed by these feature maps as F _{CA , 2} .

将F_CA,2中的所有特征图与F_En,1中的所有特征图进行逐元素相加，将得到的所有特征图作为第二增强残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_En,2。Add all feature maps in F _CA,2 and all feature maps in F _En,1 element-wise, and use all the resulting feature maps as all feature maps output by the output of the second enhanced residual block. These features The set of graphs is F _En,2 .

且高度为

的特征图，将输出的所有特征图构成的集合记为

且高度为

的特征图，将输出的所有特征图构成的集合记为

中的所有特征图进行逐元素相乘，再将相乘结果与

and the height is

The feature map of , denote the set of all output feature maps as

and the height is

The feature map of , denote the set of all output feature maps as

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

and the height is

The feature map of , denote the set of all output feature maps as

right

As a result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

且高度为

的特征图，将输出的所有特征图构成的集合记为

且高度为

的特征图，将输出的所有特征图构成的集合记为

第三增强残差块中的第二空间特征变换层的接收端接收

中的所有特征图，将

中的所有特征图与

中的所有特征图进行逐元素相乘，再将相乘结果与

and the height is

The feature map of , denote the set of all output feature maps as

and the height is

The feature map of , denote the set of all output feature maps as

All feature maps in , will

All feature maps in

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

且高度为

的特征图，将输出的所有特征图构成的集合记为

对

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the third enhanced residual block receives

As a result of the reorganization operation of all feature maps in

and the height is

The feature map of , denote the set of all output feature maps as

right

All feature maps in the recombination operation from the angle dimension to the space dimension are performed, and all the feature maps obtained after the recombination operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as

且高度为

且高度为

且高度为

中的所有特征图进行逐元素相乘，将得到的所有特征图作为第三增强残差块中的通道注意力层的输出端输出的所有特征图，将这些特征图构成的集合记为F_CA,3。The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives

and the height is

The feature _map _of the

All feature maps in are multiplied element by element, and all the feature maps obtained are used as all feature maps output by the output of the channel attention layer in the third enhanced residual block, and the set formed by these feature maps is denoted as F _{CA ,3} .

将F_CA,3中的所有特征图与F_En,2中的所有特征图进行逐元素相加，将得到的所有特征图作为第三增强残差块的输出端输出的所有特征图，这些特征图构成的集合即为F_En,3。Add all feature maps in F _CA,3 and all feature maps in F _En,2 element-wise, and use all the resulting feature maps as all feature maps output by the output of the third enhanced residual block. These features The set of graphs is F _En,3 .

为进一步说明本发明方法的可行性和有效性，对本发明方法进行实验。In order to further illustrate the feasibility and effectiveness of the method of the present invention, experiments were carried out on the method of the present invention.

本发明方法采用PyTorch深度学习框架进行实现。训练和测试所采用的光场图像均来自现有的光场图像数据库，其包括真实世界场景和合成场景，这些光场图像数据库可在网上自由下载。为保证测试的可靠性与鲁棒性，随机选择200幅光场图像构成训练图像集合，另外选择70幅光场图像构成测试图像集合，其中，训练图像集合中的光场图像和测试图像集合中的光场图像互不交叉。训练图像集合和测试图像集合所用到的光场图像数据库的基本信息如表1所示，其中EPFL[1]、INRIA[2]、STFLytro[6]和Kalantari et al.[7]这4个光场图像数据库是采用Lytro光场相机拍摄得到的，因此所获取的光场图像属于窄基线光场数据；STFGantry[5]光场图像数据库是采用固定在龙门架上的传统相机进行移动拍摄得到，因此所获取的光场图像具有更大的基线范围，属于宽基线光场数据；HCI new[3]和HCIold[4]光场图像数据库中的光场图像属于人工合成的光场图像，也属于宽基线光场数据。The method of the present invention is implemented by using the PyTorch deep learning framework. The light field images used for training and testing come from existing light field image databases, including real-world scenes and synthetic scenes, which can be freely downloaded online. In order to ensure the reliability and robustness of the test, 200 light field images were randomly selected to form the training image set, and 70 light field images were selected to form the test image set. Among them, the light field images in the training image set and the test image set were The light field images do not cross each other. The basic information of the light field image database used in the training image set and the test image set is shown in Table 1, among which EPFL[1], INRIA[2], STFLytro[6] and Kalantari et al.[7] The field image database is obtained by using the Lytro light field camera, so the acquired light field image belongs to the narrow baseline light field data; the STFGantry[5] light field image database is obtained by moving the traditional camera fixed on the gantry, Therefore, the acquired light field image has a larger baseline range and belongs to the wide baseline light field data; the light field images in the HCI new[3] and HCIold[4] light field image databases belong to the artificially synthesized light field images and also belong to Wide baseline light field data.

表1训练图像集合和测试图像集合所用到的光场图像数据库的基本信息Table 1 Basic information of the light field image database used in the training image set and the test image set

训练图像集合和测试图像集合所用到的光场图像数据库对应的参考文献信息(或下载网址)如下：The reference information (or download URL) corresponding to the light field image database used in the training image set and the test image set is as follows:

[1]Rerabek M,Ebrahimi T.New light Field Image Dataset[C]//2016 EighthInternational Conference on Quality of Multimedia Experience(QoMEX).2016.(新的光场图像数据集[C]//第八届多媒体体验质量国际会议,2016.)[1]Rerabek M, Ebrahimi T.New light Field Image Dataset[C]//2016 EighthInternational Conference on Quality of Multimedia Experience(QoMEX).2016.(New Light Field Image Dataset[C]//The Eighth Multimedia Experience International Conference on Quality of Experience, 2016.)

[2]Pendu M L,Jiang X,Guillemot C.Light Field Inpainting Propagationvia Low Rank Matrix Completion[J].IEEE Transactions on Image Processing,2018,27(4):1981-1993.(通过低秩矩阵完成进行光场修复传播,IEEE图像处理汇刊,2018,27(4):1981-1993.)[2] Pendu M L, Jiang X, Guillemot C. Light Field Inpainting Propagationvia Low Rank Matrix Completion [J]. IEEE Transactions on Image Processing, 2018, 27(4): 1981-1993. (Light Field Inpainting Propagation via Low Rank Matrix Completion [J]. Repair Propagation, IEEE Transactions on Image Processing, 2018, 27(4):1981-1993.)

[3]Honauer K,Johannsen O,Kondermann D,et al.A Dataset and EvaluationMethodology for Depth Estimation on 4D Light Fields[C]//Asian Conference onComputer Vision,2016.(用于4D光场深度估计的一个数据集和评估方法[C]//亚洲计算机视觉会议,2016.)[3]Honauer K,Johannsen O,Kondermann D,et al.A Dataset and EvaluationMethodology for Depth Estimation on 4D Light Fields[C]//Asian Conference onComputer Vision,2016.(A dataset for 4D light field depth estimation and Evaluation Methods [C] // Asian Conference on Computer Vision, 2016.)

[4]Wanner S,Meister S,B Goldluecke.Datasets and Benchmarks forDensely Sampled 4D Light Fields[C]//International Symposium on VisionModeling and Visualization,2013.(密集采样4D光场的数据集和基准[C]//视觉建模和可视化国际研讨会,2013.)[4]Wanner S, Meister S, B Goldluecke.Datasets and Benchmarks for Densely Sampled 4D Light Fields[C]//International Symposium on VisionModeling and Visualization,2013.(Datasets and Benchmarks for Densely Sampled 4D Light Fields[C]// International Symposium on Visual Modeling and Visualization, 2013.)

[5]Vaish V,Adams A.The(New)Stanford Light Field Archive,ComputerGraphics Laboratory,Stanford University,2008.((新)斯坦福光场档案，计算机图形实验室，斯坦福大学,2008.)[5] Vaish V, Adams A. The (New) Stanford Light Field Archive, ComputerGraphics Laboratory, Stanford University, 2008. ((New) Stanford Light Field Archive, Computer Graphics Laboratory, Stanford University, 2008.)

[6]Raj A S,Lowney M,Shah R,Wetzstein G.Stanford Lytro Light FieldArchive,Available:http://lightfields.stanford.edu/index.html.(斯坦福Lytro光场档案，可得网址：http://lightfields.stanford.edu/index.html.)[6] Raj A S, Lowney M, Shah R, Wetzstein G. Stanford Lytro Light FieldArchive, Available: http://lightfields.stanford.edu/index.html. (Stanford Lytro Light Field Archive, available at http:/ /lightfields.stanford.edu/index.html.)

[7]Kalantari N K,Wang T C,Ramamoorthi R.Learning-Based View SynthesisFor Light Field Cameras[J].ACM Transactions on Graphics,2016,35(6):1-10.(用于光场相机的基于学习的视图合成[J].ACM图形汇刊,2016,35(6):1-10.)[7]Kalantari N K, Wang T C, Ramamoorthi R.Learning-Based View SynthesisFor Light Field Cameras[J].ACM Transactions on Graphics,2016,35(6):1-10.(Learning-Based View SynthesisFor Light Field Cameras View Synthesis [J]. ACM Graphic Transactions, 2016, 35(6): 1-10.)

将训练图像集合和测试图像集合中的光场图像分别重组为子孔径图像阵列；考虑到光场相机存在渐晕效应(表现为边界子孔径图像的视觉质量低)，因此将用于训练和测试的光场图像的角度分辨率剪裁为9×9，即只取中心高质量的9×9视图；再从得到的角度分辨率为9×9的光场图像中取出中心的5×5视图以构成角度分辨率为5×5的光场图像，并利用双三次插值方法来对其进行空间分辨率下采样，下采样尺度为8，即将光场图像的空间分辨率降为原始光场图像的1/8，进而得到低空间分辨率光场图像；将原始的角度分辨率为5×5的光场图像作为参考高空间分辨率光场图像(即标签图像)；之后从最初的9×9视图中(不包括中心的5×5视图)选取一幅子孔径图像，并保持其分辨率不变，作为2D高分辨率图像。因此，最终的训练集包括200幅角度分辨率为5×5的低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列，对应的200幅2D高分辨率图像的Y通道图像，以及对应的200幅参考高空间分辨率光场图像的Y通道图像；最终的测试集包括70幅角度分辨率为5×5的低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列，对应的70幅2D高分辨率图像的Y通道图像，以及对应的70幅参考高空间分辨率光场图像，其中，70幅参考高空间分辨率光场图像不涉及网络的推断或测试，只用于后续的主观视觉比较和客观质量评价。Reorganize the light field images in the training image set and the test image set into sub-aperture image arrays respectively; considering the vignetting effect of the light field camera (which manifests as the low visual quality of the boundary sub-aperture images), it will be used for training and testing. The angular resolution of the light field image is cropped to 9 × 9, that is, only the high-quality 9 × 9 view in the center is taken; A light field image with an angular resolution of 5 × 5 is formed, and the bicubic interpolation method is used to downsample its spatial resolution, and the downsampling scale is 8, that is, the spatial resolution of the light field image is reduced to that of the original light field image. 1/8, and then obtain a low spatial resolution light field image; use the original light field image with an angular resolution of 5 × 5 as the reference high spatial resolution light field image (ie, the label image); then from the initial 9 × 9 A sub-aperture image is selected from the view (excluding the central 5 × 5 view), and its resolution is maintained as a 2D high-resolution image. Thus, the final training set consists of 200 Y-channel images of low spatial resolution light-field images with an angular resolution of 5 × 5, a reconstituted subaperture image array, the corresponding Y-channel images of 200 2D high-resolution images, and Corresponding 200 Y-channel images of reference high-spatial-resolution light field images; the final test set includes 70 sub-aperture image arrays recombined from Y-channel images of low-spatial-resolution light field images with an angular resolution of 5 × 5, The Y channel images of the corresponding 70 2D high-resolution images, and the corresponding 70 reference high-spatial-resolution light field images, of which the 70 reference high-spatial-resolution light field images do not involve network inference or testing, only use For subsequent subjective visual comparison and objective quality evaluation.

在训练所构建的空间超分辨率网络时，所有卷积核的参数采用MSRA初始化器进行初始化；损失函数选用像素域L1范数损失和梯度损失的组合；利用ADAM优化器训练网络；首先以10^-4为学习率来训练空间超分辨率网络中的编码器和解码器两部分到一定程度收敛，之后再将学习率设置为10^-4，来训练整个空间超分辨率网络，在训练完25个epochs后学习率以比例因子0.5进行衰减。When training the constructed spatial super-resolution network, the parameters of all convolution kernels are initialized with MSRA initializer; the loss function is a combination of pixel domain L1 norm loss and gradient loss; ADAM optimizer is used to train the network; ^-4 is the learning rate to train the encoder and decoder in the spatial super-resolution network to a certain degree of convergence, and then set the learning rate to 10 ^-4 to train the entire spatial super-resolution network. After training 25 The learning rate is decayed by a scaling factor of 0.5 after epochs.

为了说明本发明方法的性能，将本发明方法与现有的双三次插值方法、现有的六种图像超分辨率重建方法进行对比，分别为Haris等人提出的基于深度反投影网络的方法、Lai等人提出的基于深度拉普拉斯金字塔网络的方法、Yeung等人提出的基于空间-角度可分离卷积的方法、Wang等人提出的基于空间-角度交互网络的方法、Jin等人提出的基于两阶段网络的方法以及Boominathan等人提出的基于混合输入的方法，其中，Haris等人的方法和Lai等人的方法属于2D图像超分辨率重建方法(其独立地应用于光场图像的每幅子孔径图像)，Yeung等人的方法、Wang等人的方法和Jin等人的方法属于普通的光场图像空间超分辨率重建方法，Boominathan等人的方法属于使用混合输入的光场图像空间超分辨率重建方法。In order to illustrate the performance of the method of the present invention, the method of the present invention is compared with the existing bicubic interpolation method and the existing six kinds of image super-resolution reconstruction methods, which are the method based on deep back projection network proposed by Haris et al. The method based on deep Laplacian pyramid network proposed by Lai et al., the method based on space-angle separable convolution proposed by Yeung et al., the method based on space-angle interaction network proposed by Wang et al., and the method proposed by Jin et al. The two-stage network-based method and the mixed-input-based method proposed by Boominathan et al., where Haris et al. and Lai et al. belong to 2D image super-resolution reconstruction methods (which are independently applied to the Each sub-aperture image), Yeung et al., Wang et al., and Jin et al. belong to general light-field image spatial super-resolution reconstruction methods, and Boominathan et al. Spatial super-resolution reconstruction methods.

在此，使用的客观质量评价指标包括PSNR(Peak Signal-to-Noise Ratio，峰值信噪比)、SSIM(Structural Similarity Index，结构相似性指数)，以及一种先进的光场图像客观质量评价指标(参见Min X,Zhou J,Zhai G,et al.A Metric for Light FieldReconstruction,Compression,and Display Quality Evaluation[J].IEEETransactions on Image Processing,2020,29:3790-3804.(一个用于光场重建、压缩和显示质量评估的度量,IEEE图像处理汇刊,2020,29:3790-3804.))，其中，PSNR是从像素重建误差角度来评价超分辨率重建图像的客观质量，其值越高表示图像质量越好；SSIM是从视觉感知的角度来评价超分辨率重建图像的客观质量，其值在0～1之间，值越高表示图像质量越好；光场图像客观质量评价指标则是通过联合度量光场图像的空间质量(纹理和细节)和角度质量(视差结构)来有效评价超分辨率重建图像的客观质量，其值越高表示图像质量越好。Here, the objective quality evaluation indexes used include PSNR (Peak Signal-to-Noise Ratio, peak signal-to-noise ratio), SSIM (Structural Similarity Index, structural similarity index), and an advanced light field image objective quality evaluation index (See Min X, Zhou J, Zhai G, et al. A Metric for Light Field Reconstruction, Compression, and Display Quality Evaluation [J]. IEEE Transactions on Image Processing, 2020, 29:3790-3804. (A for Light Field Reconstruction , Metrics for Compression and Display Quality Evaluation, IEEE Transactions on Image Processing, 2020, 29: 3790-3804.)), where PSNR is the objective quality of super-resolution reconstructed images from the perspective of pixel reconstruction error, and the higher the value is Indicates that the image quality is better; SSIM evaluates the objective quality of the super-resolution reconstructed image from the perspective of visual perception, and its value is between 0 and 1. The higher the value, the better the image quality; the objective quality evaluation index of the light field image is It is to effectively evaluate the objective quality of super-resolution reconstructed images by jointly measuring the spatial quality (texture and details) and angular quality (parallax structure) of light field images. The higher the value, the better the image quality.

表2给出了采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在PSNR(dB)指标上的对比，表3给出了采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在SSIM指标上的对比，表4给出了采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在光场图像客观质量评价指标上的对比。从表2、表3和表4所列出的客观数据可以看出，相比于现有的光场图像空间超分辨率重建方法(包括2D图像超分辨率重建方法)，本发明方法在所使用的三个客观质量评价指标上均取得了更高的质量分数，且明显高于所有对比方法，这表明本发明方法可以有效重建光场图像的纹理和细节信息，同时恢复较好的视差结构；特别地，对于具有不同基线范围和场景内容的光场图像数据库而言，本发明方法均取得了最好的超分辨率重建效果，这表明本发明方法可以很好地处理窄基线和宽基线光场数据，并且对场景内容具有很好的鲁棒性。Table 2 shows the comparison of the PSNR (dB) index between the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method. The comparison of the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method on the SSIM index, Table 4 provides the method of the present invention and the existing bicubic interpolation method and the existing light field. Comparison of image space super-resolution reconstruction methods on objective quality evaluation indicators of light field images. It can be seen from the objective data listed in Table 2, Table 3 and Table 4 that, compared with the existing light field image spatial super-resolution reconstruction methods (including 2D image super-resolution reconstruction methods), the method of the present invention has The three objective quality evaluation indicators used have achieved higher quality scores, which are significantly higher than all comparison methods, which indicates that the method of the present invention can effectively reconstruct the texture and detail information of the light field image, and at the same time restore a better parallax structure. ; In particular, for light field image databases with different baseline ranges and scene contents, the method of the present invention has achieved the best super-resolution reconstruction effect, which shows that the method of the present invention can handle narrow baselines and wide baselines well light field data and is robust to scene content.

表2采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在PSNR(dB)指标上的对比Table 2. Comparison of PSNR (dB) index between the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method

表3采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在SSIM指标上的对比Table 3. Comparison of the SSIM index between the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method

表4采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在光场图像客观质量评价指标上的对比Table 4 Comparison of the objective quality evaluation index of light field images using the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method

图5a给出了采用双三次插值方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图5b给出了采用Haris等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图5c给出了采用Lai等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图5d给出了采用Yeung等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图5e给出了采用Wang等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图5f给出了采用Jin等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图5g给出了采用Boominathan等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图5h给出了采用本发明方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图5i给出了测试的EPFL光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示。Figure 5a shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the bicubic interpolation method. Show; Figure 5b shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Haris et al. Aperture image to display; Figure 5c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Lai et al. The center coordinates are taken here. Figure 5d shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Yeung et al., here The sub-aperture image at the center coordinate is taken to display; Figure 5e shows the reconstructed high spatial resolution light field obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Wang et al. The image, here is the sub-aperture image under the center coordinates to display; Figure 5f shows the reconstructed high spatial resolution obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Jin et al. The sub-aperture image at the center coordinate is taken here for display; Figure 5g shows the reconstruction obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Boominathan et al. The high spatial resolution light field image is shown here by taking the sub-aperture image under the center coordinates; Figure 5h shows the low spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of the present invention. The high spatial resolution light field image is reconstructed, and the sub-aperture image under the center coordinate is taken here to display; Figure 5i shows the label high spatial resolution light corresponding to the low spatial resolution light field image in the tested EPFL light field image database. Field image, here the sub-aperture image under the center coordinates is displayed.

图6a给出了采用双三次插值方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图6b给出了采用Haris等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图6c给出了采用Lai等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图6d给出了采用Yeung等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图6e给出了采用Wang等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图6f给出了采用Jin等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图6g给出了采用Boominathan等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图6h给出了采用本发明方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示；图6i给出了测试的STFLytro光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像，这里取中心坐标下的子孔径图像来展示。Figure 6a shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the bicubic interpolation method. Show; Figure 6b shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Haris et al. Aperture image to display; Figure 6c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Lai et al. Figure 6d shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Yeung et al., here The sub-aperture image in the center coordinate is taken to display; Figure 6e shows the reconstructed high spatial resolution light field obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Wang et al. The image, here is the sub-aperture image under the center coordinates to display; Figure 6f shows the reconstructed high spatial resolution obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Jin et al. The low-spatial-resolution light-field image in the tested STFLytro light-field image database is obtained by processing the low-spatial-resolution light-field image in the STFLytro light-field image database using the method of Boominathan et al. The high spatial resolution light field image is shown here by taking the sub-aperture image under the center coordinates; Fig. 6h shows the image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of the present invention. The high spatial resolution light field image is reconstructed, and the sub-aperture image at the center coordinate is taken here to display; Figure 6i shows the label high spatial resolution light corresponding to the low spatial resolution light field image in the tested STFLytro light field image database. Field image, here the sub-aperture image under the center coordinates is displayed.

分别将图5a至图5h与图5i进行对比，以及将图6a至图6h与图6i进行对比，可以清楚地看到，利用现有的光场图像空间超分辨率重建方法，包括2D图像超分辨率重建方法，所重建的高空间分辨率光场图像无法恢复图像的纹理和细节信息，如图5a至图5f中的左下方矩形框放大区域，以及图6a至图6f中的右下方矩形框放大区域所示；使用混合输入的光场图像空间超分辨率重建方法取得了相对更好地结果，但仍然包含一些模糊伪像，如图5g中的左下方矩形框放大区域和与图6g中的右下方矩形框放大区域所示；相比之下，本发明方法所重建的高空间分辨率光场图像具有清晰的纹理和丰富的细节，且在主观视觉感知上接近标签高空间分辨率光场图像(即图5i和图6i)，这表明本发明方法可有效恢复光场图像的纹理信息。此外，通过高质量地重建每幅子孔径图像，本发明方法可很好地保证最终重建的高空间分辨率光场图像的视差结构。Comparing Fig. 5a to Fig. 5h with Fig. 5i, and Fig. 6a to Fig. 6h and Fig. 6i, respectively, it can be clearly seen that using the existing light-field image spatial super-resolution reconstruction methods, including 2D image super-resolution reconstruction methods. Resolution reconstruction method, the reconstructed high spatial resolution light field image cannot restore the texture and detail information of the image, such as the enlarged area of the lower left rectangular box in Figure 5a to Figure 5f, and the lower right rectangle in Figure 6a to Figure 6f. The enlarged area of the box is shown; the spatial super-resolution reconstruction method of the light field image using the mixed input achieved relatively better results, but still contained some blurring artifacts, as shown in the lower left rectangular box enlargement area in Fig. 5g and the same as Fig. 6g. In contrast, the high spatial resolution light field image reconstructed by the method of the present invention has clear texture and rich details, and is close to the label high spatial resolution in subjective visual perception Light field images (ie, Fig. 5i and Fig. 6i ), which show that the method of the present invention can effectively restore the texture information of light field images. In addition, by reconstructing each sub-aperture image with high quality, the method of the present invention can well ensure the parallax structure of the final reconstructed high spatial resolution light field image.

本发明方法的创新性主要体现如下：一是通过异构式成像以在捕获高维光场数据的同时获取丰富的2D空间信息，即同时捕获一幅光场图像和一幅2D高分辨率图像，进而通过利用2D高分辨率图像的信息来有效提高光场图像的空间分辨率，并恢复相应的纹理和细节；二是为建立并探索光场图像与2D高分辨率图像之间的联系，本发明方法分别构造了孔径级特征配准模块和光场特征增强模块，前者可将2D高分辨率信息与4D光场图像信息进行准确配准，后者则在此基础上利用配准得到的高分辨率特征信息来一致性地增强光场特征中的视觉信息，以得到增强后的高分辨率光场特征；三是采用灵活的金字塔重建方式，即以从粗到细的重建策略来渐进式地提高光场图像的空间分辨率并恢复准确的视差结构，进而可在一次前向推断中重建多尺度超分辨率结果。此外，为降低金字塔网络的参数量和训练负担，在每个金字塔水平进行权重共享。The innovation of the method of the present invention is mainly reflected as follows: First, through heterogeneous imaging, rich 2D spatial information is acquired while capturing high-dimensional light field data, that is, a light field image and a 2D high-resolution image are simultaneously captured, and then By using the information of the 2D high-resolution image to effectively improve the spatial resolution of the light field image, and restore the corresponding texture and details; the second is to establish and explore the connection between the light field image and the 2D high-resolution image, the present invention The method constructs an aperture-level feature registration module and a light field feature enhancement module respectively. The former can accurately register 2D high-resolution information with 4D light field image information, and the latter uses the high-resolution obtained by registration on this basis. feature information to uniformly enhance the visual information in the light field features to obtain enhanced high-resolution light field features; the third is to use a flexible pyramid reconstruction method, that is, to gradually improve the reconstruction strategy from coarse to fine The spatial resolution of light field images and recover accurate disparity structure, which in turn can reconstruct multi-scale super-resolution results in a single forward inference. In addition, to reduce the amount of parameters and training burden of the pyramid network, weights are shared at each pyramid level.

Claims

1. A light field image space super-resolution reconstruction method is characterized by comprising the following steps:

step 1: selecting Num color three-channel low-spatial-resolution light field images with spatial resolution of W multiplied by H and angular resolution of V multiplied by U, corresponding Num color three-channel 2D high-resolution images with resolution of alpha W multiplied by alpha H, and corresponding Num color three-channel reference high-spatial-resolution light field images with spatial resolution of alpha W multiplied by alpha H and angular resolution of V multiplied by U; wherein Num is more than 1, alpha represents the spatial resolution improvement multiple, and the value of alpha is more than 1;

step 2: constructing a convolutional neural network as a spatial super-resolution network: the spatial super-resolution network comprises an encoder for extracting multi-scale features, an aperture level feature registration module for registering light field features and 2D high-resolution features, a shallow layer feature extraction layer for extracting shallow layer features from a low spatial resolution light field image, a light field feature enhancement module for fusing the light field features and the 2D high-resolution features, a spatial attention block for relieving registration errors in the coarse-scale features, and a decoder for reconstructing potential features into the light field image;

for the encoder, the encoder is composed of a first convolution layer, a second convolution layer, a first residual block and a second residual block which are connected in sequence, wherein the input end of the first convolution layer receives three inputs in parallel, and each input is a frame with spatial resolution of W × H and angle divisionSingle-channel image L of low-spatial-resolution light field image with resolution V multiplied by U_LRThe width of the image reconstruction obtained after the spatial resolution up-sampling is alpha_sW x V and height of alpha_sH × U subaperture image array, which is denoted as

A width of alpha_sW and a height of alpha_sThe single-channel image of the blurred 2D high-resolution image of H is described as

And a width of alpha_sW and a height of alpha_sSingle channel image of H2D high resolution image, denoted as I_HRThe output end of the first convolution layer is directed to

Output 64 frames with width alpha_sW x V and height of alpha_sH × U signature graph, will be directed to

The set of all the output feature maps is denoted as

Output terminal of the first winding layer is aimed at

Output 64 frames with width alpha_sW and a height of alpha_sH characteristic diagram, will be directed to

The set of all the output feature maps is denoted as

Output terminal of the first convolution layer is directed to I_HROutput 64 frames with width alpha_sW and a height of alpha_sH signature of H will be directed to I_HRThe set of all the output feature maps is denoted as Y_HR,0(ii) a The input terminal of the second convolutional layer receives three inputs in parallel, respectively

All the characteristic diagrams in (A),

All feature maps and Y in (1)_HR,0All feature maps in (1), the output of the second convolutional layer being directed to

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output terminal of the second convolution layer is aimed at

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

The output end of the second convolution layer is directed to Y_HR,0Output 64 frames with width of

And has a height of

Will be directed to Y_HRAnd the set of all the characteristic diagrams output by 0 is marked as Y_HR,1(ii) a The input terminal of the first residual block receives three inputs in parallel, respectively

All the characteristic diagrams in (A),

All feature maps and Y in (1)_HR,1The output of the first residual block is directed to

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output of the first residual block is directed to

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output of the first residual block for Y_HR,1Output 64 frames with width of

And has a height of

Will be directed to Y_HR,1The set of all the output feature maps is denoted as Y_HR,2(ii) a The input terminal of the second residual block receives three inputs in parallel, respectively

All the characteristic diagrams in (A),

All feature maps and Y in (1)_HR,2Of the second residual block, the output of the second residual block being directed to

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output pair of second residual block

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output of the second residual block for Y_HR,2Output 64 frames with width of

And has a height of

Will be directed to Y_HR,2The set of all the output feature maps is denoted as Y_HR,3(ii) a Wherein,

is a single-channel image L of a low spatial resolution light-field image with spatial resolution W × H and angular resolution V × U_LRThe width of the image recombination obtained after the bicubic interpolation up-sampling is alpha_sW x V and height of alpha_sAn array of H U sub-aperture images,

to pass through the pair I_HRFirstly carrying out bicubic interpolation downsampling and then carrying out bicubic interpolation upsampling to obtain alpha_sRepresenting a spatial resolution sampling factor, alpha_s ³Alpha, the up-sampling factor of the up-sampling of the bicubic interpolation and the down-sampling factor of the down-sampling of the bicubic interpolation both take the value of alpha_sThe size of the convolution kernel of the first convolution layer is 3 × 3, the convolution step is 1, the number of input channels is 1, the number of output channels is 64, the size of the convolution kernel of the second convolution layer is 3 × 3, the convolution step is 2, the number of input channels is 64, the number of output channels is 64, and the activation functions adopted by the first convolution layer and the second convolution layer are both 'ReLU';

for the aperture level feature registration module, the input end of the aperture level feature registration module receives three types of feature maps, wherein the first type is

All characteristic diagrams in (1), the second class is

The third class includes four inputs, respectively Y_HR,0All feature maps in (1), Y_HR,1All feature maps in (1), Y_HR,2All feature maps in (1), Y_HR,3All feature maps in (1); in the aperture level feature registration module, first, the image data is processed

All feature maps in (1), Y_HR,0All feature maps in (1), Y_HR,1All feature maps in (1), Y_HR,2All ofFeature map and Y_HR,3All feature maps in (1) are each replicated by a factor of V × U, so that

All feature maps in (1), Y_HR,1All feature maps in (1), Y_HR,2All feature maps and Y in (1)_HR,3Becomes the width of all the feature maps in

And the height becomes

I.e. to obtain the dimensions and

and matching the size of the feature map in (1) with Y_HR,0Becomes a_sW x V and height becomes alpha_sH × U, i.e. to size and

the dimensions of the feature maps in (1) match; then to

All characteristic figures in (1) and

all the characteristic diagrams in the method are subjected to block matching, and a width of the characteristic diagram is obtained after the block matching is finished

And has a height of

Is marked as P_CI(ii) a Then according to P_CIIs a reaction of Y_HR,1All the characteristic diagrams in (1) and

all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of

And has a height of

The obtained set of all the registration feature maps is denoted as F_Align,1(ii) a Also according to P_CIIs a reaction of Y_HR,2All the characteristic diagrams in (1) and

And has a height of

The obtained set of all the registration feature maps is denoted as F_Align,2(ii) a According to P_CIIs a reaction of Y_HR,3All the characteristic diagrams in (1) and

And has a height of

The obtained set of all the registration feature maps is denoted as F_Align,3(ii) a For P again_CIPerforming bicubic interpolation up-sampling to obtain a frame with width alpha_sW is multiplied by V andheight of alpha_sH × U coordinate index diagram, noted

Finally according to

Will Y_HR,0All the characteristic diagrams in (1) and

all feature maps in the image are registered in space position to obtain 64 pieces of width alpha_sW x V and height of alpha_sH × U registration feature map, and F represents a set of all the obtained registration feature maps_Align,0(ii) a Output F of aperture level feature registration module_Align,0All characteristic diagrams in (1), F_Align,1All characteristic diagrams in (1), F_Align,2All feature maps and F in (1)_Align,3All feature maps in (1); wherein, the precision measurement index for block matching is a texture and structure similarity index, the size of the block for block matching is 3 multiplied by 3, and the up-sampling factor of the bicubic interpolation up-sampling is alpha_s；

For the shallow feature extraction layer, it is composed of 1 fifth convolution layer, the input end of which receives a single-channel image L of a low spatial resolution light field image with spatial resolution WxH and angular resolution VxU_LRThe output end of the fifth convolution layer outputs 64 characteristic diagrams with the width of W multiplied by V and the height of H multiplied by U, and the set formed by all the output characteristic diagrams is denoted as F_LR(ii) a The convolution kernel of the fifth convolution layer has a size of 3 × 3, a convolution step size of 1, a number of input channels of 1, a number of output channels of 64, and the activation function adopted by the fifth convolution layer is "ReLU";

for the light field characteristic enhancement module, the light field characteristic enhancement module consists of a first enhancement residual block, a second enhancement residual block and a third enhancement residual block which are connected in sequence, wherein the input end of the first enhancement residual block receives F_Align,1All feature maps and F in (1)_LROf 64 width at the output of the first enhancement residual block

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_En,1(ii) a The input of the second enhanced residual block receives F_Align,2All feature maps and F in (1)_En,1Of 64 widths at the output of the second enhanced residual block

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_En,2(ii) a The input of the third enhanced residual block receives F_Align,3All feature maps and F in (1)_En,2Of 64 width at the output of the third enhanced residual block

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_En,3；

For a spatial attention block, which consists of a sixth convolutional layer and a seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives F_Align,0The output end of the sixth convolutional layer outputs 64 characteristic graphs with the width of alpha_sW x V and height of alpha_sH × U spatial attention feature map, and F represents a set of all output spatial attention feature maps_SA1(ii) a Input terminal of seventh convolution layer receiving F_SA1In (1)All spatial attention feature maps, the output end of the seventh convolutional layer outputs 64 width alpha_sW x V and height of alpha_sH × U spatial attention feature map, and F represents a set of all output spatial attention feature maps_SA2(ii) a F is to be_Align,0All feature maps in (1) and (F)_SA2Multiplying all the spatial attention feature maps element by element, and recording the set formed by all the obtained feature maps as F_WA,0(ii) a F is to be_WA,0As all feature maps output by the output end of the spatial attention block; the sizes of convolution kernels of the sixth convolution layer and the seventh convolution layer are both 3 multiplied by 3, convolution step lengths are both 1, the number of input channels is 64, the number of output channels is 64, the activation function adopted by the sixth convolution layer is 'ReLU', and the activation function adopted by the seventh convolution layer is 'Sigmoid';

for the decoder, the decoder is composed of a third residual block, a fourth residual block, a sub-pixel convolution layer, an eighth convolution layer and a ninth convolution layer which are connected in sequence, wherein the input end of the third residual block receives F_En,3Of 64 widths at the output of the third residual block

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_Dec,1(ii) a The input of the fourth residual block receives F_Dec,1Of 64 width at the output of the fourth residual block

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_Dec,2(ii) a Input terminal of sub-pixel convolution layer receiving F_Dec,2All characteristic diagrams in (1)The output end of the sub-pixel convolution layer outputs 256 widths

And has a height of

And 256 widths are set as

And has a height of

Further converting the feature map into 64 pieces with the width alpha_sW x V and height of alpha_sH × U feature graph, and F represents a set of all converted feature graphs_Dec,3(ii) a Input terminal of eighth convolution layer receiving F_Dec,3All feature maps in (1) and (F)_WA,0The result of element-by-element addition of all the feature maps in (1), the output end of the eighth convolutional layer outputs 64 width alpha_sW x V and height of alpha_sH × U feature map, and F represents a set of all output feature maps_Dec,4(ii) a Input terminal of the ninth convolutional layer receives F_Dec,4The output end of the ninth convolutional layer outputs a characteristic diagram with a width of alpha_sW x V and height of alpha_sH multiplied by U, the single-channel light field image is reconstructed, and the width is alpha_sW x V and height of alpha_sReconstruction of H multiplied by U single-channel light field image into alpha-space resolution_sW×α_sH and high spatial resolution single-channel light field image with angular resolution of V multiplied by U, which is recorded as L_SR(ii) a The convolution kernel of the sub-pixel convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 256, the convolution kernel of the eighth convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 64, the convolution kernel of the ninth convolution layer has the size of 1 multiplied by 1, the convolution step is 1, the number of input channels is 64, the number of output channels is 1, and excitation adopted by the sub-pixel convolution layer and the eighth convolution layerThe active functions are all 'ReLU', and the ninth convolution layer does not adopt the active function;

and step 3: performing color space conversion on each low spatial resolution light field image in the training set, the corresponding 2D high resolution image and the corresponding reference high spatial resolution light field image, namely converting the RGB color space into the YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of each low spatial resolution light field image into a sub-aperture image array with the width of W multiplied by V and the height of H multiplied by U for representation; then, a sub-aperture image array recombined with Y-channel images of all the light field images with low spatial resolution in the training set, a corresponding Y-channel image of the 2D high-resolution image and a corresponding Y-channel image of the reference light field image with high spatial resolution form the training set; and then constructing a pyramid network, and training by using a training set, wherein the concrete process is as follows:

step 3_ 1: copying the constructed spatial super-resolution network three times, cascading, sharing the weight of each spatial super-resolution network, namely, all the parameters are the same, and defining the whole network formed by the three spatial super-resolution networks as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be equal to α_sThe values are the same;

step 3_ 2: carrying out two times of spatial resolution downsampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after downsampling as a label image; carrying out the same spatial resolution down-sampling twice on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image aiming at a first spatial super-resolution network in the pyramid network; then recombining the Y-channel images of all the low-spatial-resolution light field images in the training set to obtain a sub-aperture image array, performing primary spatial resolution up-sampling on the Y-channel images of all the low-spatial-resolution light field images in the training set to obtain an image recombined sub-aperture image array, all the 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network and all the 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid networkFuzzy 2D high-resolution Y-channel images obtained by performing one-time spatial resolution down-sampling and one-time spatial resolution up-sampling on the 2D high-resolution Y-channel images of the network are input into a first spatial super-resolution network in the constructed pyramid network for training, and alpha corresponding to the Y-channel image of each low-spatial resolution light field image in a training set is obtained_sReconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alpha_sThe values are the same;

step 3_ 3: carrying out single spatial resolution down-sampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after the down-sampling as a label image; carrying out single same spatial resolution down-sampling on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image for a second spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training set_sSub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training set_sInputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the reconstructed image subaperture image array, all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network into the second spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial resolution light field image in the training set_s ²Reconstructing a high-spatial-resolution Y-channel light field image; wherein spatial resolution up-sampling and spatial resolution down-samplingThe sampling modes are bicubic interpolation, and the scales of the spatial resolution up-sampling and the spatial resolution down-sampling are equal to alpha_sThe values are the same;

step 3_ 4: taking a Y-channel image of each reference high-spatial-resolution light field image in the training set as a label image; taking the Y-channel image of each 2D high-resolution image in the training set as a 2D high-resolution Y-channel image for a third spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training set_s ²Sub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training set_s ²Inputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the multiple reconstructed high-spatial-resolution Y-channel light field images to a third spatial super-resolution network in the pyramid network, and all 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the 2D high-resolution Y-channel images to the third spatial super-resolution network in the pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training set_s ³Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alpha_sThe values are the same;

obtaining the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network after the training is finished, and obtaining a well-trained spatial super-resolution network model;

and 4, step 4: randomly selecting a low-spatial-resolution light field image with three color channels and a corresponding 2D high-resolution image with three color channels as test images; then, converting the low-spatial-resolution light field image of the three color channels and the corresponding 2D high-resolution image of the three color channels from an RGB color space to a YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of the light field image with low spatial resolution into a sub-aperture image array for representation; inputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the Y-channel images of the low-spatial resolution light field images, the Y-channel images of the 2D high-resolution images and the Y-channel images of the 2D high-resolution images into a spatial super-resolution network model, and testing to obtain reconstructed high-spatial resolution Y-channel light field images corresponding to the Y-channel images of the low-spatial resolution light field images; then performing bicubic interpolation up-sampling on the Cb channel image and the Cr channel image of the low-spatial-resolution light field image respectively to obtain a reconstructed high-spatial-resolution Cb channel light field image corresponding to the Cb channel image of the low-spatial-resolution light field image and a reconstructed high-spatial-resolution Cr channel light field image corresponding to the Cr channel image of the low-spatial-resolution light field image; and finally, cascading the obtained reconstructed high-spatial-resolution Y-channel light field image, the reconstructed high-spatial-resolution Cb-channel light field image and the reconstructed high-spatial-resolution Cr-channel light field image on the dimension of a color channel, and converting the cascading result into an RGB color space again to obtain the reconstructed high-spatial-resolution light field image of the color three channels corresponding to the low-spatial-resolution light field image.

2. The method for super-resolution reconstruction of light field image space according to claim 1, wherein in step 2, the first, second, third and fourth residual blocks have the same structure and are composed of sequentially connected third and fourth convolutional layers, and the input end of the third convolutional layer in the first residual block receives three inputs in parallel, namely, three inputs

All the characteristic diagrams in (A),

All feature maps and Y in (1)_HR,1Of the third convolutional layer in the first residual block, the output end of the third convolutional layer is directed to

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output terminal pair of the third convolution layer in the first residual block

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output pin of third convolution layer in first residual blockFor Y_HR,1Output 64 frames with width of

And has a height of

Will be directed to Y_HR,1The set of all the output feature maps is denoted as

The input terminal of the fourth convolutional layer in the first residual block receives three inputs in parallel, respectively

All the characteristic diagrams in (A),

All characteristic figures in (1) and

of the fourth convolutional layer in the first residual block, the output of the fourth convolutional layer is directed to

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output terminal pair of fourth convolution layer in first residual block

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output terminal pair of fourth convolution layer in first residual block

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Will be provided with

All the characteristic diagrams in (1) and

all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparison

All the output feature maps, the set formed by the feature maps is the

Will be provided with

All the characteristic diagrams in (1) and

All the output feature maps, the set formed by the feature maps is the

Will Y_HR,1All the characteristic diagrams in (1) and

all feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first residual block and aim at Y_HR,1All the output feature maps, and the set formed by the feature maps is Y_HR,2；

The input of the third convolutional layer in the second residual block receives three inputs in parallel, respectively

All the characteristic diagrams in (A),

All feature maps and Y in (1)_HR,2Of the third convolutional layer in the second residual block, the output end of the third convolutional layer is directed to

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output pair of the third convolutional layer in the second residual block

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

The output of the third convolutional layer in the second residual block is for Y_HR,2Output 64 frames with width of

And has a height of

Will be directed to Y_HR,2The set of all the output feature maps is denoted as

The input terminal of the fourth convolutional layer in the second residual block receives three inputs in parallel, respectively

All the characteristic diagrams in (A),

All characteristic figures in (1) and

of the fourth convolutional layer in the second residual block, the output of the fourth convolutional layer is directed to

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output terminal pair of fourth convolution layer in second residual block

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Output terminal pair of fourth convolution layer in second residual block

Output 64 frames with width of

And has a height of

Will be directed to

The set of all the output feature maps is denoted as

Will be provided with

All the characteristic diagrams in (1) and

all feature maps in (1) are element-by-elementPixel addition, using all the obtained feature maps as output end pairs of the second residual error block

All the output feature maps, the set formed by the feature maps is the

Will be provided with

All the characteristic diagrams in (1) and

all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparison

All the output feature maps, the set formed by the feature maps is the

Will Y_HR,2All the characteristic diagrams in (1) and

all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block and aim at Y_HR,2All the output feature maps, and the set formed by the feature maps is Y_HR,3；

The input of the third convolutional layer in the third residual block receives F_En,3Of 64 width at the output of the third convolutional layer in the third residual block

And has a height of

The feature map of (1) represents a set of all feature maps outputted

Input reception of a fourth convolutional layer in a third residual block

Of 64 width at the output of the fourth convolutional layer in the third residual block

And has a height of

The feature map of (1) represents a set of all feature maps outputted

F is to be_En,3All the characteristic diagrams in (1) and

all the feature maps in the third residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third residual block, and the set formed by the feature maps is F_Dec,1；

The input of the third convolutional layer in the fourth residual block receives F_Dec,1The output end of the third convolution layer in the fourth residual block outputs 64 width

And has a height of

The feature map of (1) represents a set of all feature maps outputted

Input reception of a fourth convolutional layer in a fourth residual block

Of 64 width at the output of the fourth convolutional layer in the fourth residual block

And has a height of

The feature map of (1) represents a set of all feature maps outputted

F is to be_Dec,1All the characteristic diagrams in (1) and

all the feature maps in the first residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the fourth residual block, and the set formed by the feature maps is F_Dec,2；

In the above, the sizes of convolution kernels of the third convolution layer and the fourth convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is 64, the number of output channels is 64, and the activation function adopted by the third convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block is "ReLU" and the activation function adopted by the fourth convolution layer is not adopted.

3. The light field image spatial super-resolution reconstruction method according to claim 1 or 2, it is characterized in that in step 2, the first enhanced residual block, the second enhanced residual block and the third enhanced residual block have the same structure, which consists of a first spatial characteristic transformation layer, a first spatial angle convolution layer, a second spatial characteristic transformation layer, a second spatial angle convolution layer and a channel attention layer which are connected in sequence, wherein the first spatial characteristic transformation layer and the second spatial characteristic transformation layer have the same structure, which are composed of a tenth convolution layer and an eleventh convolution layer in parallel, the first space angle convolution layer and the second space angle convolution layer have the same structure, the channel attention layer consists of a global mean value pooling layer, a fourteenth convolution layer and a fifteenth convolution layer which are connected in sequence;

an input of a tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives F_Align,1The output end of the tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

An input of an eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives F_Align,1The output end of the eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

The input of the first spatial feature transform layer in the first enhanced residual block receives F_LRAll feature maps in (1), will F_LRAll the characteristic diagrams in (1) and

multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result

All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the first enhanced residual block, and a set formed by the feature maps is recorded as a set

An input of a twelfth of the first spatial angle convolutional layers in the first enhanced residual block receives

Of the first spatial angle convolutional layer in the first enhancement residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 widths

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

From the spatial dimension to the angular dimensionThe input of a thirteenth of the first spatial angle convolutional layers in the first enhanced residual block receives

The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the first enhancement residual block outputs 64 widths as the result of the reorganization operation of all the feature maps in (1)

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, taking all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a first enhanced residual block, and recording a set formed by the feature maps as a set

The input terminal of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives F_Align,1The output end of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

An input of an eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives F_Align,1The output end of the eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

All the characteristic diagrams in (1) will

All the characteristic diagrams in (1) and

The obtained feature maps are used as all feature maps output by the output end of the second spatial feature transform layer in the first enhanced residual block, and the set formed by the feature maps is recorded as a set

An input of a twelfth of the second spatial angle convolutional layers in the first enhanced residual block receives

Of the twelfth convolutional layer of the second spatial angle convolutional layers in the first enhancement residual block outputs 64 width pictures

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the first enhancement residual block receiving

The output end of the thirteenth convolution layer of the second space angle convolution layer in the first enhanced residual error block outputs 64 width values

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

Performing recombination operation from angle dimension to space dimension on all feature maps in the first enhancement residual block, taking all feature maps obtained after the recombination operation as all feature maps output by the output end of the second spatial angle convolution layer in the first enhancement residual block, and recording a set formed by the feature maps as a set

The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives

The output end of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 feature maps with the width of

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_GAP,1，F_GAP,1All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives F_GAP,1The output end of the fourteenth convolution layer in the channel attention layer in the first enhancement residual block outputs 4 width

And has a height of

The feature map of (1) represents a set of all feature maps outputtedF_DS,1(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives F_DS,1Of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 widths of

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_US,1(ii) a F is to be_US,1All the characteristic diagrams in (1) and

all feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the first enhanced residual block, and a set formed by the feature maps is marked as F_CA,1；

F is to be_CA,1All feature maps in (1) and (F)_LRAll the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the first enhancement residual block, and the set formed by the feature maps is F_En,1；

The input terminal of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives F_Align,2The output end of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

An input of an eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives F_Align,2The output end of the eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

Receiving F at receiving end of first spatial feature transform layer in second enhanced residual block_En,1All feature maps in (1), will F_En,1All the characteristic diagrams in (1) and

The obtained feature maps are used as all feature maps output by the output end of the first spatial feature transform layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set

An input of a twelfth of the first spatial angle convolutional layers in the second enhanced residual block receives

All feature maps in (1), first spatial angle volume in second enhancement residual blockThe output end of the twelfth convolution layer of the lamination outputs 64 widths

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the second enhancement residual block receiving

The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the second enhanced residual block outputs 64 width values

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, and outputting all feature maps obtained after the operation of reconstructing as output ends of the first space angle convolution layer in the second enhanced residual blockAll feature maps are referred to as a set of feature maps

An input of a tenth convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives F_Align,2The output end of the tenth convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

An input of an eleventh convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives F_Align,2The output end of the eleventh convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

Receiving end of second spatial feature transform layer in second enhanced residual block

All the characteristic diagrams in (1) will

All the characteristic diagrams in (1) and

The obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set

An input of a twelfth of the second spatial angle convolutional layers in the second enhanced residual block receives

Of the twelfth convolutional layer of the second spatial angle convolutional layers in the second enhancement residual block outputs 64 width pictures

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

Performs a re-composition operation from the spatial dimension to the angular dimension, a thirteenth convolution in the second spatial-angular convolution layer in the second enhancement residual blockInput side reception of layers

The output end of the thirteenth convolution layer of the second space angle convolution layer in the second enhanced residual block outputs 64 width values

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a second space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set

The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives

The output end of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 width pictures

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_GAP,2，F_GAP,2All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block receives F_GAP,2The output end of the fourteenth convolution layer in the channel attention layer in the second enhanced residual block outputs 4 width

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_DS,2(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives F_DS,2Of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 widths of

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_US,2(ii) a F is to be_US,2All the characteristic diagrams in (1) and

the obtained all feature maps are used as all feature maps output by the output end of the channel attention layer in the second enhanced residual block, and the set formed by the feature maps is marked as F_CA,2；

F is to be_CA,2All feature maps in (1) and (F)_En,1All feature maps in (1) are added element by element, and all obtained feature maps are used as all features output by the output end of the second enhanced residual blockThe set of these characteristic maps is F_En,2；

An input of a tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives F_Align,3The output end of the tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

An input of an eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives F_Align,3The output end of the eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

Receiving F at receiving end of first spatial feature transform layer in third enhanced residual block_En,2All feature maps in (1), will F_En,2All the characteristic diagrams in (1) and

All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set

An input of a twelfth of the first spatial angle convolutional layers in the third enhanced residual block receives

Of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 width signals

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the third enhancement residual block receiving

The output end of the thirteenth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 widths as the result of the recombination operation of all the feature maps in (1)

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set

An input of a tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives F_Align,3The output end of the tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width

And has a height of

The feature map of (1) represents a set of all feature maps outputted

An input of an eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives F_Align,3The output end of the eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width maps

And has a height of

The feature map of (1) represents a set of all feature maps outputted

Receiving end of second spatial feature transform layer in third enhanced residual block

All the characteristic diagrams in (1) will

All the characteristic diagrams in (1) and

All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set

An input of a twelfth of the second spatial angle convolutional layers in the third enhanced residual block receives

Of the twelfth convolutional layer of the second spatial angle convolutional layer in the third enhanced residual block outputs 64 width pictures

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the third enhancement residual block receiving

The output end of the thirteenth convolution layer of the second space angle convolution layer in the third enhanced residual block outputs 64 width values

And has a height of

The feature map of (1) represents a set of all feature maps outputted

To pair

All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the second space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set

The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives

The output end of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 width images

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_GAP,3，F_GAP,3All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block receives F_GAP,3The output end of the fourteenth convolution layer in the channel attention layer in the third enhanced residual block outputs 4 width

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_DS,3(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives F_DS,3Of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 widths of

And has a height of

The feature map of (1) is a set of all feature maps of (1) output, denoted as F_US,3(ii) a F is to be_US,3All the characteristic diagrams in (1) and

all feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the third enhanced residual block, and a set formed by the feature maps is marked as F_CA,3；

F is to be_CA,3All feature maps in (1) and (F)_En,2All the feature maps in the third enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third enhancement residual block, and the set formed by the feature maps is F_En,3；

In the above, the sizes of convolution kernels of the tenth convolution layer and the eleventh convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is all 64, and no activation function is adopted, the sizes of convolution kernels of the twelfth convolution layer and the thirteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is 64, the adopted activation functions are all "ReLU", the sizes of convolution kernels of the fourteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are 1 × 1, the convolution step lengths are 1, the number of input channels is 64, the number of output channels is 4, and the adopted activation function is "ReLU", the size of the convolution kernel of the fifteenth convolution layer in each of the first, second, and third enhanced residual blocks is 1 × 1, the convolution step is 1, the number of input channels is 4, the number of output channels is 64, and the employed activation function is "Sigmoid".