CN114359041A - Light field image space super-resolution reconstruction method - Google Patents

Light field image space super-resolution reconstruction method Download PDF

Info

Publication number
CN114359041A
CN114359041A CN202111405987.1A CN202111405987A CN114359041A CN 114359041 A CN114359041 A CN 114359041A CN 202111405987 A CN202111405987 A CN 202111405987A CN 114359041 A CN114359041 A CN 114359041A
Authority
CN
China
Prior art keywords
feature maps
spatial
output
residual block
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111405987.1A
Other languages
Chinese (zh)
Other versions
CN114359041B (en
Inventor
陈晔曜
郁梅
蒋刚毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo University
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN202111405987.1A priority Critical patent/CN114359041B/en
Publication of CN114359041A publication Critical patent/CN114359041A/en
Application granted granted Critical
Publication of CN114359041B publication Critical patent/CN114359041B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明公开了一种光场图像空间超分辨率重建方法,其构建空间超分辨率网络,包括编码器、孔径级特征配准模块、光场特征增强模块和解码器等,利用编码器对上采样后的低空间分辨率光场图像、2D高分辨率图像及其模糊后的图像提取多尺度特征;通过孔径级特征配准模块来学习2D高分辨率特征与低分辨率光场特征之间的对应性,以将2D高分辨率特征配准到每个子孔径图像下并形成配准后的高分辨率光场特征;通过光场特征增强模块以利用配准的高分辨率光场特征来增强提取的浅层光场特征,得到增强后的高分辨率光场特征;利用解码器将增强后的高分辨率光场特征重建为高空间分辨率光场图像;优点是能高质量地重建高空间分辨率光场图像,并恢复纹理和细节信息。

Figure 202111405987

The invention discloses a light field image spatial super-resolution reconstruction method, which constructs a spatial super-resolution network, including an encoder, an aperture-level feature registration module, a light field feature enhancement module, a decoder, and the like. Extract multi-scale features from sampled low spatial resolution light field images, 2D high resolution images and their blurred images; learn the relationship between 2D high resolution features and low resolution light field features through aperture-level feature registration module to register the 2D high-resolution features under each sub-aperture image and form the registered high-resolution light field features; use the registered high-resolution light field features through the light field feature enhancement module to Enhance the extracted shallow light field features to obtain enhanced high-resolution light field features; use the decoder to reconstruct the enhanced high-resolution light field features into high spatial resolution light field images; the advantage is that high-quality reconstruction can be achieved High spatial resolution light field images and recover texture and detail information.

Figure 202111405987

Description

一种光场图像空间超分辨率重建方法A method for spatial super-resolution reconstruction of light field images

技术领域technical field

本发明涉及一种图像超分辨率重建技术,尤其是涉及一种光场图像空间超分辨率重建方法。The invention relates to an image super-resolution reconstruction technology, in particular to a light-field image space super-resolution reconstruction method.

背景技术Background technique

与常规的数码相机不同,光场相机可采集场景中光线的强度(即空间信息)和方向信息(即角度信息),进而更真实地记录真实世界。与之同时,光场相机采集的4维(4-Dimensional,4D)光场图像所蕴含的丰富信息促进了许多应用,如重聚焦、深度估计、虚拟/增强现实等。目前的商业级光场相机采用微透镜阵列来分离通过场景中同一位置点的不同方向的光线,进而在传感器平面同时记录空间信息和角度信息。但是,由于空间和角度维共享的传感器的分辨率有限,采集到的4D光场图像在提供高角度采样(或称高角度分辨率)的同时,它们的空间分辨率不可避免地被降低,因此,提高4D光场图像的空间分辨率成为了光场研究领域里一个亟待解决的重要问题。Different from conventional digital cameras, light field cameras can capture the intensity (ie, spatial information) and direction information (ie, angle information) of light in the scene, thereby recording the real world more realistically. At the same time, the rich information contained in 4-dimensional (4D) light field images captured by light field cameras facilitates many applications, such as refocusing, depth estimation, virtual/augmented reality, and so on. Current commercial-grade light field cameras use a microlens array to separate light from different directions passing through the same point in the scene, thereby simultaneously recording spatial and angular information at the sensor plane. However, due to the limited resolution of the sensor sharing the spatial and angular dimensions, the captured 4D light field images provide high angular sampling (or high angular resolution) while their spatial resolution is inevitably reduced. , improving the spatial resolution of 4D light field images has become an important problem to be solved urgently in the field of light field research.

一般而言,4D光场图像包含多种可相互转换的可视化方法,如基于2维(2-Dimensional,2D)空间信息显示的子孔径图像(Sub-Aperture Image,SAI)阵列、基于2D角度信息显示的微透镜图像(Micro-Lens Image,MLI)阵列,以及联合1维空间信息与1维角度信息显示的极平面图像(Epipolar Plane Image,EPI)等。直观地,提高4D光场图像的空间分辨率即是提高4D光场图像中每幅2D SAI的分辨率。因此,一种直接的做法就是将现有的2D图像超分辨率重建方法,如Haris等人提出的深度反投影网络,Lai等人提出的深度拉普拉斯金字塔网络等,独立地应用于每幅SAI,但是这种做法忽略了4D光场图像嵌入在角度域的信息,并且很难保证超分辨率结果的角度一致性。因此,通过探索4D光场图像的高维结构特性是设计4D光场图像空间超分辨率重建方法的关键。目前针对4D光场图像的空间超分辨率重建方法可大致分为基于优化和基于学习两类。Generally speaking, 4D light field images include a variety of mutually convertible visualization methods, such as sub-aperture image (SAI) arrays based on 2-dimensional (2-Dimensional, 2D) spatial information display, Displayed Micro-Lens Image (MLI) array, and Epipolar Plane Image (EPI) displayed by combining 1-dimensional spatial information and 1-dimensional angular information. Intuitively, improving the spatial resolution of the 4D light field image is to improve the resolution of each 2D SAI in the 4D light field image. Therefore, a straightforward approach is to apply existing 2D image super-resolution reconstruction methods, such as the deep back-projection network proposed by Haris et al., and the deep Laplacian pyramid network proposed by Lai et al. However, this approach ignores the information embedded in the angular domain of the 4D light field image, and it is difficult to guarantee the angular consistency of the super-resolution results. Therefore, exploring the high-dimensional structural properties of 4D light-field images is the key to designing spatial super-resolution reconstruction methods for 4D light-field images. The current spatial super-resolution reconstruction methods for 4D light field images can be roughly divided into two categories: optimization-based and learning-based.

基于优化的方法通常利用估计的视差或深度信息来建模4D光场图像的各SAI之间的关系,进而将4D光场图像空间超分辨率重建表示为优化问题。但是,从低空间分辨率光场图像中推断出的视差或深度信息并不十分可靠,因而基于优化的方法所展现出的性能相当有限。Optimization-based methods usually utilize the estimated disparity or depth information to model the relationship between various SAIs of 4D light-field images, and then formulate 4D light-field image spatial super-resolution reconstruction as an optimization problem. However, disparity or depth information inferred from low spatial resolution light field images is not very reliable, so optimization-based methods exhibit rather limited performance.

基于学习的方法是通过数据驱动的方式来探索4D光场图像的内在高维结构,并以此学习低空间分辨率光场图像和高空间分辨率光场图像之间的非线性映射。例如,Yeung等人利用空间-角度可分离卷积来迭代地利用4D光场图像的空间和角度信息。Wang等人则开发了一个空间-角度交互网络来融合4D光场图像的空间和角度信息。Jin等人提出了一种新颖的融合机制来利用SAI之间的补偿信息,并通过两阶段网络来恢复4D光场图像的视差细节。尽管上述方法在低重建尺度下(如2×)取得了较好的性能,但在大重建尺度下(如8×),仍无法有效恢复足够的纹理和细节信息。这是因为低分辨率光场图像所包含的空间和角度信息有限,而上述方法又只能通过4D光场图像内部的信息来推断由低分辨率而丢失的细节。Boominathan等人提出了一种使用混合输入的光场图像空间超分辨率重建方法,其通过引入一幅额外的高分辨率2D图像作为补充信息来提高4D光场图像的空间分辨率,但该方法中的平均融合机制容易模糊重建结果,并且独立地对每幅SAI进行处理会破坏重建光场图像的视差结构。Learning-based methods explore the intrinsic high-dimensional structure of 4D light-field images in a data-driven manner, and learn the nonlinear mapping between low-spatial-resolution light-field images and high-spatial-resolution light-field images. For example, Yeung et al. exploit spatial-angle separable convolution to iteratively exploit the spatial and angular information of 4D light field images. Wang et al. developed a spatial-angle interaction network to fuse the spatial and angular information of 4D light field images. Jin et al. proposed a novel fusion mechanism to exploit the compensation information between SAIs and recover the parallax details of 4D light-field images through a two-stage network. Although the above methods achieve good performance at low reconstruction scales (such as 2×), they still cannot effectively recover sufficient texture and detail information at large reconstruction scales (such as 8×). This is because low-resolution light-field images contain limited spatial and angular information, and the above methods can only infer details lost by low-resolution from the information inside 4D light-field images. Boominathan et al. proposed a light-field image spatial super-resolution reconstruction method using mixed inputs, which improved the spatial resolution of 4D light-field images by introducing an additional high-resolution 2D image as supplementary information, but this method The average fusion mechanism in SAI tends to blur the reconstruction results, and processing each SAI independently would destroy the parallax structure of the reconstructed light-field image.

综上,虽然目前的相关研究在低重建尺度下已经取得了不错的光场图像空间超分辨率重建效果,但是在处理大重建尺度(如8×)的问题上仍存在一定的不足,特别地,在恢复重建光场图像的高频纹理信息,并避免视觉伪像,以及保留视差结构方面还有一定的改进空间。To sum up, although the current related research has achieved good spatial super-resolution reconstruction effects of light field images at low reconstruction scales, there are still certain deficiencies in dealing with large reconstruction scales (such as 8×). , there is still room for improvement in recovering high-frequency texture information of reconstructed light field images, avoiding visual artifacts, and preserving parallax structure.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是提供一种光场图像空间超分辨率重建方法,其联合光场相机与传统2D相机以构成异构式成像系统,光场相机提供了丰富的角度信息和有限的空间信息,而传统2D相机则仅采集了光线的强度信息以获取足够的空间信息,进而可充分利用两者获取的角度信息和空间信息,以高质量地重建高空间分辨率光场图像,并恢复重建光场图像的纹理和细节信息,同时避免由视差带来的鬼影伪像,以及保留视差结构。The technical problem to be solved by the present invention is to provide a light-field image spatial super-resolution reconstruction method, which combines a light-field camera and a traditional 2D camera to form a heterogeneous imaging system. The light-field camera provides rich angle information and limited However, traditional 2D cameras only collect light intensity information to obtain sufficient spatial information, and then make full use of the angle information and spatial information obtained by the two to reconstruct high spatial resolution light field images with high quality. Recover texture and detail information of reconstructed light field images while avoiding ghost artifacts caused by parallax and preserving parallax structure.

本发明解决上述技术问题所采用的技术方案为:一种光场图像空间超分辨率重建方法,其特征在于包括以下步骤:The technical solution adopted by the present invention to solve the above technical problems is: a light field image space super-resolution reconstruction method, which is characterized by comprising the following steps:

步骤1:选取Num幅空间分辨率为W×H且角度分辨率为V×U的彩色三通道的低空间分辨率光场图像、对应的Num幅分辨率为αW×αH的彩色三通道的2D高分辨率图像,以及对应的Num幅空间分辨率为αW×αH且角度分辨率为V×U的彩色三通道的参考高空间分辨率光场图像;其中,Num>1,α表示空间分辨率提升倍数,α的值大于1;Step 1: Select Num light field images of low spatial resolution with three color channels with spatial resolution of W×H and angular resolution of V×U, and corresponding Num 2D images of three color channels with resolution of αW×αH High-resolution images, and the corresponding Num reference high-spatial-resolution light field images with three color channels with a spatial resolution of αW×αH and an angular resolution of V×U; where Num>1, α represents the spatial resolution Increase the multiplier, the value of α is greater than 1;

步骤2:构建一个卷积神经网络,作为空间超分辨率网络:空间超分辨率网络包括用于提取多尺度特征的编码器、用于配准光场特征和2D高分辨率特征的孔径级特征配准模块、用于从低空间分辨率光场图像中提取浅层特征的浅层特征提取层、用于融合光场特征和2D高分辨率特征的光场特征增强模块、用于缓解粗尺度特征中的配准误差的空间注意力块、用于将潜在特征重建为光场图像的解码器;Step 2: Build a convolutional neural network as a spatial super-resolution network: The spatial super-resolution network includes an encoder for extracting multi-scale features, aperture-level features for registering light field features, and 2D high-resolution features Registration module, shallow feature extraction layer for extracting shallow features from low spatial resolution light field images, light field feature enhancement module for fusing light field features and 2D high resolution features, for alleviating coarse scale A spatial attention block for registration errors in features, a decoder for reconstructing latent features into light field images;

对于编码器,其由依次连接的第一卷积层、第二卷积层、第一残差块和第二残差块组成,第一卷积层的输入端并行接收三个输入,分别为一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR经空间分辨率上采样后得到的图像重组的宽度为αsW×V且高度为αsH×U的子孔径图像阵列,将其记为

Figure BDA0003372237700000031
一幅宽度为αsW且高度为αsH的模糊后的2D高分辨率图像的单通道图像,将其记为
Figure BDA0003372237700000032
以及一幅宽度为αsW且高度为αsH的2D高分辨率图像的单通道图像,将其记为IHR,第一卷积层的输出端针对
Figure BDA0003372237700000033
输出64幅宽度为αsW×V且高度为αsH×U的特征图,将针对
Figure BDA0003372237700000034
输出的所有特征图构成的集合记为
Figure BDA0003372237700000035
第一卷积层的输出端针对
Figure BDA0003372237700000036
输出64幅宽度为αsW且高度为αsH的特征图,将针对
Figure BDA0003372237700000037
输出的所有特征图构成的集合记为
Figure BDA0003372237700000038
第一卷积层的输出端针对IHR输出64幅宽度为αsW且高度为αsH的特征图,将针对IHR输出的所有特征图构成的集合记为YHR,0;第二卷积层的输入端并行接收三个输入,分别为
Figure BDA0003372237700000039
中的所有特征图、
Figure BDA00033722377000000310
中的所有特征图和YHR,0中的所有特征图,第二卷积层的输出端针对
Figure BDA00033722377000000311
输出64幅宽度为
Figure BDA00033722377000000312
且高度为
Figure BDA00033722377000000313
的特征图,将针对
Figure BDA00033722377000000314
输出的所有特征图构成的集合记为
Figure BDA00033722377000000315
第二卷积层的输出端针对
Figure BDA00033722377000000316
输出64幅宽度为
Figure BDA0003372237700000041
且高度为
Figure BDA0003372237700000042
的特征图,将针对
Figure BDA0003372237700000043
输出的所有特征图构成的集合记为
Figure BDA0003372237700000044
第二卷积层的输出端针对YHR,0输出64幅宽度为
Figure BDA0003372237700000045
且高度为
Figure BDA0003372237700000046
的特征图,将针对YHR,0输出的所有特征图构成的集合记为YHR,1;第一残差块的输入端并行接收三个输入,分别为
Figure BDA0003372237700000047
中的所有特征图、
Figure BDA0003372237700000048
中的所有特征图和YHR,1中的所有特征图,第一残差块的输出端针对
Figure BDA0003372237700000049
输出64幅宽度为
Figure BDA00033722377000000410
且高度为
Figure BDA00033722377000000411
的特征图,将针对
Figure BDA00033722377000000412
输出的所有特征图构成的集合记为
Figure BDA00033722377000000413
第一残差块的输出端针对
Figure BDA00033722377000000414
输出64幅宽度为
Figure BDA00033722377000000415
且高度为
Figure BDA00033722377000000416
的特征图,将针对
Figure BDA00033722377000000417
输出的所有特征图构成的集合记为
Figure BDA00033722377000000418
第一残差块的输出端针对YHR,1输出64幅宽度为
Figure BDA00033722377000000419
且高度为
Figure BDA00033722377000000420
的特征图,将针对YHR,1输出的所有特征图构成的集合记为YHR,2;第二残差块的输入端并行接收三个输入,分别为
Figure BDA00033722377000000421
中的所有特征图、
Figure BDA00033722377000000422
中的所有特征图和YHR,2中的所有特征图,第二残差块的输出端针对
Figure BDA00033722377000000423
输出64幅宽度为
Figure BDA00033722377000000424
且高度为
Figure BDA00033722377000000425
的特征图,将针对
Figure BDA00033722377000000426
输出的所有特征图构成的集合记为
Figure BDA00033722377000000427
第二残差块的输出端针对
Figure BDA00033722377000000428
输出64幅宽度为
Figure BDA00033722377000000429
且高度为
Figure BDA00033722377000000430
的特征图,将针对
Figure BDA00033722377000000431
输出的所有特征图构成的集合记为
Figure BDA00033722377000000432
第二残差块的输出端针对YHR,2输出64幅宽度为
Figure BDA00033722377000000433
且高度为
Figure BDA00033722377000000434
的特征图,将针对YHR,2输出的所有特征图构成的集合记为YHR,3;其中,
Figure BDA00033722377000000435
为通过对空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR进行双三次插值上采样后得到的图像重组的宽度为αsW×V且高度为αsH×U的子孔径图像阵列,
Figure BDA00033722377000000436
为通过对IHR先进行双三次插值下采样、后进行双三次插值上采样得到,αs表示空间分辨率采样因子,αs 3=α,双三次插值上采样的上采样因子和双三次插值下采样的下采样因子均取值为αs,第一卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64,第二卷积层的卷积核的尺寸为3×3、卷积步长为2、输入通道数为64、输出通道数为64,第一卷积层和第二卷积层采用的激活函数均为“ReLU”;For the encoder, it consists of a first convolutional layer, a second convolutional layer, a first residual block, and a second residual block connected in sequence. The input of the first convolutional layer receives three inputs in parallel, which are A single-channel image LLR of a low spatial resolution light field image with a spatial resolution of W×H and an angular resolution of V×U The width of the image reconstruction obtained by up-sampling the spatial resolution is α s W×V and the sub-aperture image array with height α s H×U, denoted as
Figure BDA0003372237700000031
A single-channel image of a blurred 2D high-resolution image of width αsW and height αsH , denoted as
Figure BDA0003372237700000032
and a single-channel image of a 2D high-resolution image of width α s W and height α s H, denoted as I HR , the output of the first convolutional layer is for
Figure BDA0003372237700000033
Output 64 feature maps with width α s W×V and height α s H×U, which will be used for
Figure BDA0003372237700000034
The set of all output feature maps is denoted as
Figure BDA0003372237700000035
The output of the first convolutional layer is for
Figure BDA0003372237700000036
Output 64 feature maps with width α s W and height α s H, which will be used for
Figure BDA0003372237700000037
The set of all output feature maps is denoted as
Figure BDA0003372237700000038
The output end of the first convolutional layer outputs 64 feature maps with a width of α s W and a height of α s H for I HR , and the set formed by all feature maps output for I HR is denoted as Y HR,0 ; the second The input of the convolutional layer receives three inputs in parallel, which are
Figure BDA0003372237700000039
All feature maps in ,
Figure BDA00033722377000000310
All feature maps in and all feature maps in Y HR,0 , the output of the second convolutional layer for
Figure BDA00033722377000000311
The output 64 width is
Figure BDA00033722377000000312
and the height is
Figure BDA00033722377000000313
The feature map of , will be for
Figure BDA00033722377000000314
The set of all output feature maps is denoted as
Figure BDA00033722377000000315
The output of the second convolutional layer is for
Figure BDA00033722377000000316
The output 64 width is
Figure BDA0003372237700000041
and the height is
Figure BDA0003372237700000042
The feature map of , will be for
Figure BDA0003372237700000043
The set of all output feature maps is denoted as
Figure BDA0003372237700000044
The output of the second convolutional layer outputs 64 widths for Y HR, 0
Figure BDA0003372237700000045
and the height is
Figure BDA0003372237700000046
The feature map of , the set formed by all feature maps output for Y HR ,0 is denoted as Y HR,1 ; the input end of the first residual block receives three inputs in parallel, which are respectively
Figure BDA0003372237700000047
All feature maps in ,
Figure BDA0003372237700000048
All feature maps in and all feature maps in Y HR,1 , the output of the first residual block for
Figure BDA0003372237700000049
The output 64 width is
Figure BDA00033722377000000410
and the height is
Figure BDA00033722377000000411
The feature map of , will be for
Figure BDA00033722377000000412
The set of all output feature maps is denoted as
Figure BDA00033722377000000413
The output of the first residual block is for
Figure BDA00033722377000000414
The output 64 width is
Figure BDA00033722377000000415
and the height is
Figure BDA00033722377000000416
The feature map of , will be for
Figure BDA00033722377000000417
The set of all output feature maps is denoted as
Figure BDA00033722377000000418
The output of the first residual block is for Y HR, 1 outputs 64 widths of
Figure BDA00033722377000000419
and the height is
Figure BDA00033722377000000420
The feature map of , the set formed by all feature maps output for Y HR,1 is denoted as Y HR,2 ; the input of the second residual block receives three inputs in parallel, which are respectively
Figure BDA00033722377000000421
All feature maps in ,
Figure BDA00033722377000000422
All feature maps in and all feature maps in Y HR,2 , the output of the second residual block for
Figure BDA00033722377000000423
The output 64 width is
Figure BDA00033722377000000424
and the height is
Figure BDA00033722377000000425
The feature map of , will be for
Figure BDA00033722377000000426
The set of all output feature maps is denoted as
Figure BDA00033722377000000427
The output of the second residual block is for
Figure BDA00033722377000000428
The output 64 width is
Figure BDA00033722377000000429
and the height is
Figure BDA00033722377000000430
The feature map of , will be for
Figure BDA00033722377000000431
The set of all output feature maps is denoted as
Figure BDA00033722377000000432
The output end of the second residual block is for Y HR, 2 outputs 64 widths of
Figure BDA00033722377000000433
and the height is
Figure BDA00033722377000000434
The feature map of , the set formed by all feature maps output for Y HR,2 is denoted as Y HR,3 ; wherein,
Figure BDA00033722377000000435
is the width of α s W × V and a subaperture image array of height α s H × U,
Figure BDA00033722377000000436
In order to obtain by first performing bicubic interpolation downsampling on I HR , and then performing bicubic interpolation upsampling, α s represents the spatial resolution sampling factor, α s 3 =α, the upsampling factor of bicubic interpolation upsampling and the bicubic interpolation The downsampling factors of downsampling are all α s , the size of the convolution kernel of the first convolution layer is 3×3, the convolution stride is 1, the number of input channels is 1, and the number of output channels is 64. The size of the convolution kernel of the convolution layer is 3 × 3, the convolution stride is 2, the number of input channels is 64, and the number of output channels is 64. The activation functions used in the first convolution layer and the second convolution layer are both "ReLU";

对于孔径级特征配准模块,其输入端接收三类特征图,第一类是

Figure BDA0003372237700000051
中的所有特征图,第二类是
Figure BDA0003372237700000052
中的所有特征图,第三类包括四个输入,分别为YHR,0中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图、YHR,3中的所有特征图;在孔径级特征配准模块中,首先将
Figure BDA0003372237700000053
中的所有特征图、YHR,0中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图和YHR,3中的所有特征图各自复制V×U倍,使
Figure BDA0003372237700000054
中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图和YHR,3中的所有特征图的宽度变为
Figure BDA0003372237700000055
且高度变为
Figure BDA0003372237700000056
即使得尺寸与
Figure BDA0003372237700000057
中的特征图的尺寸相匹配,并使YHR,0中的所有特征图的宽度变为αsW×V且高度变为αsH×U,即使得尺寸与
Figure BDA0003372237700000058
中的特征图的尺寸相匹配;然后对
Figure BDA0003372237700000059
中的所有特征图和
Figure BDA00033722377000000510
中的所有特征图进行块匹配,块匹配结束后得到一幅宽度为
Figure BDA00033722377000000511
且高度为
Figure BDA00033722377000000512
的坐标索引图,记为PCI;接着根据PCI,将YHR,1中的所有特征图与
Figure BDA00033722377000000513
中的所有特征图进行空间位置配准,得到64幅宽度为
Figure BDA00033722377000000514
且高度为
Figure BDA00033722377000000515
的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,1;同样,根据PCI,将YHR,2中的所有特征图与
Figure BDA00033722377000000516
中的所有特征图进行空间位置配准,得到64幅宽度为
Figure BDA00033722377000000517
且高度为
Figure BDA00033722377000000518
的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,2;根据PCI,将YHR,3中的所有特征图与
Figure BDA00033722377000000519
中的所有特征图进行空间位置配准,得到64幅宽度为
Figure BDA00033722377000000520
且高度为
Figure BDA00033722377000000521
的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,3;再对PCI进行双三次插值上采样,得到一幅宽度为αsW×V且高度为αsH×U的坐标索引图,记为
Figure BDA00033722377000000522
最后根据
Figure BDA00033722377000000523
将YHR,0中的所有特征图与
Figure BDA00033722377000000524
中的所有特征图进行空间位置配准,得到64幅宽度为αsW×V且高度为αsH×U的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,0;孔径级特征配准模块的输出端输出FAlign,0中的所有特征图、FAlign,1中的所有特征图、FAlign,2中的所有特征图和FAlign,3中的所有特征图;其中,用于块匹配的精度衡量指标为纹理和结构相似度指数,用于块匹配的块的尺寸为3×3,双三次插值上采样的上采样因子为αs;For the aperture-level feature registration module, its input receives three types of feature maps. The first type is
Figure BDA0003372237700000051
All feature maps in , the second class is
Figure BDA0003372237700000052
All feature maps in Y HR, the third category consists of four inputs, namely all feature maps in Y HR,0 , all feature maps in Y HR,1 , all feature maps in Y HR,2 , Y HR,3 All feature maps in ; in the aperture-level feature registration module, first
Figure BDA0003372237700000053
All feature maps in Y HR, all feature maps in 0 , Y HR, all feature maps in 1 , Y HR, all feature maps in 2 , and all feature maps in Y HR, 3 each replicate V × U times, make
Figure BDA0003372237700000054
The widths of all feature maps in Y HR, 1 , Y HR, 2 , and Y HR, 3 become
Figure BDA0003372237700000055
and the height becomes
Figure BDA0003372237700000056
even if the size is the same as
Figure BDA0003372237700000057
match the dimensions of the feature maps in Y HR,0 and make the width of all feature maps in Y HR,0 become α s W×V and the height become α s H×U, i.e. make the size equal to
Figure BDA0003372237700000058
to match the dimensions of the feature maps in ; then
Figure BDA0003372237700000059
All feature maps in and
Figure BDA00033722377000000510
All feature maps in the
Figure BDA00033722377000000511
and the height is
Figure BDA00033722377000000512
The coordinate index map of , denoted as PCI ; then according to PCI , all feature maps in
Figure BDA00033722377000000513
All feature maps in
Figure BDA00033722377000000514
and the height is
Figure BDA00033722377000000515
The registration feature map of the
Figure BDA00033722377000000516
All feature maps in
Figure BDA00033722377000000517
and the height is
Figure BDA00033722377000000518
The registration feature map of the
Figure BDA00033722377000000519
All feature maps in
Figure BDA00033722377000000520
and the height is
Figure BDA00033722377000000521
The registration feature map of , and the set of all the obtained registration feature maps is denoted as F Align,3 ; then perform bicubic interpolation and upsampling on PCI to obtain a width of α s W × V and height of α s The coordinate index map of H×U, denoted as
Figure BDA00033722377000000522
Finally according to
Figure BDA00033722377000000523
Combine all feature maps in Y HR,0 with
Figure BDA00033722377000000524
All the feature maps in the spatial position are registered, and 64 registered feature maps with a width of α s W×V and a height of α s H×U are obtained, and the set of all the obtained registration feature maps is denoted as F Align ,0 ; the output of the aperture-level feature registration module outputs all feature maps in F Align,0 , all feature maps in F Align,1 , all feature maps in F Align,2 , and all feature maps in F Align,3 feature map; wherein, the accuracy measure used for block matching is texture and structure similarity index, the size of the block used for block matching is 3×3, and the upsampling factor for bicubic interpolation upsampling is α s ;

对于浅层特征提取层,其由1个第五卷积层组成,第五卷积层的输入端接收一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR重组的宽度为W×V且高度为H×U的子孔径图像阵列,第五卷积层的输出端输出64幅宽度为W×V且高度为H×U的特征图,将输出的所有特征图构成的集合记为FLR;其中,第五卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64,第五卷积层采用的激活函数为“ReLU”;For the shallow feature extraction layer, it consists of a fifth convolutional layer. The input of the fifth convolutional layer receives a low spatial resolution light with a spatial resolution of W×H and an angular resolution of V×U. The single-channel image LLR recombination of the field image is a sub-aperture image array with a width of W×V and a height of H×U, and the output of the fifth convolutional layer outputs 64 images with a width of W×V and a height of H×U. Feature map, the set of all output feature maps is denoted as F LR ; among them, the size of the convolution kernel of the fifth convolution layer is 3×3, the convolution step size is 1, the number of input channels is 1, and the output channel is 1. The number is 64, and the activation function used by the fifth convolutional layer is "ReLU";

对于光场特征增强模块,其由依次连接的第一增强残差块、第二增强残差块和第三增强残差块组成,第一增强残差块的输入端接收FAlign,1中的所有特征图和FLR中的所有特征图,第一增强残差块的输出端输出64幅宽度为

Figure BDA0003372237700000061
且高度为
Figure BDA0003372237700000062
的特征图,将输出的所有特征图构成的集合记为FEn,1;第二增强残差块的输入端接收FAlign,2中的所有特征图和FEn,1中的所有特征图,第二增强残差块的输出端输出64幅宽度为
Figure BDA0003372237700000063
且高度为
Figure BDA0003372237700000064
的特征图,将输出的所有特征图构成的集合记为FEn,2;第三增强残差块的输入端接收FAlign,3中的所有特征图和FEn,2中的所有特征图,第三增强残差块的输出端输出64幅宽度为
Figure BDA0003372237700000065
且高度为
Figure BDA0003372237700000066
的特征图,将输出的所有特征图构成的集合记为FEn,3;For the light field feature enhancement module, it consists of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block connected in sequence, and the input of the first enhanced residual block receives F Align,1 in For all feature maps and all feature maps in FLR , the output of the first enhanced residual block outputs 64 images with a width of
Figure BDA0003372237700000061
and the height is
Figure BDA0003372237700000062
The feature map of the output, the set formed by all the output feature maps is denoted as F En,1 ; the input end of the second enhanced residual block receives all the feature maps in F Align,2 and all the feature maps in F En,1 , The output end of the second enhanced residual block outputs 64 widths of
Figure BDA0003372237700000063
and the height is
Figure BDA0003372237700000064
The feature map of the output is denoted as F En,2 ; the input end of the third enhanced residual block receives all the feature maps in F Align,3 and all the feature maps in F En,2 , The output end of the third enhanced residual block outputs 64 widths of
Figure BDA0003372237700000065
and the height is
Figure BDA0003372237700000066
The feature map of , denote the set of all output feature maps as F En,3 ;

对于空间注意力块,其由依次连接的第六卷积层和第七卷积层组成,第六卷积层的输入端接收FAlign,0中的所有特征图,第六卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的空间注意力特征图,将输出的所有空间注意力特征图构成的集合记为FSA1;第七卷积层的输入端接收FSA1中的所有空间注意力特征图,第七卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的空间注意力特征图,将输出的所有空间注意力特征图构成的集合记为FSA2;将FAlign,0中的所有特征图与FSA2中的所有空间注意力特征图进行逐元素相乘,将得到的所有特征图构成的集合记为FWA,0;将FWA,0中的所有特征图作为空间注意力块的输出端输出的所有特征图;其中,第六卷积层和第七卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64,第六卷积层采用的激活函数为“ReLU”,第七卷积层采用的激活函数为“Sigmoid”;For the spatial attention block, it consists of the sixth convolutional layer and the seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives all the feature maps in F Align,0 , and the output of the sixth convolutional layer The terminal outputs 64 spatial attention feature maps with a width of α s W×V and a height of α s H×U, and the set of all output spatial attention feature maps is denoted as F SA1 ; the input of the seventh convolutional layer The terminal receives all the spatial attention feature maps in F SA1 , and the output terminal of the seventh convolutional layer outputs 64 spatial attention feature maps with a width of α s W×V and a height of α s H×U. The set composed of spatial attention feature maps is denoted as F SA2 ; all feature maps in F Align,0 are multiplied element by element with all spatial attention feature maps in F SA2 , and the set composed of all the obtained feature maps is denoted. is F WA,0 ; all feature maps in F WA,0 are used as all feature maps output by the output of the spatial attention block; wherein, the size of the convolution kernels of the sixth convolutional layer and the seventh convolutional layer are all It is 3 × 3, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 64. The activation function used in the sixth convolution layer is "ReLU", and the activation function used in the seventh convolution layer. is "Sigmoid";

对于解码器,其由依次连接的第三残差块、第四残差块、子像素卷积层、第八卷积层和第九卷积层组成,第三残差块的输入端接收FEn,3中的所有特征图,第三残差块的输出端输出64幅宽度为

Figure BDA0003372237700000071
且高度为
Figure BDA0003372237700000072
的特征图,将输出的所有特征图构成的集合记为FDec,1;第四残差块的输入端接收FDec,1中的所有特征图,第四残差块的输出端输出64幅宽度为
Figure BDA0003372237700000073
且高度为
Figure BDA0003372237700000074
的特征图,将输出的所有特征图构成的集合记为FDec,2;子像素卷积层的输入端接收FDec,2中的所有特征图,子像素卷积层的输出端输出256幅宽度为
Figure BDA0003372237700000075
且高度为
Figure BDA0003372237700000076
的特征图,并将256幅宽度为
Figure BDA0003372237700000077
且高度为
Figure BDA0003372237700000078
的特征图进一步转换为64幅宽度为αsW×V且高度为αsH×U的特征图,将转换后的所有特征图构成的集合记为FDec,3;第八卷积层的输入端接收FDec,3中的所有特征图与FWA,0中的所有特征图进行逐元素相加后的结果,第八卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的特征图,将输出的所有特征图构成的集合记为FDec,4;第九卷积层的输入端接收FDec,4中的所有特征图,第九卷积层的输出端输出一幅宽度为αsW×V且高度为αsH×U的重建单通道光场图像,并将该幅宽度为αsW×V且高度为αsH×U的重建单通道光场图像重组为空间分辨率为αsW×αsH且角度分辨率为V×U的高空间分辨率单通道光场图像,记为LSR;其中,子像素卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为256,第八卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为64,第九卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为1,子像素卷积层和第八卷积层采用的激活函数均为“ReLU”,第九卷积层不采用激活函数;For the decoder, it consists of the third residual block, the fourth residual block, the sub-pixel convolutional layer, the eighth convolutional layer and the ninth convolutional layer connected in sequence, and the input of the third residual block receives F For all feature maps in En,3 , the output of the third residual block outputs 64 widths of
Figure BDA0003372237700000071
and the height is
Figure BDA0003372237700000072
The feature map of the output, the set formed by all the output feature maps is denoted as F Dec,1 ; the input end of the fourth residual block receives all the feature maps in F Dec,1 , and the output end of the fourth residual block outputs 64 width is
Figure BDA0003372237700000073
and the height is
Figure BDA0003372237700000074
The feature map of the output is marked as F Dec,2 ; the input end of the subpixel convolution layer receives all the feature maps in F Dec,2 , and the output end of the subpixel convolution layer outputs 256 width is
Figure BDA0003372237700000075
and the height is
Figure BDA0003372237700000076
feature map of , and set the width of 256 as
Figure BDA0003372237700000077
and the height is
Figure BDA0003372237700000078
The feature maps are further converted into 64 feature maps with a width of α s W×V and a height of α s H×U, and the set of all the converted feature maps is denoted as F Dec,3 ; The input terminal receives the result of element-wise addition of all feature maps in F Dec,3 and all feature maps in F WA,0 , and the output terminal of the eighth convolutional layer outputs 64 images with a width of α s W×V and A feature map with a height of α s H×U, denote the set of all output feature maps as F Dec,4 ; the input of the ninth convolutional layer receives all the feature maps in F Dec,4 , and the ninth convolution The output of the layer outputs a reconstructed single-channel light field image with width α s W×V and height α s H×U, and converts the image with width α s W×V and height α s H×U. The reconstructed single-channel light field image is reorganized into a high spatial resolution single-channel light field image with a spatial resolution of α s W×α s H and an angular resolution of V×U, denoted as L SR ; among them, the sub-pixel convolution layer The size of the convolution kernel is 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 256, and the size of the convolution kernel of the eighth convolution layer is 3×3, the convolution stride is 1 is 1, the number of input channels is 64, the number of output channels is 64, the size of the convolution kernel of the ninth convolutional layer is 1×1, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 1, The activation functions used in the sub-pixel convolutional layer and the eighth convolutional layer are all "ReLU", and the ninth convolutional layer does not use an activation function;

步骤3:将训练集中的每幅低空间分辨率光场图像、对应的2D高分辨率图像、对应的参考高空间分辨率光场图像进行颜色空间转换,即从RGB颜色空间转换到YCbCr颜色空间,并提取出Y通道图像;然后将每幅低空间分辨率光场图像的Y通道图像重组为宽度为W×V且高度为H×U的子孔径图像阵列来表示;接着将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、对应的2D高分辨率图像的Y通道图像、对应的参考高空间分辨率光场图像的Y通道图像构成训练集;再构建金字塔网络,并利用训练集进行训练,具体过程为:Step 3: Convert each low spatial resolution light field image, corresponding 2D high resolution image, and corresponding reference high spatial resolution light field image in the training set to color space, that is, convert from RGB color space to YCbCr color space , and extract the Y-channel image; then recombine the Y-channel image of each low spatial resolution light field image into a sub-aperture image array with a width of W×V and a height of H×U to represent; The sub-aperture image array of the Y-channel image reconstruction of the low-spatial-resolution light field image, the Y-channel image of the corresponding 2D high-resolution image, and the Y-channel image of the corresponding reference high-spatial-resolution light field image constitute the training set; Pyramid network, and use the training set for training, the specific process is:

步骤3_1:将构建好的空间超分辨率网络复制三次,并进行级联,每个空间超分辨率网络的权重共享,即参数全都一样,将三个空间超分辨率网络构成的整体网络定义为金字塔网络;在每个金字塔水平,空间超分辨率网络的重建尺度设置为与αs取值相同;Step 3_1: Copy the constructed spatial super-resolution network three times and cascade it. The weights of each spatial super-resolution network are shared, that is, the parameters are all the same. The overall network composed of the three spatial super-resolution networks is defined as Pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to the same value as α s ;

步骤3_2:对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行两次空间分辨率下采样,将下采样后得到的图像作为标签图像;对训练集中的每幅2D高分辨率图像的Y通道图像也进行两次同样的空间分辨率下采样,将下采样后得到的图像作为针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第一个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同;Step 3_2: Perform two spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the rate image is also down-sampled twice with the same spatial resolution, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the first spatial super-resolution network in the pyramid network; The sub-aperture image array of the Y-channel image reconstruction of all low-spatial-resolution light-field images in the training set, and the image reconstruction of the Y-channel images of all low-spatial-resolution light-field images in the training set after a spatial resolution upsampling. Subaperture image array, all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after one spatial resolution downsampling and one spatial resolution upsampling is input into the first spatial super-resolution network in the constructed pyramid network for training, and the training is obtained. The high spatial resolution Y channel light field image is reconstructed by α s times corresponding to the Y channel image of each low spatial resolution light field image in the collection; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic The scales of interpolation, spatial resolution upsampling and spatial resolution downsampling are the same as α s ;

步骤3_3:对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行单次空间分辨率下采样,将下采样后得到的图像作为标签图像;对训练集中的每幅2D高分辨率图像的Y通道图像也进行单次同样的空间分辨率下采样,将下采样后得到的图像作为针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第二个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同;Step 3_3: Perform a single spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the high-speed image is also down-sampled at the same spatial resolution once, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the second spatial super-resolution network in the pyramid network; α s times corresponding to Y-channel images of all low-spatial-resolution light-field images in the training set Reconstructed sub-aperture image arrays of high-spatial-resolution Y-channel light-field images, Y of all low-spatial-resolution light-field images in the training set The α s -fold reconstruction of the high spatial resolution Y channel light field image corresponding to the channel image is the sub-aperture image array of the image reconstruction obtained after one spatial resolution upsampling, and all the images for the second spatial super-resolution network in the pyramid network. The 2D high-resolution Y-channel image and all 2D high-resolution Y-channel images for the second spatial super-resolution network in the pyramid network are obtained after one spatial resolution downsampling and one spatial resolution upsampling. The 2D high-resolution Y-channel image is input into the second spatial super-resolution network in the constructed pyramid network for training, and the α s 2 corresponding to the Y-channel image of each low-spatial-resolution light field image in the training set is obtained. The Y-channel light field image with high spatial resolution is reconstructed by 2 times; among them, the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, and the scales of spatial resolution upsampling and spatial resolution downsampling are the same as α s take the same value;

步骤3_4:将训练集中的每幅参考高空间分辨率光场图像的Y通道图像作为标签图像;将训练集中的每幅2D高分辨率图像的Y通道图像作为针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第三个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs 3倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同;Step 3_4: Use the Y-channel image of each reference high-spatial-resolution light field image in the training set as the label image; use the Y-channel image of each 2D high-resolution image in the training set as the third spatial image in the pyramid network. 2D high-resolution Y-channel images of the super-resolution network; then αs 2 times the corresponding Y-channel images of all low-spatial-resolution light-field images in the training set reconstructed sub-apertures of the high-spatial-resolution Y-channel light-field images Image array, sub-aperture image of image reconstruction obtained by upsampling of high spatial resolution Y channel light field images corresponding to α s 2 times corresponding to Y channel images of all low spatial resolution light field images in the training set array, all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after resolution downsampling and one spatial resolution upsampling is input into the third spatial super-resolution network in the constructed pyramid network for training, and each image in the training set is obtained. The high spatial resolution Y channel light field image is reconstructed by 3 times the α s corresponding to the Y channel image of the low spatial resolution light field image; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, The scales of spatial resolution upsampling and spatial resolution downsampling are the same as α s ;

在训练结束后得到金字塔网络中的各空间超分辨率网络中的所有卷积核的最佳权重参数,即得到训练有素的空间超分辨率网络模型;After the training, the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network are obtained, that is, the well-trained spatial super-resolution network model is obtained;

步骤4:任意选取一幅彩色三通道的低空间分辨率光场图像和对应的一幅彩色三通道的2D高分辨率图像作为测试图像;然后将彩色三通道的低空间分辨率光场图像和对应的彩色三通道的2D高分辨率图像从RGB颜色空间转换到YCbCr颜色空间,并提取出Y通道图像;接着将低空间分辨率光场图像的Y通道图像重组为子孔径图像阵列来表示;再将低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、2D高分辨率图像的Y通道图像和2D高分辨率图像的Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到空间超分辨率网络模型中,测试得到低空间分辨率光场图像的Y通道图像对应的重建高空间分辨率Y通道光场图像;之后对低空间分辨率光场图像的Cb通道图像和Cr通道图像分别进行双三次插值上采样,得到低空间分辨率光场图像的Cb通道图像对应的重建高空间分辨率Cb通道光场图像和低空间分辨率光场图像的Cr通道图像对应的重建高空间分辨率Cr通道光场图像;最后将得到的重建高空间分辨率Y通道光场图像、重建高空间分辨率Cb通道光场图像和重建高空间分辨率Cr通道光场图像在颜色通道维度上进行级联,并将级联结果重新转换到RGB颜色空间,得到低空间分辨率光场图像对应的彩色三通道的重建高空间分辨率光场图像。Step 4: Arbitrarily select a low spatial resolution light field image of three color channels and a corresponding 2D high resolution image of three color channels as test images; then use the low spatial resolution light field image of three color channels and The 2D high-resolution image of the corresponding color three-channel is converted from the RGB color space to the YCbCr color space, and the Y-channel image is extracted; then the Y-channel image of the low-spatial-resolution light field image is recombined into a sub-aperture image array to represent; The sub-aperture image array obtained by recombining the Y-channel image of the low-spatial-resolution light field image, the sub-aperture image array obtained by upsampling the Y-channel image of the low-spatial-resolution light field image for one time, and the 2D The Y-channel image of the high-resolution image and the Y-channel image of the 2D high-resolution image are subjected to one spatial resolution down-sampling and one spatial resolution up-sampling. The blurred 2D high-resolution Y-channel image is input to the spatial super-resolution In the rate network model, the reconstructed high-spatial-resolution Y-channel light-field image corresponding to the Y-channel image of the low-spatial-resolution light-field image was obtained by testing; Bicubic interpolation and upsampling to obtain the reconstructed high spatial resolution Cb channel light field image corresponding to the Cb channel image of the low spatial resolution light field image and the reconstructed high spatial resolution Cr corresponding to the Cr channel image of the low spatial resolution light field image channel light field image; finally, the obtained reconstructed high spatial resolution Y channel light field image, reconstructed high spatial resolution Cb channel light field image and reconstructed high spatial resolution Cr channel light field image are cascaded in the color channel dimension, The concatenated result is re-converted to the RGB color space, and the reconstructed high-spatial-resolution light-field image with three color channels corresponding to the low-spatial-resolution light-field image is obtained.

所述的步骤2中,第一残差块、第二残差块、第三残差块和第四残差块的结构相同,其均由依次连接的第三卷积层和第四卷积层组成,第一残差块中的第三卷积层的输入端并行接收三个输入,分别为

Figure BDA0003372237700000101
中的所有特征图、
Figure BDA0003372237700000102
中的所有特征图和YHR,1中的所有特征图,第一残差块中的第三卷积层的输出端针对
Figure BDA0003372237700000103
输出64幅宽度为
Figure BDA00033722377000001023
且高度为
Figure BDA0003372237700000104
的特征图,将针对
Figure BDA0003372237700000105
输出的所有特征图构成的集合记为
Figure BDA0003372237700000106
第一残差块中的第三卷积层的输出端针对
Figure BDA0003372237700000107
输出64幅宽度为
Figure BDA0003372237700000108
且高度为
Figure BDA0003372237700000109
的特征图,将针对
Figure BDA00033722377000001010
输出的所有特征图构成的集合记为
Figure BDA00033722377000001011
第一残差块中的第三卷积层的输出端针对YHR,1输出64幅宽度为
Figure BDA00033722377000001024
且高度为
Figure BDA00033722377000001012
的特征图,将针对YHR,1输出的所有特征图构成的集合记为
Figure BDA00033722377000001013
第一残差块中的第四卷积层的输入端并行接收三个输入,分别为
Figure BDA00033722377000001014
中的所有特征图、
Figure BDA00033722377000001015
中的所有特征图和
Figure BDA00033722377000001016
中的所有特征图,第一残差块中的第四卷积层的输出端针对
Figure BDA00033722377000001017
输出64幅宽度为
Figure BDA00033722377000001018
且高度为
Figure BDA00033722377000001019
的特征图,将针对
Figure BDA00033722377000001020
输出的所有特征图构成的集合记为
Figure BDA00033722377000001021
第一残差块中的第四卷积层的输出端针对
Figure BDA0003372237700000111
输出64幅宽度为
Figure BDA0003372237700000112
且高度为
Figure BDA0003372237700000113
的特征图,将针对
Figure BDA00033722377000001144
输出的所有特征图构成的集合记为
Figure BDA0003372237700000114
第一残差块中的第四卷积层的输出端针对
Figure BDA0003372237700000115
输出64幅宽度为
Figure BDA0003372237700000116
且高度为
Figure BDA0003372237700000117
的特征图,将针对
Figure BDA0003372237700000118
输出的所有特征图构成的集合记为
Figure BDA0003372237700000119
Figure BDA00033722377000001110
中的所有特征图与
Figure BDA00033722377000001111
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对
Figure BDA00033722377000001112
输出的所有特征图,这些特征图构成的集合即为
Figure BDA00033722377000001113
Figure BDA00033722377000001114
中的所有特征图与
Figure BDA00033722377000001115
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对
Figure BDA00033722377000001116
输出的所有特征图,这些特征图构成的集合即为
Figure BDA00033722377000001117
Figure BDA00033722377000001118
中的所有特征图与
Figure BDA00033722377000001119
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对YHR,1输出的所有特征图,这些特征图构成的集合即为YHR,2;In the step 2, the structure of the first residual block, the second residual block, the third residual block and the fourth residual block is the same, and they are all composed of the third convolution layer and the fourth convolution layer connected in sequence. The input of the third convolutional layer in the first residual block receives three inputs in parallel, which are
Figure BDA0003372237700000101
All feature maps in ,
Figure BDA0003372237700000102
All feature maps in and all feature maps in Y HR,1 , the output of the third convolutional layer in the first residual block for
Figure BDA0003372237700000103
The output 64 width is
Figure BDA00033722377000001023
and the height is
Figure BDA0003372237700000104
The feature map of , will be for
Figure BDA0003372237700000105
The set of all output feature maps is denoted as
Figure BDA0003372237700000106
The output of the third convolutional layer in the first residual block is for
Figure BDA0003372237700000107
The output 64 width is
Figure BDA0003372237700000108
and the height is
Figure BDA0003372237700000109
The feature map of , will be for
Figure BDA00033722377000001010
The set of all output feature maps is denoted as
Figure BDA00033722377000001011
The output of the third convolutional layer in the first residual block is for Y HR, and 1 outputs 64 widths of
Figure BDA00033722377000001024
and the height is
Figure BDA00033722377000001012
The feature map of , the set of all feature maps output for Y HR,1 is recorded as
Figure BDA00033722377000001013
The input of the fourth convolutional layer in the first residual block receives three inputs in parallel, which are
Figure BDA00033722377000001014
All feature maps in ,
Figure BDA00033722377000001015
All feature maps in and
Figure BDA00033722377000001016
All feature maps in , the output of the fourth convolutional layer in the first residual block is for
Figure BDA00033722377000001017
The output 64 width is
Figure BDA00033722377000001018
and the height is
Figure BDA00033722377000001019
The feature map of , will be for
Figure BDA00033722377000001020
The set of all output feature maps is denoted as
Figure BDA00033722377000001021
The output of the fourth convolutional layer in the first residual block is for
Figure BDA0003372237700000111
The output 64 width is
Figure BDA0003372237700000112
and the height is
Figure BDA0003372237700000113
The feature map of , will be for
Figure BDA00033722377000001144
The set of all output feature maps is denoted as
Figure BDA0003372237700000114
The output of the fourth convolutional layer in the first residual block is for
Figure BDA0003372237700000115
The output 64 width is
Figure BDA0003372237700000116
and the height is
Figure BDA0003372237700000117
The feature map of , will be for
Figure BDA0003372237700000118
The set of all output feature maps is denoted as
Figure BDA0003372237700000119
Will
Figure BDA00033722377000001110
All feature maps in
Figure BDA00033722377000001111
All feature maps in the
Figure BDA00033722377000001112
All output feature maps, the set of these feature maps is
Figure BDA00033722377000001113
Will
Figure BDA00033722377000001114
All feature maps in
Figure BDA00033722377000001115
All feature maps in the
Figure BDA00033722377000001116
All output feature maps, the set of these feature maps is
Figure BDA00033722377000001117
Will
Figure BDA00033722377000001118
All feature maps in
Figure BDA00033722377000001119
All feature maps in are added element by element, and all feature maps obtained are used as the output of the first residual block for all feature maps output by Y HR,1 , and the set formed by these feature maps is Y HR,2 ;

第二残差块中的第三卷积层的输入端并行接收三个输入,分别为

Figure BDA00033722377000001120
中的所有特征图、
Figure BDA00033722377000001121
中的所有特征图和YHR,2中的所有特征图,第二残差块中的第三卷积层的输出端针对
Figure BDA00033722377000001122
输出64幅宽度为
Figure BDA00033722377000001123
且高度为
Figure BDA00033722377000001124
的特征图,将针对
Figure BDA00033722377000001125
输出的所有特征图构成的集合记为
Figure BDA00033722377000001126
第二残差块中的第三卷积层的输出端针对
Figure BDA00033722377000001127
输出64幅宽度为
Figure BDA00033722377000001128
且高度为
Figure BDA00033722377000001129
的特征图,将针对
Figure BDA00033722377000001130
输出的所有特征图构成的集合记为
Figure BDA00033722377000001131
第二残差块中的第三卷积层的输出端针对YHR,2输出64幅宽度为
Figure BDA00033722377000001132
且高度为
Figure BDA00033722377000001133
的特征图,将针对YHR,2输出的所有特征图构成的集合记为
Figure BDA00033722377000001134
第二残差块中的第四卷积层的输入端并行接收三个输入,分别为
Figure BDA00033722377000001135
中的所有特征图、
Figure BDA00033722377000001136
中的所有特征图和
Figure BDA00033722377000001137
中的所有特征图,第二残差块中的第四卷积层的输出端针对
Figure BDA00033722377000001138
输出64幅宽度为
Figure BDA00033722377000001139
且高度为
Figure BDA00033722377000001140
的特征图,将针对
Figure BDA00033722377000001141
输出的所有特征图构成的集合记为
Figure BDA00033722377000001142
第二残差块中的第四卷积层的输出端针对
Figure BDA00033722377000001143
输出64幅宽度为
Figure BDA0003372237700000121
且高度为
Figure BDA0003372237700000122
的特征图,将针对
Figure BDA0003372237700000123
输出的所有特征图构成的集合记为
Figure BDA0003372237700000124
第二残差块中的第四卷积层的输出端针对
Figure BDA0003372237700000125
输出64幅宽度为
Figure BDA0003372237700000126
且高度为
Figure BDA0003372237700000127
的特征图,将针对
Figure BDA0003372237700000128
输出的所有特征图构成的集合记为
Figure BDA0003372237700000129
Figure BDA00033722377000001210
中的所有特征图与
Figure BDA00033722377000001211
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对
Figure BDA00033722377000001212
输出的所有特征图,这些特征图构成的集合即为
Figure BDA00033722377000001213
Figure BDA00033722377000001214
中的所有特征图与
Figure BDA00033722377000001215
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对
Figure BDA00033722377000001216
输出的所有特征图,这些特征图构成的集合即为
Figure BDA00033722377000001217
将YHR,2中的所有特征图与
Figure BDA00033722377000001218
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对YHR,2输出的所有特征图,这些特征图构成的集合即为YHR,3;The input of the third convolutional layer in the second residual block receives three inputs in parallel, which are
Figure BDA00033722377000001120
All feature maps in ,
Figure BDA00033722377000001121
All feature maps in and all feature maps in Y HR,2 , the output of the third convolutional layer in the second residual block for
Figure BDA00033722377000001122
The output 64 width is
Figure BDA00033722377000001123
and the height is
Figure BDA00033722377000001124
The feature map of , will be for
Figure BDA00033722377000001125
The set of all output feature maps is denoted as
Figure BDA00033722377000001126
The output of the third convolutional layer in the second residual block is for
Figure BDA00033722377000001127
The output 64 width is
Figure BDA00033722377000001128
and the height is
Figure BDA00033722377000001129
The feature map of , will be for
Figure BDA00033722377000001130
The set of all output feature maps is denoted as
Figure BDA00033722377000001131
The output of the third convolutional layer in the second residual block is for Y HR, 2 outputs 64 widths of
Figure BDA00033722377000001132
and the height is
Figure BDA00033722377000001133
The feature map of , the set of all feature maps output for Y HR,2 is recorded as
Figure BDA00033722377000001134
The input of the fourth convolutional layer in the second residual block receives three inputs in parallel, which are
Figure BDA00033722377000001135
All feature maps in ,
Figure BDA00033722377000001136
All feature maps in and
Figure BDA00033722377000001137
All feature maps in , the output of the fourth convolutional layer in the second residual block is for
Figure BDA00033722377000001138
The output 64 width is
Figure BDA00033722377000001139
and the height is
Figure BDA00033722377000001140
The feature map of , will be for
Figure BDA00033722377000001141
The set of all output feature maps is denoted as
Figure BDA00033722377000001142
The output of the fourth convolutional layer in the second residual block is for
Figure BDA00033722377000001143
The output 64 width is
Figure BDA0003372237700000121
and the height is
Figure BDA0003372237700000122
The feature map of , will be for
Figure BDA0003372237700000123
The set of all output feature maps is denoted as
Figure BDA0003372237700000124
The output of the fourth convolutional layer in the second residual block is for
Figure BDA0003372237700000125
The output 64 width is
Figure BDA0003372237700000126
and the height is
Figure BDA0003372237700000127
The feature map of , will be for
Figure BDA0003372237700000128
The set of all output feature maps is denoted as
Figure BDA0003372237700000129
Will
Figure BDA00033722377000001210
All feature maps in
Figure BDA00033722377000001211
All feature maps in the
Figure BDA00033722377000001212
All output feature maps, the set of these feature maps is
Figure BDA00033722377000001213
Will
Figure BDA00033722377000001214
All feature maps in
Figure BDA00033722377000001215
All feature maps in the
Figure BDA00033722377000001216
All output feature maps, the set of these feature maps is
Figure BDA00033722377000001217
Combine all feature maps in Y HR,2 with
Figure BDA00033722377000001218
All feature maps in are added element by element, and all feature maps obtained are used as the output of the second residual block for all feature maps output by Y HR,2 , and the set formed by these feature maps is Y HR,3 ;

第三残差块中的第三卷积层的输入端接收FEn,3中的所有特征图,第三残差块中的第三卷积层的输出端输出64幅宽度为

Figure BDA00033722377000001219
且高度为
Figure BDA00033722377000001220
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001221
第三残差块中的第四卷积层的输入端接收
Figure BDA00033722377000001222
中的所有特征图,第三残差块中的第四卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001223
且高度为
Figure BDA00033722377000001224
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001225
将FEn,3中的所有特征图与
Figure BDA00033722377000001226
中的所有特征图进行逐元素相加,将得到的所有特征图作为第三残差块的输出端输出的所有特征图,这些特征图构成的集合即为FDec,1;The input terminal of the third convolutional layer in the third residual block receives all the feature maps in F En,3 , and the output terminal of the third convolutional layer in the third residual block outputs 64 images with a width of
Figure BDA00033722377000001219
and the height is
Figure BDA00033722377000001220
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001221
The input of the fourth convolutional layer in the third residual block receives
Figure BDA00033722377000001222
All feature maps in
Figure BDA00033722377000001223
and the height is
Figure BDA00033722377000001224
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001225
Combine all feature maps in F En,3 with
Figure BDA00033722377000001226
All feature maps in are added element by element, and all the feature maps obtained are used as all feature maps output by the output of the third residual block, and the set formed by these feature maps is F Dec,1 ;

第四残差块中的第三卷积层的输入端接收FDec,1中的所有特征图,第四残差块中的第三卷积层的输出端输出64幅宽度为

Figure BDA00033722377000001227
且高度为
Figure BDA00033722377000001228
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001229
第四残差块中的第四卷积层的输入端接收
Figure BDA00033722377000001230
中的所有特征图,第四残差块中的第四卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001231
且高度为
Figure BDA0003372237700000131
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000132
将FDec,1中的所有特征图与
Figure BDA0003372237700000133
中的所有特征图进行逐元素相加,将得到的所有特征图作为第四残差块的输出端输出的所有特征图,这些特征图构成的集合即为FDec,2;The input of the third convolutional layer in the fourth residual block receives all the feature maps in F Dec,1 , and the output of the third convolutional layer in the fourth residual block outputs 64 images with a width of
Figure BDA00033722377000001227
and the height is
Figure BDA00033722377000001228
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001229
The input of the fourth convolutional layer in the fourth residual block receives
Figure BDA00033722377000001230
All feature maps in , the output of the fourth convolutional layer in the fourth residual block outputs 64 width
Figure BDA00033722377000001231
and the height is
Figure BDA0003372237700000131
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000132
Combine all feature maps in F Dec,1 with
Figure BDA0003372237700000133
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the fourth residual block, and the set formed by these feature maps is F Dec,2 ;

上述,第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层和第四卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64,第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层采用的激活函数为“ReLU”、第四卷积层不采用激活函数。As mentioned above, the size of the convolution kernels of the third convolution layer and the fourth convolution layer in the first residual block, the second residual block, the third residual block and the fourth residual block are all 3×3 , the convolution step size is 1, the number of input channels is 64, the number of output channels is 64, the first residual block, the second residual block, the third residual block and the fourth residual block The activation function used in the third convolutional layer is "ReLU", and the fourth convolutional layer does not use an activation function.

所述的步骤2中,第一增强残差块、第二增强残差块和第三增强残差块的结构相同,其均由依次连接的第一空间特征变换层、第一空间角度卷积层、第二空间特征变换层、第二空间角度卷积层和通道注意力层组成,第一空间特征变换层和第二空间特征变换层的结构相同,其均由并行的第十卷积层和第十一卷积层组成,第一空间角度卷积层和第二空间角度卷积层的结构相同,其均由依次连接的第十二卷积层和第十三卷积层组成,通道注意力层由依次连接的全局均值池化层、第十四卷积层和第十五卷积层组成;In the step 2, the structure of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block is the same, and they are all connected by the first spatial feature transformation layer and the first spatial angle convolution layer. layer, the second spatial feature transformation layer, the second spatial angle convolution layer and the channel attention layer. It is composed of the eleventh convolutional layer, the first spatial angle convolutional layer and the second spatial angle convolutional layer have the same structure, and they are both composed of the twelfth convolutional layer and the thirteenth convolutional layer connected in sequence. The attention layer consists of a global mean pooling layer, a fourteenth convolutional layer, and a fifteenth convolutional layer connected in sequence;

第一增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA0003372237700000134
且高度为
Figure BDA0003372237700000135
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000136
第一增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000137
且高度为
Figure BDA0003372237700000138
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000139
第一增强残差块中的第一空间特征变换层的输入端接收FLR中的所有特征图,将FLR中的所有特征图与
Figure BDA00033722377000001310
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA00033722377000001311
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000001312
The input end of the tenth convolutional layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of
Figure BDA0003372237700000134
and the height is
Figure BDA0003372237700000135
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000136
The input of the eleventh convolution layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the first spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA0003372237700000137
and the height is
Figure BDA0003372237700000138
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000139
The input end of the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in the FLR , and compares all the feature maps in the FLR with
Figure BDA00033722377000001310
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA00033722377000001311
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the first enhanced residual block, and the set formed by these feature maps is denoted as
Figure BDA00033722377000001312

第一增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA00033722377000001313
中的所有特征图,第一增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000141
且高度为
Figure BDA0003372237700000142
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000143
Figure BDA0003372237700000144
中的所有特征图进行从空间维转换到角度维的重组操作,第一增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA0003372237700000145
中的所有特征图的重组操作结果,第一增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000146
且高度为
Figure BDA0003372237700000147
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000148
Figure BDA0003372237700000149
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第一增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000001410
The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receives
Figure BDA00033722377000001313
All feature maps in
Figure BDA0003372237700000141
and the height is
Figure BDA0003372237700000142
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000143
right
Figure BDA0003372237700000144
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the first enhanced residual block receives
Figure BDA0003372237700000145
The result of the reorganization operation of all feature maps in
Figure BDA0003372237700000146
and the height is
Figure BDA0003372237700000147
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000148
right
Figure BDA0003372237700000149
All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolutional layer in the first enhanced residual block, Denote the set of these feature maps as
Figure BDA00033722377000001410

第一增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA00033722377000001411
且高度为
Figure BDA00033722377000001412
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001413
第一增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001414
且高度为
Figure BDA00033722377000001415
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001416
第一增强残差块中的第二空间特征变换层的输入端接收
Figure BDA00033722377000001417
中的所有特征图,将
Figure BDA00033722377000001418
中的所有特征图与
Figure BDA00033722377000001419
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA00033722377000001420
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000001421
The input end of the tenth convolutional layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of
Figure BDA00033722377000001411
and the height is
Figure BDA00033722377000001412
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001413
The input of the eleventh convolution layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the second spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA00033722377000001414
and the height is
Figure BDA00033722377000001415
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001416
The input of the second spatial feature transform layer in the first enhanced residual block receives
Figure BDA00033722377000001417
All feature maps in , will
Figure BDA00033722377000001418
All feature maps in
Figure BDA00033722377000001419
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA00033722377000001420
All feature maps in the
Figure BDA00033722377000001421

第一增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA00033722377000001422
中的所有特征图,第一增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001423
且高度为
Figure BDA00033722377000001424
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001425
Figure BDA00033722377000001426
中的所有特征图进行从空间维转换到角度维的重组操作,第一增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA00033722377000001427
中的所有特征图的重组操作结果,第一增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000151
且高度为
Figure BDA0003372237700000152
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000153
Figure BDA0003372237700000154
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第一增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA0003372237700000155
The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the first enhanced residual block receives
Figure BDA00033722377000001422
For all feature maps in the first enhanced residual block, the output of the twelfth convolutional layer of the second spatial angle convolutional layer in the first enhanced residual block outputs 64 images with a width of
Figure BDA00033722377000001423
and the height is
Figure BDA00033722377000001424
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001425
right
Figure BDA00033722377000001426
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the first enhanced residual block receives
Figure BDA00033722377000001427
As a result of the reorganization operation of all feature maps in
Figure BDA0003372237700000151
and the height is
Figure BDA0003372237700000152
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000153
right
Figure BDA0003372237700000154
All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the first enhanced residual block, Denote the set of these feature maps as
Figure BDA0003372237700000155

第一增强残差块中的通道注意力层中的全局均值池化层的输入端接收

Figure BDA0003372237700000156
中的所有特征图,第一增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为
Figure BDA0003372237700000157
且高度为
Figure BDA0003372237700000158
的特征图,将输出的所有特征图构成的集合记为FGAP,1,FGAP,1中的每幅特征图中的所有特征值相同;第一增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,1中的所有特征图,第一增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为
Figure BDA0003372237700000159
且高度为
Figure BDA00033722377000001510
的特征图,将输出的所有特征图构成的集合记为FDS,1;第一增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,1中的所有特征图,第一增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001511
且高度为
Figure BDA00033722377000001512
的特征图,将输出的所有特征图构成的集合记为FUS,1;将FUS,1中的所有特征图与
Figure BDA00033722377000001513
中的所有特征图进行逐元素相乘,将得到的所有特征图作为第一增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,1;The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives
Figure BDA0003372237700000156
All feature maps in , the output of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 widths of
Figure BDA0003372237700000157
and the height is
Figure BDA0003372237700000158
The feature map of , denote the set of all output feature maps as F GAP,1 , and all feature values in each feature map in F GAP,1 are the same; in the channel attention layer in the first enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,1 , and the output of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 4 widths of
Figure BDA0003372237700000159
and the height is
Figure BDA00033722377000001510
The feature map of the output, the set of all output feature maps is denoted as F DS,1 ; the input end of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives F DS,1 in For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 images with a width of
Figure BDA00033722377000001511
and the height is
Figure BDA00033722377000001512
The feature map of the
Figure BDA00033722377000001513
Perform element-wise multiplication of all feature maps in , and take all the feature maps obtained as all feature maps output by the output of the channel attention layer in the first enhanced residual block, and denote the set of these feature maps as F CA ,1 ;

将FCA,1中的所有特征图与FLR中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,1Add all feature maps in F CA,1 and all feature maps in F LR element by element, and use all the feature maps obtained as all feature maps output by the output of the first enhanced residual block. These feature maps constitute The set of is F En,1 ;

第二增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA00033722377000001514
且高度为
Figure BDA00033722377000001515
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001516
第二增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000161
且高度为
Figure BDA0003372237700000162
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000163
第二增强残差块中的第一空间特征变换层的接收端接收FEn,1中的所有特征图,将FEn,1中的所有特征图与
Figure BDA0003372237700000164
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA0003372237700000165
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA0003372237700000166
The input terminal of the tenth convolutional layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of
Figure BDA00033722377000001514
and the height is
Figure BDA00033722377000001515
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001516
The input of the eleventh convolution layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the first spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA0003372237700000161
and the height is
Figure BDA0003372237700000162
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000163
The receiving end of the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F En, 1 , and compares all the feature maps in F En,1 with
Figure BDA0003372237700000164
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA0003372237700000165
All feature maps in the
Figure BDA0003372237700000166

第二增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA0003372237700000167
中的所有特征图,第二增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000168
且高度为
Figure BDA0003372237700000169
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001610
Figure BDA00033722377000001611
中的所有特征图进行从空间维转换到角度维的重组操作,第二增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA00033722377000001612
中的所有特征图的重组操作结果,第二增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001613
且高度为
Figure BDA00033722377000001614
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001615
Figure BDA00033722377000001616
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第二增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000001617
The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the second enhanced residual block receives
Figure BDA0003372237700000167
For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 images with a width of
Figure BDA0003372237700000168
and the height is
Figure BDA0003372237700000169
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001610
right
Figure BDA00033722377000001611
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the second enhanced residual block receives
Figure BDA00033722377000001612
The result of the reorganization operation of all feature maps in
Figure BDA00033722377000001613
and the height is
Figure BDA00033722377000001614
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001615
right
Figure BDA00033722377000001616
All feature maps in the reorganization operation from the angle dimension to the space dimension are reorganized, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as
Figure BDA00033722377000001617

第二增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA00033722377000001618
且高度为
Figure BDA00033722377000001619
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001620
第二增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001621
且高度为
Figure BDA00033722377000001622
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001623
第二增强残差块中的第二空间特征变换层的接收端接收
Figure BDA00033722377000001624
中的所有特征图,将
Figure BDA0003372237700000171
中的所有特征图与
Figure BDA0003372237700000172
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA0003372237700000173
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA0003372237700000174
The input end of the tenth convolutional layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of
Figure BDA00033722377000001618
and the height is
Figure BDA00033722377000001619
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001620
The input of the eleventh convolution layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the second spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA00033722377000001621
and the height is
Figure BDA00033722377000001622
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001623
The receiving end of the second spatial feature transform layer in the second enhanced residual block receives
Figure BDA00033722377000001624
All feature maps in , will
Figure BDA0003372237700000171
All feature maps in
Figure BDA0003372237700000172
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA0003372237700000173
All feature maps in the
Figure BDA0003372237700000174

第二增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA0003372237700000175
中的所有特征图,第二增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000176
且高度为
Figure BDA0003372237700000177
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000178
Figure BDA0003372237700000179
中的所有特征图进行从空间维转换到角度维的重组操作,第二增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA00033722377000001710
中的所有特征图的重组操作结果,第二增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001711
且高度为
Figure BDA00033722377000001712
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001713
Figure BDA00033722377000001714
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第二增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000001715
The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the second enhanced residual block receives
Figure BDA0003372237700000175
For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 images with a width of
Figure BDA0003372237700000176
and the height is
Figure BDA0003372237700000177
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000178
right
Figure BDA0003372237700000179
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the second enhanced residual block receives
Figure BDA00033722377000001710
The result of the reorganization operation of all feature maps in
Figure BDA00033722377000001711
and the height is
Figure BDA00033722377000001712
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001713
right
Figure BDA00033722377000001714
All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as
Figure BDA00033722377000001715

第二增强残差块中的通道注意力层中的全局均值池化层的输入端接收

Figure BDA00033722377000001716
中的所有特征图,第二增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为
Figure BDA00033722377000001717
且高度为
Figure BDA00033722377000001718
的特征图,将输出的所有特征图构成的集合记为FGAP,2,FGAP,2中的每幅特征图中的所有特征值相同;第二增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,2中的所有特征图,第二增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为
Figure BDA00033722377000001719
且高度为
Figure BDA00033722377000001720
的特征图,将输出的所有特征图构成的集合记为FDS,2;第二增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,2中的所有特征图,第二增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001721
且高度为
Figure BDA00033722377000001722
的特征图,将输出的所有特征图构成的集合记为FUS,2;将FUS,2中的所有特征图与
Figure BDA0003372237700000181
中的所有特征图进行逐元素相乘,将得到的所有特征图作为第二增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,2;The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives
Figure BDA00033722377000001716
All feature maps in , the output of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 widths of
Figure BDA00033722377000001717
and the height is
Figure BDA00033722377000001718
The feature map of , denote the set of all output feature maps as F GAP,2 , all feature values in each feature map in F GAP,2 are the same; in the channel attention layer in the second enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,2 , and the output of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 4 widths of
Figure BDA00033722377000001719
and the height is
Figure BDA00033722377000001720
The feature map of the output, the set of all output feature maps is denoted as F DS,2 ; the input end of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives F DS,2 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 images with a width of
Figure BDA00033722377000001721
and the height is
Figure BDA00033722377000001722
The feature map of , denote the set of all output feature maps as F US,2 ; all feature maps in F US,2 are combined with
Figure BDA0003372237700000181
Perform element-by-element multiplication of all feature maps in , and use all the feature maps obtained as all feature maps output by the output of the channel attention layer in the second enhanced residual block, and denote the set formed by these feature maps as F CA ,2 ;

将FCA,2中的所有特征图与FEn,1中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,2Add all feature maps in F CA,2 and all feature maps in F En,1 element-wise, and use all the resulting feature maps as all feature maps output by the output of the second enhanced residual block. These features The set of graphs is F En,2 ;

第三增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA0003372237700000182
且高度为
Figure BDA0003372237700000183
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000184
第三增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000185
且高度为
Figure BDA0003372237700000186
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000187
第三增强残差块中的第一空间特征变换层的接收端接收FEn,2中的所有特征图,将FEn,2中的所有特征图与
Figure BDA0003372237700000188
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA0003372237700000189
中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000001810
The input end of the tenth convolutional layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of
Figure BDA0003372237700000182
and the height is
Figure BDA0003372237700000183
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000184
The input of the eleventh convolution layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA0003372237700000185
and the height is
Figure BDA0003372237700000186
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000187
The receiver of the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F En, 2 , and compares all the feature maps in F En,2 with
Figure BDA0003372237700000188
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA0003372237700000189
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as
Figure BDA00033722377000001810

第三增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA00033722377000001811
中的所有特征图,第三增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001812
且高度为
Figure BDA00033722377000001813
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001814
Figure BDA00033722377000001815
中的所有特征图进行从空间维转换到角度维的重组操作,第三增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA00033722377000001816
中的所有特征图的重组操作结果,第三增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001817
且高度为
Figure BDA00033722377000001818
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001819
Figure BDA00033722377000001820
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第三增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA0003372237700000191
The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the third enhanced residual block receives
Figure BDA00033722377000001811
All feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 images with a width of
Figure BDA00033722377000001812
and the height is
Figure BDA00033722377000001813
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001814
right
Figure BDA00033722377000001815
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the third enhanced residual block receives
Figure BDA00033722377000001816
As a result of the reorganization operation of all feature maps in
Figure BDA00033722377000001817
and the height is
Figure BDA00033722377000001818
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001819
right
Figure BDA00033722377000001820
All feature maps in the recombination operation from the angle dimension to the space dimension are performed, and all the feature maps obtained after the recombination operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as
Figure BDA0003372237700000191

第三增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA0003372237700000192
且高度为
Figure BDA0003372237700000193
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000194
第三增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000195
且高度为
Figure BDA0003372237700000196
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000197
第三增强残差块中的第二空间特征变换层的接收端接收
Figure BDA0003372237700000198
中的所有特征图,将
Figure BDA0003372237700000199
中的所有特征图与
Figure BDA00033722377000001910
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA00033722377000001911
中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000001912
The input end of the tenth convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of
Figure BDA0003372237700000192
and the height is
Figure BDA0003372237700000193
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000194
The input of the eleventh convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA0003372237700000195
and the height is
Figure BDA0003372237700000196
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000197
The receiving end of the second spatial feature transformation layer in the third enhanced residual block receives
Figure BDA0003372237700000198
All feature maps in , will
Figure BDA0003372237700000199
All feature maps in
Figure BDA00033722377000001910
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA00033722377000001911
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the second spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as
Figure BDA00033722377000001912

第三增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA00033722377000001913
中的所有特征图,第三增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001914
且高度为
Figure BDA00033722377000001915
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001916
Figure BDA00033722377000001917
中的所有特征图进行从空间维转换到角度维的重组操作,第三增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA00033722377000001918
中的所有特征图的重组操作结果,第三增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA00033722377000001919
且高度为
Figure BDA00033722377000001920
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000001921
Figure BDA00033722377000001922
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第三增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000001923
The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the third enhanced residual block receives
Figure BDA00033722377000001913
For all feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 images with a width of
Figure BDA00033722377000001914
and the height is
Figure BDA00033722377000001915
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001916
right
Figure BDA00033722377000001917
All feature maps in the recombination operation are converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the third enhanced residual block receives
Figure BDA00033722377000001918
The result of the reorganization operation of all feature maps in
Figure BDA00033722377000001919
and the height is
Figure BDA00033722377000001920
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000001921
right
Figure BDA00033722377000001922
All feature maps in the reorganization operation from the angle dimension to the space dimension are reorganized, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as
Figure BDA00033722377000001923

第三增强残差块中的通道注意力层中的全局均值池化层的输入端接收

Figure BDA00033722377000001924
中的所有特征图,第三增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为
Figure BDA0003372237700000201
且高度为
Figure BDA0003372237700000202
的特征图,将输出的所有特征图构成的集合记为FGAP,3,FGAP,3中的每幅特征图中的所有特征值相同;第三增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,3中的所有特征图,第三增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为
Figure BDA0003372237700000203
且高度为
Figure BDA0003372237700000204
的特征图,将输出的所有特征图构成的集合记为FDS,3;第三增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,3中的所有特征图,第三增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000205
且高度为
Figure BDA0003372237700000206
的特征图,将输出的所有特征图构成的集合记为FUS,3;将FUS,3中的所有特征图与
Figure BDA0003372237700000207
中的所有特征图进行逐元素相乘,将得到的所有特征图作为第三增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,3;The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives
Figure BDA00033722377000001924
All feature maps in , the output of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 widths of
Figure BDA0003372237700000201
and the height is
Figure BDA0003372237700000202
The feature map of , denote the set of all output feature maps as F GAP,3 , all feature values in each feature map in F GAP,3 are the same; in the channel attention layer in the third enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,3 , and the output of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 4 widths of
Figure BDA0003372237700000203
and the height is
Figure BDA0003372237700000204
The feature map of the output, the set of all output feature maps is denoted as F DS,3 ; the input end of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives F DS,3 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 images with a width of
Figure BDA0003372237700000205
and the height is
Figure BDA0003372237700000206
The feature map of the
Figure BDA0003372237700000207
All feature maps in are multiplied element by element, and all the feature maps obtained are used as all feature maps output by the output of the channel attention layer in the third enhanced residual block, and the set formed by these feature maps is denoted as F CA ,3 ;

将FCA,3中的所有特征图与FEn,2中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,3Add all feature maps in F CA,3 and all feature maps in F En,2 element-wise, and use all the resulting feature maps as all feature maps output by the output of the third enhanced residual block. These features The set of graphs is F En,3 ;

上述,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十卷积层和第十一卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、均不采用激活函数,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十二卷积层和第十三卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、采用的激活函数均为“ReLU”,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十四卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为4、采用的激活函数为“ReLU”,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十五卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为4、输出通道数为64、采用的激活函数为“Sigmoid”。As mentioned above, the size of the convolution kernels of the tenth convolution layer and the eleventh convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block are all 3×3, volume The product step size is 1, the number of input channels is 64, the number of output channels is 64, and no activation function is used. The first enhanced residual block, the second enhanced residual block and the third enhanced residual block are each The size of the convolution kernel of the twelfth convolutional layer and the thirteenth convolutional layer is both 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 64, and the activation The functions are all "ReLU", the size of the convolution kernel of the fourteenth convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block is 1 × 1, the convolution step The length is 1, the number of input channels is 64, the number of output channels is 4, the activation function used is "ReLU", the tenth of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block. The size of the convolution kernel of the five convolutional layers is 1 × 1, the convolution stride is 1, the number of input channels is 4, the number of output channels is 64, and the activation function used is "Sigmoid".

与现有技术相比,本发明的优点在于:Compared with the prior art, the advantages of the present invention are:

1)本发明方法考虑到传统2D相机可采集丰富的空间信息,其可作为光场图像空间分辨率重建的补偿信息,因此同时使用光场图像和2D高分辨率图像,并在此基础上,构造了一个端到端的卷积神经网络以充分利用两者信息以重建高空间分辨率光场图像,并恢复细致的纹理信息以及保留重建结果的视差结构。1) The method of the present invention considers that the traditional 2D camera can collect rich spatial information, which can be used as compensation information for the spatial resolution reconstruction of the light field image, so the light field image and the 2D high-resolution image are used at the same time, and on this basis, An end-to-end convolutional neural network is constructed to fully utilize both information to reconstruct high spatial resolution light field images, and to recover detailed texture information as well as preserve the disparity structure of the reconstruction results.

2)为建立光场图像与2D高分辨率图像之间的联系,本发明方法构建了孔径级特征配准模块以在高维特征空间探索光场图像与2D高分辨率图像之间的相关性,进而准确地将2D高分辨率图像的特征信息配准到光场图像下;此外,本发明方法利用构建的光场特征增强模块以将配准得到的高分辨率特征与从低空间分辨率光场图像中提取的浅层光场特征进行多层次融合,以有效生成高空间分辨率光场特征,进而可将其重建为高空间分辨率光场图像。2) In order to establish the connection between the light field image and the 2D high-resolution image, the method of the present invention constructs an aperture-level feature registration module to explore the correlation between the light field image and the 2D high-resolution image in the high-dimensional feature space , and then accurately register the feature information of the 2D high-resolution image to the light-field image; in addition, the method of the present invention utilizes the constructed light-field feature enhancement module to match the registered high-resolution feature with the low spatial resolution feature. The shallow light field features extracted from the light field image are fused at multiple levels to effectively generate high spatial resolution light field features, which can then be reconstructed into high spatial resolution light field images.

3)为提高灵活性和实用性,本发明方法采用了一种金字塔网络重建方式,其通过在不同金字塔水平重建特定尺度的超分辨率结果,以渐进式地提高光场图像的空间分辨率并恢复纹理和细节,因而可在一次前向推断中重建多尺度结果(如包含2×,4×和8×);此外,本发明方法在不同金字塔水平下采用权重共享策略以有效降低所构建的金字塔网络的参数量并减轻训练负担。3) In order to improve flexibility and practicability, the method of the present invention adopts a pyramid network reconstruction method, which progressively improves the spatial resolution of light field images by reconstructing super-resolution results of specific scales at different pyramid levels. The texture and details are recovered, so that multi-scale results (such as including 2×, 4× and 8×) can be reconstructed in one forward inference; in addition, the method of the present invention adopts a weight sharing strategy at different pyramid levels to effectively reduce the constructed The amount of parameters of the pyramid network and reduce the training burden.

附图说明Description of drawings

图1为本发明方法的总体实现流程框图;Fig. 1 is the overall realization flow chart of the method of the present invention;

图2为本发明方法构建的卷积神经网络即空间超分辨率网络的组成结构示意图;2 is a schematic diagram of the composition structure of a convolutional neural network constructed by the method of the present invention, that is, a spatial super-resolution network;

图3a为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块的组成结构示意图;3a is a schematic diagram of the composition structure of a light field feature enhancement module in a convolutional neural network constructed by the method of the present invention, that is, a spatial super-resolution network;

图3b为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块中的第一空间特征变换层和第二空间特征变换层的组成结构示意图;3b is a schematic diagram of the composition structure of the first spatial feature transformation layer and the second spatial feature transformation layer in the light field feature enhancement module in the convolutional neural network constructed by the method of the present invention, that is, the spatial super-resolution network;

图3c为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块中的第一空间角度卷积层和第二空间角度卷积层的组成结构示意图;3c is a schematic diagram of the composition structure of the first spatial angle convolution layer and the second spatial angle convolution layer in the light field feature enhancement module in the light field feature enhancement module in the spatial super-resolution network constructed by the method of the present invention;

图3d为本发明方法构建的卷积神经网络即空间超分辨率网络中的光场特征增强模块中的通道注意力层的组成结构示意图;3d is a schematic diagram of the composition structure of a channel attention layer in a light field feature enhancement module in a convolutional neural network constructed by the method of the present invention, that is, a spatial super-resolution network;

图4为本发明方法建立的金字塔网络重建方式的说明示意图;FIG. 4 is a schematic diagram illustrating the reconstruction method of the pyramid network established by the method of the present invention;

图5a为采用双三次插值方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5a is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the tested EPFL light-field image database using the bicubic interpolation method, and the sub-aperture image under the center coordinates is taken here for display;

图5b为采用Haris等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5b is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Haris et al. ;

图5c为采用Lai等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Lai et al. ;

图5d为采用Yeung等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5d is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Yeung et al. ;

图5e为采用Wang等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5e is a reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Wang et al. ;

图5f为采用Jin等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5f shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Jin et al. ;

图5g为采用Boominathan等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5g is a reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Boominathan et al. ;

图5h为采用本发明方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;5h is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the tested EPFL light-field image database by the method of the present invention, and the sub-aperture image under the center coordinates is taken here to show;

图5i为测试的EPFL光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 5i is the label high spatial resolution light field image corresponding to the low spatial resolution light field image in the tested EPFL light field image database, and the sub-aperture image under the center coordinates is taken here to show;

图6a为采用双三次插值方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6a is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the tested STFLytro light-field image database using the bicubic interpolation method, and the sub-aperture image under the center coordinates is taken here for display;

图6b为采用Haris等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6b is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Haris et al. ;

图6c为采用Lai等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Lai et al. ;

图6d为采用Yeung等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6d is the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Yeung et al. ;

图6e为采用Wang等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6e shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Wang et al. ;

图6f为采用Jin等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6f shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Jin et al. ;

图6g为采用Boominathan等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;Figure 6g is a reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Boominathan et al. ;

图6h为采用本发明方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;6h is a reconstructed high-spatial-resolution light-field image obtained by processing the low-spatial-resolution light-field image in the STFLytro light-field image database tested by the method of the present invention, and the sub-aperture image under the center coordinates is taken here to show;

图6i为测试的STFLytro光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像。Figure 6i shows the labeled high spatial resolution light field image corresponding to the low spatial resolution light field image in the tested STFLytro light field image database.

具体实施方式Detailed ways

以下结合附图实施例对本发明作进一步详细描述。The present invention will be further described in detail below with reference to the embodiments of the accompanying drawings.

随着沉浸式媒体和技术的发展,用户越来越倾向于观看具有交互式和沉浸感的图像/视频等视觉内容。但是,传统2D成像方法仅能采集场景中光线的强度信息,无法提供场景的深度信息。相比之下,3D成像技术可获取更多的场景信息,然而其包含的深度信息有限,一般用于立体显示。光场成像作为一种新兴的成像技术,可在单次拍摄中同时采集场景中光线的强度和方向信息,进而更有效地记录真实世界,正受到广泛关注。同时,一些基于光场成像的光学仪器和设备已被开发以促进光场技术的应用与发展。受限于成像传感器的尺寸,利用光场相机获取的4D光场图像存在空间和角度分辨率相互制衡的问题。简单来说,4D光场图像在提供高角度分辨率的同时,不可避免地会遭受低空间分辨率,这严重影响了4D光场图像的实际应用,如重聚焦、深度估计等,针对此,本发明提出了一种光场图像空间超分辨率重建方法,With the development of immersive media and technology, users are more and more inclined to watch visual content such as images/videos with an interactive and immersive sense. However, the traditional 2D imaging method can only collect the intensity information of the light in the scene, and cannot provide the depth information of the scene. In contrast, 3D imaging technology can obtain more scene information, but it contains limited depth information and is generally used for stereoscopic display. As an emerging imaging technology, light field imaging can simultaneously capture the intensity and direction information of light in a scene in a single shot, thereby more effectively recording the real world, and is receiving widespread attention. Meanwhile, some optical instruments and devices based on light field imaging have been developed to promote the application and development of light field technology. Limited by the size of the imaging sensor, the 4D light field image obtained by the light field camera has the problem of balance between the spatial and angular resolution. In simple terms, 4D light field images inevitably suffer from low spatial resolution while providing high angular resolution, which seriously affects the practical applications of 4D light field images, such as refocusing, depth estimation, etc. The present invention proposes a light field image space super-resolution reconstruction method,

其通过异构式成像以在捕获光场图像的同时获取一幅2D高分辨率图像,进而将捕获的2D高分辨率图像作为补充信息来帮助增强光场图像的空间分辨率,具体是构建了一个空间超分辨率网络,其主要包括编码器、孔径级特征配准模块、光场特征增强模块和解码器等部分;首先利用编码器来分别对上采样后的低空间分辨率光场图像、模糊后的2D高分辨率图像和2D高分辨率图像本身提取多尺度特征;之后通过孔径级特征配准模块来学习2D高分辨率特征与低分辨率光场特征之间的对应性,以将2D高分辨率特征配准到光场图像的每个子孔径图像下并形成配准后的高分辨率光场特征;然后通过光场特征增强模块以利用配准得到的高分辨率光场特征来增强从输入光场图像中提取的浅层光场特征,得到增强后的高分辨率光场特征;最后,利用解码器将增强后的高分辨率光场特征重建为高质量的高空间分辨率光场图像;此外,采用金字塔网络重建架构,以在每个金字塔水平重建特定上采样尺度的高空间分辨率光场图像,进而可同时生成多尺度重建结果。It uses heterogeneous imaging to acquire a 2D high-resolution image while capturing the light-field image, and then uses the captured 2D high-resolution image as supplementary information to help enhance the spatial resolution of the light-field image. A spatial super-resolution network mainly includes an encoder, an aperture-level feature registration module, a light field feature enhancement module, and a decoder. The blurred 2D high-resolution image and the 2D high-resolution image itself extract multi-scale features; then the aperture-level feature registration module is used to learn the correspondence between 2D high-resolution features and low-resolution light field features to The 2D high-resolution features are registered under each sub-aperture image of the light field image to form the registered high-resolution light field features; Enhance the shallow light field features extracted from the input light field image to obtain the enhanced high-resolution light field features; finally, use the decoder to reconstruct the enhanced high-resolution light field features into high-quality high spatial resolution light field images; in addition, a pyramid network reconstruction architecture is employed to reconstruct high spatial resolution light field images at specific upsampling scales at each pyramid level, which in turn can generate multi-scale reconstruction results simultaneously.

本发明提出的一种光场图像空间超分辨率重建方法,其总体实现流程框图如图1所示,其包括以下步骤:A light field image spatial super-resolution reconstruction method proposed by the present invention, the overall implementation flowchart is shown in Figure 1, which includes the following steps:

步骤1:选取Num幅空间分辨率为W×H且角度分辨率为V×U的彩色三通道的低空间分辨率光场图像、对应的Num幅分辨率为αW×αH的彩色三通道的2D高分辨率图像,以及对应的Num幅空间分辨率为αW×αH且角度分辨率为V×U的彩色三通道的参考高空间分辨率光场图像;其中,Num>1,在本实施例中取Num=200,在本实施例中W×H为75×50、V×U为5×5,α表示空间分辨率提升倍数,α的值大于1,在本实施例中取α的值为8。Step 1: Select Num light field images of low spatial resolution with three color channels with spatial resolution of W×H and angular resolution of V×U, and corresponding Num 2D images of three color channels with resolution of αW×αH A high-resolution image, and a corresponding Num reference high-spatial-resolution light field image with three color channels with a spatial resolution of αW×αH and an angular resolution of V×U; where Num>1, in this embodiment Take Num=200, in this embodiment, W×H is 75×50, V×U is 5×5, α represents the spatial resolution improvement multiple, and the value of α is greater than 1, in this embodiment, the value of α is taken as 8.

步骤2:构建一个卷积神经网络,作为空间超分辨率网络:如图2所示,空间超分辨率网络包括用于提取多尺度特征的编码器、用于配准光场特征和2D高分辨率特征的孔径级特征配准模块、用于从低空间分辨率光场图像中提取浅层特征的浅层特征提取层、用于融合光场特征和2D高分辨率特征的光场特征增强模块、用于缓解粗尺度特征中的配准误差的空间注意力块、用于将潜在特征重建为光场图像的解码器。Step 2: Build a convolutional neural network as a spatial super-resolution network: As shown in Figure 2, the spatial super-resolution network includes an encoder for extracting multi-scale features, a light field feature for registration, and a 2D high-resolution network. Aperture-level feature registration module for rate features, shallow feature extraction layer for extracting shallow features from low spatial resolution light field images, light field feature enhancement module for fusing light field features and 2D high resolution features , Spatial attention blocks for alleviating registration errors in coarse-scale features, Decoders for reconstructing latent features into light-field images.

对于编码器,其由依次连接的第一卷积层、第二卷积层、第一残差块和第二残差块组成,第一卷积层的输入端并行接收三个输入,分别为一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR经空间分辨率上采样后得到的图像重组的宽度为αsW×V且高度为αsH×U的子孔径图像阵列,将其记为

Figure BDA0003372237700000251
一幅宽度为αsW且高度为αsH的模糊后的2D高分辨率图像的单通道图像,将其记为
Figure BDA0003372237700000252
以及一幅宽度为αsW且高度为αsH的2D高分辨率图像的单通道图像,将其记为IHR,第一卷积层的输出端针对
Figure BDA0003372237700000253
输出64幅宽度为αsW×V且高度为αsH×U的特征图,将针对
Figure BDA0003372237700000254
输出的所有特征图构成的集合记为
Figure BDA0003372237700000255
第一卷积层的输出端针对
Figure BDA0003372237700000256
输出64幅宽度为αsW且高度为αsH的特征图,将针对
Figure BDA0003372237700000257
输出的所有特征图构成的集合记为
Figure BDA0003372237700000258
第一卷积层的输出端针对IHR输出64幅宽度为αsW且高度为αsH的特征图,将针对IHR输出的所有特征图构成的集合记为YHR,0;第二卷积层的输入端并行接收三个输入,分别为
Figure BDA0003372237700000259
中的所有特征图、
Figure BDA00033722377000002510
中的所有特征图和YHR,0中的所有特征图,第二卷积层的输出端针对
Figure BDA00033722377000002511
输出64幅宽度为
Figure BDA00033722377000002512
且高度为
Figure BDA00033722377000002513
的特征图,将针对
Figure BDA00033722377000002514
输出的所有特征图构成的集合记为
Figure BDA00033722377000002515
第二卷积层的输出端针对
Figure BDA00033722377000002516
输出64幅宽度为
Figure BDA00033722377000002517
且高度为
Figure BDA00033722377000002518
的特征图,将针对
Figure BDA00033722377000002519
输出的所有特征图构成的集合记为
Figure BDA00033722377000002520
第二卷积层的输出端针对YHR,0输出64幅宽度为
Figure BDA00033722377000002521
且高度为
Figure BDA00033722377000002522
的特征图,将针对YHR,0输出的所有特征图构成的集合记为YHR,1;第一残差块的输入端并行接收三个输入,分别为
Figure BDA00033722377000002523
中的所有特征图、
Figure BDA00033722377000002524
中的所有特征图和YHR,1中的所有特征图,第一残差块的输出端针对
Figure BDA00033722377000002525
输出64幅宽度为
Figure BDA00033722377000002526
且高度为
Figure BDA00033722377000002527
的特征图,将针对
Figure BDA00033722377000002528
输出的所有特征图构成的集合记为
Figure BDA00033722377000002529
第一残差块的输出端针对
Figure BDA00033722377000002530
输出64幅宽度为
Figure BDA00033722377000002531
且高度为
Figure BDA0003372237700000261
的特征图,将针对
Figure BDA0003372237700000262
输出的所有特征图构成的集合记为
Figure BDA0003372237700000263
第一残差块的输出端针对YHR,1输出64幅宽度为
Figure BDA0003372237700000264
且高度为
Figure BDA0003372237700000265
的特征图,将针对YHR,1输出的所有特征图构成的集合记为YHR,2;第二残差块的输入端并行接收三个输入,分别为
Figure BDA0003372237700000266
中的所有特征图、
Figure BDA0003372237700000267
中的所有特征图和YHR,2中的所有特征图,第二残差块的输出端针对
Figure BDA0003372237700000268
输出64幅宽度为
Figure BDA0003372237700000269
且高度为
Figure BDA00033722377000002610
的特征图,将针对
Figure BDA00033722377000002611
输出的所有特征图构成的集合记为
Figure BDA00033722377000002612
第二残差块的输出端针对
Figure BDA00033722377000002613
输出64幅宽度为
Figure BDA00033722377000002614
且高度为
Figure BDA00033722377000002615
的特征图,将针对
Figure BDA00033722377000002616
输出的所有特征图构成的集合记为
Figure BDA00033722377000002617
第二残差块的输出端针对YHR,2输出64幅宽度为
Figure BDA00033722377000002618
且高度为
Figure BDA00033722377000002619
的特征图,将针对YHR,2输出的所有特征图构成的集合记为YHR,3;其中,
Figure BDA00033722377000002620
为通过对空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR进行现有的双三次插值上采样后得到的图像重组的宽度为αsW×V且高度为αsH×U的子孔径图像阵列,
Figure BDA00033722377000002621
为通过对IHR先进行双三次插值下采样、后进行双三次插值上采样得到,αs表示空间分辨率采样因子,在本实施例中αs取值为2,αs 3=α,双三次插值上采样的上采样因子和双三次插值下采样的下采样因子均取值为αs,第一卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64,第二卷积层的卷积核的尺寸为3×3、卷积步长为2、输入通道数为64、输出通道数为64,第一卷积层和第二卷积层采用的激活函数均为“ReLU”。For the encoder, it consists of a first convolutional layer, a second convolutional layer, a first residual block, and a second residual block connected in sequence. The input of the first convolutional layer receives three inputs in parallel, which are A single-channel image LLR of a low spatial resolution light field image with a spatial resolution of W×H and an angular resolution of V×U The width of the image reconstruction obtained by up-sampling the spatial resolution is α s W×V and the sub-aperture image array with height α s H×U, denoted as
Figure BDA0003372237700000251
A single-channel image of a blurred 2D high-resolution image of width αsW and height αsH , denoted as
Figure BDA0003372237700000252
and a single-channel image of a 2D high-resolution image of width α s W and height α s H, denoted as I HR , the output of the first convolutional layer is for
Figure BDA0003372237700000253
Output 64 feature maps with width α s W×V and height α s H×U, which will be used for
Figure BDA0003372237700000254
The set of all output feature maps is denoted as
Figure BDA0003372237700000255
The output of the first convolutional layer is for
Figure BDA0003372237700000256
Output 64 feature maps with width α s W and height α s H, which will be used for
Figure BDA0003372237700000257
The set of all output feature maps is denoted as
Figure BDA0003372237700000258
The output end of the first convolutional layer outputs 64 feature maps with a width of α s W and a height of α s H for I HR , and the set formed by all feature maps output for I HR is denoted as Y HR,0 ; the second The input of the convolutional layer receives three inputs in parallel, which are
Figure BDA0003372237700000259
All feature maps in ,
Figure BDA00033722377000002510
All feature maps in and all feature maps in Y HR,0 , the output of the second convolutional layer for
Figure BDA00033722377000002511
The output 64 width is
Figure BDA00033722377000002512
and the height is
Figure BDA00033722377000002513
The feature map of , will be for
Figure BDA00033722377000002514
The set of all output feature maps is denoted as
Figure BDA00033722377000002515
The output of the second convolutional layer is for
Figure BDA00033722377000002516
The output 64 width is
Figure BDA00033722377000002517
and the height is
Figure BDA00033722377000002518
The feature map of , will be for
Figure BDA00033722377000002519
The set of all output feature maps is denoted as
Figure BDA00033722377000002520
The output of the second convolutional layer outputs 64 widths for Y HR, 0
Figure BDA00033722377000002521
and the height is
Figure BDA00033722377000002522
The feature map of Y HR,0 will be marked as Y HR,1 for the set formed by all the feature maps output; the input end of the first residual block receives three inputs in parallel, which are respectively
Figure BDA00033722377000002523
All feature maps in ,
Figure BDA00033722377000002524
All feature maps in and all feature maps in Y HR,1 , the output of the first residual block for
Figure BDA00033722377000002525
The output 64 width is
Figure BDA00033722377000002526
and the height is
Figure BDA00033722377000002527
The feature map of , will be for
Figure BDA00033722377000002528
The set of all output feature maps is denoted as
Figure BDA00033722377000002529
The output of the first residual block is for
Figure BDA00033722377000002530
The output 64 width is
Figure BDA00033722377000002531
and the height is
Figure BDA0003372237700000261
The feature map of , will be for
Figure BDA0003372237700000262
The set of all output feature maps is denoted as
Figure BDA0003372237700000263
The output of the first residual block is for Y HR, 1 outputs 64 widths of
Figure BDA0003372237700000264
and the height is
Figure BDA0003372237700000265
The feature map of , the set formed by all feature maps output for Y HR,1 is denoted as Y HR,2 ; the input of the second residual block receives three inputs in parallel, which are respectively
Figure BDA0003372237700000266
All feature maps in ,
Figure BDA0003372237700000267
All feature maps in and all feature maps in Y HR,2 , the output of the second residual block for
Figure BDA0003372237700000268
The output 64 width is
Figure BDA0003372237700000269
and the height is
Figure BDA00033722377000002610
The feature map of , will be for
Figure BDA00033722377000002611
The set of all output feature maps is denoted as
Figure BDA00033722377000002612
The output of the second residual block is for
Figure BDA00033722377000002613
The output 64 width is
Figure BDA00033722377000002614
and the height is
Figure BDA00033722377000002615
The feature map of , will be for
Figure BDA00033722377000002616
The set of all output feature maps is denoted as
Figure BDA00033722377000002617
The output end of the second residual block is for Y HR, 2 outputs 64 widths of
Figure BDA00033722377000002618
and the height is
Figure BDA00033722377000002619
The feature map of , the set formed by all feature maps output for Y HR,2 is denoted as Y HR,3 ; wherein,
Figure BDA00033722377000002620
is the width of the image reconstruction obtained by upsampling the existing bicubic interpolation on the single-channel image LLR of the low spatial resolution light field image with spatial resolution W×H and angular resolution V×U. a subaperture image array of s W × V and height α s H × U,
Figure BDA00033722377000002621
In order to obtain by first performing bicubic interpolation downsampling on I HR , and then performing bicubic interpolation upsampling, α s represents the spatial resolution sampling factor, and in this embodiment, α s is 2, α s 3 =α, and α s 3 =α. The upsampling factor of the cubic interpolation upsampling and the downsampling factor of the bicubic interpolation downsampling are both α s , the size of the convolution kernel of the first convolution layer is 3×3, the convolution stride is 1, and the input channel The number is 1, the number of output channels is 64, the size of the convolution kernel of the second convolution layer is 3×3, the convolution stride is 2, the number of input channels is 64, the number of output channels is 64, and the first convolution layer is And the activation function used by the second convolutional layer is "ReLU".

对于孔径级特征配准模块,其输入端接收三类特征图,第一类是

Figure BDA00033722377000002622
中的所有特征图,第二类是
Figure BDA00033722377000002623
中的所有特征图,第三类包括四个输入,分别为YHR,0中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图、YHR,3中的所有特征图;在孔径级特征配准模块中,首先将
Figure BDA00033722377000002624
中的所有特征图、YHR,0中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图和YHR,3中的所有特征图各自复制V×U倍,使
Figure BDA00033722377000002625
中的所有特征图、YHR,1中的所有特征图、YHR,2中的所有特征图和YHR,3中的所有特征图的宽度变为
Figure BDA00033722377000002626
且高度变为
Figure BDA0003372237700000271
即使得尺寸与
Figure BDA0003372237700000272
中的特征图的尺寸相匹配,并使YHR,0中的所有特征图的宽度变为αsW×V且高度变为αsH×U,即使得尺寸与
Figure BDA0003372237700000273
中的特征图的尺寸相匹配;然后对
Figure BDA0003372237700000274
中的所有特征图和
Figure BDA0003372237700000275
中的所有特征图进行现有的块匹配,块匹配结束后得到一幅宽度为
Figure BDA0003372237700000276
且高度为
Figure BDA0003372237700000277
的坐标索引图,记为PCI;接着根据PCI,将YHR,1中的所有特征图与
Figure BDA0003372237700000278
中的所有特征图进行空间位置配准,得到64幅宽度为
Figure BDA0003372237700000279
且高度为
Figure BDA00033722377000002710
的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,1;同样,根据PCI,将YHR,2中的所有特征图与
Figure BDA00033722377000002711
中的所有特征图进行空间位置配准,得到64幅宽度为
Figure BDA00033722377000002712
且高度为
Figure BDA00033722377000002713
的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,2;根据PCI,将YHR,3中的所有特征图与
Figure BDA00033722377000002714
中的所有特征图进行空间位置配准,得到64幅宽度为
Figure BDA00033722377000002715
且高度为
Figure BDA00033722377000002716
的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,3;再对PCI进行双三次插值上采样,得到一幅宽度为αsW×V且高度为αsH×U的坐标索引图,记为
Figure BDA00033722377000002717
最后根据
Figure BDA00033722377000002718
将YHR,0中的所有特征图与
Figure BDA00033722377000002719
中的所有特征图进行空间位置配准,得到64幅宽度为αsW×V且高度为αsH×U的配准特征图,将得到的所有配准特征图构成的集合记为FAlign,0;孔径级特征配准模块的输出端输出FAlign,0中的所有特征图、FAlign,1中的所有特征图、FAlign,2中的所有特征图和FAlign,3中的所有特征图;其中,用于块匹配的精度衡量指标为纹理和结构相似度指数,用于块匹配的块的尺寸为3×3,双三次插值上采样的上采样因子为αs;由于高层特征更紧凑地描述了图像在语义层面的相似性,同时抑制了不相关的纹理,因此这里是对
Figure BDA00033722377000002720
中的所有特征图和
Figure BDA00033722377000002721
中的所有特征图进行块匹配,得到的坐标索引图PCI反映了
Figure BDA00033722377000002722
中的特征图与
Figure BDA00033722377000002723
中的特征图之间的空间位置配准关系,另外,卷积操作不会改变特征图的空间位置信息,PCI也反映了
Figure BDA00033722377000002724
中的特征图与
Figure BDA00033722377000002725
中的特征图之间的空间位置配准关系,以及
Figure BDA00033722377000002726
中的特征图与
Figure BDA00033722377000002727
中的特征图之间的空间位置配准关系,经双三次插值上采样后得到的
Figure BDA0003372237700000281
反映了
Figure BDA0003372237700000282
中的特征图与
Figure BDA0003372237700000283
中的特征图之间的空间位置配准关系。For the aperture-level feature registration module, its input receives three types of feature maps. The first type is
Figure BDA00033722377000002622
All feature maps in , the second class is
Figure BDA00033722377000002623
All feature maps in Y HR, the third category consists of four inputs, namely all feature maps in Y HR,0 , all feature maps in Y HR,1 , all feature maps in Y HR,2 , Y HR,3 All feature maps in ; in the aperture-level feature registration module, first
Figure BDA00033722377000002624
All feature maps in Y HR, all feature maps in 0 , Y HR, all feature maps in 1 , Y HR, all feature maps in 2 , and all feature maps in Y HR, 3 each replicate V × U times, make
Figure BDA00033722377000002625
The widths of all feature maps in Y HR, 1 , Y HR, 2 , and Y HR, 3 become
Figure BDA00033722377000002626
and the height becomes
Figure BDA0003372237700000271
even if the size is the same as
Figure BDA0003372237700000272
match the dimensions of the feature maps in Y HR,0 and make the width of all feature maps in Y HR,0 become α s W×V and the height become α s H×U, i.e. make the size equal to
Figure BDA0003372237700000273
to match the dimensions of the feature maps in ; then
Figure BDA0003372237700000274
All feature maps in and
Figure BDA0003372237700000275
All feature maps in
Figure BDA0003372237700000276
and the height is
Figure BDA0003372237700000277
The coordinate index map of , denoted as PCI ; then according to PCI , all feature maps in
Figure BDA0003372237700000278
All feature maps in
Figure BDA0003372237700000279
and the height is
Figure BDA00033722377000002710
The registration feature map of the
Figure BDA00033722377000002711
All feature maps in
Figure BDA00033722377000002712
and the height is
Figure BDA00033722377000002713
The registration feature map of the
Figure BDA00033722377000002714
All feature maps in
Figure BDA00033722377000002715
and the height is
Figure BDA00033722377000002716
The registration feature map of , and the set of all the obtained registration feature maps is denoted as F Align,3 ; then perform bicubic interpolation and upsampling on PCI to obtain a width of α s W × V and height of α s The coordinate index map of H×U, denoted as
Figure BDA00033722377000002717
Finally according to
Figure BDA00033722377000002718
Combine all feature maps in Y HR,0 with
Figure BDA00033722377000002719
All the feature maps in the spatial position are registered, and 64 registered feature maps with a width of α s W×V and a height of α s H×U are obtained, and the set of all the obtained registration feature maps is denoted as F Align ,0 ; the output of the aperture-level feature registration module outputs all feature maps in F Align,0 , all feature maps in F Align,1 , all feature maps in F Align,2 , and all feature maps in F Align,3 Feature map; among them, the accuracy measurement index used for block matching is texture and structure similarity index, the size of the block used for block matching is 3×3, and the upsampling factor of bicubic interpolation is α s ; due to high-level features More compactly describes the similarity of images at the semantic level while suppressing irrelevant textures, so here is the
Figure BDA00033722377000002720
All feature maps in and
Figure BDA00033722377000002721
All feature maps in
Figure BDA00033722377000002722
The feature maps in and
Figure BDA00033722377000002723
In addition, the convolution operation does not change the spatial position information of the feature maps, and the PCI also reflects
Figure BDA00033722377000002724
The feature maps in and
Figure BDA00033722377000002725
The spatial location registration relationship between the feature maps in , and
Figure BDA00033722377000002726
The feature maps in and
Figure BDA00033722377000002727
The spatial position registration relationship between the feature maps in , obtained after upsampling by bicubic interpolation
Figure BDA0003372237700000281
Reflects
Figure BDA0003372237700000282
The feature maps in and
Figure BDA0003372237700000283
The spatial location registration relationship between the feature maps in .

对于浅层特征提取层,其由1个第五卷积层组成,第五卷积层的输入端接收一幅空间分辨率为W×H且角度分辨率为V×U的低空间分辨率光场图像的单通道图像LLR重组的宽度为W×V且高度为H×U的子孔径图像阵列,第五卷积层的输出端输出64幅宽度为W×V且高度为H×U的特征图,将输出的所有特征图构成的集合记为FLR;其中,第五卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为1、输出通道数为64,第五卷积层采用的激活函数为“ReLU”。For the shallow feature extraction layer, it consists of a fifth convolutional layer. The input of the fifth convolutional layer receives a low spatial resolution light with a spatial resolution of W×H and an angular resolution of V×U. The single-channel image LLR recombination of the field image is a sub-aperture image array with a width of W×V and a height of H×U, and the output of the fifth convolutional layer outputs 64 images with a width of W×V and a height of H×U. Feature map, the set of all output feature maps is denoted as F LR ; among them, the size of the convolution kernel of the fifth convolution layer is 3×3, the convolution step size is 1, the number of input channels is 1, and the output channel is 1. The number is 64, and the activation function used by the fifth convolutional layer is "ReLU".

对于光场特征增强模块,如图3a所示,其由依次连接的第一增强残差块、第二增强残差块和第三增强残差块组成,第一增强残差块的输入端接收FAlign,1中的所有特征图和FLR中的所有特征图,在αs取值为2时W×V等价于

Figure BDA0003372237700000284
H×U等价于
Figure BDA0003372237700000285
即FLR中的特征图的尺寸与FAlign,1中的特征图的尺寸相同,第一增强残差块的输出端输出64幅宽度为
Figure BDA0003372237700000286
且高度为
Figure BDA0003372237700000287
的特征图,将输出的所有特征图构成的集合记为FEn,1;第二增强残差块的输入端接收FAlign,2中的所有特征图和FEn,1中的所有特征图,第二增强残差块的输出端输出64幅宽度为
Figure BDA0003372237700000288
且高度为
Figure BDA0003372237700000289
的特征图,将输出的所有特征图构成的集合记为FEn,2;第三增强残差块的输入端接收FAlign,3中的所有特征图和FEn,2中的所有特征图,第三增强残差块的输出端输出64幅宽度为
Figure BDA00033722377000002810
且高度为
Figure BDA00033722377000002811
的特征图,将输出的所有特征图构成的集合记为FEn,3。For the light field feature enhancement module, as shown in Figure 3a, it consists of a first enhanced residual block, a second enhanced residual block and a third enhanced residual block connected in sequence. The input of the first enhanced residual block receives All feature maps in F Align,1 and all feature maps in F LR , when α s is 2, W×V is equivalent to
Figure BDA0003372237700000284
H×U is equivalent to
Figure BDA0003372237700000285
That is, the size of the feature map in F LR is the same as the size of the feature map in F Align,1 , and the output of the first enhanced residual block outputs 64 widths of
Figure BDA0003372237700000286
and the height is
Figure BDA0003372237700000287
The feature map of the output, the set composed of all the output feature maps is denoted as F En,1 ; the input end of the second enhanced residual block receives all the feature maps in F Align,2 and all the feature maps in F En,1 , The output end of the second enhanced residual block outputs 64 widths of
Figure BDA0003372237700000288
and the height is
Figure BDA0003372237700000289
The feature map of the output, the set of all the output feature maps is denoted as F En,2 ; the input end of the third enhanced residual block receives all the feature maps in F Align,3 and all the feature maps in F En,2 , The output end of the third enhanced residual block outputs 64 widths of
Figure BDA00033722377000002810
and the height is
Figure BDA00033722377000002811
The feature map of , denote the set of all output feature maps as F En,3 .

对于空间注意力块,其由依次连接的第六卷积层和第七卷积层组成,第六卷积层的输入端接收FAlign,0中的所有特征图,第六卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的空间注意力特征图,将输出的所有空间注意力特征图构成的集合记为FSA1;第七卷积层的输入端接收FSA1中的所有空间注意力特征图,第七卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的空间注意力特征图,将输出的所有空间注意力特征图构成的集合记为FSA2;将FAlign,0中的所有特征图与FSA2中的所有空间注意力特征图进行逐元素相乘,将得到的所有特征图构成的集合记为FWA,0;将FWA,0中的所有特征图作为空间注意力块的输出端输出的所有特征图;其中,第六卷积层和第七卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64,第六卷积层采用的激活函数为“ReLU”,第七卷积层采用的激活函数为“Sigmoid”。For the spatial attention block, it consists of the sixth convolutional layer and the seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives all the feature maps in F Align,0 , and the output of the sixth convolutional layer The terminal outputs 64 spatial attention feature maps with a width of α s W×V and a height of α s H×U, and the set of all output spatial attention feature maps is denoted as F SA1 ; the input of the seventh convolutional layer The terminal receives all the spatial attention feature maps in F SA1 , and the output terminal of the seventh convolutional layer outputs 64 spatial attention feature maps with a width of α s W×V and a height of α s H×U. The set composed of spatial attention feature maps is denoted as F SA2 ; all feature maps in F Align,0 are multiplied element by element with all spatial attention feature maps in F SA2 , and the set composed of all the obtained feature maps is denoted. is F WA,0 ; all feature maps in F WA,0 are used as all feature maps output by the output of the spatial attention block; wherein, the size of the convolution kernels of the sixth convolutional layer and the seventh convolutional layer are all It is 3 × 3, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 64. The activation function used in the sixth convolution layer is "ReLU", and the activation function used in the seventh convolution layer. for "Sigmoid".

对于解码器,其由依次连接的第三残差块、第四残差块、子像素卷积层、第八卷积层和第九卷积层组成,第三残差块的输入端接收FEn,3中的所有特征图,第三残差块的输出端输出64幅宽度为

Figure BDA0003372237700000291
且高度为
Figure BDA0003372237700000292
的特征图,将输出的所有特征图构成的集合记为FDec,1;第四残差块的输入端接收FDec,1中的所有特征图,第四残差块的输出端输出64幅宽度为
Figure BDA0003372237700000293
且高度为
Figure BDA0003372237700000294
的特征图,将输出的所有特征图构成的集合记为FDec,2;子像素卷积层的输入端接收FDec,2中的所有特征图,子像素卷积层的输出端输出256幅宽度为
Figure BDA0003372237700000295
且高度为
Figure BDA0003372237700000296
的特征图,并将256幅宽度为
Figure BDA0003372237700000297
且高度为
Figure BDA0003372237700000298
的特征图进一步转换为64幅宽度为αsW×V且高度为αsH×U的特征图,将转换后的所有特征图构成的集合记为FDec,3;第八卷积层的输入端接收FDec,3中的所有特征图与FWA,0中的所有特征图进行逐元素相加后的结果,第八卷积层的输出端输出64幅宽度为αsW×V且高度为αsH×U的特征图,将输出的所有特征图构成的集合记为FDec,4;第九卷积层的输入端接收FDec,4中的所有特征图,第九卷积层的输出端输出一幅宽度为αsW×V且高度为αsH×U的重建单通道光场图像,并将该幅宽度为αsW×V且高度为αsH×U的重建单通道光场图像重组为空间分辨率为αsW×αsH且角度分辨率为V×U的高空间分辨率单通道光场图像,记为LSR;其中,子像素卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为256,第八卷积层的卷积核的尺寸为3×3、卷积步长为1、输入通道数为64、输出通道数为64,第九卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为1,子像素卷积层和第八卷积层采用的激活函数均为“ReLU”,第九卷积层不采用激活函数。For the decoder, it consists of the third residual block, the fourth residual block, the sub-pixel convolutional layer, the eighth convolutional layer and the ninth convolutional layer connected in sequence, and the input of the third residual block receives F For all feature maps in En,3 , the output of the third residual block outputs 64 widths of
Figure BDA0003372237700000291
and the height is
Figure BDA0003372237700000292
The feature map of the output, the set formed by all the output feature maps is denoted as F Dec,1 ; the input end of the fourth residual block receives all the feature maps in F Dec,1 , and the output end of the fourth residual block outputs 64 width is
Figure BDA0003372237700000293
and the height is
Figure BDA0003372237700000294
The feature map of the output is marked as F Dec,2 ; the input end of the subpixel convolution layer receives all the feature maps in F Dec,2 , and the output end of the subpixel convolution layer outputs 256 width is
Figure BDA0003372237700000295
and the height is
Figure BDA0003372237700000296
feature map of , and set the width of 256 as
Figure BDA0003372237700000297
and the height is
Figure BDA0003372237700000298
The feature maps are further converted into 64 feature maps with a width of α s W×V and a height of α s H×U, and the set of all the converted feature maps is denoted as F Dec,3 ; The input terminal receives the result of element-wise addition of all feature maps in F Dec,3 and all feature maps in F WA,0 , and the output terminal of the eighth convolutional layer outputs 64 images with a width of α s W×V and A feature map with a height of α s H×U, denote the set of all output feature maps as F Dec,4 ; the input of the ninth convolutional layer receives all the feature maps in F Dec,4 , and the ninth convolution The output of the layer outputs a reconstructed single-channel light field image with width α s W×V and height α s H×U, and converts the image with width α s W×V and height α s H×U. The reconstructed single-channel light field image is reorganized into a high spatial resolution single-channel light field image with a spatial resolution of α s W×α s H and an angular resolution of V×U, denoted as L SR ; among them, the sub-pixel convolution layer The size of the convolution kernel is 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 256, and the size of the convolution kernel of the eighth convolution layer is 3×3, the convolution stride is 1 is 1, the number of input channels is 64, the number of output channels is 64, the size of the convolution kernel of the ninth convolutional layer is 1×1, the convolution stride is 1, the number of input channels is 64, and the number of output channels is 1, The activation functions used in the sub-pixel convolutional layer and the eighth convolutional layer are all "ReLU", and the ninth convolutional layer does not use an activation function.

步骤3:将训练集中的每幅低空间分辨率光场图像、对应的2D高分辨率图像、对应的参考高空间分辨率光场图像进行颜色空间转换,即从RGB颜色空间转换到YCbCr颜色空间,并提取出Y通道图像;然后将每幅低空间分辨率光场图像的Y通道图像重组为宽度为W×V且高度为H×U的子孔径图像阵列来表示;接着将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、对应的2D高分辨率图像的Y通道图像、对应的参考高空间分辨率光场图像的Y通道图像构成训练集;再构建金字塔网络,并利用训练集进行训练,具体过程为:Step 3: Convert each low spatial resolution light field image, corresponding 2D high resolution image, and corresponding reference high spatial resolution light field image in the training set to color space, that is, convert from RGB color space to YCbCr color space , and extract the Y-channel image; then recombine the Y-channel image of each low spatial resolution light field image into a sub-aperture image array with a width of W×V and a height of H×U to represent; The sub-aperture image array of the Y-channel image reconstruction of the low-spatial-resolution light field image, the Y-channel image of the corresponding 2D high-resolution image, and the Y-channel image of the corresponding reference high-spatial-resolution light field image constitute the training set; Pyramid network, and use the training set for training, the specific process is:

步骤3_1:如图4所示,将构建好的空间超分辨率网络复制三次,并进行级联,每个空间超分辨率网络的权重共享,即参数全都一样,将三个空间超分辨率网络构成的整体网络定义为金字塔网络;在每个金字塔水平,空间超分辨率网络的重建尺度设置为与αs取值相同,αs取值为2时即将光场图像的空间分辨率提高2倍,因此最终的重建尺度可达到8,即α=αs 3=8。Step 3_1: As shown in Figure 4, the constructed spatial super-resolution network is copied three times and cascaded. The weights of each spatial super-resolution network are shared, that is, the parameters are all the same. The overall network formed is defined as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be the same as the value of α s . When the value of α s is 2, the spatial resolution of the light field image is increased by 2 times. , so the final reconstruction scale can reach 8, that is, α=α s 3 =8.

步骤3_2:对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行两次空间分辨率下采样,将下采样后得到的图像作为标签图像;对训练集中的每幅2D高分辨率图像的Y通道图像也进行两次同样的空间分辨率下采样,将下采样后得到的图像作为针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第一个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第一个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同。Step 3_2: Perform two spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the rate image is also down-sampled twice with the same spatial resolution, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the first spatial super-resolution network in the pyramid network; The sub-aperture image array of the Y-channel image reconstruction of all low-spatial-resolution light-field images in the training set, and the image reconstruction of the Y-channel images of all low-spatial-resolution light-field images in the training set after a spatial resolution upsampling. Subaperture image array, all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the first spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after one spatial resolution downsampling and one spatial resolution upsampling is input into the first spatial super-resolution network in the constructed pyramid network for training, and the training is obtained. The high spatial resolution Y channel light field image is reconstructed by α s times corresponding to the Y channel image of each low spatial resolution light field image in the collection; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic The scales of interpolation, spatial resolution upsampling and spatial resolution downsampling are all the same as α s .

步骤3_3:对训练集中的每幅参考高空间分辨率光场图像的Y通道图像进行单次空间分辨率下采样,将下采样后得到的图像作为标签图像;对训练集中的每幅2D高分辨率图像的Y通道图像也进行单次同样的空间分辨率下采样,将下采样后得到的图像作为针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第二个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第二个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同。Step 3_3: Perform a single spatial resolution downsampling on the Y channel image of each reference high spatial resolution light field image in the training set, and use the image obtained after downsampling as a label image; The Y-channel image of the high-speed image is also down-sampled at the same spatial resolution once, and the image obtained after down-sampling is used as a 2D high-resolution Y-channel image for the second spatial super-resolution network in the pyramid network; α s times corresponding to Y-channel images of all low-spatial-resolution light-field images in the training set Reconstructed sub-aperture image arrays of high-spatial-resolution Y-channel light-field images, Y of all low-spatial-resolution light-field images in the training set The α s -fold reconstruction of the high spatial resolution Y channel light field image corresponding to the channel image is the sub-aperture image array of the image reconstruction obtained after one spatial resolution upsampling, and all the images for the second spatial super-resolution network in the pyramid network. The 2D high-resolution Y-channel image and all 2D high-resolution Y-channel images for the second spatial super-resolution network in the pyramid network are obtained after one spatial resolution downsampling and one spatial resolution upsampling. The 2D high-resolution Y-channel image is input into the second spatial super-resolution network in the constructed pyramid network for training, and the α s 2 corresponding to the Y-channel image of each low-spatial-resolution light field image in the training set is obtained. The Y-channel light field image with high spatial resolution is reconstructed by 2 times; among them, the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, and the scales of spatial resolution upsampling and spatial resolution downsampling are the same as α s The value is the same.

步骤3_4:将训练集中的每幅参考高空间分辨率光场图像的Y通道图像作为标签图像;将训练集中的每幅2D高分辨率图像的Y通道图像作为针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像;然后将训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像重组的子孔径图像阵列、训练集中的所有低空间分辨率光场图像的Y通道图像对应的αs 2倍重建高空间分辨率Y通道光场图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像和所有针对金字塔网络中的第三个空间超分辨率网络的2D高分辨率Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到构建好的金字塔网络中的第三个空间超分辨率网络中进行训练,得到训练集中的每幅低空间分辨率光场图像的Y通道图像对应的αs 3倍重建高空间分辨率Y通道光场图像;其中,空间分辨率上采样和空间分辨率下采样的方式均为双三次插值,空间分辨率上采样和空间分辨率下采样的尺度均与αs取值相同。Step 3_4: Use the Y-channel image of each reference high-spatial-resolution light field image in the training set as the label image; use the Y-channel image of each 2D high-resolution image in the training set as the third spatial image in the pyramid network. 2D high-resolution Y-channel images of the super-resolution network; then αs 2 times the corresponding Y-channel images of all low-spatial-resolution light-field images in the training set reconstructed sub-apertures of the high-spatial-resolution Y-channel light-field images Image array, sub-aperture image of image reconstruction obtained by upsampling of high spatial resolution Y channel light field images corresponding to α s 2 times corresponding to Y channel images of all low spatial resolution light field images in the training set array, all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images for the third spatial super-resolution network in the pyramid network The blurred 2D high-resolution Y-channel image obtained after resolution downsampling and one spatial resolution upsampling is input into the third spatial super-resolution network in the constructed pyramid network for training, and each image in the training set is obtained. The high spatial resolution Y channel light field image is reconstructed by 3 times the α s corresponding to the Y channel image of the low spatial resolution light field image; the methods of spatial resolution upsampling and spatial resolution downsampling are both bicubic interpolation, The scale of spatial resolution upsampling and spatial resolution downsampling is the same as α s .

在训练结束后得到金字塔网络中的各空间超分辨率网络中的所有卷积核的最佳权重参数,即得到训练有素的空间超分辨率网络模型;该网络模型在每个金字塔水平实现特定的超分辨率重建尺度,因而可在一次前向推断中输出多尺度超分辨率结果(即在αs取值为2时尺度为2×、4×和8×);此外,通过对各金字塔水平下的空间超分辨率网络进行权重共享,可有效减少网络参数量并降低训练负担。After the training, the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network are obtained, that is, a well-trained spatial super-resolution network model is obtained; the network model achieves specific The super-resolution reconstruction scale of the The horizontal spatial super-resolution network performs weight sharing, which can effectively reduce the amount of network parameters and reduce the training burden.

步骤4:任意选取一幅彩色三通道的低空间分辨率光场图像和对应的一幅彩色三通道的2D高分辨率图像作为测试图像;然后将彩色三通道的低空间分辨率光场图像和对应的彩色三通道的2D高分辨率图像从RGB颜色空间转换到YCbCr颜色空间,并提取出Y通道图像;接着将低空间分辨率光场图像的Y通道图像重组为子孔径图像阵列来表示;再将低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列、低空间分辨率光场图像的Y通道图像经一次空间分辨率上采样后得到的图像重组的子孔径图像阵列、2D高分辨率图像的Y通道图像和2D高分辨率图像的Y通道图像经一次空间分辨率下采样和一次空间分辨率上采样后得到的模糊后的2D高分辨率Y通道图像输入到空间超分辨率网络模型中,测试得到低空间分辨率光场图像的Y通道图像对应的重建高空间分辨率Y通道光场图像;之后对低空间分辨率光场图像的Cb通道图像和Cr通道图像分别进行双三次插值上采样,得到低空间分辨率光场图像的Cb通道图像对应的重建高空间分辨率Cb通道光场图像和低空间分辨率光场图像的Cr通道图像对应的重建高空间分辨率Cr通道光场图像;最后将得到的重建高空间分辨率Y通道光场图像、重建高空间分辨率Cb通道光场图像和重建高空间分辨率Cr通道光场图像在颜色通道维度上进行级联,并将级联结果重新转换到RGB颜色空间,得到低空间分辨率光场图像对应的彩色三通道的重建高空间分辨率光场图像。Step 4: Arbitrarily select a low spatial resolution light field image of three color channels and a corresponding 2D high resolution image of three color channels as test images; then use the low spatial resolution light field image of three color channels and The 2D high-resolution image of the corresponding color three-channel is converted from the RGB color space to the YCbCr color space, and the Y-channel image is extracted; then the Y-channel image of the low-spatial-resolution light field image is recombined into a sub-aperture image array to represent; The sub-aperture image array obtained by recombining the Y-channel image of the low-spatial-resolution light field image, the sub-aperture image array obtained by upsampling the Y-channel image of the low-spatial-resolution light field image for one time, and the 2D The Y-channel image of the high-resolution image and the Y-channel image of the 2D high-resolution image are subjected to one spatial resolution down-sampling and one spatial resolution up-sampling. The blurred 2D high-resolution Y-channel image is input to the spatial super-resolution In the rate network model, the reconstructed high-spatial-resolution Y-channel light-field image corresponding to the Y-channel image of the low-spatial-resolution light-field image was obtained by testing; Bicubic interpolation and upsampling to obtain the reconstructed high spatial resolution Cb channel light field image corresponding to the Cb channel image of the low spatial resolution light field image and the reconstructed high spatial resolution Cr corresponding to the Cr channel image of the low spatial resolution light field image channel light field image; finally, the obtained reconstructed high spatial resolution Y channel light field image, reconstructed high spatial resolution Cb channel light field image and reconstructed high spatial resolution Cr channel light field image are cascaded in the color channel dimension, The concatenated result is re-converted to the RGB color space, and the reconstructed high-spatial-resolution light-field image with three color channels corresponding to the low-spatial-resolution light-field image is obtained.

在本实施例中,步骤2中,第一残差块、第二残差块、第三残差块和第四残差块的结构相同,其均由依次连接的第三卷积层和第四卷积层组成,第一残差块中的第三卷积层的输入端并行接收三个输入,分别为

Figure BDA0003372237700000321
中的所有特征图、
Figure BDA0003372237700000322
中的所有特征图和YHR,1中的所有特征图,第一残差块中的第三卷积层的输出端针对
Figure BDA0003372237700000323
输出64幅宽度为
Figure BDA0003372237700000324
且高度为
Figure BDA0003372237700000325
的特征图,将针对
Figure BDA0003372237700000326
输出的所有特征图构成的集合记为
Figure BDA0003372237700000327
第一残差块中的第三卷积层的输出端针对
Figure BDA0003372237700000328
输出64幅宽度为
Figure BDA0003372237700000329
且高度为
Figure BDA0003372237700000331
的特征图,将针对
Figure BDA0003372237700000332
输出的所有特征图构成的集合记为
Figure BDA0003372237700000333
第一残差块中的第三卷积层的输出端针对YHR,1输出64幅宽度为
Figure BDA0003372237700000334
且高度为
Figure BDA0003372237700000335
的特征图,将针对YHR,1输出的所有特征图构成的集合记为
Figure BDA0003372237700000336
第一残差块中的第四卷积层的输入端并行接收三个输入,分别为
Figure BDA0003372237700000337
中的所有特征图、
Figure BDA0003372237700000338
中的所有特征图和
Figure BDA0003372237700000339
中的所有特征图,第一残差块中的第四卷积层的输出端针对
Figure BDA00033722377000003310
输出64幅宽度为
Figure BDA00033722377000003311
且高度为
Figure BDA00033722377000003312
的特征图,将针对
Figure BDA00033722377000003313
输出的所有特征图构成的集合记为
Figure BDA00033722377000003314
第一残差块中的第四卷积层的输出端针对
Figure BDA00033722377000003315
输出64幅宽度为
Figure BDA00033722377000003316
且高度为
Figure BDA00033722377000003317
的特征图,将针对
Figure BDA00033722377000003318
输出的所有特征图构成的集合记为
Figure BDA00033722377000003319
第一残差块中的第四卷积层的输出端针对
Figure BDA00033722377000003320
输出64幅宽度为
Figure BDA00033722377000003321
且高度为
Figure BDA00033722377000003322
的特征图,将针对
Figure BDA00033722377000003323
输出的所有特征图构成的集合记为
Figure BDA00033722377000003324
Figure BDA00033722377000003325
中的所有特征图与
Figure BDA00033722377000003326
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对
Figure BDA00033722377000003327
输出的所有特征图,这些特征图构成的集合即为
Figure BDA00033722377000003328
Figure BDA00033722377000003329
中的所有特征图与
Figure BDA00033722377000003330
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对
Figure BDA00033722377000003331
输出的所有特征图,这些特征图构成的集合即为
Figure BDA00033722377000003332
将YHR,1中的所有特征图与
Figure BDA00033722377000003333
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一残差块的输出端针对YHR,1输出的所有特征图,这些特征图构成的集合即为YHR,2。In this embodiment, in step 2, the structures of the first residual block, the second residual block, the third residual block and the fourth residual block are the same, and they are all composed of the third convolutional layer and the fourth residual block connected in sequence. It consists of four convolutional layers, and the input of the third convolutional layer in the first residual block receives three inputs in parallel, which are
Figure BDA0003372237700000321
All feature maps in ,
Figure BDA0003372237700000322
All feature maps in and all feature maps in Y HR,1 , the output of the third convolutional layer in the first residual block for
Figure BDA0003372237700000323
The output 64 width is
Figure BDA0003372237700000324
and the height is
Figure BDA0003372237700000325
The feature map of , will be for
Figure BDA0003372237700000326
The set of all output feature maps is denoted as
Figure BDA0003372237700000327
The output of the third convolutional layer in the first residual block is for
Figure BDA0003372237700000328
The output 64 width is
Figure BDA0003372237700000329
and the height is
Figure BDA0003372237700000331
The feature map of , will be for
Figure BDA0003372237700000332
The set of all output feature maps is denoted as
Figure BDA0003372237700000333
The output of the third convolutional layer in the first residual block is for Y HR, 1 outputs 64 widths of
Figure BDA0003372237700000334
and the height is
Figure BDA0003372237700000335
The feature map of , the set of all feature maps output for Y HR,1 is recorded as
Figure BDA0003372237700000336
The input of the fourth convolutional layer in the first residual block receives three inputs in parallel, which are
Figure BDA0003372237700000337
All feature maps in ,
Figure BDA0003372237700000338
All feature maps in and
Figure BDA0003372237700000339
All feature maps in , the output of the fourth convolutional layer in the first residual block is for
Figure BDA00033722377000003310
The output 64 width is
Figure BDA00033722377000003311
and the height is
Figure BDA00033722377000003312
The feature map of , will be for
Figure BDA00033722377000003313
The set of all output feature maps is denoted as
Figure BDA00033722377000003314
The output of the fourth convolutional layer in the first residual block is for
Figure BDA00033722377000003315
The output 64 width is
Figure BDA00033722377000003316
and the height is
Figure BDA00033722377000003317
The feature map of , will be for
Figure BDA00033722377000003318
The set of all output feature maps is denoted as
Figure BDA00033722377000003319
The output of the fourth convolutional layer in the first residual block is for
Figure BDA00033722377000003320
The output 64 width is
Figure BDA00033722377000003321
and the height is
Figure BDA00033722377000003322
The feature map of , will be for
Figure BDA00033722377000003323
The set of all output feature maps is denoted as
Figure BDA00033722377000003324
Will
Figure BDA00033722377000003325
All feature maps in
Figure BDA00033722377000003326
All feature maps in the
Figure BDA00033722377000003327
All output feature maps, the set of these feature maps is
Figure BDA00033722377000003328
Will
Figure BDA00033722377000003329
All feature maps in
Figure BDA00033722377000003330
All feature maps in the
Figure BDA00033722377000003331
All output feature maps, the set of these feature maps is
Figure BDA00033722377000003332
Combine all feature maps in Y HR,1 with
Figure BDA00033722377000003333
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first residual block for Y HR,1 , and the set formed by these feature maps is Y HR,2 .

第二残差块中的第三卷积层的输入端并行接收三个输入,分别为

Figure BDA00033722377000003334
中的所有特征图、
Figure BDA00033722377000003335
中的所有特征图和YHR,2中的所有特征图,第二残差块中的第三卷积层的输出端针对
Figure BDA00033722377000003336
输出64幅宽度为
Figure BDA00033722377000003337
且高度为
Figure BDA00033722377000003338
的特征图,将针对
Figure BDA00033722377000003339
输出的所有特征图构成的集合记为
Figure BDA00033722377000003340
第二残差块中的第三卷积层的输出端针对
Figure BDA00033722377000003341
输出64幅宽度为
Figure BDA00033722377000003342
且高度为
Figure BDA00033722377000003343
的特征图,将针对
Figure BDA00033722377000003344
输出的所有特征图构成的集合记为
Figure BDA0003372237700000341
第二残差块中的第三卷积层的输出端针对YHR,2输出64幅宽度为
Figure BDA0003372237700000342
且高度为
Figure BDA0003372237700000343
的特征图,将针对YHR,2输出的所有特征图构成的集合记为
Figure BDA0003372237700000344
第二残差块中的第四卷积层的输入端并行接收三个输入,分别为
Figure BDA0003372237700000345
中的所有特征图、
Figure BDA0003372237700000346
中的所有特征图和
Figure BDA0003372237700000347
中的所有特征图,第二残差块中的第四卷积层的输出端针对
Figure BDA0003372237700000348
输出64幅宽度为
Figure BDA0003372237700000349
且高度为
Figure BDA00033722377000003410
的特征图,将针对
Figure BDA00033722377000003411
输出的所有特征图构成的集合记为
Figure BDA00033722377000003412
第二残差块中的第四卷积层的输出端针对
Figure BDA00033722377000003413
输出64幅宽度为
Figure BDA00033722377000003414
且高度为
Figure BDA00033722377000003415
的特征图,将针对
Figure BDA00033722377000003416
输出的所有特征图构成的集合记为
Figure BDA00033722377000003417
第二残差块中的第四卷积层的输出端针对
Figure BDA00033722377000003418
输出64幅宽度为
Figure BDA00033722377000003419
且高度为
Figure BDA00033722377000003420
的特征图,将针对
Figure BDA00033722377000003421
输出的所有特征图构成的集合记为
Figure BDA00033722377000003422
Figure BDA00033722377000003423
中的所有特征图与
Figure BDA00033722377000003424
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对
Figure BDA00033722377000003425
输出的所有特征图,这些特征图构成的集合即为
Figure BDA00033722377000003426
Figure BDA00033722377000003427
中的所有特征图与
Figure BDA00033722377000003428
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对
Figure BDA00033722377000003429
输出的所有特征图,这些特征图构成的集合即为
Figure BDA00033722377000003430
将YHR,2中的所有特征图与
Figure BDA00033722377000003431
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二残差块的输出端针对YHR,2输出的所有特征图,这些特征图构成的集合即为YHR,3。The input of the third convolutional layer in the second residual block receives three inputs in parallel, which are
Figure BDA00033722377000003334
All feature maps in ,
Figure BDA00033722377000003335
All feature maps in and all feature maps in Y HR,2 , the output of the third convolutional layer in the second residual block for
Figure BDA00033722377000003336
The output 64 width is
Figure BDA00033722377000003337
and the height is
Figure BDA00033722377000003338
The feature map of , will be for
Figure BDA00033722377000003339
The set of all output feature maps is denoted as
Figure BDA00033722377000003340
The output of the third convolutional layer in the second residual block is for
Figure BDA00033722377000003341
The output 64 width is
Figure BDA00033722377000003342
and the height is
Figure BDA00033722377000003343
The feature map of , will be for
Figure BDA00033722377000003344
The set of all output feature maps is denoted as
Figure BDA0003372237700000341
The output of the third convolutional layer in the second residual block is for Y HR, 2 outputs 64 widths of
Figure BDA0003372237700000342
and the height is
Figure BDA0003372237700000343
The feature map of , the set of all feature maps output for Y HR,2 is recorded as
Figure BDA0003372237700000344
The input of the fourth convolutional layer in the second residual block receives three inputs in parallel, which are
Figure BDA0003372237700000345
All feature maps in ,
Figure BDA0003372237700000346
All feature maps in and
Figure BDA0003372237700000347
All feature maps in , the output of the fourth convolutional layer in the second residual block is for
Figure BDA0003372237700000348
The output 64 width is
Figure BDA0003372237700000349
and the height is
Figure BDA00033722377000003410
The feature map of , will be for
Figure BDA00033722377000003411
The set of all output feature maps is denoted as
Figure BDA00033722377000003412
The output of the fourth convolutional layer in the second residual block is for
Figure BDA00033722377000003413
The output 64 width is
Figure BDA00033722377000003414
and the height is
Figure BDA00033722377000003415
The feature map of , will be for
Figure BDA00033722377000003416
The set of all output feature maps is denoted as
Figure BDA00033722377000003417
The output of the fourth convolutional layer in the second residual block is for
Figure BDA00033722377000003418
The output 64 width is
Figure BDA00033722377000003419
and the height is
Figure BDA00033722377000003420
The feature map of , will be for
Figure BDA00033722377000003421
The set of all output feature maps is denoted as
Figure BDA00033722377000003422
Will
Figure BDA00033722377000003423
All feature maps in
Figure BDA00033722377000003424
All feature maps in the
Figure BDA00033722377000003425
All output feature maps, the set of these feature maps is
Figure BDA00033722377000003426
Will
Figure BDA00033722377000003427
All feature maps in
Figure BDA00033722377000003428
All feature maps in the
Figure BDA00033722377000003429
All output feature maps, the set of these feature maps is
Figure BDA00033722377000003430
Combine all feature maps in Y HR,2 with
Figure BDA00033722377000003431
All feature maps in are added element by element, and all the obtained feature maps are used as the output of the second residual block for all feature maps output by Y HR,2 , and the set formed by these feature maps is Y HR,3 .

第三残差块中的第三卷积层的输入端接收FEn,3中的所有特征图,第三残差块中的第三卷积层的输出端输出64幅宽度为

Figure BDA00033722377000003432
且高度为
Figure BDA00033722377000003433
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003434
第三残差块中的第四卷积层的输入端接收
Figure BDA00033722377000003435
中的所有特征图,第三残差块中的第四卷积层的输出端输出64幅宽度为
Figure BDA00033722377000003436
且高度为
Figure BDA00033722377000003437
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003438
将FEn,3中的所有特征图与
Figure BDA0003372237700000351
中的所有特征图进行逐元素相加,将得到的所有特征图作为第三残差块的输出端输出的所有特征图,这些特征图构成的集合即为FDec,1。The input terminal of the third convolutional layer in the third residual block receives all the feature maps in F En,3 , and the output terminal of the third convolutional layer in the third residual block outputs 64 images with a width of
Figure BDA00033722377000003432
and the height is
Figure BDA00033722377000003433
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003434
The input of the fourth convolutional layer in the third residual block receives
Figure BDA00033722377000003435
All feature maps in
Figure BDA00033722377000003436
and the height is
Figure BDA00033722377000003437
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003438
Combine all feature maps in F En,3 with
Figure BDA0003372237700000351
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the third residual block, and the set formed by these feature maps is F Dec,1 .

第四残差块中的第三卷积层的输入端接收FDec,1中的所有特征图,第四残差块中的第三卷积层的输出端输出64幅宽度为

Figure BDA0003372237700000352
且高度为
Figure BDA0003372237700000353
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000354
第四残差块中的第四卷积层的输入端接收
Figure BDA0003372237700000355
中的所有特征图,第四残差块中的第四卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000356
且高度为
Figure BDA0003372237700000357
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000358
将FDec,1中的所有特征图与
Figure BDA0003372237700000359
中的所有特征图进行逐元素相加,将得到的所有特征图作为第四残差块的输出端输出的所有特征图,这些特征图构成的集合即为FDec,2。The input terminal of the third convolutional layer in the fourth residual block receives all the feature maps in F Dec,1 , and the output terminal of the third convolutional layer in the fourth residual block outputs 64 images with a width of
Figure BDA0003372237700000352
and the height is
Figure BDA0003372237700000353
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000354
The input of the fourth convolutional layer in the fourth residual block receives
Figure BDA0003372237700000355
All feature maps in , the output of the fourth convolutional layer in the fourth residual block outputs 64 width
Figure BDA0003372237700000356
and the height is
Figure BDA0003372237700000357
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000358
Combine all feature maps in F Dec,1 with
Figure BDA0003372237700000359
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the fourth residual block, and the set formed by these feature maps is F Dec,2 .

上述,第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层和第四卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64,第一残差块、第二残差块、第三残差块和第四残差块各自中的第三卷积层采用的激活函数为“ReLU”、第四卷积层不采用激活函数。As mentioned above, the size of the convolution kernels of the third convolution layer and the fourth convolution layer in the first residual block, the second residual block, the third residual block and the fourth residual block are all 3×3 , the convolution step size is 1, the number of input channels is 64, the number of output channels is 64, the first residual block, the second residual block, the third residual block and the fourth residual block The activation function used in the third convolutional layer is "ReLU", and the fourth convolutional layer does not use an activation function.

在本实施例中,步骤2中,如图3a、图3b、图3c和图3d所示,第一增强残差块、第二增强残差块和第三增强残差块的结构相同,其均由依次连接的第一空间特征变换层、第一空间角度卷积层、第二空间特征变换层、第二空间角度卷积层和通道注意力层组成,第一空间特征变换层和第二空间特征变换层的结构相同,其均由并行的第十卷积层和第十一卷积层组成,第一空间角度卷积层和第二空间角度卷积层的结构相同,其均由依次连接的第十二卷积层和第十三卷积层组成,通道注意力层由依次连接的全局均值池化层、第十四卷积层和第十五卷积层组成。In this embodiment, in step 2, as shown in Fig. 3a, Fig. 3b, Fig. 3c, and Fig. 3d, the first enhanced residual block, the second enhanced residual block and the third enhanced residual block have the same structure. They are all composed of the first spatial feature transformation layer, the first spatial angle convolution layer, the second spatial feature transformation layer, the second spatial angle convolution layer and the channel attention layer, which are connected in sequence. The structure of the spatial feature transformation layer is the same, which consists of the tenth convolution layer and the eleventh convolution layer in parallel. The twelfth convolutional layer and the thirteenth convolutional layer are connected, and the channel attention layer is composed of the global mean pooling layer, the fourteenth convolutional layer and the fifteenth convolutional layer connected in sequence.

第一增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA00033722377000003510
且高度为
Figure BDA00033722377000003511
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003512
第一增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000361
且高度为
Figure BDA0003372237700000362
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000363
第一增强残差块中的第一空间特征变换层的输入端接收FLR中的所有特征图,将FLR中的所有特征图与
Figure BDA0003372237700000364
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA0003372237700000365
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA0003372237700000366
The input end of the tenth convolutional layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of
Figure BDA00033722377000003510
and the height is
Figure BDA00033722377000003511
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003512
The input of the eleventh convolution layer in the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the first spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA0003372237700000361
and the height is
Figure BDA0003372237700000362
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000363
The input end of the first spatial feature transformation layer in the first enhanced residual block receives all the feature maps in the FLR , and compares all the feature maps in the FLR with
Figure BDA0003372237700000364
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA0003372237700000365
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the first enhanced residual block, and the set formed by these feature maps is denoted as
Figure BDA0003372237700000366

第一增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA0003372237700000367
中的所有特征图,第一增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000368
且高度为
Figure BDA0003372237700000369
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003610
Figure BDA00033722377000003611
中的所有特征图进行从空间维转换到角度维的重组操作(重组操作是光场图像的常规处理手段,重组操作仅改变特征图中每个特征值的排列次序,不改变特征值的大小),第一增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA00033722377000003612
中的所有特征图的重组操作结果,第一增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA00033722377000003613
且高度为
Figure BDA00033722377000003614
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003615
Figure BDA00033722377000003616
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第一增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000003617
The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receives
Figure BDA0003372237700000367
All feature maps in
Figure BDA0003372237700000368
and the height is
Figure BDA0003372237700000369
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003610
right
Figure BDA00033722377000003611
All feature maps in the image are reorganized from the spatial dimension to the angle dimension (the reorganization operation is a conventional processing method for light field images. The reorganization operation only changes the arrangement order of each eigenvalue in the feature map, and does not change the size of the eigenvalues) , the input of the thirteenth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receives
Figure BDA00033722377000003612
The result of the reorganization operation of all feature maps in
Figure BDA00033722377000003613
and the height is
Figure BDA00033722377000003614
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003615
right
Figure BDA00033722377000003616
All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolutional layer in the first enhanced residual block, Denote the set of these feature maps as
Figure BDA00033722377000003617

第一增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA00033722377000003618
且高度为
Figure BDA00033722377000003619
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003620
第一增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,1中的所有特征图,第一增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA00033722377000003621
且高度为
Figure BDA00033722377000003622
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003623
第一增强残差块中的第二空间特征变换层的输入端接收
Figure BDA00033722377000003624
中的所有特征图,将
Figure BDA0003372237700000371
中的所有特征图与
Figure BDA0003372237700000372
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA0003372237700000373
中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA0003372237700000374
The input end of the tenth convolutional layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of
Figure BDA00033722377000003618
and the height is
Figure BDA00033722377000003619
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003620
The input of the eleventh convolution layer in the second spatial feature transformation layer in the first enhanced residual block receives all the feature maps in F Align,1 , and the second spatial feature transformation layer in the first enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA00033722377000003621
and the height is
Figure BDA00033722377000003622
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003623
The input of the second spatial feature transform layer in the first enhanced residual block receives
Figure BDA00033722377000003624
All feature maps in , will
Figure BDA0003372237700000371
All feature maps in
Figure BDA0003372237700000372
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA0003372237700000373
All feature maps in the
Figure BDA0003372237700000374

第一增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA0003372237700000375
中的所有特征图,第一增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000376
且高度为
Figure BDA0003372237700000377
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000378
Figure BDA0003372237700000379
中的所有特征图进行从空间维转换到角度维的重组操作,第一增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA00033722377000003710
中的所有特征图的重组操作结果,第一增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA00033722377000003711
且高度为
Figure BDA00033722377000003712
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003713
Figure BDA00033722377000003714
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第一增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000003715
The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the first enhanced residual block receives
Figure BDA0003372237700000375
For all feature maps in the first enhanced residual block, the output of the twelfth convolutional layer of the second spatial angle convolutional layer in the first enhanced residual block outputs 64 images with a width of
Figure BDA0003372237700000376
and the height is
Figure BDA0003372237700000377
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000378
right
Figure BDA0003372237700000379
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the first enhanced residual block receives
Figure BDA00033722377000003710
As a result of the reorganization operation of all feature maps in
Figure BDA00033722377000003711
and the height is
Figure BDA00033722377000003712
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003713
right
Figure BDA00033722377000003714
All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the first enhanced residual block, Denote the set of these feature maps as
Figure BDA00033722377000003715

第一增强残差块中的通道注意力层中的全局均值池化层的输入端接收

Figure BDA00033722377000003716
中的所有特征图,第一增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为
Figure BDA00033722377000003717
且高度为
Figure BDA00033722377000003718
的特征图,将输出的所有特征图构成的集合记为FGAP,1,FGAP,1中的每幅特征图中的所有特征值相同(全局均值池化层是独立对输入端接收的每幅特征图计算全局均值,进而可将一幅特征图转换为单个特征值,然后将得到的特征值进行复制以恢复空间尺寸,即将单个特征值复制
Figure BDA00033722377000003719
倍,得到宽度为
Figure BDA00033722377000003720
且高度为
Figure BDA00033722377000003721
的特征图);第一增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,1中的所有特征图,第一增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为
Figure BDA00033722377000003722
且高度为
Figure BDA00033722377000003723
的特征图,将输出的所有特征图构成的集合记为FDS,1;第一增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,1中的所有特征图,第一增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000381
且高度为
Figure BDA0003372237700000382
的特征图,将输出的所有特征图构成的集合记为FUS,1;将FUS,1中的所有特征图与
Figure BDA0003372237700000383
中的所有特征图进行逐元素相乘,将得到的所有特征图作为第一增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,1。The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives
Figure BDA00033722377000003716
All feature maps in , the output of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 widths of
Figure BDA00033722377000003717
and the height is
Figure BDA00033722377000003718
The feature map of , denote the set of all output feature maps as F GAP,1 , and all feature values in each feature map in F GAP,1 are the same (the global mean pooling layer is an independent process for each input received by the input. The global mean is calculated from the feature maps, and then a feature map can be converted into a single feature value, and then the obtained feature value is copied to restore the spatial size, that is, the single feature value is copied
Figure BDA00033722377000003719
times, resulting in a width of
Figure BDA00033722377000003720
and the height is
Figure BDA00033722377000003721
The input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives all the feature maps in F GAP,1 , the channel attention in the first enhanced residual block The output of the fourteenth convolutional layer in the layer outputs 4 widths of
Figure BDA00033722377000003722
and the height is
Figure BDA00033722377000003723
The feature map of the output, the set of all output feature maps is denoted as F DS,1 ; the input end of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives F DS,1 in For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 images with a width of
Figure BDA0003372237700000381
and the height is
Figure BDA0003372237700000382
The feature map of the
Figure BDA0003372237700000383
Perform element-wise multiplication of all feature maps in , and take all the feature maps obtained as all feature maps output by the output of the channel attention layer in the first enhanced residual block, and denote the set of these feature maps as F CA ,1 .

将FCA,1中的所有特征图与FLR中的所有特征图进行逐元素相加,将得到的所有特征图作为第一增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,1Add all feature maps in F CA,1 and all feature maps in F LR element by element, and use all the feature maps obtained as all feature maps output by the output of the first enhanced residual block. These feature maps constitute The set of is F En,1 .

第二增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA0003372237700000384
且高度为
Figure BDA0003372237700000385
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000386
第二增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000387
且高度为
Figure BDA0003372237700000388
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000389
第二增强残差块中的第一空间特征变换层的接收端接收FEn,1中的所有特征图,将FEn,1中的所有特征图与
Figure BDA00033722377000003810
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA00033722377000003811
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000003812
The input terminal of the tenth convolutional layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of
Figure BDA0003372237700000384
and the height is
Figure BDA0003372237700000385
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000386
The input of the eleventh convolution layer in the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the first spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA0003372237700000387
and the height is
Figure BDA0003372237700000388
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000389
The receiving end of the first spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F En, 1 , and compares all the feature maps in F En,1 with
Figure BDA00033722377000003810
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA00033722377000003811
All feature maps in the
Figure BDA00033722377000003812

第二增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA00033722377000003813
中的所有特征图,第二增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA00033722377000003814
且高度为
Figure BDA00033722377000003815
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003816
Figure BDA00033722377000003817
中的所有特征图进行从空间维转换到角度维的重组操作,第二增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA00033722377000003818
中的所有特征图的重组操作结果,第二增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000391
且高度为
Figure BDA0003372237700000392
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000393
Figure BDA0003372237700000394
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第二增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA0003372237700000395
The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the second enhanced residual block receives
Figure BDA00033722377000003813
For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 images with a width of
Figure BDA00033722377000003814
and the height is
Figure BDA00033722377000003815
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003816
right
Figure BDA00033722377000003817
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the second enhanced residual block receives
Figure BDA00033722377000003818
The result of the reorganization operation of all feature maps in
Figure BDA0003372237700000391
and the height is
Figure BDA0003372237700000392
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000393
right
Figure BDA0003372237700000394
All feature maps in the reorganization operation from the angle dimension to the space dimension are reorganized, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as
Figure BDA0003372237700000395

第二增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA0003372237700000396
且高度为
Figure BDA0003372237700000397
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000398
第二增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,2中的所有特征图,第二增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000399
且高度为
Figure BDA00033722377000003910
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003911
第二增强残差块中的第二空间特征变换层的接收端接收
Figure BDA00033722377000003912
中的所有特征图,将
Figure BDA00033722377000003913
中的所有特征图与
Figure BDA00033722377000003914
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA00033722377000003915
中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000003916
The input end of the tenth convolutional layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps. The output of the tenth convolutional layer outputs 64 widths of
Figure BDA0003372237700000396
and the height is
Figure BDA0003372237700000397
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000398
The input of the eleventh convolution layer in the second spatial feature transformation layer in the second enhanced residual block receives all the feature maps in F Align,2 , and the second spatial feature transformation layer in the second enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA0003372237700000399
and the height is
Figure BDA00033722377000003910
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003911
The receiving end of the second spatial feature transform layer in the second enhanced residual block receives
Figure BDA00033722377000003912
All feature maps in , will
Figure BDA00033722377000003913
All feature maps in
Figure BDA00033722377000003914
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA00033722377000003915
All feature maps in the
Figure BDA00033722377000003916

第二增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA00033722377000003917
中的所有特征图,第二增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA00033722377000003918
且高度为
Figure BDA00033722377000003919
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003920
Figure BDA00033722377000003921
中的所有特征图进行从空间维转换到角度维的重组操作,第二增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA00033722377000003922
中的所有特征图的重组操作结果,第二增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA00033722377000003923
且高度为
Figure BDA00033722377000003924
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000003925
Figure BDA00033722377000003926
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第二增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA0003372237700000401
The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the second enhanced residual block receives
Figure BDA00033722377000003917
For all feature maps in the second enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 widths of
Figure BDA00033722377000003918
and the height is
Figure BDA00033722377000003919
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003920
right
Figure BDA00033722377000003921
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the second enhanced residual block receives
Figure BDA00033722377000003922
The result of the reorganization operation of all feature maps in
Figure BDA00033722377000003923
and the height is
Figure BDA00033722377000003924
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000003925
right
Figure BDA00033722377000003926
All feature maps in are subjected to a reorganization operation from the angle dimension to the space dimension, and all the feature maps obtained after the reorganization operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the second enhanced residual block, Denote the set of these feature maps as
Figure BDA0003372237700000401

第二增强残差块中的通道注意力层中的全局均值池化层的输入端接收

Figure BDA0003372237700000402
中的所有特征图,第二增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为
Figure BDA0003372237700000403
且高度为
Figure BDA0003372237700000404
的特征图,将输出的所有特征图构成的集合记为FGAP,2,FGAP,2中的每幅特征图中的所有特征值相同;第二增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,2中的所有特征图,第二增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为
Figure BDA0003372237700000405
且高度为
Figure BDA0003372237700000406
的特征图,将输出的所有特征图构成的集合记为FDS,2;第二增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,2中的所有特征图,第二增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000407
且高度为
Figure BDA0003372237700000408
的特征图,将输出的所有特征图构成的集合记为FUS,2;将FUS,2中的所有特征图与
Figure BDA0003372237700000409
中的所有特征图进行逐元素相乘,将得到的所有特征图作为第二增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,2。The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives
Figure BDA0003372237700000402
All feature maps in , the output of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 widths of
Figure BDA0003372237700000403
and the height is
Figure BDA0003372237700000404
The feature map of , denote the set of all output feature maps as F GAP,2 , all feature values in each feature map in F GAP,2 are the same; in the channel attention layer in the second enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,2 , and the output of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 4 widths of
Figure BDA0003372237700000405
and the height is
Figure BDA0003372237700000406
The feature map of the output, the set of all output feature maps is denoted as F DS,2 ; the input end of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives F DS,2 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 images with a width of
Figure BDA0003372237700000407
and the height is
Figure BDA0003372237700000408
The feature map of , denote the set of all output feature maps as F US,2 ; all feature maps in F US,2 are combined with
Figure BDA0003372237700000409
Perform element-by-element multiplication of all feature maps in , and use all the feature maps obtained as all feature maps output by the output of the channel attention layer in the second enhanced residual block, and denote the set formed by these feature maps as F CA , 2 .

将FCA,2中的所有特征图与FEn,1中的所有特征图进行逐元素相加,将得到的所有特征图作为第二增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,2Add all feature maps in F CA,2 and all feature maps in F En,1 element-wise, and use all the resulting feature maps as all feature maps output by the output of the second enhanced residual block. These features The set of graphs is F En,2 .

第三增强残差块中的第一空间特征变换层中的第十卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第一空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA00033722377000004010
且高度为
Figure BDA00033722377000004011
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000004012
第三增强残差块中的第一空间特征变换层中的第十一卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第一空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA00033722377000004013
且高度为
Figure BDA00033722377000004014
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000004015
第三增强残差块中的第一空间特征变换层的接收端接收FEn,2中的所有特征图,将FEn,2中的所有特征图与
Figure BDA0003372237700000411
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA0003372237700000412
中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块中的第一空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA0003372237700000413
The input end of the tenth convolutional layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of
Figure BDA00033722377000004010
and the height is
Figure BDA00033722377000004011
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000004012
The input of the eleventh convolution layer in the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the first spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA00033722377000004013
and the height is
Figure BDA00033722377000004014
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000004015
The receiver of the first spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F En, 2 , and compares all the feature maps in F En,2 with
Figure BDA0003372237700000411
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA0003372237700000412
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the first spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as
Figure BDA0003372237700000413

第三增强残差块中的第一空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA0003372237700000414
中的所有特征图,第三增强残差块中的第一空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000415
且高度为
Figure BDA0003372237700000416
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000417
Figure BDA0003372237700000418
中的所有特征图进行从空间维转换到角度维的重组操作,第三增强残差块中的第一空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA0003372237700000419
中的所有特征图的重组操作结果,第三增强残差块中的第一空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA00033722377000004110
且高度为
Figure BDA00033722377000004111
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000004112
Figure BDA00033722377000004113
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第三增强残差块中的第一空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000004114
The input of the twelfth convolutional layer in the first spatial angle convolutional layer in the third enhanced residual block receives
Figure BDA0003372237700000414
All feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 images with a width of
Figure BDA0003372237700000415
and the height is
Figure BDA0003372237700000416
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000417
right
Figure BDA0003372237700000418
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the first spatial-angle convolutional layer in the third enhanced residual block receives
Figure BDA0003372237700000419
As a result of the reorganization operation of all feature maps in
Figure BDA00033722377000004110
and the height is
Figure BDA00033722377000004111
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000004112
right
Figure BDA00033722377000004113
All feature maps in the recombination operation from the angle dimension to the space dimension are performed, and all the feature maps obtained after the recombination operation are used as all the feature maps output by the output of the first spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as
Figure BDA00033722377000004114

第三增强残差块中的第二空间特征变换层中的第十卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第二空间特征变换层中的第十卷积层的输出端输出64幅宽度为

Figure BDA00033722377000004115
且高度为
Figure BDA00033722377000004116
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000004117
第三增强残差块中的第二空间特征变换层中的第十一卷积层的输入端接收FAlign,3中的所有特征图,第三增强残差块中的第二空间特征变换层中的第十一卷积层的输出端输出64幅宽度为
Figure BDA00033722377000004118
且高度为
Figure BDA00033722377000004119
的特征图,将输出的所有特征图构成的集合记为
Figure BDA00033722377000004120
第三增强残差块中的第二空间特征变换层的接收端接收
Figure BDA00033722377000004121
中的所有特征图,将
Figure BDA00033722377000004122
中的所有特征图与
Figure BDA00033722377000004123
中的所有特征图进行逐元素相乘,再将相乘结果与
Figure BDA00033722377000004124
中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块中的第二空间特征变换层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000004125
The input end of the tenth convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the tenth convolutional layer outputs 64 widths of
Figure BDA00033722377000004115
and the height is
Figure BDA00033722377000004116
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000004117
The input of the eleventh convolution layer in the second spatial feature transformation layer in the third enhanced residual block receives all the feature maps in F Align,3 , and the second spatial feature transformation layer in the third enhanced residual block The output of the eleventh convolutional layer in the output 64 width is
Figure BDA00033722377000004118
and the height is
Figure BDA00033722377000004119
The feature map of , denote the set of all output feature maps as
Figure BDA00033722377000004120
The receiving end of the second spatial feature transformation layer in the third enhanced residual block receives
Figure BDA00033722377000004121
All feature maps in , will
Figure BDA00033722377000004122
All feature maps in
Figure BDA00033722377000004123
All feature maps in are multiplied element-wise, and the multiplication result is
Figure BDA00033722377000004124
All feature maps in are added element by element, and all the obtained feature maps are used as all feature maps output by the output of the second spatial feature transformation layer in the third enhanced residual block, and the set formed by these feature maps is denoted as
Figure BDA00033722377000004125

第三增强残差块中的第二空间角度卷积层中的第十二卷积层的输入端接收

Figure BDA0003372237700000421
中的所有特征图,第三增强残差块中的第二空间角度卷积层的第十二卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000422
且高度为
Figure BDA0003372237700000423
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000424
Figure BDA0003372237700000425
中的所有特征图进行从空间维转换到角度维的重组操作,第三增强残差块中的第二空间角度卷积层中的第十三卷积层的输入端接收
Figure BDA0003372237700000426
中的所有特征图的重组操作结果,第三增强残差块中的第二空间角度卷积层的第十三卷积层的输出端输出64幅宽度为
Figure BDA0003372237700000427
且高度为
Figure BDA0003372237700000428
的特征图,将输出的所有特征图构成的集合记为
Figure BDA0003372237700000429
Figure BDA00033722377000004210
中的所有特征图进行从角度维到空间维的重组操作,将重组操作后得到的所有特征图作为第三增强残差块中的第二空间角度卷积层的输出端输出的所有特征图,将这些特征图构成的集合记为
Figure BDA00033722377000004211
The input of the twelfth convolutional layer in the second spatial angle convolutional layer in the third enhanced residual block receives
Figure BDA0003372237700000421
For all feature maps in the third enhanced residual block, the output end of the twelfth convolutional layer of the second spatial angle convolutional layer outputs 64 images with a width of
Figure BDA0003372237700000422
and the height is
Figure BDA0003372237700000423
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000424
right
Figure BDA0003372237700000425
All feature maps in are subjected to a reorganization operation converted from spatial dimension to angular dimension, and the input of the thirteenth convolutional layer in the second spatial-angle convolutional layer in the third enhanced residual block receives
Figure BDA0003372237700000426
As a result of the reorganization operation of all feature maps in
Figure BDA0003372237700000427
and the height is
Figure BDA0003372237700000428
The feature map of , denote the set of all output feature maps as
Figure BDA0003372237700000429
right
Figure BDA00033722377000004210
All feature maps in the recombination operation from the angle dimension to the space dimension are performed, and all the feature maps obtained after the recombination operation are used as all the feature maps output by the output of the second spatial angle convolution layer in the third enhanced residual block, Denote the set of these feature maps as
Figure BDA00033722377000004211

第三增强残差块中的通道注意力层中的全局均值池化层的输入端接收

Figure BDA00033722377000004212
中的所有特征图,第三增强残差块中的通道注意力层中的全局均值池化层的输出端输出64幅宽度为
Figure BDA00033722377000004213
且高度为
Figure BDA00033722377000004214
的特征图,将输出的所有特征图构成的集合记为FGAP,3,FGAP,3中的每幅特征图中的所有特征值相同;第三增强残差块中的通道注意力层中的第十四卷积层的输入端接收FGAP,3中的所有特征图,第三增强残差块中的通道注意力层中的第十四卷积层的输出端输出4幅宽度为
Figure BDA00033722377000004215
且高度为
Figure BDA00033722377000004216
的特征图,将输出的所有特征图构成的集合记为FDS,3;第三增强残差块中的通道注意力层中的第十五卷积层的输入端接收FDS,3中的所有特征图,第三增强残差块中的通道注意力层中的第十五卷积层的输出端输出64幅宽度为
Figure BDA00033722377000004217
且高度为
Figure BDA00033722377000004218
的特征图,将输出的所有特征图构成的集合记为FUS,3;将FUS,3中的所有特征图与
Figure BDA00033722377000004219
中的所有特征图进行逐元素相乘,将得到的所有特征图作为第三增强残差块中的通道注意力层的输出端输出的所有特征图,将这些特征图构成的集合记为FCA,3。The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives
Figure BDA00033722377000004212
All feature maps in , the output of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 widths of
Figure BDA00033722377000004213
and the height is
Figure BDA00033722377000004214
The feature map of , denote the set of all output feature maps as F GAP,3 , all feature values in each feature map in F GAP,3 are the same; in the channel attention layer in the third enhanced residual block The input of the fourteenth convolutional layer receives all the feature maps in F GAP,3 , and the output of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 4 widths of
Figure BDA00033722377000004215
and the height is
Figure BDA00033722377000004216
The feature map of the output, the set of all output feature maps is denoted as F DS,3 ; the input end of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives F DS,3 . For all feature maps, the output of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 images with a width of
Figure BDA00033722377000004217
and the height is
Figure BDA00033722377000004218
The feature map of the
Figure BDA00033722377000004219
All feature maps in are multiplied element by element, and all the feature maps obtained are used as all feature maps output by the output of the channel attention layer in the third enhanced residual block, and the set formed by these feature maps is denoted as F CA ,3 .

将FCA,3中的所有特征图与FEn,2中的所有特征图进行逐元素相加,将得到的所有特征图作为第三增强残差块的输出端输出的所有特征图,这些特征图构成的集合即为FEn,3Add all feature maps in F CA,3 and all feature maps in F En,2 element-wise, and use all the resulting feature maps as all feature maps output by the output of the third enhanced residual block. These features The set of graphs is F En,3 .

上述,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十卷积层和第十一卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、均不采用激活函数,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十二卷积层和第十三卷积层的卷积核的尺寸均为3×3、卷积步长均为1、输入通道数均为64、输出通道数均为64、采用的激活函数均为“ReLU”,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十四卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为64、输出通道数为4、采用的激活函数为“ReLU”,第一增强残差块、第二增强残差块和第三增强残差块各自中的第十五卷积层的卷积核的尺寸为1×1、卷积步长为1、输入通道数为4、输出通道数为64、采用的激活函数为“Sigmoid”。As mentioned above, the size of the convolution kernels of the tenth convolution layer and the eleventh convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block are all 3×3, volume The product step size is 1, the number of input channels is 64, the number of output channels is 64, and no activation function is used. The first enhanced residual block, the second enhanced residual block and the third enhanced residual block are each The size of the convolution kernel of the twelfth convolutional layer and the thirteenth convolutional layer is both 3×3, the convolution stride is 1, the number of input channels is 64, the number of output channels is 64, and the activation The functions are all "ReLU", the size of the convolution kernel of the fourteenth convolution layer in the first enhanced residual block, the second enhanced residual block and the third enhanced residual block is 1 × 1, the convolution step The length is 1, the number of input channels is 64, the number of output channels is 4, the activation function used is "ReLU", the tenth of the first enhanced residual block, the second enhanced residual block and the third enhanced residual block. The size of the convolution kernel of the five convolutional layers is 1 × 1, the convolution stride is 1, the number of input channels is 4, the number of output channels is 64, and the activation function used is "Sigmoid".

为进一步说明本发明方法的可行性和有效性,对本发明方法进行实验。In order to further illustrate the feasibility and effectiveness of the method of the present invention, experiments were carried out on the method of the present invention.

本发明方法采用PyTorch深度学习框架进行实现。训练和测试所采用的光场图像均来自现有的光场图像数据库,其包括真实世界场景和合成场景,这些光场图像数据库可在网上自由下载。为保证测试的可靠性与鲁棒性,随机选择200幅光场图像构成训练图像集合,另外选择70幅光场图像构成测试图像集合,其中,训练图像集合中的光场图像和测试图像集合中的光场图像互不交叉。训练图像集合和测试图像集合所用到的光场图像数据库的基本信息如表1所示,其中EPFL[1]、INRIA[2]、STFLytro[6]和Kalantari et al.[7]这4个光场图像数据库是采用Lytro光场相机拍摄得到的,因此所获取的光场图像属于窄基线光场数据;STFGantry[5]光场图像数据库是采用固定在龙门架上的传统相机进行移动拍摄得到,因此所获取的光场图像具有更大的基线范围,属于宽基线光场数据;HCI new[3]和HCIold[4]光场图像数据库中的光场图像属于人工合成的光场图像,也属于宽基线光场数据。The method of the present invention is implemented by using the PyTorch deep learning framework. The light field images used for training and testing come from existing light field image databases, including real-world scenes and synthetic scenes, which can be freely downloaded online. In order to ensure the reliability and robustness of the test, 200 light field images were randomly selected to form the training image set, and 70 light field images were selected to form the test image set. Among them, the light field images in the training image set and the test image set were The light field images do not cross each other. The basic information of the light field image database used in the training image set and the test image set is shown in Table 1, among which EPFL[1], INRIA[2], STFLytro[6] and Kalantari et al.[7] The field image database is obtained by using the Lytro light field camera, so the acquired light field image belongs to the narrow baseline light field data; the STFGantry[5] light field image database is obtained by moving the traditional camera fixed on the gantry, Therefore, the acquired light field image has a larger baseline range and belongs to the wide baseline light field data; the light field images in the HCI new[3] and HCIold[4] light field image databases belong to the artificially synthesized light field images and also belong to Wide baseline light field data.

表1训练图像集合和测试图像集合所用到的光场图像数据库的基本信息Table 1 Basic information of the light field image database used in the training image set and the test image set

Figure BDA0003372237700000431
Figure BDA0003372237700000431

Figure BDA0003372237700000441
Figure BDA0003372237700000441

训练图像集合和测试图像集合所用到的光场图像数据库对应的参考文献信息(或下载网址)如下:The reference information (or download URL) corresponding to the light field image database used in the training image set and the test image set is as follows:

[1]Rerabek M,Ebrahimi T.New light Field Image Dataset[C]//2016 EighthInternational Conference on Quality of Multimedia Experience(QoMEX).2016.(新的光场图像数据集[C]//第八届多媒体体验质量国际会议,2016.)[1]Rerabek M, Ebrahimi T.New light Field Image Dataset[C]//2016 EighthInternational Conference on Quality of Multimedia Experience(QoMEX).2016.(New Light Field Image Dataset[C]//The Eighth Multimedia Experience International Conference on Quality of Experience, 2016.)

[2]Pendu M L,Jiang X,Guillemot C.Light Field Inpainting Propagationvia Low Rank Matrix Completion[J].IEEE Transactions on Image Processing,2018,27(4):1981-1993.(通过低秩矩阵完成进行光场修复传播,IEEE图像处理汇刊,2018,27(4):1981-1993.)[2] Pendu M L, Jiang X, Guillemot C. Light Field Inpainting Propagationvia Low Rank Matrix Completion [J]. IEEE Transactions on Image Processing, 2018, 27(4): 1981-1993. (Light Field Inpainting Propagation via Low Rank Matrix Completion [J]. Repair Propagation, IEEE Transactions on Image Processing, 2018, 27(4):1981-1993.)

[3]Honauer K,Johannsen O,Kondermann D,et al.A Dataset and EvaluationMethodology for Depth Estimation on 4D Light Fields[C]//Asian Conference onComputer Vision,2016.(用于4D光场深度估计的一个数据集和评估方法[C]//亚洲计算机视觉会议,2016.)[3]Honauer K,Johannsen O,Kondermann D,et al.A Dataset and EvaluationMethodology for Depth Estimation on 4D Light Fields[C]//Asian Conference onComputer Vision,2016.(A dataset for 4D light field depth estimation and Evaluation Methods [C] // Asian Conference on Computer Vision, 2016.)

[4]Wanner S,Meister S,B Goldluecke.Datasets and Benchmarks forDensely Sampled 4D Light Fields[C]//International Symposium on VisionModeling and Visualization,2013.(密集采样4D光场的数据集和基准[C]//视觉建模和可视化国际研讨会,2013.)[4]Wanner S, Meister S, B Goldluecke.Datasets and Benchmarks for Densely Sampled 4D Light Fields[C]//International Symposium on VisionModeling and Visualization,2013.(Datasets and Benchmarks for Densely Sampled 4D Light Fields[C]// International Symposium on Visual Modeling and Visualization, 2013.)

[5]Vaish V,Adams A.The(New)Stanford Light Field Archive,ComputerGraphics Laboratory,Stanford University,2008.((新)斯坦福光场档案,计算机图形实验室,斯坦福大学,2008.)[5] Vaish V, Adams A. The (New) Stanford Light Field Archive, ComputerGraphics Laboratory, Stanford University, 2008. ((New) Stanford Light Field Archive, Computer Graphics Laboratory, Stanford University, 2008.)

[6]Raj A S,Lowney M,Shah R,Wetzstein G.Stanford Lytro Light FieldArchive,Available:http://lightfields.stanford.edu/index.html.(斯坦福Lytro光场档案,可得网址:http://lightfields.stanford.edu/index.html.)[6] Raj A S, Lowney M, Shah R, Wetzstein G. Stanford Lytro Light FieldArchive, Available: http://lightfields.stanford.edu/index.html. (Stanford Lytro Light Field Archive, available at http:/ /lightfields.stanford.edu/index.html.)

[7]Kalantari N K,Wang T C,Ramamoorthi R.Learning-Based View SynthesisFor Light Field Cameras[J].ACM Transactions on Graphics,2016,35(6):1-10.(用于光场相机的基于学习的视图合成[J].ACM图形汇刊,2016,35(6):1-10.)[7]Kalantari N K, Wang T C, Ramamoorthi R.Learning-Based View SynthesisFor Light Field Cameras[J].ACM Transactions on Graphics,2016,35(6):1-10.(Learning-Based View SynthesisFor Light Field Cameras View Synthesis [J]. ACM Graphic Transactions, 2016, 35(6): 1-10.)

将训练图像集合和测试图像集合中的光场图像分别重组为子孔径图像阵列;考虑到光场相机存在渐晕效应(表现为边界子孔径图像的视觉质量低),因此将用于训练和测试的光场图像的角度分辨率剪裁为9×9,即只取中心高质量的9×9视图;再从得到的角度分辨率为9×9的光场图像中取出中心的5×5视图以构成角度分辨率为5×5的光场图像,并利用双三次插值方法来对其进行空间分辨率下采样,下采样尺度为8,即将光场图像的空间分辨率降为原始光场图像的1/8,进而得到低空间分辨率光场图像;将原始的角度分辨率为5×5的光场图像作为参考高空间分辨率光场图像(即标签图像);之后从最初的9×9视图中(不包括中心的5×5视图)选取一幅子孔径图像,并保持其分辨率不变,作为2D高分辨率图像。因此,最终的训练集包括200幅角度分辨率为5×5的低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列,对应的200幅2D高分辨率图像的Y通道图像,以及对应的200幅参考高空间分辨率光场图像的Y通道图像;最终的测试集包括70幅角度分辨率为5×5的低空间分辨率光场图像的Y通道图像重组的子孔径图像阵列,对应的70幅2D高分辨率图像的Y通道图像,以及对应的70幅参考高空间分辨率光场图像,其中,70幅参考高空间分辨率光场图像不涉及网络的推断或测试,只用于后续的主观视觉比较和客观质量评价。Reorganize the light field images in the training image set and the test image set into sub-aperture image arrays respectively; considering the vignetting effect of the light field camera (which manifests as the low visual quality of the boundary sub-aperture images), it will be used for training and testing. The angular resolution of the light field image is cropped to 9 × 9, that is, only the high-quality 9 × 9 view in the center is taken; A light field image with an angular resolution of 5 × 5 is formed, and the bicubic interpolation method is used to downsample its spatial resolution, and the downsampling scale is 8, that is, the spatial resolution of the light field image is reduced to that of the original light field image. 1/8, and then obtain a low spatial resolution light field image; use the original light field image with an angular resolution of 5 × 5 as the reference high spatial resolution light field image (ie, the label image); then from the initial 9 × 9 A sub-aperture image is selected from the view (excluding the central 5 × 5 view), and its resolution is maintained as a 2D high-resolution image. Thus, the final training set consists of 200 Y-channel images of low spatial resolution light-field images with an angular resolution of 5 × 5, a reconstituted subaperture image array, the corresponding Y-channel images of 200 2D high-resolution images, and Corresponding 200 Y-channel images of reference high-spatial-resolution light field images; the final test set includes 70 sub-aperture image arrays recombined from Y-channel images of low-spatial-resolution light field images with an angular resolution of 5 × 5, The Y channel images of the corresponding 70 2D high-resolution images, and the corresponding 70 reference high-spatial-resolution light field images, of which the 70 reference high-spatial-resolution light field images do not involve network inference or testing, only use For subsequent subjective visual comparison and objective quality evaluation.

在训练所构建的空间超分辨率网络时,所有卷积核的参数采用MSRA初始化器进行初始化;损失函数选用像素域L1范数损失和梯度损失的组合;利用ADAM优化器训练网络;首先以10-4为学习率来训练空间超分辨率网络中的编码器和解码器两部分到一定程度收敛,之后再将学习率设置为10-4,来训练整个空间超分辨率网络,在训练完25个epochs后学习率以比例因子0.5进行衰减。When training the constructed spatial super-resolution network, the parameters of all convolution kernels are initialized with MSRA initializer; the loss function is a combination of pixel domain L1 norm loss and gradient loss; ADAM optimizer is used to train the network; -4 is the learning rate to train the encoder and decoder in the spatial super-resolution network to a certain degree of convergence, and then set the learning rate to 10 -4 to train the entire spatial super-resolution network. After training 25 The learning rate is decayed by a scaling factor of 0.5 after epochs.

为了说明本发明方法的性能,将本发明方法与现有的双三次插值方法、现有的六种图像超分辨率重建方法进行对比,分别为Haris等人提出的基于深度反投影网络的方法、Lai等人提出的基于深度拉普拉斯金字塔网络的方法、Yeung等人提出的基于空间-角度可分离卷积的方法、Wang等人提出的基于空间-角度交互网络的方法、Jin等人提出的基于两阶段网络的方法以及Boominathan等人提出的基于混合输入的方法,其中,Haris等人的方法和Lai等人的方法属于2D图像超分辨率重建方法(其独立地应用于光场图像的每幅子孔径图像),Yeung等人的方法、Wang等人的方法和Jin等人的方法属于普通的光场图像空间超分辨率重建方法,Boominathan等人的方法属于使用混合输入的光场图像空间超分辨率重建方法。In order to illustrate the performance of the method of the present invention, the method of the present invention is compared with the existing bicubic interpolation method and the existing six kinds of image super-resolution reconstruction methods, which are the method based on deep back projection network proposed by Haris et al. The method based on deep Laplacian pyramid network proposed by Lai et al., the method based on space-angle separable convolution proposed by Yeung et al., the method based on space-angle interaction network proposed by Wang et al., and the method proposed by Jin et al. The two-stage network-based method and the mixed-input-based method proposed by Boominathan et al., where Haris et al. and Lai et al. belong to 2D image super-resolution reconstruction methods (which are independently applied to the Each sub-aperture image), Yeung et al., Wang et al., and Jin et al. belong to general light-field image spatial super-resolution reconstruction methods, and Boominathan et al. Spatial super-resolution reconstruction methods.

在此,使用的客观质量评价指标包括PSNR(Peak Signal-to-Noise Ratio,峰值信噪比)、SSIM(Structural Similarity Index,结构相似性指数),以及一种先进的光场图像客观质量评价指标(参见Min X,Zhou J,Zhai G,et al.A Metric for Light FieldReconstruction,Compression,and Display Quality Evaluation[J].IEEETransactions on Image Processing,2020,29:3790-3804.(一个用于光场重建、压缩和显示质量评估的度量,IEEE图像处理汇刊,2020,29:3790-3804.)),其中,PSNR是从像素重建误差角度来评价超分辨率重建图像的客观质量,其值越高表示图像质量越好;SSIM是从视觉感知的角度来评价超分辨率重建图像的客观质量,其值在0~1之间,值越高表示图像质量越好;光场图像客观质量评价指标则是通过联合度量光场图像的空间质量(纹理和细节)和角度质量(视差结构)来有效评价超分辨率重建图像的客观质量,其值越高表示图像质量越好。Here, the objective quality evaluation indexes used include PSNR (Peak Signal-to-Noise Ratio, peak signal-to-noise ratio), SSIM (Structural Similarity Index, structural similarity index), and an advanced light field image objective quality evaluation index (See Min X, Zhou J, Zhai G, et al. A Metric for Light Field Reconstruction, Compression, and Display Quality Evaluation [J]. IEEE Transactions on Image Processing, 2020, 29:3790-3804. (A for Light Field Reconstruction , Metrics for Compression and Display Quality Evaluation, IEEE Transactions on Image Processing, 2020, 29: 3790-3804.)), where PSNR is the objective quality of super-resolution reconstructed images from the perspective of pixel reconstruction error, and the higher the value is Indicates that the image quality is better; SSIM evaluates the objective quality of the super-resolution reconstructed image from the perspective of visual perception, and its value is between 0 and 1. The higher the value, the better the image quality; the objective quality evaluation index of the light field image is It is to effectively evaluate the objective quality of super-resolution reconstructed images by jointly measuring the spatial quality (texture and details) and angular quality (parallax structure) of light field images. The higher the value, the better the image quality.

表2给出了采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在PSNR(dB)指标上的对比,表3给出了采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在SSIM指标上的对比,表4给出了采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在光场图像客观质量评价指标上的对比。从表2、表3和表4所列出的客观数据可以看出,相比于现有的光场图像空间超分辨率重建方法(包括2D图像超分辨率重建方法),本发明方法在所使用的三个客观质量评价指标上均取得了更高的质量分数,且明显高于所有对比方法,这表明本发明方法可以有效重建光场图像的纹理和细节信息,同时恢复较好的视差结构;特别地,对于具有不同基线范围和场景内容的光场图像数据库而言,本发明方法均取得了最好的超分辨率重建效果,这表明本发明方法可以很好地处理窄基线和宽基线光场数据,并且对场景内容具有很好的鲁棒性。Table 2 shows the comparison of the PSNR (dB) index between the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method. The comparison of the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method on the SSIM index, Table 4 provides the method of the present invention and the existing bicubic interpolation method and the existing light field. Comparison of image space super-resolution reconstruction methods on objective quality evaluation indicators of light field images. It can be seen from the objective data listed in Table 2, Table 3 and Table 4 that, compared with the existing light field image spatial super-resolution reconstruction methods (including 2D image super-resolution reconstruction methods), the method of the present invention has The three objective quality evaluation indicators used have achieved higher quality scores, which are significantly higher than all comparison methods, which indicates that the method of the present invention can effectively reconstruct the texture and detail information of the light field image, and at the same time restore a better parallax structure. ; In particular, for light field image databases with different baseline ranges and scene contents, the method of the present invention has achieved the best super-resolution reconstruction effect, which shows that the method of the present invention can handle narrow baselines and wide baselines well light field data and is robust to scene content.

表2采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在PSNR(dB)指标上的对比Table 2. Comparison of PSNR (dB) index between the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method

Figure BDA0003372237700000461
Figure BDA0003372237700000461

Figure BDA0003372237700000471
Figure BDA0003372237700000471

表3采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在SSIM指标上的对比Table 3. Comparison of the SSIM index between the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method

Figure BDA0003372237700000472
Figure BDA0003372237700000472

Figure BDA0003372237700000481
Figure BDA0003372237700000481

表4采用本发明方法与现有的双三次插值方法、现有的光场图像空间超分辨率重建方法在光场图像客观质量评价指标上的对比Table 4 Comparison of the objective quality evaluation index of light field images using the method of the present invention and the existing bicubic interpolation method and the existing light field image spatial super-resolution reconstruction method

Figure BDA0003372237700000482
Figure BDA0003372237700000482

Figure BDA0003372237700000491
Figure BDA0003372237700000491

图5a给出了采用双三次插值方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5b给出了采用Haris等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5c给出了采用Lai等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5d给出了采用Yeung等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5e给出了采用Wang等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5f给出了采用Jin等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5g给出了采用Boominathan等人的方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5h给出了采用本发明方法对测试的EPFL光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图5i给出了测试的EPFL光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示。Figure 5a shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the bicubic interpolation method. Show; Figure 5b shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Haris et al. Aperture image to display; Figure 5c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Lai et al. The center coordinates are taken here. Figure 5d shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Yeung et al., here The sub-aperture image at the center coordinate is taken to display; Figure 5e shows the reconstructed high spatial resolution light field obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Wang et al. The image, here is the sub-aperture image under the center coordinates to display; Figure 5f shows the reconstructed high spatial resolution obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Jin et al. The sub-aperture image at the center coordinate is taken here for display; Figure 5g shows the reconstruction obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of Boominathan et al. The high spatial resolution light field image is shown here by taking the sub-aperture image under the center coordinates; Figure 5h shows the low spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested EPFL light field image database using the method of the present invention. The high spatial resolution light field image is reconstructed, and the sub-aperture image under the center coordinate is taken here to display; Figure 5i shows the label high spatial resolution light corresponding to the low spatial resolution light field image in the tested EPFL light field image database. Field image, here the sub-aperture image under the center coordinates is displayed.

图6a给出了采用双三次插值方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6b给出了采用Haris等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6c给出了采用Lai等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6d给出了采用Yeung等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6e给出了采用Wang等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6f给出了采用Jin等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6g给出了采用Boominathan等人的方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6h给出了采用本发明方法对测试的STFLytro光场图像数据库中的低空间分辨率光场图像进行处理得到的重建高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示;图6i给出了测试的STFLytro光场图像数据库中的低空间分辨率光场图像对应的标签高空间分辨率光场图像,这里取中心坐标下的子孔径图像来展示。Figure 6a shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the bicubic interpolation method. Show; Figure 6b shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Haris et al. Aperture image to display; Figure 6c shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Lai et al. Figure 6d shows the reconstructed high spatial resolution light field image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Yeung et al., here The sub-aperture image in the center coordinate is taken to display; Figure 6e shows the reconstructed high spatial resolution light field obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Wang et al. The image, here is the sub-aperture image under the center coordinates to display; Figure 6f shows the reconstructed high spatial resolution obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of Jin et al. The low-spatial-resolution light-field image in the tested STFLytro light-field image database is obtained by processing the low-spatial-resolution light-field image in the STFLytro light-field image database using the method of Boominathan et al. The high spatial resolution light field image is shown here by taking the sub-aperture image under the center coordinates; Fig. 6h shows the image obtained by processing the low spatial resolution light field image in the tested STFLytro light field image database using the method of the present invention. The high spatial resolution light field image is reconstructed, and the sub-aperture image at the center coordinate is taken here to display; Figure 6i shows the label high spatial resolution light corresponding to the low spatial resolution light field image in the tested STFLytro light field image database. Field image, here the sub-aperture image under the center coordinates is displayed.

分别将图5a至图5h与图5i进行对比,以及将图6a至图6h与图6i进行对比,可以清楚地看到,利用现有的光场图像空间超分辨率重建方法,包括2D图像超分辨率重建方法,所重建的高空间分辨率光场图像无法恢复图像的纹理和细节信息,如图5a至图5f中的左下方矩形框放大区域,以及图6a至图6f中的右下方矩形框放大区域所示;使用混合输入的光场图像空间超分辨率重建方法取得了相对更好地结果,但仍然包含一些模糊伪像,如图5g中的左下方矩形框放大区域和与图6g中的右下方矩形框放大区域所示;相比之下,本发明方法所重建的高空间分辨率光场图像具有清晰的纹理和丰富的细节,且在主观视觉感知上接近标签高空间分辨率光场图像(即图5i和图6i),这表明本发明方法可有效恢复光场图像的纹理信息。此外,通过高质量地重建每幅子孔径图像,本发明方法可很好地保证最终重建的高空间分辨率光场图像的视差结构。Comparing Fig. 5a to Fig. 5h with Fig. 5i, and Fig. 6a to Fig. 6h and Fig. 6i, respectively, it can be clearly seen that using the existing light-field image spatial super-resolution reconstruction methods, including 2D image super-resolution reconstruction methods. Resolution reconstruction method, the reconstructed high spatial resolution light field image cannot restore the texture and detail information of the image, such as the enlarged area of the lower left rectangular box in Figure 5a to Figure 5f, and the lower right rectangle in Figure 6a to Figure 6f. The enlarged area of the box is shown; the spatial super-resolution reconstruction method of the light field image using the mixed input achieved relatively better results, but still contained some blurring artifacts, as shown in the lower left rectangular box enlargement area in Fig. 5g and the same as Fig. 6g. In contrast, the high spatial resolution light field image reconstructed by the method of the present invention has clear texture and rich details, and is close to the label high spatial resolution in subjective visual perception Light field images (ie, Fig. 5i and Fig. 6i ), which show that the method of the present invention can effectively restore the texture information of light field images. In addition, by reconstructing each sub-aperture image with high quality, the method of the present invention can well ensure the parallax structure of the final reconstructed high spatial resolution light field image.

本发明方法的创新性主要体现如下:一是通过异构式成像以在捕获高维光场数据的同时获取丰富的2D空间信息,即同时捕获一幅光场图像和一幅2D高分辨率图像,进而通过利用2D高分辨率图像的信息来有效提高光场图像的空间分辨率,并恢复相应的纹理和细节;二是为建立并探索光场图像与2D高分辨率图像之间的联系,本发明方法分别构造了孔径级特征配准模块和光场特征增强模块,前者可将2D高分辨率信息与4D光场图像信息进行准确配准,后者则在此基础上利用配准得到的高分辨率特征信息来一致性地增强光场特征中的视觉信息,以得到增强后的高分辨率光场特征;三是采用灵活的金字塔重建方式,即以从粗到细的重建策略来渐进式地提高光场图像的空间分辨率并恢复准确的视差结构,进而可在一次前向推断中重建多尺度超分辨率结果。此外,为降低金字塔网络的参数量和训练负担,在每个金字塔水平进行权重共享。The innovation of the method of the present invention is mainly reflected as follows: First, through heterogeneous imaging, rich 2D spatial information is acquired while capturing high-dimensional light field data, that is, a light field image and a 2D high-resolution image are simultaneously captured, and then By using the information of the 2D high-resolution image to effectively improve the spatial resolution of the light field image, and restore the corresponding texture and details; the second is to establish and explore the connection between the light field image and the 2D high-resolution image, the present invention The method constructs an aperture-level feature registration module and a light field feature enhancement module respectively. The former can accurately register 2D high-resolution information with 4D light field image information, and the latter uses the high-resolution obtained by registration on this basis. feature information to uniformly enhance the visual information in the light field features to obtain enhanced high-resolution light field features; the third is to use a flexible pyramid reconstruction method, that is, to gradually improve the reconstruction strategy from coarse to fine The spatial resolution of light field images and recover accurate disparity structure, which in turn can reconstruct multi-scale super-resolution results in a single forward inference. In addition, to reduce the amount of parameters and training burden of the pyramid network, weights are shared at each pyramid level.

Claims (3)

1. A light field image space super-resolution reconstruction method is characterized by comprising the following steps:
step 1: selecting Num color three-channel low-spatial-resolution light field images with spatial resolution of W multiplied by H and angular resolution of V multiplied by U, corresponding Num color three-channel 2D high-resolution images with resolution of alpha W multiplied by alpha H, and corresponding Num color three-channel reference high-spatial-resolution light field images with spatial resolution of alpha W multiplied by alpha H and angular resolution of V multiplied by U; wherein Num is more than 1, alpha represents the spatial resolution improvement multiple, and the value of alpha is more than 1;
step 2: constructing a convolutional neural network as a spatial super-resolution network: the spatial super-resolution network comprises an encoder for extracting multi-scale features, an aperture level feature registration module for registering light field features and 2D high-resolution features, a shallow layer feature extraction layer for extracting shallow layer features from a low spatial resolution light field image, a light field feature enhancement module for fusing the light field features and the 2D high-resolution features, a spatial attention block for relieving registration errors in the coarse-scale features, and a decoder for reconstructing potential features into the light field image;
for the encoder, the encoder is composed of a first convolution layer, a second convolution layer, a first residual block and a second residual block which are connected in sequence, wherein the input end of the first convolution layer receives three inputs in parallel, and each input is a frame with spatial resolution of W × H and angle divisionSingle-channel image L of low-spatial-resolution light field image with resolution V multiplied by ULRThe width of the image reconstruction obtained after the spatial resolution up-sampling is alphasW x V and height of alphasH × U subaperture image array, which is denoted as
Figure FDA0003372237690000011
A width of alphasW and a height of alphasThe single-channel image of the blurred 2D high-resolution image of H is described as
Figure FDA0003372237690000012
And a width of alphasW and a height of alphasSingle channel image of H2D high resolution image, denoted as IHRThe output end of the first convolution layer is directed to
Figure FDA0003372237690000013
Output 64 frames with width alphasW x V and height of alphasH × U signature graph, will be directed to
Figure FDA0003372237690000014
The set of all the output feature maps is denoted as
Figure FDA0003372237690000015
Output terminal of the first winding layer is aimed at
Figure FDA0003372237690000016
Output 64 frames with width alphasW and a height of alphasH characteristic diagram, will be directed to
Figure FDA0003372237690000017
The set of all the output feature maps is denoted as
Figure FDA0003372237690000018
Output terminal of the first convolution layer is directed to IHROutput 64 frames with width alphasW and a height of alphasH signature of H will be directed to IHRThe set of all the output feature maps is denoted as YHR,0(ii) a The input terminal of the second convolutional layer receives three inputs in parallel, respectively
Figure FDA0003372237690000019
All the characteristic diagrams in (A),
Figure FDA00033722376900000110
All feature maps and Y in (1)HR,0All feature maps in (1), the output of the second convolutional layer being directed to
Figure FDA00033722376900000111
Output 64 frames with width of
Figure FDA00033722376900000112
And has a height of
Figure FDA00033722376900000113
Will be directed to
Figure FDA00033722376900000114
The set of all the output feature maps is denoted as
Figure FDA0003372237690000021
Output terminal of the second convolution layer is aimed at
Figure FDA0003372237690000022
Output 64 frames with width of
Figure FDA0003372237690000023
And has a height of
Figure FDA0003372237690000024
Will be directed to
Figure FDA0003372237690000025
The set of all the output feature maps is denoted as
Figure FDA0003372237690000026
The output end of the second convolution layer is directed to YHR,0Output 64 frames with width of
Figure FDA0003372237690000027
And has a height of
Figure FDA0003372237690000028
Will be directed to YHRAnd the set of all the characteristic diagrams output by 0 is marked as YHR,1(ii) a The input terminal of the first residual block receives three inputs in parallel, respectively
Figure FDA0003372237690000029
All the characteristic diagrams in (A),
Figure FDA00033722376900000210
All feature maps and Y in (1)HR,1The output of the first residual block is directed to
Figure FDA00033722376900000211
Output 64 frames with width of
Figure FDA00033722376900000212
And has a height of
Figure FDA00033722376900000213
Will be directed to
Figure FDA00033722376900000214
The set of all the output feature maps is denoted as
Figure FDA00033722376900000215
Output of the first residual block is directed to
Figure FDA00033722376900000216
Output 64 frames with width of
Figure FDA00033722376900000217
And has a height of
Figure FDA00033722376900000218
Will be directed to
Figure FDA00033722376900000219
The set of all the output feature maps is denoted as
Figure FDA00033722376900000220
Output of the first residual block for YHR,1Output 64 frames with width of
Figure FDA00033722376900000221
And has a height of
Figure FDA00033722376900000222
Will be directed to YHR,1The set of all the output feature maps is denoted as YHR,2(ii) a The input terminal of the second residual block receives three inputs in parallel, respectively
Figure FDA00033722376900000223
All the characteristic diagrams in (A),
Figure FDA00033722376900000224
All feature maps and Y in (1)HR,2Of the second residual block, the output of the second residual block being directed to
Figure FDA00033722376900000225
Output 64 frames with width of
Figure FDA00033722376900000226
And has a height of
Figure FDA00033722376900000227
Will be directed to
Figure FDA00033722376900000228
The set of all the output feature maps is denoted as
Figure FDA00033722376900000229
Output pair of second residual block
Figure FDA00033722376900000230
Output 64 frames with width of
Figure FDA00033722376900000231
And has a height of
Figure FDA00033722376900000232
Will be directed to
Figure FDA00033722376900000233
The set of all the output feature maps is denoted as
Figure FDA00033722376900000234
Output of the second residual block for YHR,2Output 64 frames with width of
Figure FDA00033722376900000235
And has a height of
Figure FDA00033722376900000236
Will be directed to YHR,2The set of all the output feature maps is denoted as YHR,3(ii) a Wherein,
Figure FDA00033722376900000237
is a single-channel image L of a low spatial resolution light-field image with spatial resolution W × H and angular resolution V × ULRThe width of the image recombination obtained after the bicubic interpolation up-sampling is alphasW x V and height of alphasAn array of H U sub-aperture images,
Figure FDA00033722376900000238
to pass through the pair IHRFirstly carrying out bicubic interpolation downsampling and then carrying out bicubic interpolation upsampling to obtain alphasRepresenting a spatial resolution sampling factor, alphas 3Alpha, the up-sampling factor of the up-sampling of the bicubic interpolation and the down-sampling factor of the down-sampling of the bicubic interpolation both take the value of alphasThe size of the convolution kernel of the first convolution layer is 3 × 3, the convolution step is 1, the number of input channels is 1, the number of output channels is 64, the size of the convolution kernel of the second convolution layer is 3 × 3, the convolution step is 2, the number of input channels is 64, the number of output channels is 64, and the activation functions adopted by the first convolution layer and the second convolution layer are both 'ReLU';
for the aperture level feature registration module, the input end of the aperture level feature registration module receives three types of feature maps, wherein the first type is
Figure FDA0003372237690000031
All characteristic diagrams in (1), the second class is
Figure FDA0003372237690000032
The third class includes four inputs, respectively YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps in (1), YHR,3All feature maps in (1); in the aperture level feature registration module, first, the image data is processed
Figure FDA0003372237690000033
All feature maps in (1), YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All ofFeature map and YHR,3All feature maps in (1) are each replicated by a factor of V × U, so that
Figure FDA0003372237690000034
All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3Becomes the width of all the feature maps in
Figure FDA0003372237690000035
And the height becomes
Figure FDA0003372237690000036
I.e. to obtain the dimensions and
Figure FDA0003372237690000037
and matching the size of the feature map in (1) with YHR,0Becomes asW x V and height becomes alphasH × U, i.e. to size and
Figure FDA0003372237690000038
the dimensions of the feature maps in (1) match; then to
Figure FDA0003372237690000039
All characteristic figures in (1) and
Figure FDA00033722376900000310
all the characteristic diagrams in the method are subjected to block matching, and a width of the characteristic diagram is obtained after the block matching is finished
Figure FDA00033722376900000311
And has a height of
Figure FDA00033722376900000312
Is marked as PCI(ii) a Then according to PCIIs a reaction of YHR,1All the characteristic diagrams in (1) and
Figure FDA00033722376900000313
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure FDA00033722376900000314
And has a height of
Figure FDA00033722376900000315
The obtained set of all the registration feature maps is denoted as FAlign,1(ii) a Also according to PCIIs a reaction of YHR,2All the characteristic diagrams in (1) and
Figure FDA00033722376900000316
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure FDA00033722376900000317
And has a height of
Figure FDA00033722376900000318
The obtained set of all the registration feature maps is denoted as FAlign,2(ii) a According to PCIIs a reaction of YHR,3All the characteristic diagrams in (1) and
Figure FDA00033722376900000319
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure FDA00033722376900000320
And has a height of
Figure FDA00033722376900000321
The obtained set of all the registration feature maps is denoted as FAlign,3(ii) a For P againCIPerforming bicubic interpolation up-sampling to obtain a frame with width alphasW is multiplied by V andheight of alphasH × U coordinate index diagram, noted
Figure FDA00033722376900000322
Finally according to
Figure FDA00033722376900000323
Will YHR,0All the characteristic diagrams in (1) and
Figure FDA00033722376900000324
all feature maps in the image are registered in space position to obtain 64 pieces of width alphasW x V and height of alphasH × U registration feature map, and F represents a set of all the obtained registration feature mapsAlign,0(ii) a Output F of aperture level feature registration moduleAlign,0All characteristic diagrams in (1), FAlign,1All characteristic diagrams in (1), FAlign,2All feature maps and F in (1)Align,3All feature maps in (1); wherein, the precision measurement index for block matching is a texture and structure similarity index, the size of the block for block matching is 3 multiplied by 3, and the up-sampling factor of the bicubic interpolation up-sampling is alphas
For the shallow feature extraction layer, it is composed of 1 fifth convolution layer, the input end of which receives a single-channel image L of a low spatial resolution light field image with spatial resolution WxH and angular resolution VxULRThe output end of the fifth convolution layer outputs 64 characteristic diagrams with the width of W multiplied by V and the height of H multiplied by U, and the set formed by all the output characteristic diagrams is denoted as FLR(ii) a The convolution kernel of the fifth convolution layer has a size of 3 × 3, a convolution step size of 1, a number of input channels of 1, a number of output channels of 64, and the activation function adopted by the fifth convolution layer is "ReLU";
for the light field characteristic enhancement module, the light field characteristic enhancement module consists of a first enhancement residual block, a second enhancement residual block and a third enhancement residual block which are connected in sequence, wherein the input end of the first enhancement residual block receives FAlign,1All feature maps and F in (1)LROf 64 width at the output of the first enhancement residual block
Figure FDA0003372237690000041
And has a height of
Figure FDA0003372237690000042
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,1(ii) a The input of the second enhanced residual block receives FAlign,2All feature maps and F in (1)En,1Of 64 widths at the output of the second enhanced residual block
Figure FDA0003372237690000043
And has a height of
Figure FDA0003372237690000044
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,2(ii) a The input of the third enhanced residual block receives FAlign,3All feature maps and F in (1)En,2Of 64 width at the output of the third enhanced residual block
Figure FDA0003372237690000045
And has a height of
Figure FDA0003372237690000046
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,3
For a spatial attention block, which consists of a sixth convolutional layer and a seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives FAlign,0The output end of the sixth convolutional layer outputs 64 characteristic graphs with the width of alphasW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA1(ii) a Input terminal of seventh convolution layer receiving FSA1In (1)All spatial attention feature maps, the output end of the seventh convolutional layer outputs 64 width alphasW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA2(ii) a F is to beAlign,0All feature maps in (1) and (F)SA2Multiplying all the spatial attention feature maps element by element, and recording the set formed by all the obtained feature maps as FWA,0(ii) a F is to beWA,0As all feature maps output by the output end of the spatial attention block; the sizes of convolution kernels of the sixth convolution layer and the seventh convolution layer are both 3 multiplied by 3, convolution step lengths are both 1, the number of input channels is 64, the number of output channels is 64, the activation function adopted by the sixth convolution layer is 'ReLU', and the activation function adopted by the seventh convolution layer is 'Sigmoid';
for the decoder, the decoder is composed of a third residual block, a fourth residual block, a sub-pixel convolution layer, an eighth convolution layer and a ninth convolution layer which are connected in sequence, wherein the input end of the third residual block receives FEn,3Of 64 widths at the output of the third residual block
Figure FDA0003372237690000051
And has a height of
Figure FDA0003372237690000052
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,1(ii) a The input of the fourth residual block receives FDec,1Of 64 width at the output of the fourth residual block
Figure FDA0003372237690000053
And has a height of
Figure FDA0003372237690000054
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,2(ii) a Input terminal of sub-pixel convolution layer receiving FDec,2All characteristic diagrams in (1)The output end of the sub-pixel convolution layer outputs 256 widths
Figure FDA0003372237690000055
And has a height of
Figure FDA0003372237690000056
And 256 widths are set as
Figure FDA0003372237690000057
And has a height of
Figure FDA0003372237690000058
Further converting the feature map into 64 pieces with the width alphasW x V and height of alphasH × U feature graph, and F represents a set of all converted feature graphsDec,3(ii) a Input terminal of eighth convolution layer receiving FDec,3All feature maps in (1) and (F)WA,0The result of element-by-element addition of all the feature maps in (1), the output end of the eighth convolutional layer outputs 64 width alphasW x V and height of alphasH × U feature map, and F represents a set of all output feature mapsDec,4(ii) a Input terminal of the ninth convolutional layer receives FDec,4The output end of the ninth convolutional layer outputs a characteristic diagram with a width of alphasW x V and height of alphasH multiplied by U, the single-channel light field image is reconstructed, and the width is alphasW x V and height of alphasReconstruction of H multiplied by U single-channel light field image into alpha-space resolutionsW×αsH and high spatial resolution single-channel light field image with angular resolution of V multiplied by U, which is recorded as LSR(ii) a The convolution kernel of the sub-pixel convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 256, the convolution kernel of the eighth convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 64, the convolution kernel of the ninth convolution layer has the size of 1 multiplied by 1, the convolution step is 1, the number of input channels is 64, the number of output channels is 1, and excitation adopted by the sub-pixel convolution layer and the eighth convolution layerThe active functions are all 'ReLU', and the ninth convolution layer does not adopt the active function;
and step 3: performing color space conversion on each low spatial resolution light field image in the training set, the corresponding 2D high resolution image and the corresponding reference high spatial resolution light field image, namely converting the RGB color space into the YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of each low spatial resolution light field image into a sub-aperture image array with the width of W multiplied by V and the height of H multiplied by U for representation; then, a sub-aperture image array recombined with Y-channel images of all the light field images with low spatial resolution in the training set, a corresponding Y-channel image of the 2D high-resolution image and a corresponding Y-channel image of the reference light field image with high spatial resolution form the training set; and then constructing a pyramid network, and training by using a training set, wherein the concrete process is as follows:
step 3_ 1: copying the constructed spatial super-resolution network three times, cascading, sharing the weight of each spatial super-resolution network, namely, all the parameters are the same, and defining the whole network formed by the three spatial super-resolution networks as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be equal to αsThe values are the same;
step 3_ 2: carrying out two times of spatial resolution downsampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after downsampling as a label image; carrying out the same spatial resolution down-sampling twice on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image aiming at a first spatial super-resolution network in the pyramid network; then recombining the Y-channel images of all the low-spatial-resolution light field images in the training set to obtain a sub-aperture image array, performing primary spatial resolution up-sampling on the Y-channel images of all the low-spatial-resolution light field images in the training set to obtain an image recombined sub-aperture image array, all the 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network and all the 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid networkFuzzy 2D high-resolution Y-channel images obtained by performing one-time spatial resolution down-sampling and one-time spatial resolution up-sampling on the 2D high-resolution Y-channel images of the network are input into a first spatial super-resolution network in the constructed pyramid network for training, and alpha corresponding to the Y-channel image of each low-spatial resolution light field image in a training set is obtainedsReconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
step 3_ 3: carrying out single spatial resolution down-sampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after the down-sampling as a label image; carrying out single same spatial resolution down-sampling on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image for a second spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training setsSub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training setsInputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the reconstructed image subaperture image array, all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network into the second spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial resolution light field image in the training sets 2Reconstructing a high-spatial-resolution Y-channel light field image; wherein spatial resolution up-sampling and spatial resolution down-samplingThe sampling modes are bicubic interpolation, and the scales of the spatial resolution up-sampling and the spatial resolution down-sampling are equal to alphasThe values are the same;
step 3_ 4: taking a Y-channel image of each reference high-spatial-resolution light field image in the training set as a label image; taking the Y-channel image of each 2D high-resolution image in the training set as a 2D high-resolution Y-channel image for a third spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training sets 2Sub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training sets 2Inputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the multiple reconstructed high-spatial-resolution Y-channel light field images to a third spatial super-resolution network in the pyramid network, and all 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the 2D high-resolution Y-channel images to the third spatial super-resolution network in the pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training sets 3Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
obtaining the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network after the training is finished, and obtaining a well-trained spatial super-resolution network model;
and 4, step 4: randomly selecting a low-spatial-resolution light field image with three color channels and a corresponding 2D high-resolution image with three color channels as test images; then, converting the low-spatial-resolution light field image of the three color channels and the corresponding 2D high-resolution image of the three color channels from an RGB color space to a YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of the light field image with low spatial resolution into a sub-aperture image array for representation; inputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the Y-channel images of the low-spatial resolution light field images, the Y-channel images of the 2D high-resolution images and the Y-channel images of the 2D high-resolution images into a spatial super-resolution network model, and testing to obtain reconstructed high-spatial resolution Y-channel light field images corresponding to the Y-channel images of the low-spatial resolution light field images; then performing bicubic interpolation up-sampling on the Cb channel image and the Cr channel image of the low-spatial-resolution light field image respectively to obtain a reconstructed high-spatial-resolution Cb channel light field image corresponding to the Cb channel image of the low-spatial-resolution light field image and a reconstructed high-spatial-resolution Cr channel light field image corresponding to the Cr channel image of the low-spatial-resolution light field image; and finally, cascading the obtained reconstructed high-spatial-resolution Y-channel light field image, the reconstructed high-spatial-resolution Cb-channel light field image and the reconstructed high-spatial-resolution Cr-channel light field image on the dimension of a color channel, and converting the cascading result into an RGB color space again to obtain the reconstructed high-spatial-resolution light field image of the color three channels corresponding to the low-spatial-resolution light field image.
2. The method for super-resolution reconstruction of light field image space according to claim 1, wherein in step 2, the first, second, third and fourth residual blocks have the same structure and are composed of sequentially connected third and fourth convolutional layers, and the input end of the third convolutional layer in the first residual block receives three inputs in parallel, namely, three inputs
Figure FDA0003372237690000081
All the characteristic diagrams in (A),
Figure FDA0003372237690000082
All feature maps and Y in (1)HR,1Of the third convolutional layer in the first residual block, the output end of the third convolutional layer is directed to
Figure FDA0003372237690000083
Output 64 frames with width of
Figure FDA0003372237690000084
And has a height of
Figure FDA0003372237690000085
Will be directed to
Figure FDA0003372237690000086
The set of all the output feature maps is denoted as
Figure FDA0003372237690000087
Output terminal pair of the third convolution layer in the first residual block
Figure FDA0003372237690000088
Output 64 frames with width of
Figure FDA0003372237690000089
And has a height of
Figure FDA00033722376900000810
Will be directed to
Figure FDA00033722376900000811
The set of all the output feature maps is denoted as
Figure FDA00033722376900000812
Output pin of third convolution layer in first residual blockFor YHR,1Output 64 frames with width of
Figure FDA00033722376900000813
And has a height of
Figure FDA00033722376900000814
Will be directed to YHR,1The set of all the output feature maps is denoted as
Figure FDA00033722376900000815
The input terminal of the fourth convolutional layer in the first residual block receives three inputs in parallel, respectively
Figure FDA00033722376900000816
All the characteristic diagrams in (A),
Figure FDA00033722376900000817
All characteristic figures in (1) and
Figure FDA00033722376900000818
of the fourth convolutional layer in the first residual block, the output of the fourth convolutional layer is directed to
Figure FDA0003372237690000091
Output 64 frames with width of
Figure FDA0003372237690000092
And has a height of
Figure FDA0003372237690000093
Will be directed to
Figure FDA0003372237690000094
The set of all the output feature maps is denoted as
Figure FDA0003372237690000095
Output terminal pair of fourth convolution layer in first residual block
Figure FDA0003372237690000096
Output 64 frames with width of
Figure FDA0003372237690000097
And has a height of
Figure FDA0003372237690000098
Will be directed to
Figure FDA0003372237690000099
The set of all the output feature maps is denoted as
Figure FDA00033722376900000910
Output terminal pair of fourth convolution layer in first residual block
Figure FDA00033722376900000911
Output 64 frames with width of
Figure FDA00033722376900000912
And has a height of
Figure FDA00033722376900000913
Will be directed to
Figure FDA00033722376900000914
The set of all the output feature maps is denoted as
Figure FDA00033722376900000915
Will be provided with
Figure FDA00033722376900000916
All the characteristic diagrams in (1) and
Figure FDA00033722376900000917
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparison
Figure FDA00033722376900000918
All the output feature maps, the set formed by the feature maps is the
Figure FDA00033722376900000919
Will be provided with
Figure FDA00033722376900000920
All the characteristic diagrams in (1) and
Figure FDA00033722376900000921
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparison
Figure FDA00033722376900000922
All the output feature maps, the set formed by the feature maps is the
Figure FDA00033722376900000923
Will YHR,1All the characteristic diagrams in (1) and
Figure FDA00033722376900000924
all feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first residual block and aim at YHR,1All the output feature maps, and the set formed by the feature maps is YHR,2
The input of the third convolutional layer in the second residual block receives three inputs in parallel, respectively
Figure FDA00033722376900000925
All the characteristic diagrams in (A),
Figure FDA00033722376900000926
All feature maps and Y in (1)HR,2Of the third convolutional layer in the second residual block, the output end of the third convolutional layer is directed to
Figure FDA00033722376900000927
Output 64 frames with width of
Figure FDA00033722376900000928
And has a height of
Figure FDA00033722376900000929
Will be directed to
Figure FDA00033722376900000930
The set of all the output feature maps is denoted as
Figure FDA00033722376900000931
Output pair of the third convolutional layer in the second residual block
Figure FDA00033722376900000932
Output 64 frames with width of
Figure FDA00033722376900000933
And has a height of
Figure FDA00033722376900000934
Will be directed to
Figure FDA00033722376900000935
The set of all the output feature maps is denoted as
Figure FDA00033722376900000936
The output of the third convolutional layer in the second residual block is for YHR,2Output 64 frames with width of
Figure FDA00033722376900000937
And has a height of
Figure FDA00033722376900000938
Will be directed to YHR,2The set of all the output feature maps is denoted as
Figure FDA00033722376900000939
The input terminal of the fourth convolutional layer in the second residual block receives three inputs in parallel, respectively
Figure FDA00033722376900000940
All the characteristic diagrams in (A),
Figure FDA00033722376900000941
All characteristic figures in (1) and
Figure FDA00033722376900000942
of the fourth convolutional layer in the second residual block, the output of the fourth convolutional layer is directed to
Figure FDA0003372237690000101
Output 64 frames with width of
Figure FDA0003372237690000102
And has a height of
Figure FDA0003372237690000103
Will be directed to
Figure FDA0003372237690000104
The set of all the output feature maps is denoted as
Figure FDA0003372237690000105
Output terminal pair of fourth convolution layer in second residual block
Figure FDA0003372237690000106
Output 64 frames with width of
Figure FDA0003372237690000107
And has a height of
Figure FDA0003372237690000108
Will be directed to
Figure FDA0003372237690000109
The set of all the output feature maps is denoted as
Figure FDA00033722376900001010
Output terminal pair of fourth convolution layer in second residual block
Figure FDA00033722376900001011
Output 64 frames with width of
Figure FDA00033722376900001012
And has a height of
Figure FDA00033722376900001013
Will be directed to
Figure FDA00033722376900001014
The set of all the output feature maps is denoted as
Figure FDA00033722376900001015
Will be provided with
Figure FDA00033722376900001016
All the characteristic diagrams in (1) and
Figure FDA00033722376900001017
all feature maps in (1) are element-by-elementPixel addition, using all the obtained feature maps as output end pairs of the second residual error block
Figure FDA00033722376900001018
All the output feature maps, the set formed by the feature maps is the
Figure FDA00033722376900001019
Will be provided with
Figure FDA00033722376900001020
All the characteristic diagrams in (1) and
Figure FDA00033722376900001021
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparison
Figure FDA00033722376900001022
All the output feature maps, the set formed by the feature maps is the
Figure FDA00033722376900001023
Will YHR,2All the characteristic diagrams in (1) and
Figure FDA00033722376900001024
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block and aim at YHR,2All the output feature maps, and the set formed by the feature maps is YHR,3
The input of the third convolutional layer in the third residual block receives FEn,3Of 64 width at the output of the third convolutional layer in the third residual block
Figure FDA00033722376900001025
And has a height of
Figure FDA00033722376900001026
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001027
Input reception of a fourth convolutional layer in a third residual block
Figure FDA00033722376900001028
Of 64 width at the output of the fourth convolutional layer in the third residual block
Figure FDA00033722376900001029
And has a height of
Figure FDA00033722376900001030
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001031
F is to beEn,3All the characteristic diagrams in (1) and
Figure FDA00033722376900001032
all the feature maps in the third residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third residual block, and the set formed by the feature maps is FDec,1
The input of the third convolutional layer in the fourth residual block receives FDec,1The output end of the third convolution layer in the fourth residual block outputs 64 width
Figure FDA00033722376900001033
And has a height of
Figure FDA00033722376900001034
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000111
Input reception of a fourth convolutional layer in a fourth residual block
Figure FDA0003372237690000112
Of 64 width at the output of the fourth convolutional layer in the fourth residual block
Figure FDA0003372237690000113
And has a height of
Figure FDA0003372237690000114
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000115
F is to beDec,1All the characteristic diagrams in (1) and
Figure FDA0003372237690000116
all the feature maps in the first residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the fourth residual block, and the set formed by the feature maps is FDec,2
In the above, the sizes of convolution kernels of the third convolution layer and the fourth convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is 64, the number of output channels is 64, and the activation function adopted by the third convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block is "ReLU" and the activation function adopted by the fourth convolution layer is not adopted.
3. The light field image spatial super-resolution reconstruction method according to claim 1 or 2, it is characterized in that in step 2, the first enhanced residual block, the second enhanced residual block and the third enhanced residual block have the same structure, which consists of a first spatial characteristic transformation layer, a first spatial angle convolution layer, a second spatial characteristic transformation layer, a second spatial angle convolution layer and a channel attention layer which are connected in sequence, wherein the first spatial characteristic transformation layer and the second spatial characteristic transformation layer have the same structure, which are composed of a tenth convolution layer and an eleventh convolution layer in parallel, the first space angle convolution layer and the second space angle convolution layer have the same structure, the channel attention layer consists of a global mean value pooling layer, a fourteenth convolution layer and a fifteenth convolution layer which are connected in sequence;
an input of a tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure FDA0003372237690000117
And has a height of
Figure FDA0003372237690000118
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000119
An input of an eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure FDA00033722376900001110
And has a height of
Figure FDA00033722376900001111
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001112
The input of the first spatial feature transform layer in the first enhanced residual block receives FLRAll feature maps in (1), will FLRAll the characteristic diagrams in (1) and
Figure FDA00033722376900001113
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA00033722376900001114
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the first enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure FDA0003372237690000121
An input of a twelfth of the first spatial angle convolutional layers in the first enhanced residual block receives
Figure FDA0003372237690000122
Of the first spatial angle convolutional layer in the first enhancement residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 widths
Figure FDA0003372237690000123
And has a height of
Figure FDA0003372237690000124
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000125
To pair
Figure FDA0003372237690000126
From the spatial dimension to the angular dimensionThe input of a thirteenth of the first spatial angle convolutional layers in the first enhanced residual block receives
Figure FDA0003372237690000127
The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the first enhancement residual block outputs 64 widths as the result of the reorganization operation of all the feature maps in (1)
Figure FDA0003372237690000128
And has a height of
Figure FDA0003372237690000129
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001210
To pair
Figure FDA00033722376900001211
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, taking all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a first enhanced residual block, and recording a set formed by the feature maps as a set
Figure FDA00033722376900001212
The input terminal of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure FDA00033722376900001213
And has a height of
Figure FDA00033722376900001214
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001215
An input of an eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure FDA00033722376900001216
And has a height of
Figure FDA00033722376900001217
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001218
The input of the second spatial feature transform layer in the first enhanced residual block receives
Figure FDA00033722376900001219
All the characteristic diagrams in (1) will
Figure FDA00033722376900001220
All the characteristic diagrams in (1) and
Figure FDA00033722376900001221
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA00033722376900001222
The obtained feature maps are used as all feature maps output by the output end of the second spatial feature transform layer in the first enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure FDA00033722376900001223
An input of a twelfth of the second spatial angle convolutional layers in the first enhanced residual block receives
Figure FDA00033722376900001224
Of the twelfth convolutional layer of the second spatial angle convolutional layers in the first enhancement residual block outputs 64 width pictures
Figure FDA0003372237690000131
And has a height of
Figure FDA0003372237690000132
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000133
To pair
Figure FDA0003372237690000134
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the first enhancement residual block receiving
Figure FDA0003372237690000135
The output end of the thirteenth convolution layer of the second space angle convolution layer in the first enhanced residual error block outputs 64 width values
Figure FDA0003372237690000136
And has a height of
Figure FDA0003372237690000137
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000138
To pair
Figure FDA0003372237690000139
Performing recombination operation from angle dimension to space dimension on all feature maps in the first enhancement residual block, taking all feature maps obtained after the recombination operation as all feature maps output by the output end of the second spatial angle convolution layer in the first enhancement residual block, and recording a set formed by the feature maps as a set
Figure FDA00033722376900001310
The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives
Figure FDA00033722376900001311
The output end of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 feature maps with the width of
Figure FDA00033722376900001312
And has a height of
Figure FDA00033722376900001313
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,1,FGAP,1All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FGAP,1The output end of the fourteenth convolution layer in the channel attention layer in the first enhancement residual block outputs 4 width
Figure FDA00033722376900001314
And has a height of
Figure FDA00033722376900001315
The feature map of (1) represents a set of all feature maps outputtedFDS,1(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FDS,1Of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 widths of
Figure FDA00033722376900001316
And has a height of
Figure FDA00033722376900001317
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,1(ii) a F is to beUS,1All the characteristic diagrams in (1) and
Figure FDA00033722376900001318
all feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the first enhanced residual block, and a set formed by the feature maps is marked as FCA,1
F is to beCA,1All feature maps in (1) and (F)LRAll the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the first enhancement residual block, and the set formed by the feature maps is FEn,1
The input terminal of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure FDA0003372237690000141
And has a height of
Figure FDA0003372237690000142
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000143
An input of an eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure FDA0003372237690000144
And has a height of
Figure FDA0003372237690000145
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000146
Receiving F at receiving end of first spatial feature transform layer in second enhanced residual blockEn,1All feature maps in (1), will FEn,1All the characteristic diagrams in (1) and
Figure FDA0003372237690000147
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA0003372237690000148
The obtained feature maps are used as all feature maps output by the output end of the first spatial feature transform layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure FDA0003372237690000149
An input of a twelfth of the first spatial angle convolutional layers in the second enhanced residual block receives
Figure FDA00033722376900001410
All feature maps in (1), first spatial angle volume in second enhancement residual blockThe output end of the twelfth convolution layer of the lamination outputs 64 widths
Figure FDA00033722376900001411
And has a height of
Figure FDA00033722376900001412
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001413
To pair
Figure FDA00033722376900001414
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the second enhancement residual block receiving
Figure FDA00033722376900001415
The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the second enhanced residual block outputs 64 width values
Figure FDA00033722376900001416
And has a height of
Figure FDA00033722376900001417
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001418
To pair
Figure FDA00033722376900001419
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, and outputting all feature maps obtained after the operation of reconstructing as output ends of the first space angle convolution layer in the second enhanced residual blockAll feature maps are referred to as a set of feature maps
Figure FDA00033722376900001420
An input of a tenth convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure FDA00033722376900001421
And has a height of
Figure FDA00033722376900001422
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000151
An input of an eleventh convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure FDA0003372237690000152
And has a height of
Figure FDA0003372237690000153
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000154
Receiving end of second spatial feature transform layer in second enhanced residual block
Figure FDA0003372237690000155
All the characteristic diagrams in (1) will
Figure FDA0003372237690000156
All the characteristic diagrams in (1) and
Figure FDA0003372237690000157
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA0003372237690000158
The obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure FDA0003372237690000159
An input of a twelfth of the second spatial angle convolutional layers in the second enhanced residual block receives
Figure FDA00033722376900001510
Of the twelfth convolutional layer of the second spatial angle convolutional layers in the second enhancement residual block outputs 64 width pictures
Figure FDA00033722376900001511
And has a height of
Figure FDA00033722376900001512
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001513
To pair
Figure FDA00033722376900001514
Performs a re-composition operation from the spatial dimension to the angular dimension, a thirteenth convolution in the second spatial-angular convolution layer in the second enhancement residual blockInput side reception of layers
Figure FDA00033722376900001515
The output end of the thirteenth convolution layer of the second space angle convolution layer in the second enhanced residual block outputs 64 width values
Figure FDA00033722376900001516
And has a height of
Figure FDA00033722376900001517
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001518
To pair
Figure FDA00033722376900001519
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a second space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
Figure FDA00033722376900001520
The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives
Figure FDA00033722376900001521
The output end of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 width pictures
Figure FDA00033722376900001522
And has a height of
Figure FDA00033722376900001523
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,2,FGAP,2All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FGAP,2The output end of the fourteenth convolution layer in the channel attention layer in the second enhanced residual block outputs 4 width
Figure FDA0003372237690000161
And has a height of
Figure FDA0003372237690000162
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,2(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FDS,2Of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 widths of
Figure FDA0003372237690000163
And has a height of
Figure FDA0003372237690000164
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,2(ii) a F is to beUS,2All the characteristic diagrams in (1) and
Figure FDA0003372237690000165
the obtained all feature maps are used as all feature maps output by the output end of the channel attention layer in the second enhanced residual block, and the set formed by the feature maps is marked as FCA,2
F is to beCA,2All feature maps in (1) and (F)En,1All feature maps in (1) are added element by element, and all obtained feature maps are used as all features output by the output end of the second enhanced residual blockThe set of these characteristic maps is FEn,2
An input of a tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure FDA0003372237690000166
And has a height of
Figure FDA0003372237690000167
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000168
An input of an eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure FDA0003372237690000169
And has a height of
Figure FDA00033722376900001610
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001611
Receiving F at receiving end of first spatial feature transform layer in third enhanced residual blockEn,2All feature maps in (1), will FEn,2All the characteristic diagrams in (1) and
Figure FDA00033722376900001612
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA00033722376900001613
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure FDA00033722376900001614
An input of a twelfth of the first spatial angle convolutional layers in the third enhanced residual block receives
Figure FDA00033722376900001615
Of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 width signals
Figure FDA00033722376900001616
And has a height of
Figure FDA00033722376900001617
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001618
To pair
Figure FDA00033722376900001619
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the third enhancement residual block receiving
Figure FDA0003372237690000171
The output end of the thirteenth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 widths as the result of the recombination operation of all the feature maps in (1)
Figure FDA0003372237690000172
And has a height of
Figure FDA0003372237690000173
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000174
To pair
Figure FDA0003372237690000175
All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
Figure FDA0003372237690000176
An input of a tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width
Figure FDA0003372237690000177
And has a height of
Figure FDA0003372237690000178
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000179
An input of an eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure FDA00033722376900001710
And has a height of
Figure FDA00033722376900001711
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001712
Receiving end of second spatial feature transform layer in third enhanced residual block
Figure FDA00033722376900001713
All the characteristic diagrams in (1) will
Figure FDA00033722376900001714
All the characteristic diagrams in (1) and
Figure FDA00033722376900001715
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA00033722376900001716
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure FDA00033722376900001717
An input of a twelfth of the second spatial angle convolutional layers in the third enhanced residual block receives
Figure FDA00033722376900001718
Of the twelfth convolutional layer of the second spatial angle convolutional layer in the third enhanced residual block outputs 64 width pictures
Figure FDA00033722376900001719
And has a height of
Figure FDA00033722376900001720
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001721
To pair
Figure FDA00033722376900001722
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the third enhancement residual block receiving
Figure FDA00033722376900001723
The output end of the thirteenth convolution layer of the second space angle convolution layer in the third enhanced residual block outputs 64 width values
Figure FDA00033722376900001724
And has a height of
Figure FDA00033722376900001725
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000181
To pair
Figure FDA0003372237690000182
All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the second space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
Figure FDA0003372237690000183
The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives
Figure FDA0003372237690000184
The output end of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 width images
Figure FDA0003372237690000185
And has a height of
Figure FDA0003372237690000186
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,3,FGAP,3All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FGAP,3The output end of the fourteenth convolution layer in the channel attention layer in the third enhanced residual block outputs 4 width
Figure FDA0003372237690000187
And has a height of
Figure FDA0003372237690000188
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,3(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FDS,3Of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 widths of
Figure FDA0003372237690000189
And has a height of
Figure FDA00033722376900001810
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,3(ii) a F is to beUS,3All the characteristic diagrams in (1) and
Figure FDA00033722376900001811
all feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the third enhanced residual block, and a set formed by the feature maps is marked as FCA,3
F is to beCA,3All feature maps in (1) and (F)En,2All the feature maps in the third enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third enhancement residual block, and the set formed by the feature maps is FEn,3
In the above, the sizes of convolution kernels of the tenth convolution layer and the eleventh convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is all 64, and no activation function is adopted, the sizes of convolution kernels of the twelfth convolution layer and the thirteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is 64, the adopted activation functions are all "ReLU", the sizes of convolution kernels of the fourteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are 1 × 1, the convolution step lengths are 1, the number of input channels is 64, the number of output channels is 4, and the adopted activation function is "ReLU", the size of the convolution kernel of the fifteenth convolution layer in each of the first, second, and third enhanced residual blocks is 1 × 1, the convolution step is 1, the number of input channels is 4, the number of output channels is 64, and the employed activation function is "Sigmoid".
CN202111405987.1A 2021-11-24 2021-11-24 A method for spatial super-resolution reconstruction of light field images Active CN114359041B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111405987.1A CN114359041B (en) 2021-11-24 2021-11-24 A method for spatial super-resolution reconstruction of light field images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111405987.1A CN114359041B (en) 2021-11-24 2021-11-24 A method for spatial super-resolution reconstruction of light field images

Publications (2)

Publication Number Publication Date
CN114359041A true CN114359041A (en) 2022-04-15
CN114359041B CN114359041B (en) 2024-11-26

Family

ID=81096214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111405987.1A Active CN114359041B (en) 2021-11-24 2021-11-24 A method for spatial super-resolution reconstruction of light field images

Country Status (1)

Country Link
CN (1) CN114359041B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309067A (en) * 2023-03-21 2023-06-23 安徽易刚信息技术有限公司 Light field image space super-resolution method
CN117114987A (en) * 2023-07-17 2023-11-24 重庆理工大学 Light field image super-resolution reconstruction method based on sub-pixel and gradient guidance
CN117475088A (en) * 2023-12-25 2024-01-30 浙江优众新材料科技有限公司 Light field reconstruction model training method based on polar plane attention and related equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109741260A (en) * 2018-12-29 2019-05-10 天津大学 An efficient super-resolution method based on deep backprojection network
US20200402205A1 (en) * 2019-06-18 2020-12-24 Huawei Technologies Co., Ltd. Real-time video ultra resolution
CN112381711A (en) * 2020-10-27 2021-02-19 深圳大学 Light field image reconstruction model training and rapid super-resolution reconstruction method
CN112669214A (en) * 2021-01-04 2021-04-16 东北大学 Fuzzy image super-resolution reconstruction method based on alternative direction multiplier algorithm
CN112950475A (en) * 2021-03-05 2021-06-11 北京工业大学 Light field super-resolution reconstruction method based on residual learning and spatial transformation network
CN113139898A (en) * 2021-03-24 2021-07-20 宁波大学 Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109741260A (en) * 2018-12-29 2019-05-10 天津大学 An efficient super-resolution method based on deep backprojection network
US20200402205A1 (en) * 2019-06-18 2020-12-24 Huawei Technologies Co., Ltd. Real-time video ultra resolution
CN112381711A (en) * 2020-10-27 2021-02-19 深圳大学 Light field image reconstruction model training and rapid super-resolution reconstruction method
CN112669214A (en) * 2021-01-04 2021-04-16 东北大学 Fuzzy image super-resolution reconstruction method based on alternative direction multiplier algorithm
CN112950475A (en) * 2021-03-05 2021-06-11 北京工业大学 Light field super-resolution reconstruction method based on residual learning and spatial transformation network
CN113139898A (en) * 2021-03-24 2021-07-20 宁波大学 Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邓武 等: "《融合全局与局部视角的光场超分辨率重建》", 《计算机应用研究》, vol. 36, no. 5, 31 May 2019 (2019-05-31), pages 1549 - 1559 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309067A (en) * 2023-03-21 2023-06-23 安徽易刚信息技术有限公司 Light field image space super-resolution method
CN116309067B (en) * 2023-03-21 2023-09-29 安徽易刚信息技术有限公司 Light field image space super-resolution method
CN117114987A (en) * 2023-07-17 2023-11-24 重庆理工大学 Light field image super-resolution reconstruction method based on sub-pixel and gradient guidance
CN117475088A (en) * 2023-12-25 2024-01-30 浙江优众新材料科技有限公司 Light field reconstruction model training method based on polar plane attention and related equipment
CN117475088B (en) * 2023-12-25 2024-03-19 浙江优众新材料科技有限公司 Light field reconstruction model training method based on polar plane attention and related equipment

Also Published As

Publication number Publication date
CN114359041B (en) 2024-11-26

Similar Documents

Publication Publication Date Title
CN113139898B (en) Super-resolution reconstruction method of light field image based on frequency domain analysis and deep learning
Cai et al. Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction
Meng et al. High-dimensional dense residual convolutional neural network for light field reconstruction
CN110570353B (en) Densely connected generative adversarial network single image super-resolution reconstruction method
CN112819737B (en) Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
An et al. TR-MISR: Multiimage super-resolution based on feature fusion with transformers
Farrugia et al. Light field super-resolution using a low-rank prior and deep convolutional neural networks
Zhao et al. Pyramid global context network for image dehazing
CN114359041B (en) A method for spatial super-resolution reconstruction of light field images
Liu et al. Multi-angular epipolar geometry based light field angular reconstruction network
CN112767253B (en) Multi-scale feature fusion binocular image super-resolution reconstruction method
CN109886871A (en) Image super-resolution method based on channel attention mechanism and multi-layer feature fusion
Shi et al. Exploiting multi-scale parallel self-attention and local variation via dual-branch transformer-CNN structure for face super-resolution
CN109146787B (en) Real-time reconstruction method of dual-camera spectral imaging system based on interpolation
CN106920214A (en) Spatial target images super resolution ratio reconstruction method
CN113744134B (en) Hyperspectral image super-resolution method based on spectral unmixing convolutional neural network
CN112308085B (en) A Convolutional Neural Network Based Light Field Image Denoising Method
CN114841856A (en) Image super-pixel reconstruction method of dense connection network based on depth residual channel space attention
Jin et al. Light field super-resolution via attention-guided fusion of hybrid lenses
Guan et al. Srdgan: learning the noise prior for super resolution with dual generative adversarial networks
CN110880162A (en) Snapshot spectrum depth combined imaging method and system based on deep learning
CN117474764B (en) A high-resolution reconstruction method for remote sensing images under complex degradation models
CN112785502A (en) Light field image super-resolution method of hybrid camera based on texture migration
CN117114987B (en) Light field image super-resolution reconstruction method based on sub-pixel and gradient guidance
CN111754561A (en) Method and system for depth recovery of light field images based on self-supervised deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant