CN114359041A - Light field image space super-resolution reconstruction method - Google Patents

Light field image space super-resolution reconstruction method Download PDF

Info

Publication number
CN114359041A
CN114359041A CN202111405987.1A CN202111405987A CN114359041A CN 114359041 A CN114359041 A CN 114359041A CN 202111405987 A CN202111405987 A CN 202111405987A CN 114359041 A CN114359041 A CN 114359041A
Authority
CN
China
Prior art keywords
feature maps
spatial
residual block
output
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111405987.1A
Other languages
Chinese (zh)
Inventor
陈晔曜
郁梅
蒋刚毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo University
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN202111405987.1A priority Critical patent/CN114359041A/en
Publication of CN114359041A publication Critical patent/CN114359041A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a light field image space super-resolution reconstruction method, which constructs a space super-resolution network and comprises an encoder, an aperture level feature registration module, a light field feature enhancement module, a decoder and the like, wherein the encoder is used for extracting multi-scale features from an up-sampled low-space-resolution light field image, a 2D high-resolution image and blurred images thereof; learning the correspondence between the 2D high resolution features and the low resolution light field features through an aperture level feature registration module to register the 2D high resolution features under each sub-aperture image and form registered high resolution light field features; enhancing the extracted shallow light field characteristics by utilizing the registered high-resolution light field characteristics through a light field characteristic enhancement module to obtain enhanced high-resolution light field characteristics; reconstructing the enhanced high-resolution light field characteristics into a high-spatial-resolution light field image by using a decoder; the method has the advantages that the high-spatial-resolution light field image can be reconstructed with high quality, and texture and detail information can be recovered.

Description

Light field image space super-resolution reconstruction method
Technical Field
The invention relates to an image super-resolution reconstruction technology, in particular to a light field image space super-resolution reconstruction method.
Background
Unlike conventional digital cameras, light field cameras can capture the intensity (i.e., spatial information) and directional (i.e., angular information) information of light rays in a scene, thereby more realistically recording the real world. At the same time, the rich information implied by 4-Dimensional (4D) light field images acquired by light field cameras facilitates many applications such as refocusing, depth estimation, virtual/augmented reality, etc. Current commercial-grade light field cameras employ microlens arrays to separate light rays in different directions that pass through the same location point in the scene, and then simultaneously record spatial information and angular information at the sensor plane. However, because the resolution of the sensor shared by the spatial and angular dimensions is limited, the spatial resolution of the acquired 4D light field image is inevitably reduced while providing high angular sampling (or called high angular resolution), and thus, improving the spatial resolution of the 4D light field image becomes an important problem to be solved in the field of light field research.
In general, a 4D light field Image includes a plurality of interconvertible visualization methods, such as a Sub-Aperture Image (SAI) array displayed based on 2-Dimensional (2-Dimensional, 2D) spatial information, a Micro-Lens Image (MLI) array displayed based on 2D angular information, and an Epipolar Plane Image (EPI) displayed combining 1-Dimensional spatial information and 1-Dimensional angular information. Intuitively, increasing the spatial resolution of the 4D light-field image is to increase the resolution of each 2D SAI in the 4D light-field image. Therefore, it is a straightforward matter to apply the existing super-resolution reconstruction method for 2D images, such as the depth back-projection network proposed by Haris et al, the depth laplacian pyramid network proposed by Lai et al, and the like, to each SAI independently, but this approach ignores the information embedded in the angle domain by the 4D light field image, and it is difficult to ensure the angular consistency of the super-resolution result. Therefore, the key to designing the 4D light field image spatial super-resolution reconstruction method is to explore the high-dimensional structural characteristics of the 4D light field image. The current spatial super-resolution reconstruction methods for 4D light field images can be broadly classified into two categories, optimization-based and learning-based.
Optimization-based methods typically utilize estimated disparity or depth information to model the relationship between SAIs of 4D light-field images, thereby representing 4D light-field image spatial super-resolution reconstruction as an optimization problem. However, the disparity or depth information inferred from low spatial resolution light-field images is not very reliable and hence optimization-based methods exhibit rather limited performance.
The learning-based approach is to explore the intrinsic high-dimensional structure of the 4D light-field image in a data-driven manner and thus learn the non-linear mapping between the low-spatial-resolution light-field image and the high-spatial-resolution light-field image. For example, Yeung et al iteratively utilize spatial and angular information of the 4D light-field image using a space-angle separable convolution. Wang et al developed a space-angle interaction network to fuse the spatial and angular information of 4D light-field images. Jin et al propose a novel fusion mechanism to exploit the compensation information between SAIs and recover the parallax detail of 4D light-field images through a two-stage network. Although the above method achieves better performance at a low reconstruction scale (e.g., 2 x), it still fails to efficiently recover sufficient texture and detail information at a large reconstruction scale (e.g., 8 x). This is because low resolution light-field images contain limited spatial and angular information, which in turn can only infer details lost by low resolution from information within the 4D light-field image. Boominathan et al propose a spatial super-resolution reconstruction method using a hybrid input light field image, which improves the spatial resolution of a 4D light field image by introducing an additional high-resolution 2D image as supplementary information, but the average fusion mechanism in the method easily blurs the reconstruction result, and processing each SAI independently destroys the parallax structure of the reconstructed light field image.
In summary, although the related researches at present have achieved good spatial super-resolution reconstruction effect of light field images at low reconstruction scale, there still remains a certain disadvantage in dealing with the problem of large reconstruction scale (such as 8 ×), in particular, there is still a certain improvement space in recovering high frequency texture information of the reconstructed light field images, avoiding visual artifacts, and preserving parallax structures.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a light field image spatial super-resolution reconstruction method, which combines a light field camera and a traditional 2D camera to form a heterogeneous imaging system, wherein the light field camera provides rich angle information and limited spatial information, while the traditional 2D camera only acquires the intensity information of light to acquire enough spatial information, so that the angle information and the spatial information acquired by the two can be fully utilized to reconstruct a high-spatial-resolution light field image with high quality, recover the texture and the detail information of the reconstructed light field image, avoid ghost artifacts caused by parallax and keep a parallax structure.
The technical scheme adopted by the invention for solving the technical problems is as follows: a light field image space super-resolution reconstruction method is characterized by comprising the following steps:
step 1: selecting Num color three-channel low-spatial-resolution light field images with spatial resolution of W multiplied by H and angular resolution of V multiplied by U, corresponding Num color three-channel 2D high-resolution images with resolution of alpha W multiplied by alpha H, and corresponding Num color three-channel reference high-spatial-resolution light field images with spatial resolution of alpha W multiplied by alpha H and angular resolution of V multiplied by U; wherein Num is more than 1, alpha represents the spatial resolution improvement multiple, and the value of alpha is more than 1;
step 2: constructing a convolutional neural network as a spatial super-resolution network: the spatial super-resolution network comprises an encoder for extracting multi-scale features, an aperture level feature registration module for registering light field features and 2D high-resolution features, a shallow layer feature extraction layer for extracting shallow layer features from a low spatial resolution light field image, a light field feature enhancement module for fusing the light field features and the 2D high-resolution features, a spatial attention block for relieving registration errors in the coarse-scale features, and a decoder for reconstructing potential features into the light field image;
for the encoder, the encoder is composed of a first convolution layer, a second convolution layer, a first residual block and a second residual block which are connected in sequence, wherein the input end of the first convolution layer receives three inputs in parallel, and the three inputs are respectively one with spatial resolution of W multiplied by H and angular resolution of V multiplied by USingle channel image L of a low spatial resolution light field imageLRThe width of the image reconstruction obtained after the spatial resolution up-sampling is alphasW x V and height of alphasH × U subaperture image array, which is denoted as
Figure BDA0003372237700000031
A width of alphasW and a height of alphasThe single-channel image of the blurred 2D high-resolution image of H is described as
Figure BDA0003372237700000032
And a width of alphasW and a height of alphasSingle channel image of H2D high resolution image, denoted as IHRThe output end of the first convolution layer is directed to
Figure BDA0003372237700000033
Output 64 frames with width alphasW x V and height of alphasH × U signature graph, will be directed to
Figure BDA0003372237700000034
The set of all the output feature maps is denoted as
Figure BDA0003372237700000035
Output terminal of the first winding layer is aimed at
Figure BDA0003372237700000036
Output 64 frames with width alphasW and a height of alphasH characteristic diagram, will be directed to
Figure BDA0003372237700000037
The set of all the output feature maps is denoted as
Figure BDA0003372237700000038
Output terminal of the first convolution layer is directed to IHROutput 64 frames with width alphasW and a height of alphasH signature of H will be directed to IHRThe set of all the output feature maps is denoted asYHR,0(ii) a The input terminal of the second convolutional layer receives three inputs in parallel, respectively
Figure BDA0003372237700000039
All the characteristic diagrams in (A),
Figure BDA00033722377000000310
All feature maps and Y in (1)HR,0All feature maps in (1), the output of the second convolutional layer being directed to
Figure BDA00033722377000000311
Output 64 frames with width of
Figure BDA00033722377000000312
And has a height of
Figure BDA00033722377000000313
Will be directed to
Figure BDA00033722377000000314
The set of all the output feature maps is denoted as
Figure BDA00033722377000000315
Output terminal of the second convolution layer is aimed at
Figure BDA00033722377000000316
Output 64 frames with width of
Figure BDA0003372237700000041
And has a height of
Figure BDA0003372237700000042
Will be directed to
Figure BDA0003372237700000043
The set of all the output feature maps is denoted as
Figure BDA0003372237700000044
Feeding of the second convolution layerOutput end is directed to YHR,0Output 64 frames with width of
Figure BDA0003372237700000045
And has a height of
Figure BDA0003372237700000046
Will be directed to YHRAnd the set of all the characteristic diagrams output by 0 is marked as YHR,1(ii) a The input terminal of the first residual block receives three inputs in parallel, respectively
Figure BDA0003372237700000047
All the characteristic diagrams in (A),
Figure BDA0003372237700000048
All feature maps and Y in (1)HR,1The output of the first residual block is directed to
Figure BDA0003372237700000049
Output 64 frames with width of
Figure BDA00033722377000000410
And has a height of
Figure BDA00033722377000000411
Will be directed to
Figure BDA00033722377000000412
The set of all the output feature maps is denoted as
Figure BDA00033722377000000413
Output of the first residual block is directed to
Figure BDA00033722377000000414
Output 64 frames with width of
Figure BDA00033722377000000415
And has a height of
Figure BDA00033722377000000416
Will be directed to
Figure BDA00033722377000000417
The set of all the output feature maps is denoted as
Figure BDA00033722377000000418
Output of the first residual block for YHR,1Output 64 frames with width of
Figure BDA00033722377000000419
And has a height of
Figure BDA00033722377000000420
Will be directed to YHR,1The set of all the output feature maps is denoted as YHR,2(ii) a The input terminal of the second residual block receives three inputs in parallel, respectively
Figure BDA00033722377000000421
All the characteristic diagrams in (A),
Figure BDA00033722377000000422
All feature maps and Y in (1)HR,2Of the second residual block, the output of the second residual block being directed to
Figure BDA00033722377000000423
Output 64 frames with width of
Figure BDA00033722377000000424
And has a height of
Figure BDA00033722377000000425
Will be directed to
Figure BDA00033722377000000426
The set of all the output feature maps is denoted as
Figure BDA00033722377000000427
Output pair of second residual block
Figure BDA00033722377000000428
Output 64 frames with width of
Figure BDA00033722377000000429
And has a height of
Figure BDA00033722377000000430
Will be directed to
Figure BDA00033722377000000431
The set of all the output feature maps is denoted as
Figure BDA00033722377000000432
Output of the second residual block for YHR,2Output 64 frames with width of
Figure BDA00033722377000000433
And has a height of
Figure BDA00033722377000000434
Will be directed to YHR,2The set of all the output feature maps is denoted as YHR,3(ii) a Wherein,
Figure BDA00033722377000000435
is a single-channel image L of a low spatial resolution light-field image with spatial resolution W × H and angular resolution V × ULRThe width of the image recombination obtained after the bicubic interpolation up-sampling is alphasW x V and height of alphasAn array of H U sub-aperture images,
Figure BDA00033722377000000436
to pass through the pair IHRFirstly carrying out bicubic interpolation downsampling and then carrying out bicubic interpolation upsampling to obtain alphasRepresenting a spatial resolution sampling factor, alphas 3Upsampling factor and bicubic interpolation for bicubic interpolation upsamplingThe down-sampling factors of the down-sampling of the values are all taken as alphasThe size of the convolution kernel of the first convolution layer is 3 × 3, the convolution step is 1, the number of input channels is 1, the number of output channels is 64, the size of the convolution kernel of the second convolution layer is 3 × 3, the convolution step is 2, the number of input channels is 64, the number of output channels is 64, and the activation functions adopted by the first convolution layer and the second convolution layer are both 'ReLU';
for the aperture level feature registration module, the input end of the aperture level feature registration module receives three types of feature maps, wherein the first type is
Figure BDA0003372237700000051
All characteristic diagrams in (1), the second class is
Figure BDA0003372237700000052
The third class includes four inputs, respectively YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps in (1), YHR,3All feature maps in (1); in the aperture level feature registration module, first, the image data is processed
Figure BDA0003372237700000053
All feature maps in (1), YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3All feature maps in (1) are each replicated by a factor of V × U, so that
Figure BDA0003372237700000054
All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3Becomes the width of all the feature maps in
Figure BDA0003372237700000055
And the height becomes
Figure BDA0003372237700000056
I.e. to obtain the dimensions and
Figure BDA0003372237700000057
and matching the size of the feature map in (1) with YHR,0Becomes asW x V and height becomes alphasH × U, i.e. to size and
Figure BDA0003372237700000058
the dimensions of the feature maps in (1) match; then to
Figure BDA0003372237700000059
All characteristic figures in (1) and
Figure BDA00033722377000000510
all the characteristic diagrams in the method are subjected to block matching, and a width of the characteristic diagram is obtained after the block matching is finished
Figure BDA00033722377000000511
And has a height of
Figure BDA00033722377000000512
Is marked as PCI(ii) a Then according to PCIIs a reaction of YHR,1All the characteristic diagrams in (1) and
Figure BDA00033722377000000513
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure BDA00033722377000000514
And has a height of
Figure BDA00033722377000000515
The obtained set of all the registration feature maps is denoted as FAlign,1(ii) a Also according to PCIIs a reaction of YHR,2All the characteristic diagrams in (1) and
Figure BDA00033722377000000516
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure BDA00033722377000000517
And has a height of
Figure BDA00033722377000000518
The obtained set of all the registration feature maps is denoted as FAlign,2(ii) a According to PCIIs a reaction of YHR,3All the characteristic diagrams in (1) and
Figure BDA00033722377000000519
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure BDA00033722377000000520
And has a height of
Figure BDA00033722377000000521
The obtained set of all the registration feature maps is denoted as FAlign,3(ii) a For P againCIPerforming bicubic interpolation up-sampling to obtain a frame with width alphasW x V and height of alphasH × U coordinate index diagram, noted
Figure BDA00033722377000000522
Finally according to
Figure BDA00033722377000000523
Will YHR,0All the characteristic diagrams in (1) and
Figure BDA00033722377000000524
all feature maps in the image are registered in space position to obtain 64 pieces of width alphasW x V and height of alphasH × U registration feature map, and F represents a set of all the obtained registration feature mapsAlign,0(ii) a Output F of aperture level feature registration moduleAlign,0All characteristic diagrams in (1), FAlign,1All characteristic diagrams in (1), FAlign,2All feature maps and F in (1)Align,3All feature maps in (1); wherein, the precision scale for block matchingThe quantity index is a texture and structure similarity index, the size of a block for block matching is 3 x 3, and the upsampling factor of bicubic interpolation upsampling is alphas
For the shallow feature extraction layer, it is composed of 1 fifth convolution layer, the input end of which receives a single-channel image L of a low spatial resolution light field image with spatial resolution WxH and angular resolution VxULRThe output end of the fifth convolution layer outputs 64 characteristic diagrams with the width of W multiplied by V and the height of H multiplied by U, and the set formed by all the output characteristic diagrams is denoted as FLR(ii) a The convolution kernel of the fifth convolution layer has a size of 3 × 3, a convolution step size of 1, a number of input channels of 1, a number of output channels of 64, and the activation function adopted by the fifth convolution layer is "ReLU";
for the light field characteristic enhancement module, the light field characteristic enhancement module consists of a first enhancement residual block, a second enhancement residual block and a third enhancement residual block which are connected in sequence, wherein the input end of the first enhancement residual block receives FAlign,1All feature maps and F in (1)LROf 64 width at the output of the first enhancement residual block
Figure BDA0003372237700000061
And has a height of
Figure BDA0003372237700000062
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,1(ii) a The input of the second enhanced residual block receives FAlign,2All feature maps and F in (1)En,1Of 64 widths at the output of the second enhanced residual block
Figure BDA0003372237700000063
And has a height of
Figure BDA0003372237700000064
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,2(ii) a Input terminal of third enhanced residual blockReceiving FAlign,3All feature maps and F in (1)En,2Of 64 width at the output of the third enhanced residual block
Figure BDA0003372237700000065
And has a height of
Figure BDA0003372237700000066
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,3
For a spatial attention block, which consists of a sixth convolutional layer and a seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives FAlign,0The output end of the sixth convolutional layer outputs 64 characteristic graphs with the width of alphasW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA1(ii) a Input terminal of seventh convolution layer receiving FSA1The output end of the seventh convolutional layer outputs 64 width alpha of all spatial attention feature maps insW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA2(ii) a F is to beAlign,0All feature maps in (1) and (F)SA2Multiplying all the spatial attention feature maps element by element, and recording the set formed by all the obtained feature maps as FWA,0(ii) a F is to beWA,0As all feature maps output by the output end of the spatial attention block; the sizes of convolution kernels of the sixth convolution layer and the seventh convolution layer are both 3 multiplied by 3, convolution step lengths are both 1, the number of input channels is 64, the number of output channels is 64, the activation function adopted by the sixth convolution layer is 'ReLU', and the activation function adopted by the seventh convolution layer is 'Sigmoid';
for the decoder, the decoder is composed of a third residual block, a fourth residual block, a sub-pixel convolution layer, an eighth convolution layer and a ninth convolution layer which are connected in sequence, wherein the input end of the third residual block receives FEn,3Of 64 widths at the output of the third residual block
Figure BDA0003372237700000071
And has a height of
Figure BDA0003372237700000072
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,1(ii) a The input of the fourth residual block receives FDec,1Of 64 width at the output of the fourth residual block
Figure BDA0003372237700000073
And has a height of
Figure BDA0003372237700000074
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,2(ii) a Input terminal of sub-pixel convolution layer receiving FDec,2The output end of the sub-pixel convolution layer outputs 256 widths of all the characteristic maps in
Figure BDA0003372237700000075
And has a height of
Figure BDA0003372237700000076
And 256 widths are set as
Figure BDA0003372237700000077
And has a height of
Figure BDA0003372237700000078
Further converting the feature map into 64 pieces with the width alphasW x V and height of alphasH × U feature graph, and F represents a set of all converted feature graphsDec,3(ii) a Input terminal of eighth convolution layer receiving FDec,3All feature maps in (1) and (F)WA,0The result of element-by-element addition of all the feature maps in (1), the output end of the eighth convolutional layer outputs 64 width alphasW x V and height of alphasH × U feature map, and F represents a set of all output feature mapsDec,4(ii) a Input terminal of the ninth convolutional layer receives FDec,4The output end of the ninth convolutional layer outputs a characteristic diagram with a width of alphasW x V and height of alphasH multiplied by U, the single-channel light field image is reconstructed, and the width is alphasW x V and height of alphasReconstruction of H multiplied by U single-channel light field image into alpha-space resolutionsW×αsH and high spatial resolution single-channel light field image with angular resolution of V multiplied by U, which is recorded as LSR(ii) a The size of a convolution kernel of the sub-pixel convolution layer is 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 256, the size of a convolution kernel of the eighth convolution layer is 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 64, the size of a convolution kernel of the ninth convolution layer is 1 multiplied by 1, the convolution step is 1, the number of input channels is 64, the number of output channels is 1, the activation functions adopted by the sub-pixel convolution layer and the eighth convolution layer are both 'ReLU', and the ninth convolution layer does not adopt the activation function;
and step 3: performing color space conversion on each low spatial resolution light field image in the training set, the corresponding 2D high resolution image and the corresponding reference high spatial resolution light field image, namely converting the RGB color space into the YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of each low spatial resolution light field image into a sub-aperture image array with the width of W multiplied by V and the height of H multiplied by U for representation; then, a sub-aperture image array recombined with Y-channel images of all the light field images with low spatial resolution in the training set, a corresponding Y-channel image of the 2D high-resolution image and a corresponding Y-channel image of the reference light field image with high spatial resolution form the training set; and then constructing a pyramid network, and training by using a training set, wherein the concrete process is as follows:
step 3_ 1: copying the constructed spatial super-resolution network three times, cascading, sharing the weight of each spatial super-resolution network, namely, all the parameters are the same, and defining the whole network formed by the three spatial super-resolution networks as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be equal to αsThe values are the same;
step 3_ 2: carrying out two times of spatial resolution downsampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after downsampling as a label image; carrying out the same spatial resolution down-sampling twice on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image aiming at a first spatial super-resolution network in the pyramid network; then recombining the Y-channel images of all the low spatial resolution light field images in the training set to obtain a sub-aperture image array, and performing primary spatial resolution up-sampling on the Y-channel images of all the low spatial resolution light field images in the training set to obtain an image recombined sub-aperture image array, and inputting all 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network into the first spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training set.sReconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
step 3_ 3: carrying out single spatial resolution down-sampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after the down-sampling as a label image; carrying out single same spatial resolution down-sampling on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image for a second spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training setsMultiple reconstruction high spatial resolution Y-channel light field imageRecombined subaperture image array, alpha corresponding to Y-channel image of all low spatial resolution light field images in training setsInputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the reconstructed image subaperture image array, all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network into the second spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial resolution light field image in the training sets 2Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
step 3_ 4: taking a Y-channel image of each reference high-spatial-resolution light field image in the training set as a label image; taking the Y-channel image of each 2D high-resolution image in the training set as a 2D high-resolution Y-channel image for a third spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training sets 2Sub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training sets 2Inputting blurred 2D high-resolution Y-channel images obtained by performing one-time spatial resolution down-sampling and one-time spatial resolution up-sampling on a sub-aperture image array of image recombination obtained by performing one-time spatial resolution up-sampling on a multiple-reconstruction high-spatial-resolution Y-channel light field image, all 2D high-resolution Y-channel images aiming at a third spatial super-resolution network in a pyramid network and all 2D high-resolution Y-channel images aiming at the third spatial super-resolution network in the pyramid network into a sub-aperture image array of image recombination, all 2D high-resolution Y-channel images obtained by performing one-time spatial resolution down-sampling and one-time spatial resolution up-sampling on a multiple-reconstruction high-spatial-resolution Y-channel light field imageTraining in a third spatial super-resolution network in the constructed pyramid network to obtain alpha corresponding to the Y-channel image of each low spatial resolution light field image in the training sets 3Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
obtaining the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network after the training is finished, and obtaining a well-trained spatial super-resolution network model;
and 4, step 4: randomly selecting a low-spatial-resolution light field image with three color channels and a corresponding 2D high-resolution image with three color channels as test images; then, converting the low-spatial-resolution light field image of the three color channels and the corresponding 2D high-resolution image of the three color channels from an RGB color space to a YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of the light field image with low spatial resolution into a sub-aperture image array for representation; inputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the Y-channel images of the low-spatial resolution light field images, the Y-channel images of the 2D high-resolution images and the Y-channel images of the 2D high-resolution images into a spatial super-resolution network model, and testing to obtain reconstructed high-spatial resolution Y-channel light field images corresponding to the Y-channel images of the low-spatial resolution light field images; then performing bicubic interpolation up-sampling on the Cb channel image and the Cr channel image of the low-spatial-resolution light field image respectively to obtain a reconstructed high-spatial-resolution Cb channel light field image corresponding to the Cb channel image of the low-spatial-resolution light field image and a reconstructed high-spatial-resolution Cr channel light field image corresponding to the Cr channel image of the low-spatial-resolution light field image; and finally, cascading the obtained reconstructed high-spatial-resolution Y-channel light field image, the reconstructed high-spatial-resolution Cb-channel light field image and the reconstructed high-spatial-resolution Cr-channel light field image on the dimension of a color channel, and converting the cascading result into an RGB color space again to obtain the reconstructed high-spatial-resolution light field image of the color three channels corresponding to the low-spatial-resolution light field image.
In step 2, the first, second, third and fourth residual blocks have the same structure and are composed of sequentially connected third and fourth convolution layers, and the input end of the third convolution layer in the first residual block receives three inputs in parallel, namely, three inputs respectively
Figure BDA0003372237700000101
All the characteristic diagrams in (A),
Figure BDA0003372237700000102
All feature maps and Y in (1)HR,1Of the third convolutional layer in the first residual block, the output end of the third convolutional layer is directed to
Figure BDA0003372237700000103
Output 64 frames with width of
Figure BDA00033722377000001023
And has a height of
Figure BDA0003372237700000104
Will be directed to
Figure BDA0003372237700000105
The set of all the output feature maps is denoted as
Figure BDA0003372237700000106
Output terminal pair of the third convolution layer in the first residual block
Figure BDA0003372237700000107
Output 64 frames with width of
Figure BDA0003372237700000108
And has a height of
Figure BDA0003372237700000109
Will be directed to
Figure BDA00033722377000001010
The set of all the output feature maps is denoted as
Figure BDA00033722377000001011
The output of the third convolutional layer in the first residual block is for YHR,1Output 64 frames with width of
Figure BDA00033722377000001024
And has a height of
Figure BDA00033722377000001012
Will be directed to YHR,1The set of all the output feature maps is denoted as
Figure BDA00033722377000001013
The input terminal of the fourth convolutional layer in the first residual block receives three inputs in parallel, respectively
Figure BDA00033722377000001014
All the characteristic diagrams in (A),
Figure BDA00033722377000001015
All characteristic figures in (1) and
Figure BDA00033722377000001016
of the fourth convolutional layer in the first residual block, the output of the fourth convolutional layer is directed to
Figure BDA00033722377000001017
Output 64 frames with width of
Figure BDA00033722377000001018
And has a height of
Figure BDA00033722377000001019
Will be directed to
Figure BDA00033722377000001020
The set of all the output feature maps is denoted as
Figure BDA00033722377000001021
Output terminal pair of fourth convolution layer in first residual block
Figure BDA0003372237700000111
Output 64 frames with width of
Figure BDA0003372237700000112
And has a height of
Figure BDA0003372237700000113
Will be directed to
Figure BDA00033722377000001144
The set of all the output feature maps is denoted as
Figure BDA0003372237700000114
Output terminal pair of fourth convolution layer in first residual block
Figure BDA0003372237700000115
Output 64 frames with width of
Figure BDA0003372237700000116
And has a height of
Figure BDA0003372237700000117
Will be directed to
Figure BDA0003372237700000118
The set of all the output feature maps is denoted as
Figure BDA0003372237700000119
Will be provided with
Figure BDA00033722377000001110
All the characteristic diagrams in (1) and
Figure BDA00033722377000001111
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparison
Figure BDA00033722377000001112
All the output feature maps, the set formed by the feature maps is the
Figure BDA00033722377000001113
Will be provided with
Figure BDA00033722377000001114
All the characteristic diagrams in (1) and
Figure BDA00033722377000001115
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparison
Figure BDA00033722377000001116
All the output feature maps, the set formed by the feature maps is the
Figure BDA00033722377000001117
Will be provided with
Figure BDA00033722377000001118
All the characteristic diagrams in (1) and
Figure BDA00033722377000001119
all feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first residual block and aim at YHR,1All the output feature maps, and the set formed by the feature maps is YHR,2
Second residual errorThe input of the third convolutional layer in the block receives three inputs in parallel, one for each
Figure BDA00033722377000001120
All the characteristic diagrams in (A),
Figure BDA00033722377000001121
All feature maps and Y in (1)HR,2Of the third convolutional layer in the second residual block, the output end of the third convolutional layer is directed to
Figure BDA00033722377000001122
Output 64 frames with width of
Figure BDA00033722377000001123
And has a height of
Figure BDA00033722377000001124
Will be directed to
Figure BDA00033722377000001125
The set of all the output feature maps is denoted as
Figure BDA00033722377000001126
Output pair of the third convolutional layer in the second residual block
Figure BDA00033722377000001127
Output 64 frames with width of
Figure BDA00033722377000001128
And has a height of
Figure BDA00033722377000001129
Will be directed to
Figure BDA00033722377000001130
The set of all the output feature maps is denoted as
Figure BDA00033722377000001131
The output of the third convolutional layer in the second residual block is for YHR,2Output 64 frames with width of
Figure BDA00033722377000001132
And has a height of
Figure BDA00033722377000001133
Will be directed to YHR,2The set of all the output feature maps is denoted as
Figure BDA00033722377000001134
The input terminal of the fourth convolutional layer in the second residual block receives three inputs in parallel, respectively
Figure BDA00033722377000001135
All the characteristic diagrams in (A),
Figure BDA00033722377000001136
All characteristic figures in (1) and
Figure BDA00033722377000001137
of the fourth convolutional layer in the second residual block, the output of the fourth convolutional layer is directed to
Figure BDA00033722377000001138
Output 64 frames with width of
Figure BDA00033722377000001139
And has a height of
Figure BDA00033722377000001140
Will be directed to
Figure BDA00033722377000001141
The set of all the output feature maps is denoted as
Figure BDA00033722377000001142
Output terminal pair of fourth convolution layer in second residual block
Figure BDA00033722377000001143
Output 64 frames with width of
Figure BDA0003372237700000121
And has a height of
Figure BDA0003372237700000122
Will be directed to
Figure BDA0003372237700000123
The set of all the output feature maps is denoted as
Figure BDA0003372237700000124
Output terminal pair of fourth convolution layer in second residual block
Figure BDA0003372237700000125
Output 64 frames with width of
Figure BDA0003372237700000126
And has a height of
Figure BDA0003372237700000127
Will be directed to
Figure BDA0003372237700000128
The set of all the output feature maps is denoted as
Figure BDA0003372237700000129
Will be provided with
Figure BDA00033722377000001210
All the characteristic diagrams in (1) and
Figure BDA00033722377000001211
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparison
Figure BDA00033722377000001212
All the output feature maps, the set formed by the feature maps is the
Figure BDA00033722377000001213
Will be provided with
Figure BDA00033722377000001214
All the characteristic diagrams in (1) and
Figure BDA00033722377000001215
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparison
Figure BDA00033722377000001216
All the output feature maps, the set formed by the feature maps is the
Figure BDA00033722377000001217
Will YHR,2All the characteristic diagrams in (1) and
Figure BDA00033722377000001218
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block and aim at YHR,2All the output feature maps, and the set formed by the feature maps is YHR,3
The input of the third convolutional layer in the third residual block receives FEn,3Of 64 width at the output of the third convolutional layer in the third residual block
Figure BDA00033722377000001219
And has a height of
Figure BDA00033722377000001220
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001221
Third stepInput reception of the fourth convolutional layer in the residual block
Figure BDA00033722377000001222
Of 64 width at the output of the fourth convolutional layer in the third residual block
Figure BDA00033722377000001223
And has a height of
Figure BDA00033722377000001224
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001225
F is to beEn,3All the characteristic diagrams in (1) and
Figure BDA00033722377000001226
all the feature maps in the third residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third residual block, and the set formed by the feature maps is FDec,1
The input of the third convolutional layer in the fourth residual block receives FDec,1The output end of the third convolution layer in the fourth residual block outputs 64 width
Figure BDA00033722377000001227
And has a height of
Figure BDA00033722377000001228
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001229
Input reception of a fourth convolutional layer in a fourth residual block
Figure BDA00033722377000001230
All feature maps in (1), the output of the fourth convolutional layer in the fourth residual blockOut of 64 widths
Figure BDA00033722377000001231
And has a height of
Figure BDA0003372237700000131
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000132
F is to beDec,1All the characteristic diagrams in (1) and
Figure BDA0003372237700000133
all the feature maps in the first residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the fourth residual block, and the set formed by the feature maps is FDec,2
In the above, the sizes of convolution kernels of the third convolution layer and the fourth convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is 64, the number of output channels is 64, and the activation function adopted by the third convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block is "ReLU" and the activation function adopted by the fourth convolution layer is not adopted.
In the step 2, the first enhancement residual block, the second enhancement residual block and the third enhancement residual block have the same structure, and are composed of a first spatial characteristic transformation layer, a first spatial angle convolution layer, a second spatial characteristic transformation layer, a second spatial angle convolution layer and a channel attention layer which are connected in sequence, the first spatial characteristic transformation layer and the second spatial characteristic transformation layer have the same structure and are composed of a tenth convolution layer and an eleventh convolution layer which are parallel, the first spatial angle convolution layer and the second spatial angle convolution layer have the same structure and are composed of a twelfth convolution layer and a thirteenth convolution layer which are connected in sequence, and the channel attention layer is composed of a global mean value pooling layer, a fourteenth convolution layer and a fifteenth convolution layer which are connected in sequence;
first increaseThe input of the tenth convolutional layer in the first spatial feature transform layer in the strong residual block receives FAlign,1The output end of the tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure BDA0003372237700000134
And has a height of
Figure BDA0003372237700000135
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000136
An input of an eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure BDA0003372237700000137
And has a height of
Figure BDA0003372237700000138
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000139
The input of the first spatial feature transform layer in the first enhanced residual block receives FLRAll feature maps in (1), will FLRAll the characteristic diagrams in (1) and
Figure BDA00033722377000001310
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA00033722377000001311
The obtained all feature maps are used as the output of the first spatial feature transform layer in the first enhanced residual blockAll feature maps outputted from the output end are described as a set of these feature maps
Figure BDA00033722377000001312
An input of a twelfth of the first spatial angle convolutional layers in the first enhanced residual block receives
Figure BDA00033722377000001313
Of the first spatial angle convolutional layer in the first enhancement residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 widths
Figure BDA0003372237700000141
And has a height of
Figure BDA0003372237700000142
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000143
To pair
Figure BDA0003372237700000144
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the first enhancement residual block receiving
Figure BDA0003372237700000145
The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the first enhancement residual block outputs 64 widths as the result of the reorganization operation of all the feature maps in (1)
Figure BDA0003372237700000146
And has a height of
Figure BDA0003372237700000147
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000148
To pair
Figure BDA0003372237700000149
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, taking all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a first enhanced residual block, and recording a set formed by the feature maps as a set
Figure BDA00033722377000001410
The input terminal of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure BDA00033722377000001411
And has a height of
Figure BDA00033722377000001412
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001413
An input of an eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure BDA00033722377000001414
And has a height of
Figure BDA00033722377000001415
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001416
The input of the second spatial feature transform layer in the first enhanced residual block receives
Figure BDA00033722377000001417
All the characteristic diagrams in (1) will
Figure BDA00033722377000001418
All the characteristic diagrams in (1) and
Figure BDA00033722377000001419
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA00033722377000001420
The obtained feature maps are used as all feature maps output by the output end of the second spatial feature transform layer in the first enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure BDA00033722377000001421
An input of a twelfth of the second spatial angle convolutional layers in the first enhanced residual block receives
Figure BDA00033722377000001422
Of the twelfth convolutional layer of the second spatial angle convolutional layers in the first enhancement residual block outputs 64 width pictures
Figure BDA00033722377000001423
And has a height of
Figure BDA00033722377000001424
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001425
To pair
Figure BDA00033722377000001426
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the first enhancement residual block receiving
Figure BDA00033722377000001427
The output end of the thirteenth convolution layer of the second space angle convolution layer in the first enhanced residual error block outputs 64 width values
Figure BDA0003372237700000151
And has a height of
Figure BDA0003372237700000152
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000153
To pair
Figure BDA0003372237700000154
Performing recombination operation from angle dimension to space dimension on all feature maps in the first enhancement residual block, taking all feature maps obtained after the recombination operation as all feature maps output by the output end of the second spatial angle convolution layer in the first enhancement residual block, and recording a set formed by the feature maps as a set
Figure BDA0003372237700000155
The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives
Figure BDA0003372237700000156
The output end of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 feature maps with the width of
Figure BDA0003372237700000157
And has a height of
Figure BDA0003372237700000158
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,1,FGAP,1All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FGAP,1The output end of the fourteenth convolution layer in the channel attention layer in the first enhancement residual block outputs 4 width
Figure BDA0003372237700000159
And has a height of
Figure BDA00033722377000001510
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,1(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FDS,1Of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 widths of
Figure BDA00033722377000001511
And has a height of
Figure BDA00033722377000001512
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,1(ii) a F is to beUS,1All the characteristic diagrams in (1) and
Figure BDA00033722377000001513
all feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the first enhanced residual block, and a set formed by the feature maps is marked as FCA,1
F is to beCA,1All feature maps in (1) and (F)LRAll feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first enhanced residual blockAll the output feature maps, and the set formed by the feature maps is FEn,1
The input terminal of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure BDA00033722377000001514
And has a height of
Figure BDA00033722377000001515
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001516
An input of an eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure BDA0003372237700000161
And has a height of
Figure BDA0003372237700000162
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000163
Receiving F at receiving end of first spatial feature transform layer in second enhanced residual blockEn,1All feature maps in (1), will FEn,1All the characteristic diagrams in (1) and
Figure BDA0003372237700000164
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA0003372237700000165
In (1)Adding all the feature maps element by element, using the obtained all the feature maps as all the feature maps output by the output end of the first spatial feature conversion layer in the second enhanced residual block, and recording the set formed by the feature maps as a set
Figure BDA0003372237700000166
An input of a twelfth of the first spatial angle convolutional layers in the second enhanced residual block receives
Figure BDA0003372237700000167
Of the twelfth convolutional layer of the first spatial angle convolutional layer in the second enhanced residual block outputs 64 width signals
Figure BDA0003372237700000168
And has a height of
Figure BDA0003372237700000169
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001610
To pair
Figure BDA00033722377000001611
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the second enhancement residual block receiving
Figure BDA00033722377000001612
The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the second enhanced residual block outputs 64 width values
Figure BDA00033722377000001613
And has a height of
Figure BDA00033722377000001614
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001615
To pair
Figure BDA00033722377000001616
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
Figure BDA00033722377000001617
An input of a tenth convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure BDA00033722377000001618
And has a height of
Figure BDA00033722377000001619
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001620
An input of an eleventh convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure BDA00033722377000001621
And has a height of
Figure BDA00033722377000001622
Is characterized by comprising a characteristic diagram of (A),the set of all the output feature maps is expressed as
Figure BDA00033722377000001623
Receiving end of second spatial feature transform layer in second enhanced residual block
Figure BDA00033722377000001624
All the characteristic diagrams in (1) will
Figure BDA0003372237700000171
All the characteristic diagrams in (1) and
Figure BDA0003372237700000172
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA0003372237700000173
The obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure BDA0003372237700000174
An input of a twelfth of the second spatial angle convolutional layers in the second enhanced residual block receives
Figure BDA0003372237700000175
Of the twelfth convolutional layer of the second spatial angle convolutional layers in the second enhancement residual block outputs 64 width pictures
Figure BDA0003372237700000176
And has a height of
Figure BDA0003372237700000177
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000178
To pair
Figure BDA0003372237700000179
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the second enhancement residual block receiving
Figure BDA00033722377000001710
The output end of the thirteenth convolution layer of the second space angle convolution layer in the second enhanced residual block outputs 64 width values
Figure BDA00033722377000001711
And has a height of
Figure BDA00033722377000001712
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001713
To pair
Figure BDA00033722377000001714
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a second space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
Figure BDA00033722377000001715
The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives
Figure BDA00033722377000001716
The output end of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 width pictures
Figure BDA00033722377000001717
And has a height of
Figure BDA00033722377000001718
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,2,FGAP,2All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FGAP,2The output end of the fourteenth convolution layer in the channel attention layer in the second enhanced residual block outputs 4 width
Figure BDA00033722377000001719
And has a height of
Figure BDA00033722377000001720
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,2(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FDS,2Of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 widths of
Figure BDA00033722377000001721
And has a height of
Figure BDA00033722377000001722
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,2(ii) a F is to beUS,2All the characteristic diagrams in (1) and
Figure BDA0003372237700000181
the obtained all feature maps are used as all feature maps output by the output end of the channel attention layer in the second enhanced residual block, and the set formed by the feature maps is marked as FCA,2
F is to beCA,2In (1)All feature maps and FEn,1All the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the second enhancement residual block, and the set formed by the feature maps is FEn,2
An input of a tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure BDA0003372237700000182
And has a height of
Figure BDA0003372237700000183
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000184
An input of an eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure BDA0003372237700000185
And has a height of
Figure BDA0003372237700000186
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000187
Receiving F at receiving end of first spatial feature transform layer in third enhanced residual blockEn,2All feature maps in (1), will FEn,2All the characteristic diagrams in (1) and
Figure BDA0003372237700000188
all feature maps in the method are multiplied element by element, and then the multiplication result is multipliedAnd
Figure BDA0003372237700000189
all feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure BDA00033722377000001810
An input of a twelfth of the first spatial angle convolutional layers in the third enhanced residual block receives
Figure BDA00033722377000001811
Of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 width signals
Figure BDA00033722377000001812
And has a height of
Figure BDA00033722377000001813
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001814
To pair
Figure BDA00033722377000001815
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the third enhancement residual block receiving
Figure BDA00033722377000001816
The output end of the thirteenth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 widths as the result of the recombination operation of all the feature maps in (1)
Figure BDA00033722377000001817
And has a height of
Figure BDA00033722377000001818
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001819
To pair
Figure BDA00033722377000001820
All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
Figure BDA0003372237700000191
An input of a tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width
Figure BDA0003372237700000192
And has a height of
Figure BDA0003372237700000193
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000194
An input of an eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure BDA0003372237700000195
And has a height of
Figure BDA0003372237700000196
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000197
Receiving end of second spatial feature transform layer in third enhanced residual block
Figure BDA0003372237700000198
All the characteristic diagrams in (1) will
Figure BDA0003372237700000199
All the characteristic diagrams in (1) and
Figure BDA00033722377000001910
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA00033722377000001911
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure BDA00033722377000001912
An input of a twelfth of the second spatial angle convolutional layers in the third enhanced residual block receives
Figure BDA00033722377000001913
Of the twelfth convolutional layer of the second spatial angle convolutional layer in the third enhanced residual block outputs 64 width pictures
Figure BDA00033722377000001914
And has a height of
Figure BDA00033722377000001915
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001916
To pair
Figure BDA00033722377000001917
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the third enhancement residual block receiving
Figure BDA00033722377000001918
The output end of the thirteenth convolution layer of the second space angle convolution layer in the third enhanced residual block outputs 64 width values
Figure BDA00033722377000001919
And has a height of
Figure BDA00033722377000001920
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000001921
To pair
Figure BDA00033722377000001922
All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the second space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
Figure BDA00033722377000001923
The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives
Figure BDA00033722377000001924
The output end of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 width images
Figure BDA0003372237700000201
And has a height of
Figure BDA0003372237700000202
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,3,FGAP,3All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FGAP,3The output end of the fourteenth convolution layer in the channel attention layer in the third enhanced residual block outputs 4 width
Figure BDA0003372237700000203
And has a height of
Figure BDA0003372237700000204
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,3(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FDS,3Of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 widths of
Figure BDA0003372237700000205
And has a height of
Figure BDA0003372237700000206
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,3(ii) a F is to beUS,3All the characteristic diagrams in (1) and
Figure BDA0003372237700000207
all feature maps in (1) are multiplied element by element, and all obtained feature maps are used as output ends of a channel attention layer in a third enhanced residual blockAll the feature maps are output, and a set of these feature maps is denoted as FCA,3
F is to beCA,3All feature maps in (1) and (F)En,2All the feature maps in the third enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third enhancement residual block, and the set formed by the feature maps is FEn,3
In the above, the sizes of convolution kernels of the tenth convolution layer and the eleventh convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is all 64, and no activation function is adopted, the sizes of convolution kernels of the twelfth convolution layer and the thirteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is 64, the adopted activation functions are all "ReLU", the sizes of convolution kernels of the fourteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are 1 × 1, the convolution step lengths are 1, the number of input channels is 64, the number of output channels is 4, and the adopted activation function is "ReLU", the size of the convolution kernel of the fifteenth convolution layer in each of the first, second, and third enhanced residual blocks is 1 × 1, the convolution step is 1, the number of input channels is 4, the number of output channels is 64, and the employed activation function is "Sigmoid".
Compared with the prior art, the invention has the advantages that:
1) the method of the invention considers that the traditional 2D camera can collect abundant space information which can be used as compensation information for reconstructing the light field image space resolution, so that the light field image and the 2D high resolution image are simultaneously used, and on the basis, an end-to-end convolution neural network is constructed to fully utilize the information of the two to reconstruct the high space resolution light field image, recover detailed texture information and keep a parallax structure of a reconstruction result.
2) In order to establish the relation between the light field image and the 2D high-resolution image, the method constructs an aperture-level feature registration module to explore the correlation between the light field image and the 2D high-resolution image in a high-dimensional feature space, and further accurately registers the feature information of the 2D high-resolution image under the light field image; in addition, the method utilizes the constructed light field characteristic enhancement module to carry out multi-level fusion on the high-resolution characteristics obtained by registration and the shallow light field characteristics extracted from the low-spatial resolution light field image so as to effectively generate the high-spatial resolution light field characteristics, and further reconstruct the high-spatial resolution light field characteristics into the high-spatial resolution light field image.
3) In order to improve flexibility and practicability, the method adopts a pyramid network reconstruction mode, and the super-resolution results of specific scales are reconstructed at different pyramid levels so as to gradually improve the spatial resolution of the light field image and recover textures and details, so that multi-scale results (such as 2 x, 4 x and 8 x) can be reconstructed in one-time forward inference; in addition, the method adopts a weight sharing strategy under different pyramid levels so as to effectively reduce the parameter quantity of the constructed pyramid network and reduce the training burden.
Drawings
FIG. 1 is a block diagram of the overall implementation of the method of the present invention;
FIG. 2 is a schematic diagram of the structure of a convolutional neural network, namely a spatial super-resolution network, constructed by the method of the present invention;
FIG. 3a is a schematic diagram of the structure of a light field feature enhancement module in a convolutional neural network, i.e., a spatial super-resolution network, constructed by the method of the present invention;
FIG. 3b is a schematic diagram of the composition structure of the first spatial feature transform layer and the second spatial feature transform layer in the light field feature enhancement module in the convolutional neural network, i.e., the spatial super-resolution network, constructed by the method of the present invention;
FIG. 3c is a schematic diagram of the composition structure of the first spatial angle convolutional layer and the second spatial angle convolutional layer in the light field feature enhancement module in the convolutional neural network, i.e., the spatial super-resolution network, constructed by the method of the present invention;
FIG. 3d is a schematic diagram of the structure of the channel attention layer in the light field feature enhancement module in the convolutional neural network, i.e., the spatial super-resolution network, constructed by the method of the present invention;
FIG. 4 is a schematic diagram illustrating a pyramid network reconstruction method established by the method of the present invention;
FIG. 5a is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by a bicubic interpolation method, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5b is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using Haris et al, where a sub-aperture image at a central coordinate is taken for display;
FIG. 5c is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Lai et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5d is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by a method of Yeung et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5e is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Wang et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5f is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by Jin et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5g is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by Boominathan et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5h is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using the method of the present invention, where a sub-aperture image under a central coordinate is taken for display;
FIG. 5i is a label high spatial resolution light field image corresponding to a low spatial resolution light field image in a tested EPFL light field image database, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6a is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by a bicubic interpolation method, wherein a sub-aperture image under a central coordinate is taken for display;
FIG. 6b is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by Haris et al, where a sub-aperture image at a central coordinate is taken for display;
FIG. 6c is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by the method of Lai et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6d is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by a method of Yeung et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6e is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by the method of Wang et al, where a sub-aperture image in a central coordinate is taken for display;
FIG. 6f is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by Jin et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6g is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by Boominathan et al, where a sub-aperture image under a central coordinate is taken for display;
FIG. 6h is a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database using the method of the present invention, where a sub-aperture image under a central coordinate is taken for display;
fig. 6i is a label high spatial resolution light field image corresponding to a low spatial resolution light field image in a tested STFLytro light field image database.
Detailed Description
The invention is described in further detail below with reference to the accompanying examples.
With the development of immersive media and technology, users are increasingly inclined to view visual content such as interactive and immersive images/videos. However, the conventional 2D imaging method can only collect the intensity information of the light in the scene, and cannot provide the depth information of the scene. In contrast, 3D imaging techniques can acquire more scene information, however, they contain limited depth information and are typically used for stereoscopic displays. As a new imaging technology, light field imaging is receiving wide attention, and can simultaneously acquire intensity and direction information of light in a scene in a single shooting, thereby more effectively recording the real world. Meanwhile, some optical instruments and devices based on light field imaging have been developed to promote the application and development of light field technology. Limited by the size of the imaging sensor, the 4D light field images acquired with a light field camera suffer from the problem of spatial and angular resolution being mutually compromised. In brief, while providing high angular resolution, the 4D light field image inevitably suffers from low spatial resolution, which seriously affects the practical applications of the 4D light field image, such as refocusing, depth estimation, etc., for which, the present invention proposes a light field image spatial super-resolution reconstruction method,
the method comprises the steps of acquiring a 2D high-resolution image while capturing a light field image through heterogeneous imaging, and further using the captured 2D high-resolution image as supplementary information to help enhance the spatial resolution of the light field image, wherein a spatial super-resolution network is constructed and mainly comprises an encoder, an aperture level feature registration module, a light field feature enhancement module, a decoder and the like; firstly, respectively extracting multi-scale features from an up-sampled low-spatial-resolution light field image, a blurred 2D high-resolution image and the 2D high-resolution image by using an encoder; then, learning the correspondence between the 2D high-resolution features and the low-resolution light field features through an aperture-level feature registration module so as to register the 2D high-resolution features under each sub-aperture image of the light field image and form registered high-resolution light field features; then, the light field characteristic enhancement module is used for enhancing shallow light field characteristics extracted from an input light field image by utilizing the high-resolution light field characteristics obtained through registration to obtain enhanced high-resolution light field characteristics; finally, reconstructing the enhanced high-resolution light field characteristics into a high-quality high-spatial resolution light field image by using a decoder; in addition, a pyramid network reconstruction architecture is adopted to reconstruct a high spatial resolution light field image of a specific up-sampling scale at each pyramid level, and then multi-scale reconstruction results can be generated simultaneously.
The invention provides a light field image space super-resolution reconstruction method, the overall implementation flow block diagram of which is shown in figure 1, and the method comprises the following steps:
step 1: selecting Num color three-channel low-spatial-resolution light field images with spatial resolution of W multiplied by H and angular resolution of V multiplied by U, corresponding Num color three-channel 2D high-resolution images with resolution of alpha W multiplied by alpha H, and corresponding Num color three-channel reference high-spatial-resolution light field images with spatial resolution of alpha W multiplied by alpha H and angular resolution of V multiplied by U; where Num > 1, Num in this embodiment is 200, W × H in this embodiment is 75 × 50, V × U is 5 × 5, α represents a spatial resolution improvement multiple, and a is greater than 1, and in this embodiment, α is 8.
Step 2: constructing a convolutional neural network as a spatial super-resolution network: as shown in fig. 2, the spatial super-resolution network includes an encoder for extracting multi-scale features, an aperture level feature registration module for registering light field features and 2D high resolution features, a shallow feature extraction layer for extracting shallow features from a low spatial resolution light field image, a light field feature enhancement module for fusing light field features and 2D high resolution features, a spatial attention block for mitigating registration errors in coarse-scale features, and a decoder for reconstructing potential features into a light field image.
For the encoder, the encoder is composed of a first convolution layer, a second convolution layer, a first residual block and a second residual block which are connected in sequence, wherein the input end of the first convolution layer receives three inputs in parallel, and the three inputs are respectively a single-channel image L of a low-spatial-resolution light field image with spatial resolution of W × H and angular resolution of V × ULRThe width of the image reconstruction obtained after the spatial resolution up-sampling is alphasW x V and height of alphasH × U subaperture image array, which is denoted as
Figure BDA0003372237700000251
A width of alphasW and a height of alphasThe single-channel image of the blurred 2D high-resolution image of H is described as
Figure BDA0003372237700000252
And a width of alphasW and a height of alphasSingle channel image of H2D high resolution image, denoted as IHRThe output end of the first convolution layer is directed to
Figure BDA0003372237700000253
Output 64 frames with width alphasW x V and height of alphasH × U signature graph, will be directed to
Figure BDA0003372237700000254
The set of all the output feature maps is denoted as
Figure BDA0003372237700000255
Output terminal of the first winding layer is aimed at
Figure BDA0003372237700000256
Output 64 frames with width alphasW and a height of alphasH characteristic diagram, will be directed to
Figure BDA0003372237700000257
The set of all the output feature maps is denoted as
Figure BDA0003372237700000258
Output terminal of the first convolution layer is directed to IHROutput 64 frames with width alphasW and a height of alphasH signature of H will be directed to IHRThe set of all the output feature maps is denoted as YHR,0(ii) a The input terminal of the second convolutional layer receives three inputs in parallel, respectively
Figure BDA0003372237700000259
All the characteristic diagrams in (A),
Figure BDA00033722377000002510
All feature maps and Y in (1)HR,0All feature maps in (1), the output of the second convolutional layer being directed to
Figure BDA00033722377000002511
Output 64 frames with width of
Figure BDA00033722377000002512
And has a height of
Figure BDA00033722377000002513
Will be directed to
Figure BDA00033722377000002514
The set of all the output feature maps is denoted as
Figure BDA00033722377000002515
Output terminal of the second convolution layer is aimed at
Figure BDA00033722377000002516
Output 64 frames with width of
Figure BDA00033722377000002517
And has a height of
Figure BDA00033722377000002518
Will be directed to
Figure BDA00033722377000002519
The set of all the output feature maps is denoted as
Figure BDA00033722377000002520
The output end of the second convolution layer is directed to YHR,0Output 64 frames with width of
Figure BDA00033722377000002521
And has a height of
Figure BDA00033722377000002522
Will be directed to YHR,0The set of all the output feature maps is denoted as YHR,1(ii) a The input terminal of the first residual block receives three inputs in parallel, respectively
Figure BDA00033722377000002523
All the characteristic diagrams in (A),
Figure BDA00033722377000002524
All feature maps and Y in (1)HR,1The output of the first residual block is directed to
Figure BDA00033722377000002525
Output 64 frames with width of
Figure BDA00033722377000002526
And has a height of
Figure BDA00033722377000002527
Will be directed to
Figure BDA00033722377000002528
The set of all the output feature maps is denoted as
Figure BDA00033722377000002529
Output of the first residual block is directed to
Figure BDA00033722377000002530
Output 64 frames with width of
Figure BDA00033722377000002531
And has a height of
Figure BDA0003372237700000261
Will be directed to
Figure BDA0003372237700000262
The set of all the output feature maps is denoted as
Figure BDA0003372237700000263
Output of the first residual block for YHR,1Output 64 frames with width of
Figure BDA0003372237700000264
And has a height of
Figure BDA0003372237700000265
Will be directed to YHR,1The set of all the output feature maps is denoted as YHR,2(ii) a The input terminal of the second residual block receives three inputs in parallel, respectively
Figure BDA0003372237700000266
All the characteristic diagrams in (A),
Figure BDA0003372237700000267
All feature maps and Y in (1)HR,2Of the second residual block, the output of the second residual block being directed to
Figure BDA0003372237700000268
Output 64 frames with width of
Figure BDA0003372237700000269
And has a height of
Figure BDA00033722377000002610
Will be directed to
Figure BDA00033722377000002611
The set of all the output feature maps is denoted as
Figure BDA00033722377000002612
Output pair of second residual block
Figure BDA00033722377000002613
Output 64 frames with width of
Figure BDA00033722377000002614
And has a height of
Figure BDA00033722377000002615
Will be directed to
Figure BDA00033722377000002616
The set of all the output feature maps is denoted as
Figure BDA00033722377000002617
Output of the second residual block for YHR,2Output 64 frames with width of
Figure BDA00033722377000002618
And has a height of
Figure BDA00033722377000002619
Will be directed to YHR,2The set of all the output feature maps is denoted as YHR,3(ii) a Wherein,
Figure BDA00033722377000002620
to pass throughSingle-channel image L of low spatial resolution light field image with spatial resolution W × H and angular resolution V × ULRThe width of image recombination obtained after the existing bicubic interpolation up-sampling is alphasW x V and height of alphasAn array of H U sub-aperture images,
Figure BDA00033722377000002621
to pass through the pair IHRFirstly carrying out bicubic interpolation downsampling and then carrying out bicubic interpolation upsampling to obtain alphasRepresenting a spatial resolution sampling factor, in this embodiment alphasValue of 2, alphas 3Alpha, the up-sampling factor of the up-sampling of the bicubic interpolation and the down-sampling factor of the down-sampling of the bicubic interpolation both take the value of alphasThe convolution kernel of the first convolution layer has a size of 3 × 3, a convolution step of 1, a number of input channels of 1 and a number of output channels of 64, the convolution kernel of the second convolution layer has a size of 3 × 3, a convolution step of 2, a number of input channels of 64 and a number of output channels of 64, and both the first convolution layer and the second convolution layer have an "ReLU" activation function.
For the aperture level feature registration module, the input end of the aperture level feature registration module receives three types of feature maps, wherein the first type is
Figure BDA00033722377000002622
All characteristic diagrams in (1), the second class is
Figure BDA00033722377000002623
The third class includes four inputs, respectively YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps in (1), YHR,3All feature maps in (1); in the aperture level feature registration module, first, the image data is processed
Figure BDA00033722377000002624
All feature maps in (1), YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3All characteristic maps in (1) are respectively repeatedMaking V times U times so that
Figure BDA00033722377000002625
All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3Becomes the width of all the feature maps in
Figure BDA00033722377000002626
And the height becomes
Figure BDA0003372237700000271
I.e. to obtain the dimensions and
Figure BDA0003372237700000272
and matching the size of the feature map in (1) with YHR,0Becomes asW x V and height becomes alphasH × U, i.e. to size and
Figure BDA0003372237700000273
the dimensions of the feature maps in (1) match; then to
Figure BDA0003372237700000274
All characteristic figures in (1) and
Figure BDA0003372237700000275
all the characteristic graphs in the method are subjected to the existing block matching, and a width of the characteristic graph is obtained after the block matching is finished
Figure BDA0003372237700000276
And has a height of
Figure BDA0003372237700000277
Is marked as PCI(ii) a Then according to PCIIs a reaction of YHR,1All the characteristic diagrams in (1) and
Figure BDA0003372237700000278
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure BDA0003372237700000279
And has a height of
Figure BDA00033722377000002710
The obtained set of all the registration feature maps is denoted as FAlign,1(ii) a Also according to PCIIs a reaction of YHR,2All the characteristic diagrams in (1) and
Figure BDA00033722377000002711
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure BDA00033722377000002712
And has a height of
Figure BDA00033722377000002713
The obtained set of all the registration feature maps is denoted as FAlign,2(ii) a According to PCIIs a reaction of YHR,3All the characteristic diagrams in (1) and
Figure BDA00033722377000002714
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure BDA00033722377000002715
And has a height of
Figure BDA00033722377000002716
The obtained set of all the registration feature maps is denoted as FAlign,3(ii) a For P againCIPerforming bicubic interpolation up-sampling to obtain a frame with width alphasW x V and height of alphasH × U coordinate index diagram, noted
Figure BDA00033722377000002717
Finally according to
Figure BDA00033722377000002718
Will YHR,0All the characteristic diagrams in (1) and
Figure BDA00033722377000002719
all feature maps in the image are registered in space position to obtain 64 pieces of width alphasW x V and height of alphasH × U registration feature map, and F represents a set of all the obtained registration feature mapsAlign,0(ii) a Output F of aperture level feature registration moduleAlign,0All characteristic diagrams in (1), FAlign,1All characteristic diagrams in (1), FAlign,2All feature maps and F in (1)Align,3All feature maps in (1); wherein, the precision measurement index for block matching is a texture and structure similarity index, the size of the block for block matching is 3 multiplied by 3, and the up-sampling factor of the bicubic interpolation up-sampling is alphas(ii) a This is because the high-level features more closely describe the similarity of images at the semantic level, while suppressing irrelevant textures, and so on
Figure BDA00033722377000002720
All characteristic figures in (1) and
Figure BDA00033722377000002721
all feature maps in the image are subjected to block matching to obtain a coordinate index map PCIReflect and make a stand of
Figure BDA00033722377000002722
Characteristic diagram of (1) and
Figure BDA00033722377000002723
in the above-mentioned method, the convolution operation does not change the spatial position information of the feature map, PCIAlso reflects
Figure BDA00033722377000002724
Characteristic diagram of (1) and
Figure BDA00033722377000002725
the spatial position registration relationship between the feature maps in (1), an
Figure BDA00033722377000002726
Characteristic diagram of (1) and
Figure BDA00033722377000002727
the spatial position registration relation between the characteristic graphs in (1) is obtained after up-sampling by bicubic interpolation
Figure BDA0003372237700000281
Reflect and make a stand of
Figure BDA0003372237700000282
Characteristic diagram of (1) and
Figure BDA0003372237700000283
the spatial position registration relationship between the feature maps in (1).
For the shallow feature extraction layer, it is composed of 1 fifth convolution layer, the input end of which receives a single-channel image L of a low spatial resolution light field image with spatial resolution WxH and angular resolution VxULRThe output end of the fifth convolution layer outputs 64 characteristic diagrams with the width of W multiplied by V and the height of H multiplied by U, and the set formed by all the output characteristic diagrams is denoted as FLR(ii) a The convolution kernel of the fifth convolution layer has a size of 3 × 3, a convolution step size of 1, a number of input channels of 1, a number of output channels of 64, and the activation function adopted by the fifth convolution layer is "ReLU".
For the light field feature enhancement module, as shown in fig. 3a, it is composed of a first enhancement residual block, a second enhancement residual block and a third enhancement residual block connected in sequence, where the input end of the first enhancement residual block receives FAlign,1All feature maps and F in (1)LRAll feature maps in (1), at αsW × V is equivalent to 2
Figure BDA0003372237700000284
H × U is equivalent to
Figure BDA0003372237700000285
I.e. FLRThe size and F of the feature map in (1)Align,1The feature maps in (1) have the same size, and the output end of the first enhanced residual block outputs 64 frames with the width of
Figure BDA0003372237700000286
And has a height of
Figure BDA0003372237700000287
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,1(ii) a The input of the second enhanced residual block receives FAlign,2All feature maps and F in (1)En,1Of 64 widths at the output of the second enhanced residual block
Figure BDA0003372237700000288
And has a height of
Figure BDA0003372237700000289
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,2(ii) a The input of the third enhanced residual block receives FAlign,3All feature maps and F in (1)En,2Of 64 width at the output of the third enhanced residual block
Figure BDA00033722377000002810
And has a height of
Figure BDA00033722377000002811
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,3
For a spatial attention block, which consists of a sixth convolutional layer and a seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives FAlign,0The output end of the sixth convolutional layer outputs 64 characteristic graphs with the width of alphasW x V and height of alphasH × U space attention feature map, all nulls to be outputThe set of inter-attention feature maps is denoted as FSA1(ii) a Input terminal of seventh convolution layer receiving FSA1The output end of the seventh convolutional layer outputs 64 width alpha of all spatial attention feature maps insW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA2(ii) a F is to beAlign,0All feature maps in (1) and (F)SA2Multiplying all the spatial attention feature maps element by element, and recording the set formed by all the obtained feature maps as FWA,0(ii) a F is to beWA,0As all feature maps output by the output end of the spatial attention block; the sizes of convolution kernels of the sixth convolution layer and the seventh convolution layer are both 3 × 3, convolution step lengths are both 1, the number of input channels is 64, the number of output channels is 64, the activation function adopted by the sixth convolution layer is 'ReLU', and the activation function adopted by the seventh convolution layer is 'Sigmoid'.
For the decoder, the decoder is composed of a third residual block, a fourth residual block, a sub-pixel convolution layer, an eighth convolution layer and a ninth convolution layer which are connected in sequence, wherein the input end of the third residual block receives FEn,3Of 64 widths at the output of the third residual block
Figure BDA0003372237700000291
And has a height of
Figure BDA0003372237700000292
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,1(ii) a The input of the fourth residual block receives FDec,1Of 64 width at the output of the fourth residual block
Figure BDA0003372237700000293
And has a height of
Figure BDA0003372237700000294
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,2(ii) a Sub-pixel volumeInput terminal of the stack receives FDec,2The output end of the sub-pixel convolution layer outputs 256 widths of all the characteristic maps in
Figure BDA0003372237700000295
And has a height of
Figure BDA0003372237700000296
And 256 widths are set as
Figure BDA0003372237700000297
And has a height of
Figure BDA0003372237700000298
Further converting the feature map into 64 pieces with the width alphasW x V and height of alphasH × U feature graph, and F represents a set of all converted feature graphsDec,3(ii) a Input terminal of eighth convolution layer receiving FDec,3All feature maps in (1) and (F)WA,0The result of element-by-element addition of all the feature maps in (1), the output end of the eighth convolutional layer outputs 64 width alphasW x V and height of alphasH × U feature map, and F represents a set of all output feature mapsDec,4(ii) a Input terminal of the ninth convolutional layer receives FDec,4The output end of the ninth convolutional layer outputs a characteristic diagram with a width of alphasW x V and height of alphasH multiplied by U, the single-channel light field image is reconstructed, and the width is alphasW x V and height of alphasReconstruction of H multiplied by U single-channel light field image into alpha-space resolutionsW×αsH and high spatial resolution single-channel light field image with angular resolution of V multiplied by U, which is recorded as LSR(ii) a The convolution kernel of the sub-pixel convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 256, the convolution kernel of the eighth convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 64, the convolution kernel of the ninth convolution layer has the size of 1 multiplied by 1, the convolution step is 1, the number of input channels is 64, the number of output channels is 1, and the sub-pixel convolution layer and the eighth convolution layer adoptThe activation functions used are all "ReLU" and the ninth convolution layer does not use an activation function.
And step 3: performing color space conversion on each low spatial resolution light field image in the training set, the corresponding 2D high resolution image and the corresponding reference high spatial resolution light field image, namely converting the RGB color space into the YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of each low spatial resolution light field image into a sub-aperture image array with the width of W multiplied by V and the height of H multiplied by U for representation; then, a sub-aperture image array recombined with Y-channel images of all the light field images with low spatial resolution in the training set, a corresponding Y-channel image of the 2D high-resolution image and a corresponding Y-channel image of the reference light field image with high spatial resolution form the training set; and then constructing a pyramid network, and training by using a training set, wherein the concrete process is as follows:
step 3_ 1: as shown in fig. 4, the constructed spatial super-resolution networks are copied three times and cascaded, the weight of each spatial super-resolution network is shared, that is, the parameters are all the same, and the overall network formed by the three spatial super-resolution networks is defined as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be equal to αsValues are the same, αsWhen the value is 2, the spatial resolution of the light field image is improved by 2 times, so that the final reconstruction scale can reach 8, namely, alpha is alphas 3=8。
Step 3_ 2: carrying out two times of spatial resolution downsampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after downsampling as a label image; carrying out the same spatial resolution down-sampling twice on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image aiming at a first spatial super-resolution network in the pyramid network; then recombining the sub-aperture image array of the Y-channel images of all the low spatial resolution light field images in the training set and recombining the images obtained by sampling the Y-channel images of all the low spatial resolution light field images in the training set at the spatial resolution for one timeThe sub-aperture image array, all 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network are input into the first spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training setsReconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same.
Step 3_ 3: carrying out single spatial resolution down-sampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after the down-sampling as a label image; carrying out single same spatial resolution down-sampling on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image for a second spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training setsSub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training setsInputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the reconstructed image sub-aperture image array, all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network into the second spatial super-resolution network in the constructed pyramid network for training to obtain each image in the training setAlpha corresponding to Y-channel image of low spatial resolution light field images 2Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same.
Step 3_ 4: taking a Y-channel image of each reference high-spatial-resolution light field image in the training set as a label image; taking the Y-channel image of each 2D high-resolution image in the training set as a 2D high-resolution Y-channel image for a third spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training sets 2Sub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training sets 2Inputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the multiple reconstructed high-spatial-resolution Y-channel light field images to a third spatial super-resolution network in the pyramid network, and all 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the 2D high-resolution Y-channel images to the third spatial super-resolution network in the pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training sets 3Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same.
Obtaining the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network after the training is finished, and obtaining a well-trained spatial super-resolution network model; the network model is implemented at each pyramid levelA specific super-resolution reconstruction scale, so that the multi-scale super-resolution result can be output in one forward inference (namely, alpha)sScale 2 x, 4 x and 8 x when taking value of 2); in addition, by carrying out weight sharing on the spatial super-resolution network under each pyramid level, the network parameter quantity can be effectively reduced, and the training burden is reduced.
And 4, step 4: randomly selecting a low-spatial-resolution light field image with three color channels and a corresponding 2D high-resolution image with three color channels as test images; then, converting the low-spatial-resolution light field image of the three color channels and the corresponding 2D high-resolution image of the three color channels from an RGB color space to a YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of the light field image with low spatial resolution into a sub-aperture image array for representation; inputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the Y-channel images of the low-spatial resolution light field images, the Y-channel images of the 2D high-resolution images and the Y-channel images of the 2D high-resolution images into a spatial super-resolution network model, and testing to obtain reconstructed high-spatial resolution Y-channel light field images corresponding to the Y-channel images of the low-spatial resolution light field images; then performing bicubic interpolation up-sampling on the Cb channel image and the Cr channel image of the low-spatial-resolution light field image respectively to obtain a reconstructed high-spatial-resolution Cb channel light field image corresponding to the Cb channel image of the low-spatial-resolution light field image and a reconstructed high-spatial-resolution Cr channel light field image corresponding to the Cr channel image of the low-spatial-resolution light field image; and finally, cascading the obtained reconstructed high-spatial-resolution Y-channel light field image, the reconstructed high-spatial-resolution Cb-channel light field image and the reconstructed high-spatial-resolution Cr-channel light field image on the dimension of a color channel, and converting the cascading result into an RGB color space again to obtain the reconstructed high-spatial-resolution light field image of the color three channels corresponding to the low-spatial-resolution light field image.
In this embodiment, in step 2, the first, second, third and fourth residual blocks have the same structure, and each of them is composed of a third convolutional layer and a fourth convolutional layer connected in sequence, and the input end of the third convolutional layer in the first residual block receives three inputs in parallel, namely, three inputs respectively
Figure BDA0003372237700000321
All the characteristic diagrams in (A),
Figure BDA0003372237700000322
All feature maps and Y in (1)HR,1Of the third convolutional layer in the first residual block, the output end of the third convolutional layer is directed to
Figure BDA0003372237700000323
Output 64 frames with width of
Figure BDA0003372237700000324
And has a height of
Figure BDA0003372237700000325
Will be directed to
Figure BDA0003372237700000326
The set of all the output feature maps is denoted as
Figure BDA0003372237700000327
Output terminal pair of the third convolution layer in the first residual block
Figure BDA0003372237700000328
Output 64 frames with width of
Figure BDA0003372237700000329
And has a height of
Figure BDA0003372237700000331
Will be directed to
Figure BDA0003372237700000332
The set of all the output feature maps is denoted as
Figure BDA0003372237700000333
The output of the third convolutional layer in the first residual block is for YHR,1Output 64 frames with width of
Figure BDA0003372237700000334
And has a height of
Figure BDA0003372237700000335
Will be directed to YHR,1The set of all the output feature maps is denoted as
Figure BDA0003372237700000336
The input terminal of the fourth convolutional layer in the first residual block receives three inputs in parallel, respectively
Figure BDA0003372237700000337
All the characteristic diagrams in (A),
Figure BDA0003372237700000338
All characteristic figures in (1) and
Figure BDA0003372237700000339
of the fourth convolutional layer in the first residual block, the output of the fourth convolutional layer is directed to
Figure BDA00033722377000003310
Output 64 frames with width of
Figure BDA00033722377000003311
And has a height of
Figure BDA00033722377000003312
Will be directed to
Figure BDA00033722377000003313
The set of all the output feature maps is denoted as
Figure BDA00033722377000003314
Output terminal pair of fourth convolution layer in first residual block
Figure BDA00033722377000003315
Output 64 frames with width of
Figure BDA00033722377000003316
And has a height of
Figure BDA00033722377000003317
Will be directed to
Figure BDA00033722377000003318
The set of all the output feature maps is denoted as
Figure BDA00033722377000003319
Output terminal pair of fourth convolution layer in first residual block
Figure BDA00033722377000003320
Output 64 frames with width of
Figure BDA00033722377000003321
And has a height of
Figure BDA00033722377000003322
Will be directed to
Figure BDA00033722377000003323
The set of all the output feature maps is denoted as
Figure BDA00033722377000003324
Will be provided with
Figure BDA00033722377000003325
All the characteristic diagrams in (1) and
Figure BDA00033722377000003326
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparison
Figure BDA00033722377000003327
All the output feature maps, the set formed by the feature maps is the
Figure BDA00033722377000003328
Will be provided with
Figure BDA00033722377000003329
All the characteristic diagrams in (1) and
Figure BDA00033722377000003330
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparison
Figure BDA00033722377000003331
All the output feature maps, the set formed by the feature maps is the
Figure BDA00033722377000003332
Will YHR,1All the characteristic diagrams in (1) and
Figure BDA00033722377000003333
all feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first residual block and aim at YHR,1All the output feature maps, and the set formed by the feature maps is YHR,2
The input of the third convolutional layer in the second residual block receives three inputs in parallel, respectively
Figure BDA00033722377000003334
All the characteristic diagrams in (A),
Figure BDA00033722377000003335
All feature maps and Y in (1)HR,2All feature maps in (1), in the second residual blockOutput terminal of the third convolution layer is aimed at
Figure BDA00033722377000003336
Output 64 frames with width of
Figure BDA00033722377000003337
And has a height of
Figure BDA00033722377000003338
Will be directed to
Figure BDA00033722377000003339
The set of all the output feature maps is denoted as
Figure BDA00033722377000003340
Output pair of the third convolutional layer in the second residual block
Figure BDA00033722377000003341
Output 64 frames with width of
Figure BDA00033722377000003342
And has a height of
Figure BDA00033722377000003343
Will be directed to
Figure BDA00033722377000003344
The set of all the output feature maps is denoted as
Figure BDA0003372237700000341
The output of the third convolutional layer in the second residual block is for YHR,2Output 64 frames with width of
Figure BDA0003372237700000342
And has a height of
Figure BDA0003372237700000343
Will be directed to YHR,2All characteristic maps of the outputThe set of constructs is denoted as
Figure BDA0003372237700000344
The input terminal of the fourth convolutional layer in the second residual block receives three inputs in parallel, respectively
Figure BDA0003372237700000345
All the characteristic diagrams in (A),
Figure BDA0003372237700000346
All characteristic figures in (1) and
Figure BDA0003372237700000347
of the fourth convolutional layer in the second residual block, the output of the fourth convolutional layer is directed to
Figure BDA0003372237700000348
Output 64 frames with width of
Figure BDA0003372237700000349
And has a height of
Figure BDA00033722377000003410
Will be directed to
Figure BDA00033722377000003411
The set of all the output feature maps is denoted as
Figure BDA00033722377000003412
Output terminal pair of fourth convolution layer in second residual block
Figure BDA00033722377000003413
Output 64 frames with width of
Figure BDA00033722377000003414
And has a height of
Figure BDA00033722377000003415
Will be directed to
Figure BDA00033722377000003416
The set of all the output feature maps is denoted as
Figure BDA00033722377000003417
Output terminal pair of fourth convolution layer in second residual block
Figure BDA00033722377000003418
Output 64 frames with width of
Figure BDA00033722377000003419
And has a height of
Figure BDA00033722377000003420
Will be directed to
Figure BDA00033722377000003421
The set of all the output feature maps is denoted as
Figure BDA00033722377000003422
Will be provided with
Figure BDA00033722377000003423
All the characteristic diagrams in (1) and
Figure BDA00033722377000003424
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparison
Figure BDA00033722377000003425
All the output feature maps, the set formed by the feature maps is the
Figure BDA00033722377000003426
Will be provided with
Figure BDA00033722377000003427
All the characteristic diagrams in (1) and
Figure BDA00033722377000003428
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparison
Figure BDA00033722377000003429
All the output feature maps, the set formed by the feature maps is the
Figure BDA00033722377000003430
Will YHR,2All the characteristic diagrams in (1) and
Figure BDA00033722377000003431
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block and aim at YHR,2All the output feature maps, and the set formed by the feature maps is YHR,3
The input of the third convolutional layer in the third residual block receives FEn,3Of 64 width at the output of the third convolutional layer in the third residual block
Figure BDA00033722377000003432
And has a height of
Figure BDA00033722377000003433
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003434
Input reception of a fourth convolutional layer in a third residual block
Figure BDA00033722377000003435
Of 64 width at the output of the fourth convolutional layer in the third residual block
Figure BDA00033722377000003436
And has a height of
Figure BDA00033722377000003437
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003438
F is to beEn,3All the characteristic diagrams in (1) and
Figure BDA0003372237700000351
all the feature maps in the third residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third residual block, and the set formed by the feature maps is FDec,1
The input of the third convolutional layer in the fourth residual block receives FDec,1The output end of the third convolution layer in the fourth residual block outputs 64 width
Figure BDA0003372237700000352
And has a height of
Figure BDA0003372237700000353
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000354
Input reception of a fourth convolutional layer in a fourth residual block
Figure BDA0003372237700000355
Of 64 width at the output of the fourth convolutional layer in the fourth residual block
Figure BDA0003372237700000356
And has a height of
Figure BDA0003372237700000357
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000358
F is to beDec,1All the characteristic diagrams in (1) and
Figure BDA0003372237700000359
all the feature maps in the first residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the fourth residual block, and the set formed by the feature maps is FDec,2
In the above, the sizes of convolution kernels of the third convolution layer and the fourth convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is 64, the number of output channels is 64, and the activation function adopted by the third convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block is "ReLU" and the activation function adopted by the fourth convolution layer is not adopted.
In this embodiment, in step 2, as shown in fig. 3a, 3b, 3c and 3d, the first enhancement residual block, the second enhancement residual block and the third enhancement residual block have the same structure, and each of them is composed of a first spatial characteristic transformation layer, a first spatial angle convolution layer, a second spatial characteristic transformation layer, a second spatial angle convolution layer and a channel attention layer which are connected in sequence, the first spatial characteristic transformation layer and the second spatial characteristic transformation layer have the same structure, and each of them is composed of a tenth convolution layer and an eleventh convolution layer which are parallel, the first spatial angle convolution layer and the second spatial angle convolution layer have the same structure, and each of them is composed of a twelfth convolution layer and a thirteenth convolution layer which are connected in sequence, and the channel attention layer is composed of a global mean value pooling layer, a fourteenth convolution layer and a fifteenth convolution layer which are connected in sequence.
An input of a tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure BDA00033722377000003510
And has a height of
Figure BDA00033722377000003511
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003512
An input of an eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure BDA0003372237700000361
And has a height of
Figure BDA0003372237700000362
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000363
The input of the first spatial feature transform layer in the first enhanced residual block receives FLRAll feature maps in (1), will FLRAll the characteristic diagrams in (1) and
Figure BDA0003372237700000364
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA0003372237700000365
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the first enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure BDA0003372237700000366
An input of a twelfth of the first spatial angle convolutional layers in the first enhanced residual block receives
Figure BDA0003372237700000367
Of the first spatial angle convolutional layer in the first enhancement residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 widths
Figure BDA0003372237700000368
And has a height of
Figure BDA0003372237700000369
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003610
To pair
Figure BDA00033722377000003611
Performs a re-assembly operation from a spatial dimension to an angular dimension (the re-assembly operation is a conventional processing means of light field images, the re-assembly operation only changes the arrangement order of each feature value in the feature map, and does not change the size of the feature value), and an input end of a thirteenth convolutional layer in the first spatial angle convolutional layer in the first enhanced residual block receives
Figure BDA00033722377000003612
The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the first enhancement residual block outputs 64 widths as the result of the reorganization operation of all the feature maps in (1)
Figure BDA00033722377000003613
And has a height of
Figure BDA00033722377000003614
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003615
To pair
Figure BDA00033722377000003616
All the characteristics ofThe maps are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the first enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure BDA00033722377000003617
The input terminal of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure BDA00033722377000003618
And has a height of
Figure BDA00033722377000003619
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003620
An input of an eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure BDA00033722377000003621
And has a height of
Figure BDA00033722377000003622
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003623
The input of the second spatial feature transform layer in the first enhanced residual block receives
Figure BDA00033722377000003624
All the characteristic diagrams in (1) will
Figure BDA0003372237700000371
All the characteristic diagrams in (1) and
Figure BDA0003372237700000372
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA0003372237700000373
The obtained feature maps are used as all feature maps output by the output end of the second spatial feature transform layer in the first enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure BDA0003372237700000374
An input of a twelfth of the second spatial angle convolutional layers in the first enhanced residual block receives
Figure BDA0003372237700000375
Of the twelfth convolutional layer of the second spatial angle convolutional layers in the first enhancement residual block outputs 64 width pictures
Figure BDA0003372237700000376
And has a height of
Figure BDA0003372237700000377
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000378
To pair
Figure BDA0003372237700000379
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the first enhancement residual block receiving
Figure BDA00033722377000003710
The output end of the thirteenth convolution layer of the second space angle convolution layer in the first enhanced residual error block outputs 64 width values
Figure BDA00033722377000003711
And has a height of
Figure BDA00033722377000003712
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003713
To pair
Figure BDA00033722377000003714
Performing recombination operation from angle dimension to space dimension on all feature maps in the first enhancement residual block, taking all feature maps obtained after the recombination operation as all feature maps output by the output end of the second spatial angle convolution layer in the first enhancement residual block, and recording a set formed by the feature maps as a set
Figure BDA00033722377000003715
The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives
Figure BDA00033722377000003716
The output end of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 feature maps with the width of
Figure BDA00033722377000003717
And has a height of
Figure BDA00033722377000003718
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,1,FGAP,1In each characteristic diagram ofAll the eigenvalues of (1) are the same (the global mean pooling layer is to calculate the global mean value for each eigenvalue received at the input end independently, and then convert one eigenvalue into a single eigenvalue, and then copy the obtained eigenvalue to restore the space size, i.e. copy the single eigenvalue
Figure BDA00033722377000003719
Multiple, get a width of
Figure BDA00033722377000003720
And has a height of
Figure BDA00033722377000003721
Characteristic map of (1); the input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FGAP,1The output end of the fourteenth convolution layer in the channel attention layer in the first enhancement residual block outputs 4 width
Figure BDA00033722377000003722
And has a height of
Figure BDA00033722377000003723
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,1(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FDS,1Of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 widths of
Figure BDA0003372237700000381
And has a height of
Figure BDA0003372237700000382
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,1(ii) a F is to beUS,1All the characteristic diagrams in (1) and
Figure BDA0003372237700000383
all feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the first enhanced residual block, and a set formed by the feature maps is marked as FCA,1
F is to beCA,1All feature maps in (1) and (F)LRAll the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the first enhancement residual block, and the set formed by the feature maps is FEn,1
The input terminal of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure BDA0003372237700000384
And has a height of
Figure BDA0003372237700000385
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000386
An input of an eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure BDA0003372237700000387
And has a height of
Figure BDA0003372237700000388
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000389
First spatial feature transform layer in second enhanced residual blockReceiving end of FEn,1All feature maps in (1), will FEn,1All the characteristic diagrams in (1) and
Figure BDA00033722377000003810
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA00033722377000003811
The obtained feature maps are used as all feature maps output by the output end of the first spatial feature transform layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure BDA00033722377000003812
An input of a twelfth of the first spatial angle convolutional layers in the second enhanced residual block receives
Figure BDA00033722377000003813
Of the twelfth convolutional layer of the first spatial angle convolutional layer in the second enhanced residual block outputs 64 width signals
Figure BDA00033722377000003814
And has a height of
Figure BDA00033722377000003815
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003816
To pair
Figure BDA00033722377000003817
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the second enhancement residual block receiving
Figure BDA00033722377000003818
The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the second enhanced residual block outputs 64 width values
Figure BDA0003372237700000391
And has a height of
Figure BDA0003372237700000392
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000393
To pair
Figure BDA0003372237700000394
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
Figure BDA0003372237700000395
An input of a tenth convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure BDA0003372237700000396
And has a height of
Figure BDA0003372237700000397
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000398
An input of an eleventh convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure BDA0003372237700000399
And has a height of
Figure BDA00033722377000003910
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003911
Receiving end of second spatial feature transform layer in second enhanced residual block
Figure BDA00033722377000003912
All the characteristic diagrams in (1) will
Figure BDA00033722377000003913
All the characteristic diagrams in (1) and
Figure BDA00033722377000003914
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA00033722377000003915
The obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure BDA00033722377000003916
An input of a twelfth of the second spatial angle convolutional layers in the second enhanced residual block receives
Figure BDA00033722377000003917
Of the second spatial angle convolutional layer in the second enhancement residual block, and an output terminal of a twelfth convolutional layer of the second spatial angle convolutional layerOut of 64 widths
Figure BDA00033722377000003918
And has a height of
Figure BDA00033722377000003919
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003920
To pair
Figure BDA00033722377000003921
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the second enhancement residual block receiving
Figure BDA00033722377000003922
The output end of the thirteenth convolution layer of the second space angle convolution layer in the second enhanced residual block outputs 64 width values
Figure BDA00033722377000003923
And has a height of
Figure BDA00033722377000003924
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000003925
To pair
Figure BDA00033722377000003926
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a second space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
Figure BDA0003372237700000401
The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives
Figure BDA0003372237700000402
The output end of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 width pictures
Figure BDA0003372237700000403
And has a height of
Figure BDA0003372237700000404
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,2,FGAP,2All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FGAP,2The output end of the fourteenth convolution layer in the channel attention layer in the second enhanced residual block outputs 4 width
Figure BDA0003372237700000405
And has a height of
Figure BDA0003372237700000406
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,2(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FDS,2Of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 widths of
Figure BDA0003372237700000407
And has a height of
Figure BDA0003372237700000408
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,2(ii) a F is to beUS,2All the characteristic diagrams in (1) and
Figure BDA0003372237700000409
the obtained all feature maps are used as all feature maps output by the output end of the channel attention layer in the second enhanced residual block, and the set formed by the feature maps is marked as FCA,2
F is to beCA,2All feature maps in (1) and (F)En,1All the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the second enhancement residual block, and the set formed by the feature maps is FEn,2
An input of a tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure BDA00033722377000004010
And has a height of
Figure BDA00033722377000004011
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000004012
An input of an eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure BDA00033722377000004013
And has a height of
Figure BDA00033722377000004014
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000004015
Receiving F at receiving end of first spatial feature transform layer in third enhanced residual blockEn,2All feature maps in (1), will FEn,2All the characteristic diagrams in (1) and
Figure BDA0003372237700000411
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA0003372237700000412
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure BDA0003372237700000413
An input of a twelfth of the first spatial angle convolutional layers in the third enhanced residual block receives
Figure BDA0003372237700000414
Of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 width signals
Figure BDA0003372237700000415
And has a height of
Figure BDA0003372237700000416
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000417
To pair
Figure BDA0003372237700000418
Performing a recombination operation of converting from a spatial dimension to an angular dimension on all feature maps in the third enhanced residual block, performing a first spatial-angular convolution in the third enhanced residual blockInput reception of a thirteenth of the layers
Figure BDA0003372237700000419
The output end of the thirteenth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 widths as the result of the recombination operation of all the feature maps in (1)
Figure BDA00033722377000004110
And has a height of
Figure BDA00033722377000004111
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000004112
To pair
Figure BDA00033722377000004113
All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
Figure BDA00033722377000004114
An input of a tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width
Figure BDA00033722377000004115
And has a height of
Figure BDA00033722377000004116
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000004117
An input of an eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure BDA00033722377000004118
And has a height of
Figure BDA00033722377000004119
The feature map of (1) represents a set of all feature maps outputted
Figure BDA00033722377000004120
Receiving end of second spatial feature transform layer in third enhanced residual block
Figure BDA00033722377000004121
All the characteristic diagrams in (1) will
Figure BDA00033722377000004122
All the characteristic diagrams in (1) and
Figure BDA00033722377000004123
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure BDA00033722377000004124
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure BDA00033722377000004125
An input of a twelfth of the second spatial angle convolutional layers in the third enhanced residual block receives
Figure BDA0003372237700000421
Of the twelfth convolutional layer of the second spatial angle convolutional layer in the third enhanced residual block outputs 64 width pictures
Figure BDA0003372237700000422
And has a height of
Figure BDA0003372237700000423
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000424
To pair
Figure BDA0003372237700000425
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the third enhancement residual block receiving
Figure BDA0003372237700000426
The output end of the thirteenth convolution layer of the second space angle convolution layer in the third enhanced residual block outputs 64 width values
Figure BDA0003372237700000427
And has a height of
Figure BDA0003372237700000428
The feature map of (1) represents a set of all feature maps outputted
Figure BDA0003372237700000429
To pair
Figure BDA00033722377000004210
All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, and all feature maps obtained after the recombination operation are used as all feature maps output by the output end of the second space angle convolution layer in the third enhancement residual blockThe set of these feature maps is described as
Figure BDA00033722377000004211
The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives
Figure BDA00033722377000004212
The output end of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 width images
Figure BDA00033722377000004213
And has a height of
Figure BDA00033722377000004214
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,3,FGAP,3All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FGAP,3The output end of the fourteenth convolution layer in the channel attention layer in the third enhanced residual block outputs 4 width
Figure BDA00033722377000004215
And has a height of
Figure BDA00033722377000004216
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,3(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FDS,3Of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 widths of
Figure BDA00033722377000004217
And has a height of
Figure BDA00033722377000004218
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,3(ii) a F is to beUS,3All the characteristic diagrams in (1) and
Figure BDA00033722377000004219
all feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the third enhanced residual block, and a set formed by the feature maps is marked as FCA,3
F is to beCA,3All feature maps in (1) and (F)En,2All the feature maps in the third enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third enhancement residual block, and the set formed by the feature maps is FEn,3
In the above, the sizes of convolution kernels of the tenth convolution layer and the eleventh convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is all 64, and no activation function is adopted, the sizes of convolution kernels of the twelfth convolution layer and the thirteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is 64, the adopted activation functions are all "ReLU", the sizes of convolution kernels of the fourteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are 1 × 1, the convolution step lengths are 1, the number of input channels is 64, the number of output channels is 4, and the adopted activation function is "ReLU", the size of the convolution kernel of the fifteenth convolution layer in each of the first, second, and third enhanced residual blocks is 1 × 1, the convolution step is 1, the number of input channels is 4, the number of output channels is 64, and the employed activation function is "Sigmoid".
To further illustrate the feasibility and effectiveness of the method of the present invention, experiments were conducted on the method of the present invention.
The method is realized by adopting a PyTorch deep learning framework. The light field images used for training and testing are from an existing light field image database, which includes real world scenes and synthetic scenes, and these light field image databases are freely available for download over the internet. In order to ensure the reliability and robustness of the test, 200 light field images are randomly selected to form a training image set, and 70 light field images are selected to form a test image set, wherein the light field images in the training image set and the light field images in the test image set are not crossed. The basic information of the light field image database used by the training image set and the testing image set is shown in table 1, wherein the 4 light field image databases of EPFL [1], INRIA [2], STFLytro [6] and Kalantari et al [7] are obtained by shooting with a Lytro light field camera, so that the obtained light field image belongs to narrow baseline light field data; the STFGantry [5] light field image database is obtained by adopting a traditional camera fixed on a portal frame to carry out moving shooting, so that the obtained light field image has a larger baseline range and belongs to wide baseline light field data; the light field images in the HCI new [3] and HCI old [4] light field image databases belong to artificially synthesized light field images and also belong to wide baseline light field data.
TABLE 1 basic information of light field image database used for training and testing image sets
Figure BDA0003372237700000431
Figure BDA0003372237700000441
The reference information (or download website) corresponding to the light field image database used by the training image set and the testing image set is as follows:
[1] rerabek M, Edbrohimi T.New light Field Image data set [ C ]//2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).2016 (New lightfield Image Dataset [ C ]// Eighth International Conference on Quality of Multimedia Experience, 2016.)
[2] Pendu M L, Jiang X, Guilleot C.light Field input Propagation Low speed Matrix Completion [ J ]. IEEE Transactions on Image Processing,2018,27(4): 1981. sup. supplement 1993 (Propagation of light Field repair by Low-Rank Matrix Completion, IEEE Image Processing journal, 2018,27(4): 1981. sup. supplement 1993)
[3] Honauer K, Johannsen O, Kondermann D, et al.A Dataset and Evaluation method for Depth Estimation on 4D Light Fields [ C ]// Asian Conference on Computer Vision,2016 (one Dataset for 4D Light field Depth Estimation and Evaluation method [ C ]// Asian Computer Vision Conference, 2016.)
[4] Wanner S, Meister S, B Goldquecke. Datases and Benchmarks for Densely Sampled 4D Light Fields [ C ]// International Symposium on Vision Modeling and Visualization,2013 (data sets for dense sampling 4D Light Fields and reference [ C ]// visual Modeling and Visualization International seminar, 2013.)
[5] Vaish V, Adams a. the (New) Stanford Light Field Archive, Computer Graphics Laboratory, Stanford University,2008. ((New) Stanford Light Field Archive, Computer Graphics Laboratory, Stanford University, 2008.)
[6] Raj A S, Lowney M, Shah R, Wetzstein G.Stanford Lytro Light Field Archive, Available: http:// lightfields.stanford.edu/index.html. (Stanford Lytro lightfield Archive, Available website: http:// lightfields.stanford.edu/index.html.)
[7] Kalantari N K, Wang T C, Ramamotorthi R.Learing-Based View Synthesis For Light Field Cameras [ J ]. ACM Transactions on Graphics,2016,35(6):1-10. (For learning-Based View Synthesis For Light Field Cameras [ J ]. ACM Graphics,2016,35 (6):1-10.)
Respectively recombining the light field images in the training image set and the test image set into a sub-aperture image array; considering that there is vignetting effect in the light field camera (appearing as low visual quality of the boundary sub-aperture image), the angular resolution of the light field image used for training and testing is clipped to 9 × 9, i.e. only the central high quality 9 × 9 view is taken; then, taking a 5 × 5 view of the center from the obtained light field image with the angular resolution of 9 × 9 to form a light field image with the angular resolution of 5 × 5, and performing spatial resolution downsampling on the light field image by using a bicubic interpolation method, wherein the downsampling scale is 8, namely the spatial resolution of the light field image is reduced to 1/8 of the original light field image, so as to obtain a light field image with low spatial resolution; taking the original light field image with the angular resolution of 5 multiplied by 5 as a reference high spatial resolution light field image (namely a label image); then, one sub-aperture image is selected from the initial 9 × 9 views (excluding the central 5 × 5 view) and the resolution is kept unchanged, so as to obtain a 2D high resolution image. Thus, the final training set includes an array of sub-aperture images recombined with 200Y-channel images of low spatial resolution light field images with angular resolution of 5 × 5, corresponding Y-channel images of 200 2D high resolution images, and corresponding Y-channel images of 200 reference high spatial resolution light field images; the final test set comprises a subaperture image array recombined by 70Y-channel images of low spatial resolution light field images with angular resolution of 5 x 5, corresponding Y-channel images of 70 2D high resolution images and corresponding 70 reference high spatial resolution light field images, wherein the 70 reference high spatial resolution light field images are not related to network inference or test and are only used for subsequent subjective visual comparison and objective quality evaluation.
When the constructed spatial super-resolution network is trained, initializing parameters of all convolution kernels by adopting an MSRA initializer; the loss function selects the combination of pixel domain L1 norm loss and gradient loss; training the network by using an ADAM optimizer; firstly, the number of the particles is 10-4Training two parts of an encoder and a decoder in a spatial super-resolution network for a learning rate to be converged to a certain degree, and then setting the learning rate to be 10-4The whole spatial super-resolution network is trained, and the learning rate is attenuated by a scale factor of 0.5 after 25 epochs are trained.
In order to illustrate the performance of the method of the present invention, the method of the present invention is compared with the existing bicubic interpolation method and the existing six image super-resolution reconstruction methods, and the method based on the depth back-projection network proposed by Haris et al, the method based on the depth Laplacian pyramid network proposed by Lai et al, the method based on the space-angle separable convolution proposed by Yeung et al, the method based on the space-angle interaction network proposed by Wang et al, the method based on the two-stage network proposed by Jin et al, and the method based on the hybrid input proposed by Camnathan et al, wherein the method of Haris et al and the method of Lai et al belong to the 2D image super-resolution reconstruction method (which is independently applied to each sub-aperture image of the light field image), the method of Yeung et al, the method of Wang et al and the method of Jin et al belong to the common light field image space super-resolution reconstruction method, the method of Boominathan et al belongs to the field of spatial super-resolution reconstruction methods using hybrid input.
Here, the objective Quality Evaluation Index used includes PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index), and an advanced objective Quality Evaluation Index of a Light-Field Image (see Min X, Zhou J, Zhai G, et al a method for Light Field Reconstruction, Compression, and Display Quality Evaluation [ J ]. IEEE transformations on Image Processing,2020,29: 3790-; the SSIM is used for evaluating the objective quality of the super-resolution reconstruction image from the perspective of visual perception, the value of the SSIM is 0-1, and the higher the value is, the better the image quality is; the objective quality evaluation index of the light field image is used for effectively evaluating the objective quality of the super-resolution reconstruction image by jointly measuring the spatial quality (texture and detail) and the angular quality (parallax structure) of the light field image, and the higher the value of the objective quality evaluation index is, the better the image quality is.
Table 2 shows the comparison between the method of the present invention and the existing bicubic interpolation method and the existing optical field image space super-resolution reconstruction method on the psnr (db) index, table 3 shows the comparison between the method of the present invention and the existing bicubic interpolation method and the existing optical field image space super-resolution reconstruction method on the SSIM index, and table 4 shows the comparison between the method of the present invention and the existing bicubic interpolation method and the existing optical field image space super-resolution reconstruction method on the objective quality evaluation index of the optical field image. As can be seen from the objective data listed in tables 2, 3 and 4, compared with the existing light field image spatial super-resolution reconstruction method (including the 2D image super-resolution reconstruction method), the method of the present invention obtains higher quality scores on the three objective quality evaluation indexes used, and is significantly higher than all comparison methods, which indicates that the method of the present invention can effectively reconstruct the texture and detail information of the light field image, and recover a better parallax structure at the same time; particularly, for the light field image databases with different baseline ranges and scene contents, the method of the invention achieves the best super-resolution reconstruction effect, which shows that the method of the invention can well process narrow baseline and wide baseline light field data and has good robustness to the scene contents.
TABLE 2 comparison of PSNR (dB) index by the method of the present invention with the existing bicubic interpolation method and the existing optical field image spatial super-resolution reconstruction method
Figure BDA0003372237700000461
Figure BDA0003372237700000471
TABLE 3 comparison of SSIM index using the method of the present invention with existing bicubic interpolation method and existing light field image spatial super resolution reconstruction method
Figure BDA0003372237700000472
Figure BDA0003372237700000481
Table 4 comparison of the method of the present invention with the existing bicubic interpolation method and the existing light field image space super-resolution reconstruction method on the objective quality evaluation index of the light field image
Figure BDA0003372237700000482
Figure BDA0003372237700000491
FIG. 5a shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by a bicubic interpolation method, where a sub-aperture image under a central coordinate is taken for display; FIG. 5b shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using Haris et al, where a sub-aperture image at a central coordinate is taken for display; FIG. 5c shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Lai et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 5d shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Yeung et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 5e shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using the method of Wang et al, where a sub-aperture image at a central coordinate is taken for display; FIG. 5f shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by the method of Jin et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 5g shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database by using Boominathan et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 5h shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested EPFL light field image database using the method of the present invention, where a sub-aperture image under a central coordinate is taken for display; fig. 5i shows the label high spatial resolution light field image corresponding to the low spatial resolution light field image in the EPFL light field image database under test, where the sub-aperture image in the central coordinate is taken for presentation.
FIG. 6a shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database using a bicubic interpolation method, where a sub-aperture image in a central coordinate is taken for display; FIG. 6b shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database using Haris et al, where a sub-aperture image at a central coordinate is taken for display; FIG. 6c shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by the method of Lai et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 6d shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by a method of Yeung et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 6e shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by the method of Wang et al, where a sub-aperture image in a central coordinate is taken for display; FIG. 6f shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by a method of Jin et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 6g shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database by using Boominathan et al, where a sub-aperture image under a central coordinate is taken for display; FIG. 6h shows a reconstructed high spatial resolution light field image obtained by processing a low spatial resolution light field image in a tested STFLytro light field image database using the method of the present invention, where a sub-aperture image at a central coordinate is taken for display; fig. 6i shows the label high spatial resolution light field image corresponding to the low spatial resolution light field image in the STFLytro light field image database under test, here shown as a sub-aperture image in central coordinates.
Comparing fig. 5a to 5h with fig. 5i, and comparing fig. 6a to 6h with fig. 6i, respectively, it can be clearly seen that, with the existing light field image spatial super-resolution reconstruction methods, including the 2D image super-resolution reconstruction method, the reconstructed high spatial resolution light field image cannot recover the texture and detail information of the image, as shown in the lower left rectangular frame enlarged region in fig. 5a to 5f, and the lower right rectangular frame enlarged region in fig. 6a to 6 f; using the hybrid input light field image spatial super resolution reconstruction method achieves relatively better results but still contains some blurring artifacts as shown by the lower left rectangular box magnified region in fig. 5g and the lower right rectangular box magnified region in fig. 6 g; in contrast, the high spatial resolution light field image reconstructed by the method of the present invention has clear texture and rich details, and is close to the label high spatial resolution light field image (i.e. fig. 5i and fig. 6i) in subjective visual perception, which indicates that the method of the present invention can effectively recover the texture information of the light field image. In addition, by reconstructing each sub-aperture image with high quality, the method of the invention can well ensure the parallax structure of the finally reconstructed high-spatial-resolution light field image.
The innovation of the method is mainly as follows: firstly, acquiring abundant 2D spatial information while capturing high-dimensional light field data through heterogeneous imaging, namely capturing a light field image and a 2D high-resolution image simultaneously, further effectively improving the spatial resolution of the light field image by utilizing the information of the 2D high-resolution image, and recovering corresponding textures and details; secondly, in order to establish and explore the relation between the light field image and the 2D high-resolution image, the method respectively constructs an aperture-level feature registration module and a light field feature enhancement module, wherein the aperture-level feature registration module can accurately register 2D high-resolution information and 4D light field image information, and the light field feature enhancement module can consistently enhance visual information in light field features by using high-resolution feature information obtained by registration on the basis to obtain enhanced high-resolution light field features; and thirdly, a flexible pyramid reconstruction mode is adopted, namely the spatial resolution of the light field image is gradually improved and an accurate parallax structure is recovered by a coarse-to-fine reconstruction strategy, and then a multi-scale super-resolution result can be reconstructed in one-time forward inference. In addition, to reduce the number of parameters and training burden of the pyramid network, weight sharing is performed at each pyramid level.

Claims (3)

1. A light field image space super-resolution reconstruction method is characterized by comprising the following steps:
step 1: selecting Num color three-channel low-spatial-resolution light field images with spatial resolution of W multiplied by H and angular resolution of V multiplied by U, corresponding Num color three-channel 2D high-resolution images with resolution of alpha W multiplied by alpha H, and corresponding Num color three-channel reference high-spatial-resolution light field images with spatial resolution of alpha W multiplied by alpha H and angular resolution of V multiplied by U; wherein Num is more than 1, alpha represents the spatial resolution improvement multiple, and the value of alpha is more than 1;
step 2: constructing a convolutional neural network as a spatial super-resolution network: the spatial super-resolution network comprises an encoder for extracting multi-scale features, an aperture level feature registration module for registering light field features and 2D high-resolution features, a shallow layer feature extraction layer for extracting shallow layer features from a low spatial resolution light field image, a light field feature enhancement module for fusing the light field features and the 2D high-resolution features, a spatial attention block for relieving registration errors in the coarse-scale features, and a decoder for reconstructing potential features into the light field image;
for the encoder, the encoder is composed of a first convolution layer, a second convolution layer, a first residual block and a second residual block which are connected in sequence, wherein the input end of the first convolution layer receives three inputs in parallel, and each input is a frame with spatial resolution of W × H and angle divisionSingle-channel image L of low-spatial-resolution light field image with resolution V multiplied by ULRThe width of the image reconstruction obtained after the spatial resolution up-sampling is alphasW x V and height of alphasH × U subaperture image array, which is denoted as
Figure FDA0003372237690000011
A width of alphasW and a height of alphasThe single-channel image of the blurred 2D high-resolution image of H is described as
Figure FDA0003372237690000012
And a width of alphasW and a height of alphasSingle channel image of H2D high resolution image, denoted as IHRThe output end of the first convolution layer is directed to
Figure FDA0003372237690000013
Output 64 frames with width alphasW x V and height of alphasH × U signature graph, will be directed to
Figure FDA0003372237690000014
The set of all the output feature maps is denoted as
Figure FDA0003372237690000015
Output terminal of the first winding layer is aimed at
Figure FDA0003372237690000016
Output 64 frames with width alphasW and a height of alphasH characteristic diagram, will be directed to
Figure FDA0003372237690000017
The set of all the output feature maps is denoted as
Figure FDA0003372237690000018
Output terminal of the first convolution layer is directed to IHROutput 64 frames with width alphasW and a height of alphasH signature of H will be directed to IHRThe set of all the output feature maps is denoted as YHR,0(ii) a The input terminal of the second convolutional layer receives three inputs in parallel, respectively
Figure FDA0003372237690000019
All the characteristic diagrams in (A),
Figure FDA00033722376900000110
All feature maps and Y in (1)HR,0All feature maps in (1), the output of the second convolutional layer being directed to
Figure FDA00033722376900000111
Output 64 frames with width of
Figure FDA00033722376900000112
And has a height of
Figure FDA00033722376900000113
Will be directed to
Figure FDA00033722376900000114
The set of all the output feature maps is denoted as
Figure FDA0003372237690000021
Output terminal of the second convolution layer is aimed at
Figure FDA0003372237690000022
Output 64 frames with width of
Figure FDA0003372237690000023
And has a height of
Figure FDA0003372237690000024
Will be directed to
Figure FDA0003372237690000025
The set of all the output feature maps is denoted as
Figure FDA0003372237690000026
The output end of the second convolution layer is directed to YHR,0Output 64 frames with width of
Figure FDA0003372237690000027
And has a height of
Figure FDA0003372237690000028
Will be directed to YHRAnd the set of all the characteristic diagrams output by 0 is marked as YHR,1(ii) a The input terminal of the first residual block receives three inputs in parallel, respectively
Figure FDA0003372237690000029
All the characteristic diagrams in (A),
Figure FDA00033722376900000210
All feature maps and Y in (1)HR,1The output of the first residual block is directed to
Figure FDA00033722376900000211
Output 64 frames with width of
Figure FDA00033722376900000212
And has a height of
Figure FDA00033722376900000213
Will be directed to
Figure FDA00033722376900000214
The set of all the output feature maps is denoted as
Figure FDA00033722376900000215
Output of the first residual block is directed to
Figure FDA00033722376900000216
Output 64 frames with width of
Figure FDA00033722376900000217
And has a height of
Figure FDA00033722376900000218
Will be directed to
Figure FDA00033722376900000219
The set of all the output feature maps is denoted as
Figure FDA00033722376900000220
Output of the first residual block for YHR,1Output 64 frames with width of
Figure FDA00033722376900000221
And has a height of
Figure FDA00033722376900000222
Will be directed to YHR,1The set of all the output feature maps is denoted as YHR,2(ii) a The input terminal of the second residual block receives three inputs in parallel, respectively
Figure FDA00033722376900000223
All the characteristic diagrams in (A),
Figure FDA00033722376900000224
All feature maps and Y in (1)HR,2Of the second residual block, the output of the second residual block being directed to
Figure FDA00033722376900000225
Output 64 frames with width of
Figure FDA00033722376900000226
And has a height of
Figure FDA00033722376900000227
Will be directed to
Figure FDA00033722376900000228
The set of all the output feature maps is denoted as
Figure FDA00033722376900000229
Output pair of second residual block
Figure FDA00033722376900000230
Output 64 frames with width of
Figure FDA00033722376900000231
And has a height of
Figure FDA00033722376900000232
Will be directed to
Figure FDA00033722376900000233
The set of all the output feature maps is denoted as
Figure FDA00033722376900000234
Output of the second residual block for YHR,2Output 64 frames with width of
Figure FDA00033722376900000235
And has a height of
Figure FDA00033722376900000236
Will be directed to YHR,2The set of all the output feature maps is denoted as YHR,3(ii) a Wherein,
Figure FDA00033722376900000237
is a single-channel image L of a low spatial resolution light-field image with spatial resolution W × H and angular resolution V × ULRThe width of the image recombination obtained after the bicubic interpolation up-sampling is alphasW x V and height of alphasAn array of H U sub-aperture images,
Figure FDA00033722376900000238
to pass through the pair IHRFirstly carrying out bicubic interpolation downsampling and then carrying out bicubic interpolation upsampling to obtain alphasRepresenting a spatial resolution sampling factor, alphas 3Alpha, the up-sampling factor of the up-sampling of the bicubic interpolation and the down-sampling factor of the down-sampling of the bicubic interpolation both take the value of alphasThe size of the convolution kernel of the first convolution layer is 3 × 3, the convolution step is 1, the number of input channels is 1, the number of output channels is 64, the size of the convolution kernel of the second convolution layer is 3 × 3, the convolution step is 2, the number of input channels is 64, the number of output channels is 64, and the activation functions adopted by the first convolution layer and the second convolution layer are both 'ReLU';
for the aperture level feature registration module, the input end of the aperture level feature registration module receives three types of feature maps, wherein the first type is
Figure FDA0003372237690000031
All characteristic diagrams in (1), the second class is
Figure FDA0003372237690000032
The third class includes four inputs, respectively YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps in (1), YHR,3All feature maps in (1); in the aperture level feature registration module, first, the image data is processed
Figure FDA0003372237690000033
All feature maps in (1), YHR,0All feature maps in (1), YHR,1All feature maps in (1), YHR,2All ofFeature map and YHR,3All feature maps in (1) are each replicated by a factor of V × U, so that
Figure FDA0003372237690000034
All feature maps in (1), YHR,1All feature maps in (1), YHR,2All feature maps and Y in (1)HR,3Becomes the width of all the feature maps in
Figure FDA0003372237690000035
And the height becomes
Figure FDA0003372237690000036
I.e. to obtain the dimensions and
Figure FDA0003372237690000037
and matching the size of the feature map in (1) with YHR,0Becomes asW x V and height becomes alphasH × U, i.e. to size and
Figure FDA0003372237690000038
the dimensions of the feature maps in (1) match; then to
Figure FDA0003372237690000039
All characteristic figures in (1) and
Figure FDA00033722376900000310
all the characteristic diagrams in the method are subjected to block matching, and a width of the characteristic diagram is obtained after the block matching is finished
Figure FDA00033722376900000311
And has a height of
Figure FDA00033722376900000312
Is marked as PCI(ii) a Then according to PCIIs a reaction of YHR,1All the characteristic diagrams in (1) and
Figure FDA00033722376900000313
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure FDA00033722376900000314
And has a height of
Figure FDA00033722376900000315
The obtained set of all the registration feature maps is denoted as FAlign,1(ii) a Also according to PCIIs a reaction of YHR,2All the characteristic diagrams in (1) and
Figure FDA00033722376900000316
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure FDA00033722376900000317
And has a height of
Figure FDA00033722376900000318
The obtained set of all the registration feature maps is denoted as FAlign,2(ii) a According to PCIIs a reaction of YHR,3All the characteristic diagrams in (1) and
Figure FDA00033722376900000319
all feature maps in (1) are subjected to spatial position registration to obtain 64 feature maps with the width of
Figure FDA00033722376900000320
And has a height of
Figure FDA00033722376900000321
The obtained set of all the registration feature maps is denoted as FAlign,3(ii) a For P againCIPerforming bicubic interpolation up-sampling to obtain a frame with width alphasW is multiplied by V andheight of alphasH × U coordinate index diagram, noted
Figure FDA00033722376900000322
Finally according to
Figure FDA00033722376900000323
Will YHR,0All the characteristic diagrams in (1) and
Figure FDA00033722376900000324
all feature maps in the image are registered in space position to obtain 64 pieces of width alphasW x V and height of alphasH × U registration feature map, and F represents a set of all the obtained registration feature mapsAlign,0(ii) a Output F of aperture level feature registration moduleAlign,0All characteristic diagrams in (1), FAlign,1All characteristic diagrams in (1), FAlign,2All feature maps and F in (1)Align,3All feature maps in (1); wherein, the precision measurement index for block matching is a texture and structure similarity index, the size of the block for block matching is 3 multiplied by 3, and the up-sampling factor of the bicubic interpolation up-sampling is alphas
For the shallow feature extraction layer, it is composed of 1 fifth convolution layer, the input end of which receives a single-channel image L of a low spatial resolution light field image with spatial resolution WxH and angular resolution VxULRThe output end of the fifth convolution layer outputs 64 characteristic diagrams with the width of W multiplied by V and the height of H multiplied by U, and the set formed by all the output characteristic diagrams is denoted as FLR(ii) a The convolution kernel of the fifth convolution layer has a size of 3 × 3, a convolution step size of 1, a number of input channels of 1, a number of output channels of 64, and the activation function adopted by the fifth convolution layer is "ReLU";
for the light field characteristic enhancement module, the light field characteristic enhancement module consists of a first enhancement residual block, a second enhancement residual block and a third enhancement residual block which are connected in sequence, wherein the input end of the first enhancement residual block receives FAlign,1All feature maps and F in (1)LROf 64 width at the output of the first enhancement residual block
Figure FDA0003372237690000041
And has a height of
Figure FDA0003372237690000042
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,1(ii) a The input of the second enhanced residual block receives FAlign,2All feature maps and F in (1)En,1Of 64 widths at the output of the second enhanced residual block
Figure FDA0003372237690000043
And has a height of
Figure FDA0003372237690000044
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,2(ii) a The input of the third enhanced residual block receives FAlign,3All feature maps and F in (1)En,2Of 64 width at the output of the third enhanced residual block
Figure FDA0003372237690000045
And has a height of
Figure FDA0003372237690000046
The feature map of (1) is a set of all feature maps of (1) output, denoted as FEn,3
For a spatial attention block, which consists of a sixth convolutional layer and a seventh convolutional layer connected in sequence, the input of the sixth convolutional layer receives FAlign,0The output end of the sixth convolutional layer outputs 64 characteristic graphs with the width of alphasW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA1(ii) a Input terminal of seventh convolution layer receiving FSA1In (1)All spatial attention feature maps, the output end of the seventh convolutional layer outputs 64 width alphasW x V and height of alphasH × U spatial attention feature map, and F represents a set of all output spatial attention feature mapsSA2(ii) a F is to beAlign,0All feature maps in (1) and (F)SA2Multiplying all the spatial attention feature maps element by element, and recording the set formed by all the obtained feature maps as FWA,0(ii) a F is to beWA,0As all feature maps output by the output end of the spatial attention block; the sizes of convolution kernels of the sixth convolution layer and the seventh convolution layer are both 3 multiplied by 3, convolution step lengths are both 1, the number of input channels is 64, the number of output channels is 64, the activation function adopted by the sixth convolution layer is 'ReLU', and the activation function adopted by the seventh convolution layer is 'Sigmoid';
for the decoder, the decoder is composed of a third residual block, a fourth residual block, a sub-pixel convolution layer, an eighth convolution layer and a ninth convolution layer which are connected in sequence, wherein the input end of the third residual block receives FEn,3Of 64 widths at the output of the third residual block
Figure FDA0003372237690000051
And has a height of
Figure FDA0003372237690000052
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,1(ii) a The input of the fourth residual block receives FDec,1Of 64 width at the output of the fourth residual block
Figure FDA0003372237690000053
And has a height of
Figure FDA0003372237690000054
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDec,2(ii) a Input terminal of sub-pixel convolution layer receiving FDec,2All characteristic diagrams in (1)The output end of the sub-pixel convolution layer outputs 256 widths
Figure FDA0003372237690000055
And has a height of
Figure FDA0003372237690000056
And 256 widths are set as
Figure FDA0003372237690000057
And has a height of
Figure FDA0003372237690000058
Further converting the feature map into 64 pieces with the width alphasW x V and height of alphasH × U feature graph, and F represents a set of all converted feature graphsDec,3(ii) a Input terminal of eighth convolution layer receiving FDec,3All feature maps in (1) and (F)WA,0The result of element-by-element addition of all the feature maps in (1), the output end of the eighth convolutional layer outputs 64 width alphasW x V and height of alphasH × U feature map, and F represents a set of all output feature mapsDec,4(ii) a Input terminal of the ninth convolutional layer receives FDec,4The output end of the ninth convolutional layer outputs a characteristic diagram with a width of alphasW x V and height of alphasH multiplied by U, the single-channel light field image is reconstructed, and the width is alphasW x V and height of alphasReconstruction of H multiplied by U single-channel light field image into alpha-space resolutionsW×αsH and high spatial resolution single-channel light field image with angular resolution of V multiplied by U, which is recorded as LSR(ii) a The convolution kernel of the sub-pixel convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 256, the convolution kernel of the eighth convolution layer has the size of 3 multiplied by 3, the convolution step is 1, the number of input channels is 64, the number of output channels is 64, the convolution kernel of the ninth convolution layer has the size of 1 multiplied by 1, the convolution step is 1, the number of input channels is 64, the number of output channels is 1, and excitation adopted by the sub-pixel convolution layer and the eighth convolution layerThe active functions are all 'ReLU', and the ninth convolution layer does not adopt the active function;
and step 3: performing color space conversion on each low spatial resolution light field image in the training set, the corresponding 2D high resolution image and the corresponding reference high spatial resolution light field image, namely converting the RGB color space into the YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of each low spatial resolution light field image into a sub-aperture image array with the width of W multiplied by V and the height of H multiplied by U for representation; then, a sub-aperture image array recombined with Y-channel images of all the light field images with low spatial resolution in the training set, a corresponding Y-channel image of the 2D high-resolution image and a corresponding Y-channel image of the reference light field image with high spatial resolution form the training set; and then constructing a pyramid network, and training by using a training set, wherein the concrete process is as follows:
step 3_ 1: copying the constructed spatial super-resolution network three times, cascading, sharing the weight of each spatial super-resolution network, namely, all the parameters are the same, and defining the whole network formed by the three spatial super-resolution networks as a pyramid network; at each pyramid level, the reconstruction scale of the spatial super-resolution network is set to be equal to αsThe values are the same;
step 3_ 2: carrying out two times of spatial resolution downsampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after downsampling as a label image; carrying out the same spatial resolution down-sampling twice on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image aiming at a first spatial super-resolution network in the pyramid network; then recombining the Y-channel images of all the low-spatial-resolution light field images in the training set to obtain a sub-aperture image array, performing primary spatial resolution up-sampling on the Y-channel images of all the low-spatial-resolution light field images in the training set to obtain an image recombined sub-aperture image array, all the 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid network and all the 2D high-resolution Y-channel images aiming at the first spatial super-resolution network in the pyramid networkFuzzy 2D high-resolution Y-channel images obtained by performing one-time spatial resolution down-sampling and one-time spatial resolution up-sampling on the 2D high-resolution Y-channel images of the network are input into a first spatial super-resolution network in the constructed pyramid network for training, and alpha corresponding to the Y-channel image of each low-spatial resolution light field image in a training set is obtainedsReconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
step 3_ 3: carrying out single spatial resolution down-sampling on a Y-channel image of each reference high spatial resolution light field image in the training set, and taking an image obtained after the down-sampling as a label image; carrying out single same spatial resolution down-sampling on the Y-channel image of each 2D high-resolution image in the training set, and taking the image obtained after the down-sampling as a 2D high-resolution Y-channel image for a second spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training setsSub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training setsInputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the reconstructed image subaperture image array, all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network and all 2D high-resolution Y-channel images aiming at the second spatial super-resolution network in the pyramid network into the second spatial super-resolution network in the constructed pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial resolution light field image in the training sets 2Reconstructing a high-spatial-resolution Y-channel light field image; wherein spatial resolution up-sampling and spatial resolution down-samplingThe sampling modes are bicubic interpolation, and the scales of the spatial resolution up-sampling and the spatial resolution down-sampling are equal to alphasThe values are the same;
step 3_ 4: taking a Y-channel image of each reference high-spatial-resolution light field image in the training set as a label image; taking the Y-channel image of each 2D high-resolution image in the training set as a 2D high-resolution Y-channel image for a third spatial super-resolution network in the pyramid network; then corresponding alpha of Y-channel images of all the low spatial resolution light field images in the training sets 2Sub-aperture image array for reconstructing high-spatial-resolution Y-channel light field image recombination in multiple mode, and alpha corresponding to Y-channel images of all low-spatial-resolution light field images in training sets 2Inputting fuzzy 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the multiple reconstructed high-spatial-resolution Y-channel light field images to a third spatial super-resolution network in the pyramid network, and all 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the 2D high-resolution Y-channel images to the third spatial super-resolution network in the pyramid network for training to obtain alpha corresponding to the Y-channel image of each low-spatial-resolution light field image in the training sets 3Reconstructing a high-spatial-resolution Y-channel light field image; the spatial resolution up-sampling and the spatial resolution down-sampling are performed by bicubic interpolation, and the scale of the spatial resolution up-sampling and the spatial resolution down-sampling is equal to alphasThe values are the same;
obtaining the optimal weight parameters of all convolution kernels in each spatial super-resolution network in the pyramid network after the training is finished, and obtaining a well-trained spatial super-resolution network model;
and 4, step 4: randomly selecting a low-spatial-resolution light field image with three color channels and a corresponding 2D high-resolution image with three color channels as test images; then, converting the low-spatial-resolution light field image of the three color channels and the corresponding 2D high-resolution image of the three color channels from an RGB color space to a YCbCr color space, and extracting a Y-channel image; recombining the Y-channel images of the light field image with low spatial resolution into a sub-aperture image array for representation; inputting blurred 2D high-resolution Y-channel images obtained by performing primary spatial resolution down-sampling and primary spatial resolution up-sampling on the Y-channel images of the low-spatial resolution light field images, the Y-channel images of the 2D high-resolution images and the Y-channel images of the 2D high-resolution images into a spatial super-resolution network model, and testing to obtain reconstructed high-spatial resolution Y-channel light field images corresponding to the Y-channel images of the low-spatial resolution light field images; then performing bicubic interpolation up-sampling on the Cb channel image and the Cr channel image of the low-spatial-resolution light field image respectively to obtain a reconstructed high-spatial-resolution Cb channel light field image corresponding to the Cb channel image of the low-spatial-resolution light field image and a reconstructed high-spatial-resolution Cr channel light field image corresponding to the Cr channel image of the low-spatial-resolution light field image; and finally, cascading the obtained reconstructed high-spatial-resolution Y-channel light field image, the reconstructed high-spatial-resolution Cb-channel light field image and the reconstructed high-spatial-resolution Cr-channel light field image on the dimension of a color channel, and converting the cascading result into an RGB color space again to obtain the reconstructed high-spatial-resolution light field image of the color three channels corresponding to the low-spatial-resolution light field image.
2. The method for super-resolution reconstruction of light field image space according to claim 1, wherein in step 2, the first, second, third and fourth residual blocks have the same structure and are composed of sequentially connected third and fourth convolutional layers, and the input end of the third convolutional layer in the first residual block receives three inputs in parallel, namely, three inputs
Figure FDA0003372237690000081
All the characteristic diagrams in (A),
Figure FDA0003372237690000082
All feature maps and Y in (1)HR,1Of the third convolutional layer in the first residual block, the output end of the third convolutional layer is directed to
Figure FDA0003372237690000083
Output 64 frames with width of
Figure FDA0003372237690000084
And has a height of
Figure FDA0003372237690000085
Will be directed to
Figure FDA0003372237690000086
The set of all the output feature maps is denoted as
Figure FDA0003372237690000087
Output terminal pair of the third convolution layer in the first residual block
Figure FDA0003372237690000088
Output 64 frames with width of
Figure FDA0003372237690000089
And has a height of
Figure FDA00033722376900000810
Will be directed to
Figure FDA00033722376900000811
The set of all the output feature maps is denoted as
Figure FDA00033722376900000812
Output pin of third convolution layer in first residual blockFor YHR,1Output 64 frames with width of
Figure FDA00033722376900000813
And has a height of
Figure FDA00033722376900000814
Will be directed to YHR,1The set of all the output feature maps is denoted as
Figure FDA00033722376900000815
The input terminal of the fourth convolutional layer in the first residual block receives three inputs in parallel, respectively
Figure FDA00033722376900000816
All the characteristic diagrams in (A),
Figure FDA00033722376900000817
All characteristic figures in (1) and
Figure FDA00033722376900000818
of the fourth convolutional layer in the first residual block, the output of the fourth convolutional layer is directed to
Figure FDA0003372237690000091
Output 64 frames with width of
Figure FDA0003372237690000092
And has a height of
Figure FDA0003372237690000093
Will be directed to
Figure FDA0003372237690000094
The set of all the output feature maps is denoted as
Figure FDA0003372237690000095
Output terminal pair of fourth convolution layer in first residual block
Figure FDA0003372237690000096
Output 64 frames with width of
Figure FDA0003372237690000097
And has a height of
Figure FDA0003372237690000098
Will be directed to
Figure FDA0003372237690000099
The set of all the output feature maps is denoted as
Figure FDA00033722376900000910
Output terminal pair of fourth convolution layer in first residual block
Figure FDA00033722376900000911
Output 64 frames with width of
Figure FDA00033722376900000912
And has a height of
Figure FDA00033722376900000913
Will be directed to
Figure FDA00033722376900000914
The set of all the output feature maps is denoted as
Figure FDA00033722376900000915
Will be provided with
Figure FDA00033722376900000916
All the characteristic diagrams in (1) and
Figure FDA00033722376900000917
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparison
Figure FDA00033722376900000918
All the output feature maps, the set formed by the feature maps is the
Figure FDA00033722376900000919
Will be provided with
Figure FDA00033722376900000920
All the characteristic diagrams in (1) and
Figure FDA00033722376900000921
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the first residual block for comparison
Figure FDA00033722376900000922
All the output feature maps, the set formed by the feature maps is the
Figure FDA00033722376900000923
Will YHR,1All the characteristic diagrams in (1) and
Figure FDA00033722376900000924
all feature maps in (1) are added element by element, and all obtained feature maps are used as the output end of the first residual block and aim at YHR,1All the output feature maps, and the set formed by the feature maps is YHR,2
The input of the third convolutional layer in the second residual block receives three inputs in parallel, respectively
Figure FDA00033722376900000925
All the characteristic diagrams in (A),
Figure FDA00033722376900000926
All feature maps and Y in (1)HR,2Of the third convolutional layer in the second residual block, the output end of the third convolutional layer is directed to
Figure FDA00033722376900000927
Output 64 frames with width of
Figure FDA00033722376900000928
And has a height of
Figure FDA00033722376900000929
Will be directed to
Figure FDA00033722376900000930
The set of all the output feature maps is denoted as
Figure FDA00033722376900000931
Output pair of the third convolutional layer in the second residual block
Figure FDA00033722376900000932
Output 64 frames with width of
Figure FDA00033722376900000933
And has a height of
Figure FDA00033722376900000934
Will be directed to
Figure FDA00033722376900000935
The set of all the output feature maps is denoted as
Figure FDA00033722376900000936
The output of the third convolutional layer in the second residual block is for YHR,2Output 64 frames with width of
Figure FDA00033722376900000937
And has a height of
Figure FDA00033722376900000938
Will be directed to YHR,2The set of all the output feature maps is denoted as
Figure FDA00033722376900000939
The input terminal of the fourth convolutional layer in the second residual block receives three inputs in parallel, respectively
Figure FDA00033722376900000940
All the characteristic diagrams in (A),
Figure FDA00033722376900000941
All characteristic figures in (1) and
Figure FDA00033722376900000942
of the fourth convolutional layer in the second residual block, the output of the fourth convolutional layer is directed to
Figure FDA0003372237690000101
Output 64 frames with width of
Figure FDA0003372237690000102
And has a height of
Figure FDA0003372237690000103
Will be directed to
Figure FDA0003372237690000104
The set of all the output feature maps is denoted as
Figure FDA0003372237690000105
Output terminal pair of fourth convolution layer in second residual block
Figure FDA0003372237690000106
Output 64 frames with width of
Figure FDA0003372237690000107
And has a height of
Figure FDA0003372237690000108
Will be directed to
Figure FDA0003372237690000109
The set of all the output feature maps is denoted as
Figure FDA00033722376900001010
Output terminal pair of fourth convolution layer in second residual block
Figure FDA00033722376900001011
Output 64 frames with width of
Figure FDA00033722376900001012
And has a height of
Figure FDA00033722376900001013
Will be directed to
Figure FDA00033722376900001014
The set of all the output feature maps is denoted as
Figure FDA00033722376900001015
Will be provided with
Figure FDA00033722376900001016
All the characteristic diagrams in (1) and
Figure FDA00033722376900001017
all feature maps in (1) are element-by-elementPixel addition, using all the obtained feature maps as output end pairs of the second residual error block
Figure FDA00033722376900001018
All the output feature maps, the set formed by the feature maps is the
Figure FDA00033722376900001019
Will be provided with
Figure FDA00033722376900001020
All the characteristic diagrams in (1) and
Figure FDA00033722376900001021
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block for comparison
Figure FDA00033722376900001022
All the output feature maps, the set formed by the feature maps is the
Figure FDA00033722376900001023
Will YHR,2All the characteristic diagrams in (1) and
Figure FDA00033722376900001024
all feature maps in (1) are added element by element, and all obtained feature maps are used as output ends of the second residual block and aim at YHR,2All the output feature maps, and the set formed by the feature maps is YHR,3
The input of the third convolutional layer in the third residual block receives FEn,3Of 64 width at the output of the third convolutional layer in the third residual block
Figure FDA00033722376900001025
And has a height of
Figure FDA00033722376900001026
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001027
Input reception of a fourth convolutional layer in a third residual block
Figure FDA00033722376900001028
Of 64 width at the output of the fourth convolutional layer in the third residual block
Figure FDA00033722376900001029
And has a height of
Figure FDA00033722376900001030
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001031
F is to beEn,3All the characteristic diagrams in (1) and
Figure FDA00033722376900001032
all the feature maps in the third residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third residual block, and the set formed by the feature maps is FDec,1
The input of the third convolutional layer in the fourth residual block receives FDec,1The output end of the third convolution layer in the fourth residual block outputs 64 width
Figure FDA00033722376900001033
And has a height of
Figure FDA00033722376900001034
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000111
Input reception of a fourth convolutional layer in a fourth residual block
Figure FDA0003372237690000112
Of 64 width at the output of the fourth convolutional layer in the fourth residual block
Figure FDA0003372237690000113
And has a height of
Figure FDA0003372237690000114
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000115
F is to beDec,1All the characteristic diagrams in (1) and
Figure FDA0003372237690000116
all the feature maps in the first residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the fourth residual block, and the set formed by the feature maps is FDec,2
In the above, the sizes of convolution kernels of the third convolution layer and the fourth convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is 64, the number of output channels is 64, and the activation function adopted by the third convolution layer in each of the first residual block, the second residual block, the third residual block and the fourth residual block is "ReLU" and the activation function adopted by the fourth convolution layer is not adopted.
3. The light field image spatial super-resolution reconstruction method according to claim 1 or 2, it is characterized in that in step 2, the first enhanced residual block, the second enhanced residual block and the third enhanced residual block have the same structure, which consists of a first spatial characteristic transformation layer, a first spatial angle convolution layer, a second spatial characteristic transformation layer, a second spatial angle convolution layer and a channel attention layer which are connected in sequence, wherein the first spatial characteristic transformation layer and the second spatial characteristic transformation layer have the same structure, which are composed of a tenth convolution layer and an eleventh convolution layer in parallel, the first space angle convolution layer and the second space angle convolution layer have the same structure, the channel attention layer consists of a global mean value pooling layer, a fourteenth convolution layer and a fifteenth convolution layer which are connected in sequence;
an input of a tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure FDA0003372237690000117
And has a height of
Figure FDA0003372237690000118
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000119
An input of an eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the first spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure FDA00033722376900001110
And has a height of
Figure FDA00033722376900001111
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001112
The input of the first spatial feature transform layer in the first enhanced residual block receives FLRAll feature maps in (1), will FLRAll the characteristic diagrams in (1) and
Figure FDA00033722376900001113
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA00033722376900001114
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the first enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure FDA0003372237690000121
An input of a twelfth of the first spatial angle convolutional layers in the first enhanced residual block receives
Figure FDA0003372237690000122
Of the first spatial angle convolutional layer in the first enhancement residual block, the output end of the twelfth convolutional layer of the first spatial angle convolutional layer outputs 64 widths
Figure FDA0003372237690000123
And has a height of
Figure FDA0003372237690000124
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000125
To pair
Figure FDA0003372237690000126
From the spatial dimension to the angular dimensionThe input of a thirteenth of the first spatial angle convolutional layers in the first enhanced residual block receives
Figure FDA0003372237690000127
The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the first enhancement residual block outputs 64 widths as the result of the reorganization operation of all the feature maps in (1)
Figure FDA0003372237690000128
And has a height of
Figure FDA0003372237690000129
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001210
To pair
Figure FDA00033722376900001211
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, taking all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a first space angle convolution layer in a first enhanced residual block, and recording a set formed by the feature maps as a set
Figure FDA00033722376900001212
The input terminal of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the tenth convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure FDA00033722376900001213
And has a height of
Figure FDA00033722376900001214
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001215
An input of an eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block receives FAlign,1The output end of the eleventh convolutional layer in the second spatial feature transform layer in the first enhanced residual block outputs 64 width maps
Figure FDA00033722376900001216
And has a height of
Figure FDA00033722376900001217
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001218
The input of the second spatial feature transform layer in the first enhanced residual block receives
Figure FDA00033722376900001219
All the characteristic diagrams in (1) will
Figure FDA00033722376900001220
All the characteristic diagrams in (1) and
Figure FDA00033722376900001221
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA00033722376900001222
The obtained feature maps are used as all feature maps output by the output end of the second spatial feature transform layer in the first enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure FDA00033722376900001223
An input of a twelfth of the second spatial angle convolutional layers in the first enhanced residual block receives
Figure FDA00033722376900001224
Of the twelfth convolutional layer of the second spatial angle convolutional layers in the first enhancement residual block outputs 64 width pictures
Figure FDA0003372237690000131
And has a height of
Figure FDA0003372237690000132
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000133
To pair
Figure FDA0003372237690000134
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the first enhancement residual block receiving
Figure FDA0003372237690000135
The output end of the thirteenth convolution layer of the second space angle convolution layer in the first enhanced residual error block outputs 64 width values
Figure FDA0003372237690000136
And has a height of
Figure FDA0003372237690000137
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000138
To pair
Figure FDA0003372237690000139
Performing recombination operation from angle dimension to space dimension on all feature maps in the first enhancement residual block, taking all feature maps obtained after the recombination operation as all feature maps output by the output end of the second spatial angle convolution layer in the first enhancement residual block, and recording a set formed by the feature maps as a set
Figure FDA00033722376900001310
The input of the global mean pooling layer in the channel attention layer in the first enhanced residual block receives
Figure FDA00033722376900001311
The output end of the global mean pooling layer in the channel attention layer in the first enhanced residual block outputs 64 feature maps with the width of
Figure FDA00033722376900001312
And has a height of
Figure FDA00033722376900001313
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,1,FGAP,1All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FGAP,1The output end of the fourteenth convolution layer in the channel attention layer in the first enhancement residual block outputs 4 width
Figure FDA00033722376900001314
And has a height of
Figure FDA00033722376900001315
The feature map of (1) represents a set of all feature maps outputtedFDS,1(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block receives FDS,1Of the fifteenth convolutional layer in the channel attention layer in the first enhanced residual block outputs 64 widths of
Figure FDA00033722376900001316
And has a height of
Figure FDA00033722376900001317
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,1(ii) a F is to beUS,1All the characteristic diagrams in (1) and
Figure FDA00033722376900001318
all feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the first enhanced residual block, and a set formed by the feature maps is marked as FCA,1
F is to beCA,1All feature maps in (1) and (F)LRAll the feature maps in the first enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the first enhancement residual block, and the set formed by the feature maps is FEn,1
The input terminal of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure FDA0003372237690000141
And has a height of
Figure FDA0003372237690000142
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000143
An input of an eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the first spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure FDA0003372237690000144
And has a height of
Figure FDA0003372237690000145
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000146
Receiving F at receiving end of first spatial feature transform layer in second enhanced residual blockEn,1All feature maps in (1), will FEn,1All the characteristic diagrams in (1) and
Figure FDA0003372237690000147
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA0003372237690000148
The obtained feature maps are used as all feature maps output by the output end of the first spatial feature transform layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure FDA0003372237690000149
An input of a twelfth of the first spatial angle convolutional layers in the second enhanced residual block receives
Figure FDA00033722376900001410
All feature maps in (1), first spatial angle volume in second enhancement residual blockThe output end of the twelfth convolution layer of the lamination outputs 64 widths
Figure FDA00033722376900001411
And has a height of
Figure FDA00033722376900001412
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001413
To pair
Figure FDA00033722376900001414
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the second enhancement residual block receiving
Figure FDA00033722376900001415
The output end of the thirteenth convolutional layer of the first space angle convolutional layer in the second enhanced residual block outputs 64 width values
Figure FDA00033722376900001416
And has a height of
Figure FDA00033722376900001417
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001418
To pair
Figure FDA00033722376900001419
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, and outputting all feature maps obtained after the operation of reconstructing as output ends of the first space angle convolution layer in the second enhanced residual blockAll feature maps are referred to as a set of feature maps
Figure FDA00033722376900001420
An input of a tenth convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the tenth convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure FDA00033722376900001421
And has a height of
Figure FDA00033722376900001422
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000151
An input of an eleventh convolutional layer in a second spatial feature transform layer in the second enhanced residual block receives FAlign,2The output end of the eleventh convolutional layer in the second spatial feature transform layer in the second enhanced residual block outputs 64 width maps
Figure FDA0003372237690000152
And has a height of
Figure FDA0003372237690000153
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000154
Receiving end of second spatial feature transform layer in second enhanced residual block
Figure FDA0003372237690000155
All the characteristic diagrams in (1) will
Figure FDA0003372237690000156
All the characteristic diagrams in (1) and
Figure FDA0003372237690000157
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA0003372237690000158
The obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the second enhanced residual block, and the set formed by the feature maps is recorded as a set
Figure FDA0003372237690000159
An input of a twelfth of the second spatial angle convolutional layers in the second enhanced residual block receives
Figure FDA00033722376900001510
Of the twelfth convolutional layer of the second spatial angle convolutional layers in the second enhancement residual block outputs 64 width pictures
Figure FDA00033722376900001511
And has a height of
Figure FDA00033722376900001512
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001513
To pair
Figure FDA00033722376900001514
Performs a re-composition operation from the spatial dimension to the angular dimension, a thirteenth convolution in the second spatial-angular convolution layer in the second enhancement residual blockInput side reception of layers
Figure FDA00033722376900001515
The output end of the thirteenth convolution layer of the second space angle convolution layer in the second enhanced residual block outputs 64 width values
Figure FDA00033722376900001516
And has a height of
Figure FDA00033722376900001517
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001518
To pair
Figure FDA00033722376900001519
Performing an operation of reconstructing all feature maps from an angle dimension to a space dimension, using all feature maps obtained after the operation of reconstructing as all feature maps output by an output end of a second space angle convolution layer in a second enhanced residual block, and recording a set formed by the feature maps as a set
Figure FDA00033722376900001520
The input of the global mean pooling layer in the channel attention layer in the second enhanced residual block receives
Figure FDA00033722376900001521
The output end of the global mean pooling layer in the channel attention layer in the second enhanced residual block outputs 64 width pictures
Figure FDA00033722376900001522
And has a height of
Figure FDA00033722376900001523
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,2,FGAP,2All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FGAP,2The output end of the fourteenth convolution layer in the channel attention layer in the second enhanced residual block outputs 4 width
Figure FDA0003372237690000161
And has a height of
Figure FDA0003372237690000162
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,2(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block receives FDS,2Of the fifteenth convolutional layer in the channel attention layer in the second enhanced residual block outputs 64 widths of
Figure FDA0003372237690000163
And has a height of
Figure FDA0003372237690000164
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,2(ii) a F is to beUS,2All the characteristic diagrams in (1) and
Figure FDA0003372237690000165
the obtained all feature maps are used as all feature maps output by the output end of the channel attention layer in the second enhanced residual block, and the set formed by the feature maps is marked as FCA,2
F is to beCA,2All feature maps in (1) and (F)En,1All feature maps in (1) are added element by element, and all obtained feature maps are used as all features output by the output end of the second enhanced residual blockThe set of these characteristic maps is FEn,2
An input of a tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure FDA0003372237690000166
And has a height of
Figure FDA0003372237690000167
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000168
An input of an eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the first spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure FDA0003372237690000169
And has a height of
Figure FDA00033722376900001610
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001611
Receiving F at receiving end of first spatial feature transform layer in third enhanced residual blockEn,2All feature maps in (1), will FEn,2All the characteristic diagrams in (1) and
Figure FDA00033722376900001612
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA00033722376900001613
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the first spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure FDA00033722376900001614
An input of a twelfth of the first spatial angle convolutional layers in the third enhanced residual block receives
Figure FDA00033722376900001615
Of the twelfth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 width signals
Figure FDA00033722376900001616
And has a height of
Figure FDA00033722376900001617
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001618
To pair
Figure FDA00033722376900001619
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the first spatial-angular convolutional layers of the third enhancement residual block receiving
Figure FDA0003372237690000171
The output end of the thirteenth convolutional layer of the first spatial angle convolutional layer in the third enhanced residual block outputs 64 widths as the result of the recombination operation of all the feature maps in (1)
Figure FDA0003372237690000172
And has a height of
Figure FDA0003372237690000173
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000174
To pair
Figure FDA0003372237690000175
All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the first space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
Figure FDA0003372237690000176
An input of a tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the tenth convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width
Figure FDA0003372237690000177
And has a height of
Figure FDA0003372237690000178
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000179
An input of an eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block receives FAlign,3The output end of the eleventh convolutional layer in the second spatial feature transform layer in the third enhanced residual block outputs 64 width maps
Figure FDA00033722376900001710
And has a height of
Figure FDA00033722376900001711
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001712
Receiving end of second spatial feature transform layer in third enhanced residual block
Figure FDA00033722376900001713
All the characteristic diagrams in (1) will
Figure FDA00033722376900001714
All the characteristic diagrams in (1) and
Figure FDA00033722376900001715
multiplying all the characteristic graphs element by element, and comparing the multiplication result with the result
Figure FDA00033722376900001716
All feature maps in (1) are added element by element, all obtained feature maps are used as all feature maps output by the output end of the second spatial feature conversion layer in the third enhanced residual block, and a set formed by the feature maps is recorded as a set
Figure FDA00033722376900001717
An input of a twelfth of the second spatial angle convolutional layers in the third enhanced residual block receives
Figure FDA00033722376900001718
Of the twelfth convolutional layer of the second spatial angle convolutional layer in the third enhanced residual block outputs 64 width pictures
Figure FDA00033722376900001719
And has a height of
Figure FDA00033722376900001720
The feature map of (1) represents a set of all feature maps outputted
Figure FDA00033722376900001721
To pair
Figure FDA00033722376900001722
Performs a recombination operation from a spatial dimension to an angular dimension, an input of a thirteenth of the second spatial-angular convolutional layers of the third enhancement residual block receiving
Figure FDA00033722376900001723
The output end of the thirteenth convolution layer of the second space angle convolution layer in the third enhanced residual block outputs 64 width values
Figure FDA00033722376900001724
And has a height of
Figure FDA00033722376900001725
The feature map of (1) represents a set of all feature maps outputted
Figure FDA0003372237690000181
To pair
Figure FDA0003372237690000182
All feature maps in the third enhancement residual block are recombined from an angle dimension to a space dimension, all feature maps obtained after the recombination operation are taken as all feature maps output by the output end of the second space angle convolution layer in the third enhancement residual block, and a set formed by the feature maps is recorded as a set
Figure FDA0003372237690000183
The input of the global mean pooling layer in the channel attention layer in the third enhanced residual block receives
Figure FDA0003372237690000184
The output end of the global mean pooling layer in the channel attention layer in the third enhanced residual block outputs 64 width images
Figure FDA0003372237690000185
And has a height of
Figure FDA0003372237690000186
The feature map of (1) is a set of all feature maps of (1) output, denoted as FGAP,3,FGAP,3All feature values in each feature map in (1) are the same; the input of the fourteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FGAP,3The output end of the fourteenth convolution layer in the channel attention layer in the third enhanced residual block outputs 4 width
Figure FDA0003372237690000187
And has a height of
Figure FDA0003372237690000188
The feature map of (1) is a set of all feature maps of (1) output, denoted as FDS,3(ii) a The input of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block receives FDS,3Of the fifteenth convolutional layer in the channel attention layer in the third enhanced residual block outputs 64 widths of
Figure FDA0003372237690000189
And has a height of
Figure FDA00033722376900001810
The feature map of (1) is a set of all feature maps of (1) output, denoted as FUS,3(ii) a F is to beUS,3All the characteristic diagrams in (1) and
Figure FDA00033722376900001811
all feature maps in (1) are multiplied element by element, all obtained feature maps are used as all feature maps output by the output end of the channel attention layer in the third enhanced residual block, and a set formed by the feature maps is marked as FCA,3
F is to beCA,3All feature maps in (1) and (F)En,2All the feature maps in the third enhancement residual block are added element by element, all the obtained feature maps are used as all the feature maps output by the output end of the third enhancement residual block, and the set formed by the feature maps is FEn,3
In the above, the sizes of convolution kernels of the tenth convolution layer and the eleventh convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is all 64, and no activation function is adopted, the sizes of convolution kernels of the twelfth convolution layer and the thirteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are all 3 × 3, the convolution step lengths are all 1, the number of input channels is all 64, the number of output channels is 64, the adopted activation functions are all "ReLU", the sizes of convolution kernels of the fourteenth convolution layer in each of the first enhancement residual block, the second enhancement residual block and the third enhancement residual block are 1 × 1, the convolution step lengths are 1, the number of input channels is 64, the number of output channels is 4, and the adopted activation function is "ReLU", the size of the convolution kernel of the fifteenth convolution layer in each of the first, second, and third enhanced residual blocks is 1 × 1, the convolution step is 1, the number of input channels is 4, the number of output channels is 64, and the employed activation function is "Sigmoid".
CN202111405987.1A 2021-11-24 2021-11-24 Light field image space super-resolution reconstruction method Pending CN114359041A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111405987.1A CN114359041A (en) 2021-11-24 2021-11-24 Light field image space super-resolution reconstruction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111405987.1A CN114359041A (en) 2021-11-24 2021-11-24 Light field image space super-resolution reconstruction method

Publications (1)

Publication Number Publication Date
CN114359041A true CN114359041A (en) 2022-04-15

Family

ID=81096214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111405987.1A Pending CN114359041A (en) 2021-11-24 2021-11-24 Light field image space super-resolution reconstruction method

Country Status (1)

Country Link
CN (1) CN114359041A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309067A (en) * 2023-03-21 2023-06-23 安徽易刚信息技术有限公司 Light field image space super-resolution method
CN117475088A (en) * 2023-12-25 2024-01-30 浙江优众新材料科技有限公司 Light field reconstruction model training method based on polar plane attention and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200402205A1 (en) * 2019-06-18 2020-12-24 Huawei Technologies Co., Ltd. Real-time video ultra resolution
CN112381711A (en) * 2020-10-27 2021-02-19 深圳大学 Light field image reconstruction model training and rapid super-resolution reconstruction method
CN112950475A (en) * 2021-03-05 2021-06-11 北京工业大学 Light field super-resolution reconstruction method based on residual learning and spatial transformation network
CN113139898A (en) * 2021-03-24 2021-07-20 宁波大学 Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200402205A1 (en) * 2019-06-18 2020-12-24 Huawei Technologies Co., Ltd. Real-time video ultra resolution
CN112381711A (en) * 2020-10-27 2021-02-19 深圳大学 Light field image reconstruction model training and rapid super-resolution reconstruction method
CN112950475A (en) * 2021-03-05 2021-06-11 北京工业大学 Light field super-resolution reconstruction method based on residual learning and spatial transformation network
CN113139898A (en) * 2021-03-24 2021-07-20 宁波大学 Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邓武 等: "《融合全局与局部视角的光场超分辨率重建》", 《计算机应用研究》, vol. 36, no. 5, 31 May 2019 (2019-05-31), pages 1549 - 1559 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309067A (en) * 2023-03-21 2023-06-23 安徽易刚信息技术有限公司 Light field image space super-resolution method
CN116309067B (en) * 2023-03-21 2023-09-29 安徽易刚信息技术有限公司 Light field image space super-resolution method
CN117475088A (en) * 2023-12-25 2024-01-30 浙江优众新材料科技有限公司 Light field reconstruction model training method based on polar plane attention and related equipment
CN117475088B (en) * 2023-12-25 2024-03-19 浙江优众新材料科技有限公司 Light field reconstruction model training method based on polar plane attention and related equipment

Similar Documents

Publication Publication Date Title
Cai et al. Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction
CN113139898B (en) Light field image super-resolution reconstruction method based on frequency domain analysis and deep learning
Wu et al. Light field reconstruction using deep convolutional network on EPI
Farrugia et al. Light field super-resolution using a low-rank prior and deep convolutional neural networks
CN110880162A (en) Snapshot spectrum depth combined imaging method and system based on deep learning
CN114359041A (en) Light field image space super-resolution reconstruction method
Guan et al. Srdgan: learning the noise prior for super resolution with dual generative adversarial networks
Jin et al. Light field super-resolution via attention-guided fusion of hybrid lenses
CN114841856A (en) Image super-pixel reconstruction method of dense connection network based on depth residual channel space attention
Zhang et al. Light field super-resolution using complementary-view feature attention
Liu et al. Learning from EPI-volume-stack for light field image angular super-resolution
CN116823602B (en) Parallax-guided spatial super-resolution reconstruction method for light field image
Lu et al. Low-rank constrained super-resolution for mixed-resolution multiview video
CN117114987A (en) Light field image super-resolution reconstruction method based on sub-pixels and gradient guidance
CN117237207A (en) Ghost-free high dynamic range light field imaging method for dynamic scene
CN111696167A (en) Single image super-resolution reconstruction method guided by self-example learning
Neshatavar et al. ICF-SRSR: Invertible scale-Conditional Function for Self-Supervised Real-world Single Image Super-Resolution
CN103226818A (en) Single-frame image super-resolution reconstruction method based on manifold regularized sparse support regression
CN116630152A (en) Image resolution reconstruction method and device, storage medium and electronic equipment
Fang et al. Light field reconstruction with a hybrid sparse regularization-pseudo 4DCNN framework
CN113205005B (en) Low-illumination low-resolution face image reconstruction method
Rohit et al. A robust face hallucination technique based on adaptive learning method
Liu et al. DCM-CNN: Densely connected multiloss convolutional neural networks for light field view synthesis
Chen et al. Hybrid Domain Learning towards Light Field Spatial Super-Resolution using Heterogeneous Imaging
Li et al. Realistic single-image super-resolution using autoencoding adversarial networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination